New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Save unique kmers count in Nodegraph #1009
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1009 +/- ##
==========================================
+ Coverage 83.30% 92.09% +8.79%
==========================================
Files 97 72 -25
Lines 8749 5454 -3295
==========================================
- Hits 7288 5023 -2265
+ Misses 1461 431 -1030
Continue to review full report at Codecov.
|
Exciting!! Thank you so much! |
I'm adding these docs and changes to #1221, which will be version 5 of the Nodegraph format (and incompatible with khmer). |
Both khmer and sourmash Nodegraph keep track of unique kmers during insertion, but this info is lost when serializing to disk. This PR updated the binary format, adding an extra field to keep this info. This makes sourmash Nodegraphs incompatible with khmer, but they sort of are incompatible already because khmer doesn't support compressed Nodegraphs.
(both changes can be ported back to khmer, but not today =] )
I copied over the version 4 binary format documentation from khmer and added the new changes for version 5 of the Nodegraph.
cc @olgabot
TODO
save_v4
method to Python, to keep compatibility with khmer (thesave
method defaults tov5
)Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?