Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Save unique kmers count in Nodegraph #1009

Closed
wants to merge 3 commits into from
Closed

Conversation

luizirber
Copy link
Member

Both khmer and sourmash Nodegraph keep track of unique kmers during insertion, but this info is lost when serializing to disk. This PR updated the binary format, adding an extra field to keep this info. This makes sourmash Nodegraphs incompatible with khmer, but they sort of are incompatible already because khmer doesn't support compressed Nodegraphs.

(both changes can be ported back to khmer, but not today =] )

I copied over the version 4 binary format documentation from khmer and added the new changes for version 5 of the Nodegraph.

cc @olgabot

TODO

  • need to expose a save_v4 method to Python, to keep compatibility with khmer (the save method defaults to v5)
  • Fix the rust tests, they still assume everything is v4-only

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented May 31, 2020

Codecov Report

Merging #1009 into master will increase coverage by 8.79%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1009      +/-   ##
==========================================
+ Coverage   83.30%   92.09%   +8.79%     
==========================================
  Files          97       72      -25     
  Lines        8749     5454    -3295     
==========================================
- Hits         7288     5023    -2265     
+ Misses       1461      431    -1030     
Flag Coverage Δ
#rusttests ?
Impacted Files Coverage Δ
sourmash/nodegraph.py 77.67% <0.00%> (-16.08%) ⬇️
src/core/tests/test.rs
src/core/src/ffi/utils.rs
src/core/src/ffi/mod.rs
src/core/src/ffi/signature.rs
src/core/src/sketch/ukhs.rs
src/core/src/ffi/nodegraph.rs
src/core/src/ffi/cmd/compute.rs
src/core/src/signature.rs
src/core/src/lib.rs
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc2b168...eb74b5f. Read the comment docs.

@olgabot
Copy link
Collaborator

olgabot commented Jun 1, 2020

Exciting!! Thank you so much!

@luizirber
Copy link
Member Author

I'm adding these docs and changes to #1221, which will be version 5 of the Nodegraph format (and incompatible with khmer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants