Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More refactoring of MinHash comparison code #882

Merged
merged 13 commits into from Feb 5, 2020
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Feb 1, 2020

Fixes #868.

  • create rust functions jaccard and angular_similarity to do the actual math on compatible signatures
  • make Python compare and similarity call the same compare function on rust side
  • refactor out duplicated code for signature downsampling to new rust function downsample_max_hash

I made the decision to just simplify and rationalize the rust side of things, because our Python API for MinHash objects (in sourmash/_minhash.py) probably shouldn't change without a major version release...

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Feb 1, 2020

Codecov Report

Merging #882 into master will increase coverage by 27.83%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #882       +/-   ##
===========================================
+ Coverage   50.23%   78.06%   +27.83%     
===========================================
  Files          25       95       +70     
  Lines        2365     7308     +4943     
===========================================
+ Hits         1188     5705     +4517     
- Misses       1177     1603      +426
Flag Coverage Δ
#rusttests 50.23% <ø> (ø) ⬆️
Impacted Files Coverage Δ
sourmash/commands.py 84.4% <100%> (ø)
sourmash/cli/compare.py 100% <100%> (ø)
sourmash/sbt.py 86.52% <100%> (ø)
sourmash/logging.py 96.07% <0%> (ø)
sourmash/__main__.py 92.3% <0%> (ø)
sourmash/_minhash.py 96.2% <0%> (ø)
sourmash/cli/lca/classify.py 100% <0%> (ø)
sourmash/cli/storage/__init__.py 100% <0%> (ø)
sourmash/lca/command_summarize.py 79.26% <0%> (ø)
sourmash/sbt_storage.py 86.36% <0%> (ø)
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d551a78...f4365aa. Read the comment docs.

@ctb
Copy link
Contributor Author

ctb commented Feb 1, 2020

Ready for review @luizirber, and thx for the suggestion, @kloetzl !

@kloetzl
Copy link
Contributor

kloetzl commented Feb 2, 2020

Uh, nice. That really reduced the line count. Less lines, less bugs!

@ctb
Copy link
Contributor Author

ctb commented Feb 2, 2020 via email

@ctb
Copy link
Contributor Author

ctb commented Feb 5, 2020

OK I think this is ready.

@luizirber
Copy link
Member

I'll fix the wasm-pack issue. Should I also fix all the failing Rust tests?

@luizirber luizirber merged commit 5038f0e into master Feb 5, 2020
@luizirber luizirber deleted the refactor/sbtmh_2 branch February 5, 2020 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor compare and similarity in _minhash.py and sketch/minhash.rs
3 participants