Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add a MultiIndex class that wraps multiple Index classes. #1374

Merged
merged 24 commits into from
Mar 26, 2021

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Mar 7, 2021

This PR adds a MultiIndex class, which provides a wrapper for other Index classes. Each wrapped class should come with a paired location, which can be None; if set, the wrapper overrides the filename returned in search and gather with the paired location.

The PR also adds a filter method to LinearIndex and MultiIndex. This is a harbinger of selectors on scaled/num, #1072; I'm adding filter as an interim solution because I needed the functionality in sourmash_args and it made the code much cleaner.

Addresses several of the issues in DirectoryIndex issue, #810.

A side effect of switching to MultiIndex in various sourmash_args functions is that we can remove special-casing of SIGLIST in _load_databases and dependent functions. This further simplifies load_file_as_index and load_file_as_signatures. 👍

In particular, this approach provides source filename information for where matches are coming from; see this comment for detailed rationale. Could perhaps replace the SBT location attribute in #1373 with wrapping an SBT in one of these (which seems like a hack in the long term, but could work fine in the short term ;).

This wrapper approach may also provide a partial solution to lazy loading #1097, because it could support lazy loading of signatures by Index classes.

TODO:

  • add a few tests that show that source filename in CSV output is correct 🎉

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Mar 7, 2021

Codecov Report

Merging #1374 (d99828d) into latest (202bdc5) will increase coverage by 5.30%.
The diff coverage is 98.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1374      +/-   ##
==========================================
+ Coverage   89.18%   94.49%   +5.30%     
==========================================
  Files         123       96      -27     
  Lines       18642    15176    -3466     
  Branches     1434     1447      +13     
==========================================
- Hits        16626    14340    -2286     
+ Misses       1780      603    -1177     
+ Partials      236      233       -3     
Flag Coverage Δ
python 94.49% <98.68%> (+0.06%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/index.py 93.61% <94.91%> (+0.51%) ⬆️
src/sourmash/sourmash_args.py 92.93% <100.00%> (+0.49%) ⬆️
tests/test_index.py 100.00% <100.00%> (ø)
tests/test_sourmash.py 99.71% <100.00%> (+<0.01%) ⬆️
src/core/src/from.rs
src/core/src/encodings.rs
src/core/src/index/sbt/mod.rs
src/core/src/ffi/nodegraph.rs
src/core/src/sketch/nodegraph.rs
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 202bdc5...d99828d. Read the comment docs.

@ctb ctb changed the title [WIP] Add a MultiIndex class that wraps multiple Index classes. [MRG] Add a MultiIndex class that wraps multiple Index classes. Mar 8, 2021
@ctb
Copy link
Contributor Author

ctb commented Mar 8, 2021

Ready for review @luizirber @bluegenes

@ctb
Copy link
Contributor Author

ctb commented Mar 9, 2021

OK, upgraded a bit after some ...excursions. Cleaner, better, more succinct.

@luizirber
Copy link
Member

Anything else to add to this, @ctb? Otherwise, go for merging =]

@ctb
Copy link
Contributor Author

ctb commented Mar 26, 2021

thanks @luizirber - I'm doing some important downstream cleanup in #1406, and will fix any minor nits there.

@ctb ctb merged commit 7ed6291 into latest Mar 26, 2021
@ctb ctb deleted the add/multi_index branch March 26, 2021 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants