-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Create an Index abstract base class #556
Conversation
Exciting!! |
Codecov Report
@@ Coverage Diff @@
## master #556 +/- ##
==========================================
+ Coverage 89.38% 89.47% +0.09%
==========================================
Files 29 30 +1
Lines 4614 4705 +91
Branches 49 49
==========================================
+ Hits 4124 4210 +86
- Misses 486 491 +5
Partials 4 4
Continue to review full report at Codecov.
|
@luizirber once #533 is merged I plan to work on this next & clean up some API stuff. May I haz? |
e91153f
to
9d32f4c
Compare
Thought: I think we should provide at least two specialized functions on Index objects, equivalent to 'search' and 'gather'. The former would return matches with best Jaccard similarity, the latter would find matches with best containment. Since this is currently 100% of needed functionality it would be (IMO) be better than our current bending-over-backwards approach to working off of a generic 'find' function. |
@ctb I like that! I don't think there is a way of marking a method 'optional' with |
k
|
hey @luizirber your thoughts on |
sourmash/index.py
Outdated
""" """ | ||
|
||
@abstractmethod | ||
def search(self, signature, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even defining search
like this:
def search(self, signature, *,
threshold=None,
do_containment=False,
ignore_abundance=False,
**kwargs):
without *args
, with default keyword arguments.
(the *
in the middle is a Python 3 thing, but when this get merged I think we will have dropped Python 2 anyway?)
Tentative: add stub files with typing information too |
What is the status of this with relation to the Rust codebase? |
The Rust codebase has similar concepts implemented. This is the ~equivalent Trait (Rust traits are similar to Python abstract base classes) https://github.com/dib-lab/sourmash/blob/0d69b7973e1b4e690933b27aeb8d11becc876e19/src/index/mod.rs#L34-L68 And an implementation for a LinearIndex similar to the one @ctb implemented in this PR: (Note that since the Rust trait has a default implementation for Python and Rust are not talking yet, mostly because it would involve juggling 3 different branches/PRs:
A follow up PR to this one would then move commands ( |
Ping @phoenixAja |
#672 (comment) is also related to this: SBT and LCA are the main candidates to implement |
Hey @luizirber, there are still a few failed tests, and I have a sneaking suspicion that there are ways to make things much more efficient with generators, but I've done a major bit of refactoring and would welcome your input! |
sourmash/index.py
Outdated
|
||
|
||
class LinearIndex(Index): | ||
def __init__(self, signatures=[], filename=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noooooooooo... Don't set a default argument like that =]
http://effbot.org/zone/default-values.htm
def __init__(self, signatures=[], filename=None): | |
def __init__(self, signatures=None, filename=None): | |
if signatures is None: | |
signatures = [] |
The Or even an |
Yes, I too was thinking about a datasets-style iter! Dovetails with #672. I'm not a huge fan of the |
A few other thoughts - first, Second, looking at the |
(ref |
Note to self: should gather take |
Note to self: the LCA db |
Note, we could now easily provide a generic |
hi @luizirber this is ready for review! |
Co-Authored-By: Luiz Irber <luizirber@users.noreply.github.com>
…into refactor/index_cleanup
* add signatures() method to both LCA and SBT indices * Update tests/test_sbt.py Co-Authored-By: Luiz Irber <luizirber@users.noreply.github.com> * SBT.insert now matches Index.insert, while SBT.add_node does what insert used to * clean up signature loading * round out Index method tests, sort of :)
Updated with #796 and added some scaled tests. Review & merge at will @luizirber. |
From #545 (comment):
Another way to test this
abc
is to define aLinear
index: save all the signatures in a list/dictionary, search thru it linearly. It's also good as a base case for testing if SBT/LCA are finding all matches correctly (there are some test cases that already do this)Remaining TODO before review:
SearchResult
signatures
method?scaled
values.Rejected for this PR:
Index
objects.Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?