-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start thinking about a standard selector framework for signature search/compatibility #1072
Comments
random thought, having a robust API that raises appropriate exceptions for incompatible signatures could dovetail nicely with a more consistent and informative user experience. |
side note, it'd be awesome to be able to use accessions as selectors as well as md5sums, because then you could swizzle between ksize/moltypes fearlessly, too. |
Much progress made in #1420:
Items not tackled in #1420:
(not all of these may be good ideas, either ;) Once #1420 is merged we should close this issue and create a new issue with the remaining unimplemented ideas. |
Interesting conversation here about how selectors should work - tl;dr we should have them pick out signatures that can satisfy the conditions, but not actually modify them (by e.g. downsampling). This dovetails with some stuff I've been thinking about in #1392 where we face a similar choice in searching - when we actually do the comparisons, we need to modify the signatures to match, but in terms of returning signatures, we probably want to return the original unmodified signature. |
note to self: #1427 suggests we need more |
closing for #1524 |
Note: Updated in #1524, which contains the unresolved parts of this issue.
#936 added
Index.select(ksize=ksize, moltype=moltype, ...)
and with #1059 merged we have a standard API for loading piles of signatures, and I keep on coming back to doing cool things with selectors.this issue replaces #599, which is just about md5sum selectors, with a place for more general discussion.
note that #934 fell apart into a mess of ugly code, and selector frameworks could provide some simplicity here.
this is all just brainstorming without any attempt to make the code work... but I like the idea of providing a few pieces of functionality.
the current situation
Index
subclasses support a functionselect
that currently takesksize
andmoltype
. it behaves differently onLinearIndex
and on databases.LinearIndex
, which can contain many different kind of signatures, it applies the selector all signatures and returns a newLinearIndex
the underlying idea is to be able to say
obj.select(<condition>)
and have that condition hold for any future uses ofobj
.this would dovetail nicely with #198 in terms of supporting richer databases (e.g. multiple ksizes, moltypes, etc.)
some brainstormy thoughts for more selector foo
it'd be nice to have a selector object that could be used to apply partial restrictions to collections, e.g. just
ksize
selection. That's kind of how it works now in theory, but selectors are not very rich at the moment.part of the idea is that selectors could be lazy, so that all the conditions could be resolved once, when you actually use the database or collection of signatures.
I really like the idea of method chaining although I'm not 100% sure exactly what that would look like. maybe
db.select(moltype='DNA').select(ksize=57)
?we might want to add
scaled
andnum
, and then allow selector functions to do the necessary downsampling (or raise objections).similarly, we could provide MinHash flattening via selector.
(#611 relevant to both ideas)
md5sum is an obvious selector (#599)
would be nice to be able to use a signature or a database (so, a signature or an Index object?) as a selector, so that only compatible signatures/databases are selected. this would probably help resolve #809 / #934 more cleanly :)
you could imagine applying taxonomic filtering via selectors, although that's kind of a different thing conceptually.
The text was updated successfully, but these errors were encountered: