-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we specify a search function? #28
Comments
I was drawn back to this issue by a recent use case which made me look at this in a different light. It would be a separate This can be incredibly powerful because it enables both discovery of sequence collections in a given server and increased interoperability between servers:
|
I think discussions on these search/existence/lookup-by-level1-digest are good things to look at next. Probably don't make it into 1.0, but clear useful extensions for a 1.1 or something. |
I implemented a basic version of this here: https://seqcolapi.databio.org/docs#/Discovering%20data/attribute_search_list_collections__attribute___attribute_digest__get Basically, you list collections with a given attribute like this:
This can't ask a question of "is compatible with" but it can ask a question of whether a specific attribute is identical. So it's a bit weaker than I originally proposed, but still very useful and maybe solves the main use cases; also it becomes more powerful as additional custom attributes are added to enable particular searches. I think this is exactly what Tim proposed above. |
Here's what I've written that we could add to the specification under the Variant: List with filter
|
I agree barring my comment on |
Just a thought on the list with filter idea. In discussions on schema registry, it came up that if the filters used query parameters, instead of path parameters, then it would be easy and natural to specify more than one filter. So if, instead of this:
You used this:
Then you could more easily enable this:
Is this desirable? |
Various times we've discussed what we refer to as the search function. It's been raised in discussion and also in issues, e.g.:
Brief description of the search function
Given a sequence collection, find other sequence collections that are compatible with it. "Compatible" can have a variety of meanings here... it could mean looking for subset relationships, collections with same content but in different order, collections with same sequences but with different names, same lengths and names but different sequences, etc.
Now that we've come to agreement on the comparison function, we could think about the search function. The search function seems very useful, but it also seems time consuming. In a naive database that is just storing the objects, to calculate this would basically require running the compare function across all other collections in the database.
I suppose we could do something like pre-compute the comparison function for all pairs of collections, and then a search function might be more possible. Or, perhaps there's another way this could be implemented.
At the moment I'm not sure this should be within scope for sequence collections, at least not at this point. This seems like a separate service that could be built on top of the
collection
andcomparison
endpoints by computing lots of comparisons and then structuring the results into some kind of smart data structure so that a given search query wouldn't take too long to compute. Very useful, yes, but also probably an extension to seqcol.thoughts?
The text was updated successfully, but these errors were encountered: