Should we specify a search function? #28

nsheff · 2022-02-01T21:47:42Z

Various times we've discussed what we refer to as the search function. It's been raised in discussion and also in issues, e.g.:

Brief description of the search function

Given a sequence collection, find other sequence collections that are compatible with it. "Compatible" can have a variety of meanings here... it could mean looking for subset relationships, collections with same content but in different order, collections with same sequences but with different names, same lengths and names but different sequences, etc.

Now that we've come to agreement on the comparison function, we could think about the search function. The search function seems very useful, but it also seems time consuming. In a naive database that is just storing the objects, to calculate this would basically require running the compare function across all other collections in the database.

I suppose we could do something like pre-compute the comparison function for all pairs of collections, and then a search function might be more possible. Or, perhaps there's another way this could be implemented.

At the moment I'm not sure this should be within scope for sequence collections, at least not at this point. This seems like a separate service that could be built on top of the collection and comparison endpoints by computing lots of comparisons and then structuring the results into some kind of smart data structure so that a given search query wouldn't take too long to compute. Very useful, yes, but also probably an extension to seqcol.

thoughts?

The text was updated successfully, but these errors were encountered:

tcezard · 2024-03-20T14:00:53Z

I was drawn back to this issue by a recent use case which made me look at this in a different light.
What I have in mind is in between a search described here and the existence test
@sveinugu mentioned it also here

It would be a separate collections endpoint (name to be debated) that can be queried with a level1 digest and property and return the list of level0 sequence collection that match
i.e.
/collections?names=4925cdbd780a71e332d13145141863c1
Would return all the collections with that ordered set of names that can be queried further on the collection endpoint or compared
We could extend the concept to the list endpoint that has been discuss in the past
/collections
would return the whole list.

This can be incredibly powerful because it enables both discovery of sequence collections in a given server and increased interoperability between servers:

Increased discoverability because user can find out what sequence collection a server has available
Increased interoperability because two services will likely have incompatible level 0 digest but will have compatible level1 digest (for names, sequences and lengths at least)
Unable precomputed search as a service can allow search by sorted_names_length_pairs or sorted_sequences or any other precomputed array digest.

nsheff · 2024-03-20T14:33:58Z

I think discussions on these search/existence/lookup-by-level1-digest are good things to look at next.

Probably don't make it into 1.0, but clear useful extensions for a 1.1 or something.

nsheff · 2024-07-29T18:14:57Z

I implemented a basic version of this here: https://seqcolapi.databio.org/docs#/Discovering%20data/attribute_search_list_collections__attribute___attribute_digest__get

Basically, you list collections with a given attribute like this:

/list/collections/{attribute}/{attribute_digest}

This can't ask a question of "is compatible with" but it can ask a question of whether a specific attribute is identical. So it's a bit weaker than I originally proposed, but still very useful and maybe solves the main use cases; also it becomes more powerful as additional custom attributes are added to enable particular searches. I think this is exactly what Tim proposed above.

nsheff · 2024-08-07T12:20:23Z

Here's what I've written that we could add to the specification under the /list endpoint. I'm envisioning this as an endpoints lives underneath the generic list endpoint.

Variant: List with filter

Endpoint: GET /list/:object_type/:attribute/:attribute_digest?page=:page&page_size=:page_size (REQUIRED)
Description: Lists identifiers for a given object type (e.g. collections), filtered to only those that have a specific attribute value. This endpoint provides a way to discover sequence collections with a certain attribute.
Return value: The output format matches the the more general /list endpoint. It is simply filtered.

tcezard · 2024-08-07T13:31:54Z

I agree barring my comment on :object_type which should be define as collections for now.
I would also change the attribute_digest to level1_value or something similar to make it clear where the value is coming from.
I could also support single value attributes which would enable all kind of predefine searches.
Something like
GET /list/collections/:attribute/:level1_value?page=:page&page_size=:page_size

nsheff · 2024-08-23T21:59:31Z

Just a thought on the list with filter idea. In discussions on schema registry, it came up that if the filters used query parameters, instead of path parameters, then it would be easy and natural to specify more than one filter.

So if, instead of this:

/list/collections/{attribute}/{attribute_digest}

You used this:

/list/collections/?{attribute}={attribute_digest}

Then you could more easily enable this:

/list/collections/?{attribute1}={attribute_digest}&{attribute2}={attribute_digest2}

Is this desirable?

nsheff mentioned this issue Feb 1, 2022

How will the seqcol compatibility flags be encoded? #7

Closed

nsheff added the enhancement New feature or request label Feb 22, 2024

tcezard added this to the v1.1 milestone Mar 20, 2024

nsheff mentioned this issue Jun 12, 2024

Use case: a digest for a collection of sequences #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we specify a search function? #28

Should we specify a search function? #28

nsheff commented Feb 1, 2022

tcezard commented Mar 20, 2024

nsheff commented Mar 20, 2024

nsheff commented Jul 29, 2024

nsheff commented Aug 7, 2024 •

edited

Loading

tcezard commented Aug 7, 2024 •

edited

Loading

nsheff commented Aug 23, 2024

Should we specify a search function? #28

Should we specify a search function? #28

Comments

nsheff commented Feb 1, 2022

Brief description of the search function

tcezard commented Mar 20, 2024

nsheff commented Mar 20, 2024

nsheff commented Jul 29, 2024

nsheff commented Aug 7, 2024 • edited Loading

Variant: List with filter

tcezard commented Aug 7, 2024 • edited Loading

nsheff commented Aug 23, 2024

nsheff commented Aug 7, 2024 •

edited

Loading

tcezard commented Aug 7, 2024 •

edited

Loading