Filter Ext: Dynamic queryables difficult for item-search #182

rsmith013 · 2021-07-28T10:44:55Z

One feature we are really interested in is the ability to request from the server the possible queryables and their accepted values.

As defined, there are two routes:

/queryables - The global intersect of all collection queryables
`/collections/{collection_id}/queryables - Collection specific queryables

As noted in the documentation, this falls short in providing a useful interface where an API presents diverse content.

Issues ogcapi-582 ogcapi-576 go some way in addressing this by requesting wildcard schema definitions and /queryables?collections=collection1,collection2 but I feel this still leaves limitations. Some discussed in the issues themselves.

Wildcard definitions have issues because you lose the information about possible queryable properties and values. We are likely to end up with millions of items and 100s if not 1000s of collections. It is unreasonable to expect a user to search these to find some possible attributes they might wish to search on.
adding the ?collections parameter requires the client to know the collections they are interested in up-front. One of the benefits of item-search is cross-collection search.

I wonder whether a more useful approach would be to allow the same query parameters as /search on /queryables.
The implementation can then search for the list of results that match and provide the intersect of queryables to the user for further refinement.

e.g.

Return the list of items which match the filter expression
/search?filter=sentinel:data_coverage > 50 OR eo:cloud_cover < 10

Return the queryables available for the results which match the current filter expression
/queryables?filter=sentinel:data_coverage > 50 OR eo:cloud_cover < 10

The /queryables?collections=collection1,collection2 approach requires the API to return a list of relevant collections to be useful IMO. Perhaps allow the context extension to return an array of collections in the response which are relevant for the current search.

Proposed solutions:

Accept the /search query parameters on the /queryables endpoint to dynamically build the intersect
Return collection ids in context response, this can then be used via proposed /queryables?collections=...,

The text was updated successfully, but these errors were encountered:

rsmith013 · 2021-09-17T13:49:45Z

I have created an extension to the context extension to add collections into the response.
https://github.com/cedadev/stac-context-collections

In my implementation, the number of collections returned in this response is max 10 as the more collections returned, the less likely there is to be an intersect. The value from this response is then used via /queryables?collections=..., as suggested

Open to thoughts on this approach. I appreciate the filter extension based on OGC OA Feat: Part 3 is still a moving target.

dwilson1988 · 2021-10-26T18:25:13Z

@rsmith013

I'm looking into this now. Has anyone proposed a different response to those working on OGC OA Feat: Part 3 for global queryables?

Instead of /queryables returning the intersection, why not separate them out by collection similar to the collections endpoint?
e.g.:

GET /queryables

{
  "queryables": [
    {
      "$id": "collection1" 
      "title": "Collection 1"
      ...
    }, {
      "$id": "collection1" 
      "title": "Collection 1"
      ...
    }
  ]
}

GET /queryables?collections=collection1

{
  "queryables": [
    {
      "$id": "collection1" 
      "title": "Collection 1"
      ...
    }
  ]
}

I'm struggling to understand the logic of using the intersection of queryables given the collections could be RADICALLY different (in our use case, they certainly are).

rsmith013 · 2021-11-01T11:30:29Z

@dwilson1988 We are expecting to have many hundreds of collections so I am not sure this would be a scalable approach either. We are working under the assumption that each collection will have its own set of queryables. These could be very different. We are then working to create a separate service that will create global search facets (queryables) and make a mapping between the collection-specific terms and the global set.

I am not sure which area you're working in but we are coming from an earth system modeling perspective (climate models mixed with earth observation remote and local sensing)

As an example:

CMIP6

cmip6:source_id -> global:model
global:general_data_type

CMIP5

cmip5:model -> global:model
global:general_data_type

Sentinel 1

global:general_data_type
platform
processing_level

Sentinel 3

global:general_data_type
platform
processing_level

So although I agreed that at the top level, having an intersection will likely tend to zero for heterogenous STAC collections we are hoping to have a few key search facets available at the top level e.g. data type, permitted_use, inspire_theme, gemet_topic... as well as incorporating free-text search . We hope that this will allow the user to narrow their search sufficiently that an intersection will return a meaningful overlap. The context collections extension I have proposed based on thoughts in another issue allows clients to submit a set of collections to the queryables endpoint and return their intersection.

So to use the above examples, if you were looking for satellite data, then it is likely that processing_level would bubble up from a queryables search. Or if you were looking for model data then model would appear as a queryable.

I would be interested to hear other thoughts and approaches of feedback on our proposed answer to the issue.

dwilson1988 · 2021-11-01T14:39:21Z

@rsmith013 We definitely have a lot of earth observation datasets as well, but a few others. Definitely a wide breadth of data types and very heterogenous field names except for a few that have imposed consistency like start_datetime, end_datetime, datetime, etc.

Not sure I see a scaling issue the approach I suggested. Unless there are dozens of fields on every dataset, I wouldn't expect the response to be much larger than just a /collections response? In the end, it's probably not a huge deal, but it would nice to be able to get the queryables for everything without making first a request to collections and then to each queryables endpoint.

Free text search is at least partially implemented in CQL/CQL2, I would expect a function there might become the canonical way to do it. I do really like the idea of the context collections extension - I might implement that on our side.

rsmith013 · 2021-11-01T14:59:50Z

I am also thinking from a UI perspective. If you return the queryables for each collection in isolation, how do you display that to a user? Even with some kind of post-processing, which could do as I am suggesting where you map terms together, would still result in an unusable interface (assuming you have many varied vocabs).
To some extent, having a very small number of high-level queryables and expanding these as the user query gets more specific would be a more user-friendly experience. IMO.

The filters extension is developing, so will keep an eye on that. Always good to remove code.

How were you thinking you would handle the client/UI side of things with separated queryables for each collection?

dwilson1988 · 2021-11-01T15:08:05Z

Well, the queryables by collection is already in the OGC API Features Part 3 spec, but our primary usage is not exactly UI driven, but part API client and part machine to machine. We use the collecitons specific queryables endpoint (collections//queryables) with a Dataset (in our usage, a superset of a Collection) object in our API. So a user of that will be able to check what they are able to use to query it. Our user interface allows a user to browse these datasets individually, so queryables would just be displayed as a list or sorts in that Dataset's display.

rsmith013 · 2021-11-01T15:37:05Z

If I understand correctly you have a structure like this:

Where collection queryables are an aggregation of the item properties and dataset queryables are an aggregation of the collections associated with it. I guess you are wanting all the queryables in one hit and then would sort it by dataset yourself? I can see why you would want to lose the intersect. I would think something like Memcached would be an answer here. As you don't need all the queryables every time, just a subset for each dataset.

Use Memcached (other caches do exist) to store the response from collections//queryable then the lookup to each member collection should be lightning fast. Assuming also that these things change infrequently, caching the dataset response for the same (i.e. the aggregation of all the /collections//queryables responses) would be even better.

Although the global intersect is flawed where you have heterogeneous collection queryables, I am protecting the idea because I think that from a STAC client (UI or otherwise) perspective, it is more useful. It sounds like your Dataset concept already reduces the number of collections needed per request and I would think that caching would help with performance.

dwilson1988 · 2021-11-01T15:41:44Z

No, I wasn't very clear - Dataset is just a STAC Collection + more stuff, not multiple STAC Collections.

I'm just looking for flexibility, but querying individual collections isn't a huge burden, just a potential annoyance.

rsmith013 · 2021-11-01T15:45:04Z

ok, I'm with you now.

rsmith013 · 2021-12-06T17:17:37Z

Thinking deeper on this issue and having played with my original suggestion context-collections it still doesn't quite do the job. The issue is that once you enter a collection, there is no further refinement. As you need to know the search context to generate the facets, the /queryables endpoints does not seem the most sensible place to do this (You would have to perform the search on the queryables endpoint e.g /queryables?datetime=...&bbox=...&filter=...).

Other solutions such as Google Custom Search
place facet counts and further refinements in the search context.

philvarner · 2023-01-30T14:22:51Z

Moved to: stac-api-extensions/filter#9

rsmith013 mentioned this issue Sep 17, 2021

Patch 1 #211

Merged

4 tasks

philvarner mentioned this issue Jan 30, 2023

Dynamic queryables difficult for item-search stac-api-extensions/filter#9

Open

philvarner closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Ext: Dynamic queryables difficult for item-search #182

Filter Ext: Dynamic queryables difficult for item-search #182

rsmith013 commented Jul 28, 2021 •

edited

Loading

rsmith013 commented Sep 17, 2021

dwilson1988 commented Oct 26, 2021 •

edited

Loading

rsmith013 commented Nov 1, 2021

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

rsmith013 commented Dec 6, 2021

philvarner commented Jan 30, 2023

Filter Ext: Dynamic queryables difficult for item-search #182

Filter Ext: Dynamic queryables difficult for item-search #182

Comments

rsmith013 commented Jul 28, 2021 • edited Loading

Proposed solutions:

rsmith013 commented Sep 17, 2021

dwilson1988 commented Oct 26, 2021 • edited Loading

rsmith013 commented Nov 1, 2021

CMIP6

CMIP5

Sentinel 1

Sentinel 3

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

dwilson1988 commented Nov 1, 2021

rsmith013 commented Nov 1, 2021

rsmith013 commented Dec 6, 2021

philvarner commented Jan 30, 2023

rsmith013 commented Jul 28, 2021 •

edited

Loading

dwilson1988 commented Oct 26, 2021 •

edited

Loading