-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Ext: Dynamic queryables difficult for item-search #182
Comments
I have created an extension to the context extension to add In my implementation, the number of collections returned in this response is max 10 as the more collections returned, the less likely there is to be an intersect. The value from this response is then used via Open to thoughts on this approach. I appreciate the filter extension based on OGC OA Feat: Part 3 is still a moving target. |
I'm looking into this now. Has anyone proposed a different response to those working on OGC OA Feat: Part 3 for global queryables? Instead of /queryables returning the intersection, why not separate them out by collection similar to the collections endpoint?
{
"queryables": [
{
"$id": "collection1"
"title": "Collection 1"
...
}, {
"$id": "collection1"
"title": "Collection 1"
...
}
]
}
{
"queryables": [
{
"$id": "collection1"
"title": "Collection 1"
...
}
]
} I'm struggling to understand the logic of using the intersection of queryables given the collections could be RADICALLY different (in our use case, they certainly are). |
@dwilson1988 We are expecting to have many hundreds of collections so I am not sure this would be a scalable approach either. We are working under the assumption that each collection will have its own set of queryables. These could be very different. We are then working to create a separate service that will create global search facets (queryables) and make a mapping between the collection-specific terms and the global set. I am not sure which area you're working in but we are coming from an earth system modeling perspective (climate models mixed with earth observation remote and local sensing) As an example: CMIP6cmip6:source_id -> global:model CMIP5cmip5:model -> global:model Sentinel 1global:general_data_type Sentinel 3global:general_data_type So although I agreed that at the top level, having an intersection will likely tend to zero for heterogenous STAC collections we are hoping to have a few key search facets available at the top level e.g. So to use the above examples, if you were looking for satellite data, then it is likely that processing_level would bubble up from a queryables search. Or if you were looking for model data then model would appear as a queryable. I would be interested to hear other thoughts and approaches of feedback on our proposed answer to the issue. |
@rsmith013 We definitely have a lot of earth observation datasets as well, but a few others. Definitely a wide breadth of data types and very heterogenous field names except for a few that have imposed consistency like start_datetime, end_datetime, datetime, etc. Not sure I see a scaling issue the approach I suggested. Unless there are dozens of fields on every dataset, I wouldn't expect the response to be much larger than just a /collections response? In the end, it's probably not a huge deal, but it would nice to be able to get the queryables for everything without making first a request to collections and then to each queryables endpoint. Free text search is at least partially implemented in CQL/CQL2, I would expect a function there might become the canonical way to do it. I do really like the idea of the context collections extension - I might implement that on our side. |
I am also thinking from a UI perspective. If you return the queryables for each collection in isolation, how do you display that to a user? Even with some kind of post-processing, which could do as I am suggesting where you map terms together, would still result in an unusable interface (assuming you have many varied vocabs). The filters extension is developing, so will keep an eye on that. Always good to remove code. How were you thinking you would handle the client/UI side of things with separated queryables for each collection? |
Well, the queryables by collection is already in the OGC API Features Part 3 spec, but our primary usage is not exactly UI driven, but part API client and part machine to machine. We use the collecitons specific queryables endpoint (collections//queryables) with a Dataset (in our usage, a superset of a Collection) object in our API. So a user of that will be able to check what they are able to use to query it. Our user interface allows a user to browse these datasets individually, so queryables would just be displayed as a list or sorts in that Dataset's display. |
If I understand correctly you have a structure like this: Where collection queryables are an aggregation of the item properties and dataset queryables are an aggregation of the collections associated with it. I guess you are wanting all the queryables in one hit and then would sort it by dataset yourself? I can see why you would want to lose the intersect. I would think something like Memcached would be an answer here. As you don't need all the queryables every time, just a subset for each dataset. Use Memcached (other caches do exist) to store the response from Although the global intersect is flawed where you have heterogeneous collection queryables, I am protecting the idea because I think that from a STAC client (UI or otherwise) perspective, it is more useful. It sounds like your |
No, I wasn't very clear - Dataset is just a STAC Collection + more stuff, not multiple STAC Collections. I'm just looking for flexibility, but querying individual collections isn't a huge burden, just a potential annoyance. |
ok, I'm with you now. |
Thinking deeper on this issue and having played with my original suggestion context-collections it still doesn't quite do the job. The issue is that once you enter a collection, there is no further refinement. As you need to know the search context to generate the facets, the Other solutions such as Google Custom Search |
Moved to: stac-api-extensions/filter#9 |
One feature we are really interested in is the ability to request from the server the possible queryables and their accepted values.
As defined, there are two routes:
/queryables
- The global intersect of all collection queryablesAs noted in the documentation, this falls short in providing a useful interface where an API presents diverse content.
Issues ogcapi-582 ogcapi-576 go some way in addressing this by requesting wildcard schema definitions and
/queryables?collections=collection1,collection2
but I feel this still leaves limitations. Some discussed in the issues themselves.?collections
parameter requires the client to know the collections they are interested in up-front. One of the benefits of item-search is cross-collection search.I wonder whether a more useful approach would be to allow the same query parameters as
/search
on/queryables
.The implementation can then search for the list of results that match and provide the intersect of queryables to the user for further refinement.
e.g.
Return the list of items which match the filter expression
/search?filter=sentinel:data_coverage > 50 OR eo:cloud_cover < 10
Return the queryables available for the results which match the current filter expression
/queryables?filter=sentinel:data_coverage > 50 OR eo:cloud_cover < 10
The
/queryables?collections=collection1,collection2
approach requires the API to return a list of relevant collections to be useful IMO. Perhaps allow the context extension to return an array of collections in the response which are relevant for the current search.Proposed solutions:
/search
query parameters on the/queryables
endpoint to dynamically build the intersect/queryables?collections=...,
The text was updated successfully, but these errors were encountered: