Skip to content

Commit 0a654da

Browse files
authored
GET /collections filter extension (#475)
**Related Issue(s):** - #459 **Description:** - Added support for both cql2-json and cql2-text in GET /collections **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
1 parent 57afb55 commit 0a654da

File tree

8 files changed

+227
-26
lines changed

8 files changed

+227
-26
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### Added
1111

12+
- GET `/collections` collection search structured filter extension with support for both cql2-json and cql2-text formats. [#475](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/475)
13+
1214
### Changed
1315

1416
### Fixed

README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,18 @@ SFEOS implements extended capabilities for the `/collections` endpoint, allowing
131131
- Searches across multiple text fields including title, description, and keywords
132132
- Supports partial word matching and relevance-based sorting
133133

134+
- **Structured Filtering**: Filter collections using CQL2 expressions
135+
- JSON format: `/collections?filter={"op":"=","args":[{"property":"id"},"sentinel-2"]}&filter-lang=cql2-json`
136+
- Text format: `/collections?filter=id='sentinel-2'&filter-lang=cql2-text` (note: string values must be quoted)
137+
- Advanced text format: `/collections?filter=id LIKE '%sentinel%'&filter-lang=cql2-text` (supports LIKE, BETWEEN, etc.)
138+
- Supports both CQL2 JSON and CQL2 text formats with various operators
139+
- Enables precise filtering on any collection property
140+
141+
> **Note on HTTP Methods**: All collection search extensions (sorting, field selection, free text search, and structured filtering) currently only support GET requests. POST requests with these parameters in the request body are not yet supported.
142+
134143
These extensions make it easier to build user interfaces that display and navigate through collections efficiently.
135144

136-
> **Configuration**: Collection search extensions can be disabled by setting the `ENABLE_COLLECTIONS_SEARCH` environment variable to `false`. By default, these extensions are enabled.
145+
> **Configuration**: Collection search extensions (sorting, field selection, free text search, and structured filtering) can be disabled by setting the `ENABLE_COLLECTIONS_SEARCH` environment variable to `false`. By default, these extensions are enabled.
137146
138147
> **Note**: Sorting is only available on fields that are indexed for sorting in Elasticsearch/OpenSearch. With the default mappings, you can sort on:
139148
> - `id` (keyword field)
@@ -156,7 +165,7 @@ This project is organized into several packages, each with a specific purpose:
156165
- Shared logic and utilities that improve code reuse between backends
157166

158167
- **stac_fastapi_elasticsearch**: Complete implementation of the STAC API using Elasticsearch as the backend database. This package depends on both `stac_fastapi_core` and `sfeos_helpers`.
159-
-
168+
160169
- **stac_fastapi_opensearch**: Complete implementation of the STAC API using OpenSearch as the backend database. This package depends on both `stac_fastapi_core` and `sfeos_helpers`.
161170

162171
## Examples

stac_fastapi/core/stac_fastapi/core/core.py

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,8 @@ async def all_collections(
228228
self,
229229
fields: Optional[List[str]] = None,
230230
sortby: Optional[str] = None,
231+
filter_expr: Optional[str] = None,
232+
filter_lang: Optional[str] = None,
231233
q: Optional[Union[str, List[str]]] = None,
232234
**kwargs,
233235
) -> stac_types.Collections:
@@ -236,7 +238,9 @@ async def all_collections(
236238
Args:
237239
fields (Optional[List[str]]): Fields to include or exclude from the results.
238240
sortby (Optional[str]): Sorting options for the results.
239-
q (Optional[List[str]]): Free text search terms.
241+
filter_expr (Optional[str]): Structured filter expression in CQL2 JSON or CQL2-text format.
242+
filter_lang (Optional[str]): Must be 'cql2-json' or 'cql2-text' if specified, other values will result in an error.
243+
q (Optional[Union[str, List[str]]]): Free text search terms.
240244
**kwargs: Keyword arguments from the request.
241245
242246
Returns:
@@ -276,8 +280,61 @@ async def all_collections(
276280
if q is not None:
277281
q_list = [q] if isinstance(q, str) else q
278282

283+
# Parse the filter parameter if provided
284+
parsed_filter = None
285+
if filter_expr is not None:
286+
try:
287+
# Check if filter_lang is specified and not one of the supported formats
288+
if filter_lang is not None and filter_lang not in [
289+
"cql2-json",
290+
"cql2-text",
291+
]:
292+
# Raise an error for unsupported filter languages
293+
raise HTTPException(
294+
status_code=400,
295+
detail=f"Input should be 'cql2-json' or 'cql2-text' for collections. Got '{filter_lang}'.",
296+
)
297+
298+
# Handle different filter formats
299+
try:
300+
if filter_lang == "cql2-text" or filter_lang is None:
301+
# For cql2-text or when no filter_lang is specified, try both formats
302+
try:
303+
# First try to parse as JSON
304+
parsed_filter = orjson.loads(unquote_plus(filter_expr))
305+
except Exception:
306+
# If that fails, use pygeofilter to convert CQL2-text to CQL2-JSON
307+
try:
308+
# Parse CQL2-text and convert to CQL2-JSON
309+
text_filter = unquote_plus(filter_expr)
310+
parsed_ast = parse_cql2_text(text_filter)
311+
parsed_filter = to_cql2(parsed_ast)
312+
except Exception as e:
313+
# If parsing fails, provide a helpful error message
314+
raise HTTPException(
315+
status_code=400,
316+
detail=f"Invalid CQL2-text filter: {e}. Please check your syntax.",
317+
)
318+
else:
319+
# For explicit cql2-json, parse as JSON
320+
parsed_filter = orjson.loads(unquote_plus(filter_expr))
321+
except Exception as e:
322+
# Catch any other parsing errors
323+
raise HTTPException(
324+
status_code=400, detail=f"Error parsing filter: {e}"
325+
)
326+
except Exception as e:
327+
raise HTTPException(
328+
status_code=400, detail=f"Invalid filter parameter: {e}"
329+
)
330+
279331
collections, next_token = await self.database.get_all_collections(
280-
token=token, limit=limit, request=request, sort=sort, q=q_list
332+
token=token,
333+
limit=limit,
334+
request=request,
335+
sort=sort,
336+
q=q_list,
337+
filter=parsed_filter,
281338
)
282339

283340
# Apply field filtering if fields parameter was provided

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/app.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,10 @@
3434
create_collection_index,
3535
create_index_templates,
3636
)
37-
from stac_fastapi.extensions.core import ( # CollectionSearchFilterExtension,
37+
from stac_fastapi.extensions.core import (
3838
AggregationExtension,
3939
CollectionSearchExtension,
40+
CollectionSearchFilterExtension,
4041
FilterExtension,
4142
FreeTextExtension,
4243
SortExtension,
@@ -123,9 +124,9 @@
123124
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
124125
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
125126
FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
126-
# CollectionSearchFilterExtension(
127-
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
128-
# ),
127+
CollectionSearchFilterExtension(
128+
conformance_classes=[FilterConformanceClasses.COLLECTIONS]
129+
),
129130
FreeTextExtension(conformance_classes=[FreeTextConformanceClasses.COLLECTIONS]),
130131
]
131132

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,7 @@ async def get_all_collections(
176176
request: Request,
177177
sort: Optional[List[Dict[str, Any]]] = None,
178178
q: Optional[List[str]] = None,
179+
filter: Optional[Dict[str, Any]] = None,
179180
) -> Tuple[List[Dict[str, Any]], Optional[str]]:
180181
"""Retrieve a list of collections from Elasticsearch, supporting pagination.
181182
@@ -185,6 +186,7 @@ async def get_all_collections(
185186
request (Request): The FastAPI request object.
186187
sort (Optional[List[Dict[str, Any]]]): Optional sort parameter from the request.
187188
q (Optional[List[str]]): Free text search terms.
189+
filter (Optional[Dict[str, Any]]): Structured query in CQL2 format.
188190
189191
Returns:
190192
A tuple of (collections, next pagination token if any).
@@ -225,6 +227,9 @@ async def get_all_collections(
225227
if token:
226228
body["search_after"] = [token]
227229

230+
# Build the query part of the body
231+
query_parts = []
232+
228233
# Apply free text query if provided
229234
if q:
230235
# For collections, we want to search across all relevant fields
@@ -251,10 +256,27 @@ async def get_all_collections(
251256
}
252257
)
253258

254-
# Add the query to the body using bool query with should clauses
255-
body["query"] = {
256-
"bool": {"should": should_clauses, "minimum_should_match": 1}
257-
}
259+
# Add the free text query to the query parts
260+
query_parts.append(
261+
{"bool": {"should": should_clauses, "minimum_should_match": 1}}
262+
)
263+
264+
# Apply structured filter if provided
265+
if filter:
266+
# Convert string filter to dict if needed
267+
if isinstance(filter, str):
268+
filter = orjson.loads(filter)
269+
# Convert the filter to an Elasticsearch query using the filter module
270+
es_query = filter_module.to_es(await self.get_queryables_mapping(), filter)
271+
query_parts.append(es_query)
272+
273+
# Combine all query parts with AND logic
274+
if query_parts:
275+
body["query"] = (
276+
query_parts[0]
277+
if len(query_parts) == 1
278+
else {"bool": {"must": query_parts}}
279+
)
258280

259281
# Execute the search
260282
response = await self.client.search(

stac_fastapi/opensearch/stac_fastapi/opensearch/app.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,10 @@
2828
from stac_fastapi.core.route_dependencies import get_route_dependencies
2929
from stac_fastapi.core.session import Session
3030
from stac_fastapi.core.utilities import get_bool_env
31-
from stac_fastapi.extensions.core import ( # CollectionSearchFilterExtension,
31+
from stac_fastapi.extensions.core import (
3232
AggregationExtension,
3333
CollectionSearchExtension,
34+
CollectionSearchFilterExtension,
3435
FilterExtension,
3536
FreeTextExtension,
3637
SortExtension,
@@ -123,9 +124,9 @@
123124
# QueryExtension(conformance_classes=[QueryConformanceClasses.COLLECTIONS]),
124125
SortExtension(conformance_classes=[SortConformanceClasses.COLLECTIONS]),
125126
FieldsExtension(conformance_classes=[FieldsConformanceClasses.COLLECTIONS]),
126-
# CollectionSearchFilterExtension(
127-
# conformance_classes=[FilterConformanceClasses.COLLECTIONS]
128-
# ),
127+
CollectionSearchFilterExtension(
128+
conformance_classes=[FilterConformanceClasses.COLLECTIONS]
129+
),
129130
FreeTextExtension(conformance_classes=[FreeTextConformanceClasses.COLLECTIONS]),
130131
]
131132

stac_fastapi/opensearch/stac_fastapi/opensearch/database_logic.py

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -160,15 +160,17 @@ async def get_all_collections(
160160
request: Request,
161161
sort: Optional[List[Dict[str, Any]]] = None,
162162
q: Optional[List[str]] = None,
163+
filter: Optional[Dict[str, Any]] = None,
163164
) -> Tuple[List[Dict[str, Any]], Optional[str]]:
164-
"""Retrieve a list of collections from Elasticsearch, supporting pagination.
165+
"""Retrieve a list of collections from Opensearch, supporting pagination.
165166
166167
Args:
167168
token (Optional[str]): The pagination token.
168169
limit (int): The number of results to return.
169170
request (Request): The FastAPI request object.
170171
sort (Optional[List[Dict[str, Any]]]): Optional sort parameter from the request.
171172
q (Optional[List[str]]): Free text search terms.
173+
filter (Optional[Dict[str, Any]]): Structured query in CQL2 format.
172174
173175
Returns:
174176
A tuple of (collections, next pagination token if any).
@@ -191,7 +193,7 @@ async def get_all_collections(
191193
raise HTTPException(
192194
status_code=400,
193195
detail=f"Field '{field}' is not sortable. Sortable fields are: {', '.join(sortable_fields)}. "
194-
+ "Text fields are not sortable by default in OpenSearch. "
196+
+ "Text fields are not sortable by default in Opensearch. "
195197
+ "To make a field sortable, update the mapping to use 'keyword' type or add a '.keyword' subfield. ",
196198
)
197199
formatted_sort.append({field: {"order": direction}})
@@ -209,6 +211,9 @@ async def get_all_collections(
209211
if token:
210212
body["search_after"] = [token]
211213

214+
# Build the query part of the body
215+
query_parts = []
216+
212217
# Apply free text query if provided
213218
if q:
214219
# For collections, we want to search across all relevant fields
@@ -235,11 +240,29 @@ async def get_all_collections(
235240
}
236241
)
237242

238-
# Add the query to the body using bool query with should clauses
239-
body["query"] = {
240-
"bool": {"should": should_clauses, "minimum_should_match": 1}
241-
}
243+
# Add the free text query to the query parts
244+
query_parts.append(
245+
{"bool": {"should": should_clauses, "minimum_should_match": 1}}
246+
)
247+
248+
# Apply structured filter if provided
249+
if filter:
250+
# Convert string filter to dict if needed
251+
if isinstance(filter, str):
252+
filter = orjson.loads(filter)
253+
# Convert the filter to an Opensearch query using the filter module
254+
es_query = filter_module.to_es(await self.get_queryables_mapping(), filter)
255+
query_parts.append(es_query)
256+
257+
# Combine all query parts with AND logic
258+
if query_parts:
259+
body["query"] = (
260+
query_parts[0]
261+
if len(query_parts) == 1
262+
else {"bool": {"must": query_parts}}
263+
)
242264

265+
# Execute the search
243266
response = await self.client.search(
244267
index=COLLECTIONS_INDEX,
245268
body=body,
@@ -255,7 +278,6 @@ async def get_all_collections(
255278

256279
next_token = None
257280
if len(hits) == limit:
258-
# Ensure we have a valid sort value for next_token
259281
next_token_values = hits[-1].get("sort")
260282
if next_token_values:
261283
next_token = next_token_values[0]
@@ -276,7 +298,7 @@ async def get_one_item(self, collection_id: str, item_id: str) -> Dict:
276298
NotFoundError: If the specified Item does not exist in the Collection.
277299
278300
Notes:
279-
The Item is retrieved from the Elasticsearch database using the `client.get` method,
301+
The Item is retrieved from the Opensearch database using the `client.get` method,
280302
with the index for the Collection as the target index and the combined `mk_item_id` as the document id.
281303
"""
282304
try:

0 commit comments

Comments
 (0)