Paginated search queries always returns a next token even if it's the last page of the pagination. #242

pedro-cf · 2024-05-04T23:10:00Z

Describe the bug
Paginated search queries always returns a next token even if it's the last page of the pagination.

To Reproduce
Steps to reproduce the behavior:

Run app docker-compose up
Create test collection
Creation 1 test item
Run search query with "limit":1 -> Response will contain an "next link"

Expected behavior
I assume the last page of the pagination should not include a "next link"

The text was updated successfully, but these errors were encountered:

jonhealy1 · 2024-05-05T03:30:35Z

It's true. If there are no more items, there shouldn't be a next link.

pedro-cf · 2024-05-05T10:52:18Z

stac-fastapi-elasticsearch-opensearch/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines 524 to 604 in f797ed4

    
               async def execute_search( 
        
                   self, 
        
                   search: Search, 
        
                   limit: int, 
        
                   token: Optional[str], 
        
                   sort: Optional[Dict[str, Dict[str, str]]], 
        
                   collection_ids: Optional[List[str]], 
        
                   ignore_unavailable: bool = True, 
        
               ) -> Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]: 
        
                   """Execute a search query with limit and other optional parameters. 
        
                   Args: 
        
                       search (Search): The search query to be executed. 
        
                       limit (int): The maximum number of results to be returned. 
        
                       token (Optional[str]): The token used to return the next set of results. 
        
                       sort (Optional[Dict[str, Dict[str, str]]]): Specifies how the results should be sorted. 
        
                       collection_ids (Optional[List[str]]): The collection ids to search. 
        
                       ignore_unavailable (bool, optional): Whether to ignore unavailable collections. Defaults to True. 
        
                   Returns: 
        
                       Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]: A tuple containing: 
        
                           - An iterable of search results, where each result is a dictionary with keys and values representing the 
        
                           fields and values of each document. 
        
                           - The total number of results (if the count could be computed), or None if the count could not be 
        
                           computed. 
        
                           - The token to be used to retrieve the next set of results, or None if there are no more results. 
        
                   Raises: 
        
                       NotFoundError: If the collections specified in `collection_ids` do not exist. 
        
                   """ 
        
                   search_after = None 
        
                   if token: 
        
                       search_after = urlsafe_b64decode(token.encode()).decode().split(",") 
        
                   query = search.query.to_dict() if search.query else None 
        
                   index_param = indices(collection_ids) 
        
                   search_task = asyncio.create_task( 
        
                       self.client.search( 
        
                           index=index_param, 
        
                           ignore_unavailable=ignore_unavailable, 
        
                           query=query, 
        
                           sort=sort or DEFAULT_SORT, 
        
                           search_after=search_after, 
        
                           size=limit, 
        
                       ) 
        
                   ) 
        
                   count_task = asyncio.create_task( 
        
                       self.client.count( 
        
                           index=index_param, 
        
                           ignore_unavailable=ignore_unavailable, 
        
                           body=search.to_dict(count=True), 
        
                       ) 
        
                   ) 
        
                   try: 
        
                       es_response = await search_task 
        
                   except exceptions.NotFoundError: 
        
                       raise NotFoundError(f"Collections '{collection_ids}' do not exist") 
        
                   hits = es_response["hits"]["hits"] 
        
                   items = (hit["_source"] for hit in hits) 
        
                   next_token = None 
        
                   if hits and (sort_array := hits[-1].get("sort")): 
        
                       next_token = urlsafe_b64encode( 
        
                           ",".join([str(x) for x in sort_array]).encode() 
        
                       ).decode() 
        
                   # (1) count should not block returning results, so don't wait for it to be done 
        
                   # (2) don't cancel the task so that it will populate the ES cache for subsequent counts 
        
                   maybe_count = None 
        
                   if count_task.done(): 
        
                       try: 
        
                           maybe_count = count_task.result().get("count") 
        
                       except Exception as e: 
        
                           logger.error(f"Count task failed: {e}") 
        
                   return items, maybe_count, next_token

Issue is here, but I'm not quite sure how to fix this, cause I'm not so familiar with elasticsearch.

pedro-cf · 2024-05-05T16:07:41Z

I can't seem to figure out a good way to obtain the "page" number in elasticsearch.

We could pass a page number inside the token, but this would cause issues in other places sadly:

my example:

    async def execute_search(
...
    ) -> Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]:
        """
...
        """
        search_after = None
        page = 1

        if token:
            decoded_token = urlsafe_b64decode(token.encode()).decode().split(",")
            search_after = decoded_token[:-1]
            page = int(decoded_token[-1]) + 1

        query = search.query.to_dict() if search.query else None

        index_param = indices(collection_ids)

        search_task = asyncio.create_task(
            self.client.search(
                index=index_param,
                ignore_unavailable=ignore_unavailable,
                query=query,
                sort=sort or DEFAULT_SORT,
                search_after=search_after,
                size=limit,
            )
        )

        try:
            es_response = await search_task
        except exceptions.NotFoundError:
            raise NotFoundError(f"Collections '{collection_ids}' do not exist")

        matched = es_response["hits"]["total"]["value"]
        hits = es_response["hits"]["hits"]
        items = (hit["_source"] for hit in hits)

        next_token = None
        if matched > page * limit:
            if hits and (sort_array := hits[-1].get("sort")):
                next_token = urlsafe_b64encode(
                    ",".join([str(x) for x in sort_array] + [str(page)]).encode()
                ).decode()

        return items, matched, next_token

Note that this version of this function also makes use of es_response["hits"]["total"]["value"] which from what I understand represents the number of matches which would be a great replacement for:

stac-fastapi-elasticsearch-opensearch/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines 573 to 579 in f797ed4

    
           count_task = asyncio.create_task( 
        
               self.client.count( 
        
                   index=index_param, 
        
                   ignore_unavailable=ignore_unavailable, 
        
                   body=search.to_dict(count=True), 
        
               ) 
        
           )

stac-fastapi-elasticsearch-opensearch/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines 597 to 604 in f797ed4

    
           maybe_count = None 
        
           if count_task.done(): 
        
               try: 
        
                   maybe_count = count_task.result().get("count") 
        
               except Exception as e: 
        
                   logger.error(f"Count task failed: {e}") 
        
           return items, maybe_count, next_token

jonhealy1 · 2024-05-06T04:56:35Z

I added another test related to this here: #244

) **Related Issue(s):** - #242 **Merge dependencie(s):** - #241 **Description:** - Paginated search queries now don't return a token on the last page. - Made some fixes to the respective tests. In particular `test_pagination_token_idempotent` had and indentation issue - Improved `execute_search` to make use of `es_response["hits"]["total"]["value"]` **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog --------- Co-authored-by: Jonathan Healy <jonathan.d.healy@gmail.com>

pedro-cf mentioned this issue May 5, 2024

Paginated search queries now don't return a token on the last page #243

Merged

4 tasks

jonhealy1 mentioned this issue May 6, 2024

Add pagination test to verify the absence of a 'next' link on the last page of results #244

Merged

4 tasks

pedro-cf closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paginated search queries always returns a next token even if it's the last page of the pagination. #242

Paginated search queries always returns a next token even if it's the last page of the pagination. #242

pedro-cf commented May 4, 2024 •

edited

Loading

jonhealy1 commented May 5, 2024

pedro-cf commented May 5, 2024

pedro-cf commented May 5, 2024 •

edited

Loading

jonhealy1 commented May 6, 2024

Paginated search queries always returns a next token even if it's the last page of the pagination. #242

Paginated search queries always returns a next token even if it's the last page of the pagination. #242

Comments

pedro-cf commented May 4, 2024 • edited Loading

jonhealy1 commented May 5, 2024

pedro-cf commented May 5, 2024

pedro-cf commented May 5, 2024 • edited Loading

jonhealy1 commented May 6, 2024

pedro-cf commented May 4, 2024 •

edited

Loading

pedro-cf commented May 5, 2024 •

edited

Loading