-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paginated search queries now don't return a token on the last page #243
Paginated search queries now don't return a token on the last page #243
Conversation
…quests for 6 items
So I'd suggest to use the count from the search only if |
Maybe we can add to this test or do something similar to make sure that the last page isn't returning a token? #244 |
Greetings, a test for this already exists It used to test for 7 requests for 6 items, now I've fixed it for 6 requests for 6 items.
These are all tests I've fixed up. |
Greetings @StijnCaerts , thank you so much for the feedback. Do you think this approach is correct? search_task = asyncio.create_task(
self.client.search(
index=index_param,
ignore_unavailable=ignore_unavailable,
body=search_body,
size=limit,
)
)
count_task = asyncio.create_task(
self.client.count(
index=index_param,
ignore_unavailable=ignore_unavailable,
body=search.to_dict(count=True),
)
)
try:
es_response = await search_task
except exceptions.NotFoundError:
raise NotFoundError(f"Collections '{collection_ids}' do not exist")
hits = es_response["hits"]["hits"]
items = (hit["_source"] for hit in hits)
matched = es_response["hits"]["total"]["value"]
if es_response["hits"]["total"]["relation"] != "eq":
if count_task.done():
try:
matched = count_task.result().get("count")
except Exception as e:
logger.error(f"Count task failed: {e}")
else:
count_task.cancel()
|
I think either getting the correct count or no count at all would be the preferred behaviour. https://github.com/stac-api-extensions/context |
Is there situations where the count can fail ? Is this why With the addition of the page in the token it's critical to get a matched value everytime. |
Probably the same reason a search request could fail, eg. invalid collection, bad query, ... Without an accurate count, it is impossible to tell if we're on the last page. The only case where you are sure you're on the last page is when the current page size is less than the limit. |
I don't know why it's in a try except myself? A lot of the db stuff was done by an old contributor. |
es_response["hits"]["total"]["value"] is accurate up to 10,000 results. If the actual count_task fails - which is probably unlikely - we can use this maybe because most people are not going to paginate through more than 10,000 results. |
Indeed, if you are paging through more than 10,000 hits, I think there is no harm in 1 extra request with an empty response |
Another option would be to implement this workaround: https://stackoverflow.com/a/67200853/9339603 Pro's:
Con's:
|
@StijnCaerts I've tried implementing this approach search_after = None
if token:
search_after = urlsafe_b64decode(token.encode()).decode().split(",")
query = search.query.to_dict() if search.query else None
index_param = indices(collection_ids)
search_task = asyncio.create_task(
self.client.search(
index=index_param,
ignore_unavailable=ignore_unavailable,
query=query,
sort=sort or DEFAULT_SORT,
search_after=search_after,
size=limit + 1, # Fetch one more result than the limit
)
)
count_task = asyncio.create_task(
self.client.count(
index=index_param,
ignore_unavailable=ignore_unavailable,
body=search.to_dict(count=True),
)
)
try:
es_response = await search_task
except exceptions.NotFoundError:
raise NotFoundError(f"Collections '{collection_ids}' do not exist")
hits = es_response["hits"]["hits"]
items = (hit["_source"] for hit in hits[:limit])
next_token = None
if len(hits) > limit:
if hits and (sort_array := hits[limit - 1].get("sort")):
next_token = urlsafe_b64encode(
",".join([str(x) for x in sort_array]).encode()
).decode()
matched = None
if count_task.done():
try:
matched = count_task.result().get("count")
except Exception as e:
logger.error(f"Count task failed: {e}")
return items, matched, next_token but I'm getting these errors on these tests:
|
As I understand elasticsearch itself doesn't allow limits above 10,000 (by default), so when you pass 10,000 +1 it responds with a bad request. I was a bit confused from why these tests weren't failing on the main branch but it's because of this from stac-fastapi changelog:
|
Ended up applying this #243 (comment) with edge case handling, in this case the 10,000. |
stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py
Outdated
Show resolved
Hide resolved
what do you think @jonhealy1 ? I don't really like this hardcoded 10,000, but it's basically set by:
|
Can we import the max limit from stac-fastapi? |
Yes, we can use this: import stac_fastapi.types.search
max_result_window = stac_fastapi.types.search.Limit.le or would you prefer this: from stac_fastapi.types.search import Limit
max_result_window = Limit.le I prefer option 1 since it makes it clear where the limit comes from. |
I prefer option one too |
added |
stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here Pedro. I will wait until tomorrow to merge just in case anyone else has any thoughts.
stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py
Outdated
Show resolved
Hide resolved
I just left one small remark. Otherwise it looks good to me, thanks for the contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved!
Related Issue(s):
Merge dependencie(s):
Description:
test_pagination_token_idempotent
had and indentation issueexecute_search
to make use ofes_response["hits"]["total"]["value"]
PR Checklist:
pre-commit run --all-files
)make test
)