Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RawDocument reports to have next if not all rows can be fetched in a single query #1090

Closed
ivansenic opened this issue Jul 6, 2021 · 4 comments
Assignees
Labels
bug Something isn't working documents API

Comments

@ivansenic
Copy link
Contributor

ivansenic commented Jul 6, 2021

Seems that implementation of the RawDocument#hasNext depends on the number of rows fetched in a single DB query (db page size). And it reports that it has next if not all of them can fit one query.

Steps to reproduce:

  • set DocsApiConfiguration#getSearchPageSize to 1 (can not reproduce with 2 here)
  • run integration tests

So the paging state imo should have nothing to do with the search DB page size. Let's check.

@ivansenic ivansenic added bug Something isn't working documents API labels Jul 6, 2021
@ivansenic
Copy link
Contributor Author

@dimas-b Any idea if this is false alarm?

@dimas-b dimas-b self-assigned this Jul 6, 2021
@dimas-b
Copy link
Contributor

dimas-b commented Jul 6, 2021

DocsApiConfiguration#getSearchPageSize essentially defines the storage-level page size that the Docs API uses for fetching raw document rows.

With 1 row storage-level pages, read requests are satisfied by the first found row, so any read that finds some data will not check if there is more data because it does not have to check for that, so it will return a non-null paging state.

At the Docs API level this means that if we find the number of docs that the user requested (the HTTP-level page size), the last fetched page will most likely still have a paging state, which will translate into the HTTP-level paging state. This is normal because we cannot guarantee that there is no more data available. The next request using the paging state may have an empty result set, but IMHO, this is normal too.

AFAIK, the only case when a 1 row page will not have a paging state at the storage level is when the found row's C* token is the last token in the ring, which is a very rare case, I guess.

A similar situation may arise in tests when the available number of rows matches the storage page size exactly. This is a rate case too.

For the sake of clarity: read requests with multi-row pages will try to fill the page and will actively check for more data if the page is not full, AFAIK, and will look ahead for more data to the end of the token ring and will be able to indicate that no more data is available by providing a null paging state in the result set.

@dimas-b
Copy link
Contributor

dimas-b commented Jul 6, 2021

That said, I'm not sure we can make integration tests absolutely robust in this case. The basic problem is that we cannot always know whether more data is available or not.

Currently tests work by virtue of having a large (1000 rows) default storage-level page size, which covers all test data rows.

With smaller fetch sizes (e.g. 2, 3, etc.) there will be odd cases when the page size is a factor (mathematically) of the number of selected rows. In those cases "exact" reads will also have a paging state. I made a comment about that in QueryExecutorTest.testFullScanFinalPagingState().

I personally think this is an acceptable tradeoff, but please feel free to disagree. I'm closing this issue for now.

@dimas-b dimas-b closed this as completed Jul 6, 2021
@ivansenic
Copy link
Contributor Author

Thanks for clarification @dimas-b 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documents API
Projects
None yet
Development

No branches or pull requests

2 participants