RawDocument reports to have next if not all rows can be fetched in a single query #1090

ivansenic · 2021-07-06T14:42:12Z

Seems that implementation of the RawDocument#hasNext depends on the number of rows fetched in a single DB query (db page size). And it reports that it has next if not all of them can fit one query.

Steps to reproduce:

set DocsApiConfiguration#getSearchPageSize to 1 (can not reproduce with 2 here)
run integration tests

So the paging state imo should have nothing to do with the search DB page size. Let's check.

The text was updated successfully, but these errors were encountered:

ivansenic · 2021-07-06T14:42:26Z

@dimas-b Any idea if this is false alarm?

dimas-b · 2021-07-06T16:35:55Z

DocsApiConfiguration#getSearchPageSize essentially defines the storage-level page size that the Docs API uses for fetching raw document rows.

With 1 row storage-level pages, read requests are satisfied by the first found row, so any read that finds some data will not check if there is more data because it does not have to check for that, so it will return a non-null paging state.

At the Docs API level this means that if we find the number of docs that the user requested (the HTTP-level page size), the last fetched page will most likely still have a paging state, which will translate into the HTTP-level paging state. This is normal because we cannot guarantee that there is no more data available. The next request using the paging state may have an empty result set, but IMHO, this is normal too.

AFAIK, the only case when a 1 row page will not have a paging state at the storage level is when the found row's C* token is the last token in the ring, which is a very rare case, I guess.

A similar situation may arise in tests when the available number of rows matches the storage page size exactly. This is a rate case too.

For the sake of clarity: read requests with multi-row pages will try to fill the page and will actively check for more data if the page is not full, AFAIK, and will look ahead for more data to the end of the token ring and will be able to indicate that no more data is available by providing a null paging state in the result set.

dimas-b · 2021-07-06T16:43:58Z

That said, I'm not sure we can make integration tests absolutely robust in this case. The basic problem is that we cannot always know whether more data is available or not.

Currently tests work by virtue of having a large (1000 rows) default storage-level page size, which covers all test data rows.

With smaller fetch sizes (e.g. 2, 3, etc.) there will be odd cases when the page size is a factor (mathematically) of the number of selected rows. In those cases "exact" reads will also have a paging state. I made a comment about that in QueryExecutorTest.testFullScanFinalPagingState().

I personally think this is an acceptable tradeoff, but please feel free to disagree. I'm closing this issue for now.

ivansenic · 2021-07-07T08:34:33Z

Thanks for clarification @dimas-b 👍

ivansenic added bug Something isn't working documents API labels Jul 6, 2021

dimas-b self-assigned this Jul 6, 2021

dimas-b closed this as completed Jul 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RawDocument reports to have next if not all rows can be fetched in a single query #1090

RawDocument reports to have next if not all rows can be fetched in a single query #1090

ivansenic commented Jul 6, 2021 •

edited by sync-by-unito bot

ivansenic commented Jul 6, 2021

dimas-b commented Jul 6, 2021

dimas-b commented Jul 6, 2021 •

edited

ivansenic commented Jul 7, 2021

RawDocument reports to have next if not all rows can be fetched in a single query #1090

RawDocument reports to have next if not all rows can be fetched in a single query #1090

Comments

ivansenic commented Jul 6, 2021 • edited by sync-by-unito bot

ivansenic commented Jul 6, 2021

dimas-b commented Jul 6, 2021

dimas-b commented Jul 6, 2021 • edited

ivansenic commented Jul 7, 2021

ivansenic commented Jul 6, 2021 •

edited by sync-by-unito bot

dimas-b commented Jul 6, 2021 •

edited