Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator Query with ScanIndexForward = false is inefficient and limited to 100MB #7586

Closed
nyh opened this issue Nov 10, 2020 · 3 comments
Closed
Assignees
Labels
area/alternator Alternator related Issues bug user request
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented Nov 10, 2020

Note: This is an Alternator (i.e., DynamoDB API) issue. For a similar issue in the CQL API #1413.

The Alternator Query operations supports the ScanIndexForward=false parameter, which requests the query to return items in reversed sort order. This feature already works, and uses the same internal mechanism (query::partition_slice::option::reversed) which we use in CQL's ORDER BY ... DESC feature.

However, as explained in issue #1413, our implementation of query::partition_slice::option::reversed is inefficient - because we don't have support for reading a partition backwards so it resorts to reading the entire partition into memory and then reversing it. This isn't just inefficient - it is also limited to max_memory_for_unlimited_query_hard_limit (by default 100MB), beyond which the reverse query will fail.

Beyond a single query page, we will also want to be able to resume readers on the following pages, even when iterating in reverse order. See issue #6278 about Alternator missing support for resuming readers for forward-ordered queries - we need to remember to do this also for reverse-ordered queries.

@nyh nyh added bug area/alternator Alternator related Issues labels Nov 10, 2020
@slivne slivne added this to the 4.x milestone Nov 15, 2020
@dorlaor dorlaor self-assigned this Jul 13, 2021
@nyh nyh self-assigned this Sep 30, 2021
psarna pushed a commit that referenced this issue Sep 30, 2021
This patch adds a reproducer for issue #7586 - that Alternator queries
(Query) operating in reverse order (ScanIndexForward = false) are
artificially limited to 100 MB partitions because of their memory use.

This test generates a partition over 100 MB in size and then tries various
reverse queries on it - with or without Limit, starting at the end or
the middle of the partition. The test currently fails when a reverse query
refuses to operate on such a large partition - the log reports this:

  ERROR ... Memory usage of reversed read exceeds hard limit of 104857600
  (configured via max_memory_for_unlimited_query_hard_limit), while reading
  partition K1H6ON3A1C

With yet-uncommitted reverse-scan improvements, the test proceeds further,
but still fails where we test that a reverse query with Limit not
explicitly specified should still be limited to a certain size (e.g. 1MB)
and cannot return the entire 100 MB partition in one response.

Please note that this is not a comprehensive test for Scylla's reverse
scan implementation: In particular we do not have separate tests for
reverse scan's implementation on different sources - memtables, sstables,
or the cache. Nor do we check all sorts of edge cases. We assume that
Scylla's reverse scan implementation will have its own unit tests
elsewhere that will check these things - and this test can focus on the
Alternator use case.

This test is marked "xfail" because it still fails on Alternator. It is
marked "veryslow" because it's a (relatively) slow test, taking multiple
seconds to set up the 100 MB partition. So run the test with the
pytest options "--runxfail --runveryslow" to see how it fails.

Refs #7586

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210930063700.407511-1-nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Oct 18, 2021

Status update:

There has been a lot of work recently on improving reverse queries in Scylla. The work is done in for the direct benefit of CQL reverse queries (#1413) but it also benefits Alternator reverse queries because both use the same underlying implementation.

We already have a test for reverse queries in Alternator, which can be run by

test/alternator/run --runveryslow --runxfail test_query.py::test_query_reverse_long

The test is still failing, but it demonstrates that a reverse query with Limit does work as expected now - even on very large partition - and the remaining problem is for queries that don't specify a Limit. Such queries should still be limited to 1MB per page - but today aren't. This is issue #9487.

nyh added a commit to nyh/scylla that referenced this issue Dec 21, 2021
Now that issues scylladb#7586 and scylladb#9487 were fixed, reverse queries - even in
long partitions - work well, we can drop the claim in
alternator/docs/compatibility.md that reverse queries are buggy for
large partitions.

We can also remove the "xfail" mark from the tes that checks this
feature, as it now passes.

Refs scylladb#7586
Refs scylladb#9487

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
mikolajsieluzycki pushed a commit to mikolajsieluzycki/scylla that referenced this issue Jan 3, 2022
When the whole cluster is already supporting
separate_page_size_and_safety_limit,
start sending page_size in read_command. This new value will be used
for determining the page size instead of hard_limit.

Fixes scylladb#9487
Fixes scylladb#7586

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
avikivity pushed a commit that referenced this issue Jan 16, 2022
Now that issues #7586 and #9487 were fixed, reverse queries - even in
long partitions - work well, we can drop the claim in
alternator/docs/compatibility.md that reverse queries are buggy for
large partitions.

We can also remove the "xfail" mark from the tes that checks this
feature, as it now passes.

Refs #7586
Refs #9487

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #9831
psarna pushed a commit that referenced this issue Feb 21, 2022
The regression test we have for Alternator's issue #9487 (where a reverse
query without a Limit given was broken into 100MB pages instead of the
expected 1MB) is test_query.py::test_query_reverse_long. But this is a
very long test requiring a 100MB partition, and because of its slowness
isn't run by default.

This patch adds another version of that test, test_query_reverse_longish,
which reproduces the same issue #9487 with a partition 50 times shorter
(2MB) so it only takes a fraction of a second and can be enabled by
default. It also requires much less network traffic which is important
when running these tests non-locally.

We leave the original test test_query_reverse_long behind, it can be
still useful to stress Scylla even beyond the 100MB boundary, but it
remains in @veryslow mode so won't run in default test runs.

Refs #9487
Refs #7586

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220220161905.852994-1-nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Mar 14, 2022

This issue was fixed by a bunch of fixes to reverse scanning in the underlying database. We need to consider if we need (or don't need) to backport any or all of these fixes.

Here are some of the relevant merges I found. This is probably an incomplete list (@kbr- do you have a better list? I don't see any mention of backports in #1413) but it can serve as a useful guide on which release already has a full fix for this issue.

9e74556 - Merge 'Support reverse reads in the row cache natively' from @tgrabiec, Dec 29, 2021
4a32377 - Merge 'Use the same page size limit in reverse queries as in forward reads' from @haaawk, Dec 29, 2021
d8832b9 - Merge 'Memtable make reversing reader' from @enedil , Oct. 13, 2021
e89b979 - Merge 'sstable mx reader: implement reverse single-partition reads' from @kbr-, Oct. 4, 2021

It seems like the last two merges from October are in branch-4.6, but the ones from December are not - so branch 4.6 does not have complete support for ScanIndexForward=false unless we complete the backports.

All patches are in branch 5.0, so it seems that branch 5.0 will have full support for this feature and doesn't need more backports for it.

@avikivity
Copy link
Member

Not backporting as it's not a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alternator Alternator related Issues bug user request
Projects
None yet
Development

No branches or pull requests

6 participants