-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternator Query with ScanIndexForward = false is inefficient and limited to 100MB #7586
Comments
This patch adds a reproducer for issue #7586 - that Alternator queries (Query) operating in reverse order (ScanIndexForward = false) are artificially limited to 100 MB partitions because of their memory use. This test generates a partition over 100 MB in size and then tries various reverse queries on it - with or without Limit, starting at the end or the middle of the partition. The test currently fails when a reverse query refuses to operate on such a large partition - the log reports this: ERROR ... Memory usage of reversed read exceeds hard limit of 104857600 (configured via max_memory_for_unlimited_query_hard_limit), while reading partition K1H6ON3A1C With yet-uncommitted reverse-scan improvements, the test proceeds further, but still fails where we test that a reverse query with Limit not explicitly specified should still be limited to a certain size (e.g. 1MB) and cannot return the entire 100 MB partition in one response. Please note that this is not a comprehensive test for Scylla's reverse scan implementation: In particular we do not have separate tests for reverse scan's implementation on different sources - memtables, sstables, or the cache. Nor do we check all sorts of edge cases. We assume that Scylla's reverse scan implementation will have its own unit tests elsewhere that will check these things - and this test can focus on the Alternator use case. This test is marked "xfail" because it still fails on Alternator. It is marked "veryslow" because it's a (relatively) slow test, taking multiple seconds to set up the 100 MB partition. So run the test with the pytest options "--runxfail --runveryslow" to see how it fails. Refs #7586 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210930063700.407511-1-nyh@scylladb.com>
Status update: There has been a lot of work recently on improving reverse queries in Scylla. The work is done in for the direct benefit of CQL reverse queries (#1413) but it also benefits Alternator reverse queries because both use the same underlying implementation. We already have a test for reverse queries in Alternator, which can be run by
The test is still failing, but it demonstrates that a reverse query with |
Now that issues scylladb#7586 and scylladb#9487 were fixed, reverse queries - even in long partitions - work well, we can drop the claim in alternator/docs/compatibility.md that reverse queries are buggy for large partitions. We can also remove the "xfail" mark from the tes that checks this feature, as it now passes. Refs scylladb#7586 Refs scylladb#9487 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When the whole cluster is already supporting separate_page_size_and_safety_limit, start sending page_size in read_command. This new value will be used for determining the page size instead of hard_limit. Fixes scylladb#9487 Fixes scylladb#7586 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Now that issues #7586 and #9487 were fixed, reverse queries - even in long partitions - work well, we can drop the claim in alternator/docs/compatibility.md that reverse queries are buggy for large partitions. We can also remove the "xfail" mark from the tes that checks this feature, as it now passes. Refs #7586 Refs #9487 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #9831
The regression test we have for Alternator's issue #9487 (where a reverse query without a Limit given was broken into 100MB pages instead of the expected 1MB) is test_query.py::test_query_reverse_long. But this is a very long test requiring a 100MB partition, and because of its slowness isn't run by default. This patch adds another version of that test, test_query_reverse_longish, which reproduces the same issue #9487 with a partition 50 times shorter (2MB) so it only takes a fraction of a second and can be enabled by default. It also requires much less network traffic which is important when running these tests non-locally. We leave the original test test_query_reverse_long behind, it can be still useful to stress Scylla even beyond the 100MB boundary, but it remains in @veryslow mode so won't run in default test runs. Refs #9487 Refs #7586 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220220161905.852994-1-nyh@scylladb.com>
This issue was fixed by a bunch of fixes to reverse scanning in the underlying database. We need to consider if we need (or don't need) to backport any or all of these fixes. Here are some of the relevant merges I found. This is probably an incomplete list (@kbr- do you have a better list? I don't see any mention of backports in #1413) but it can serve as a useful guide on which release already has a full fix for this issue. 9e74556 - Merge 'Support reverse reads in the row cache natively' from @tgrabiec, Dec 29, 2021 It seems like the last two merges from October are in branch-4.6, but the ones from December are not - so branch 4.6 does not have complete support for ScanIndexForward=false unless we complete the backports. All patches are in branch 5.0, so it seems that branch 5.0 will have full support for this feature and doesn't need more backports for it. |
Not backporting as it's not a regression. |
Note: This is an Alternator (i.e., DynamoDB API) issue. For a similar issue in the CQL API #1413.
The Alternator
Query
operations supports theScanIndexForward=false
parameter, which requests the query to return items in reversed sort order. This feature already works, and uses the same internal mechanism (query::partition_slice::option::reversed
) which we use in CQL'sORDER BY ... DESC
feature.However, as explained in issue #1413, our implementation of
query::partition_slice::option::reversed
is inefficient - because we don't have support for reading a partition backwards so it resorts to reading the entire partition into memory and then reversing it. This isn't just inefficient - it is also limited tomax_memory_for_unlimited_query_hard_limit
(by default 100MB), beyond which the reverse query will fail.Beyond a single query page, we will also want to be able to resume readers on the following pages, even when iterating in reverse order. See issue #6278 about Alternator missing support for resuming readers for forward-ordered queries - we need to remember to do this also for reverse-ordered queries.
The text was updated successfully, but these errors were encountered: