New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restriction on clustering-key only should require ALLOW FILTERING #7608
Comments
Here is how we initially decided to change the behaviour. Our thinking has since evolved; we now want to match Cassandra and require |
And here is where we decided to switch back. |
The original goal of this patch was to replace the two single-node dtests allow_filtering_test and allow_filtering_secondary_indexes_test, which recently caused us problems when we wanted to change the ALLOW FILTERING behavior but the tests were outside the tree. I'm hoping that after this patch, those two tests could be removed from dtest. But this patch actually tests more cases then those original dtest, and moreover tests not just whether ALLOW FILTERING is required or not, but also that the results of the filtering is correct. Currently, four of the included tests are expected to fail ("xfail") on Scylla, reproducing two issues: 1. Refs #5545: "WHERE x IN ..." on indexed column x wrongly requires ALLOW FILTERING 2. Refs #7608: "WHERE c=1" on clustering key c should require ALLOW FILTERING, but doesn't. All tests, except the one for issue #5545, pass on Cassandra. That one fails on Cassandra because doesn't support IN on an indexed column at all (regardless of whether ALLOW FILTERING is used or not). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201115124631.1224888-1-nyh@scylladb.com>
The original goal of this patch was to replace the two single-node dtests allow_filtering_test and allow_filtering_secondary_indexes_test, which recently caused us problems when we wanted to change the ALLOW FILTERING behavior but the tests were outside the tree. I'm hoping that after this patch, those two tests could be removed from dtest. But this patch actually tests more cases then those original dtest, and moreover tests not just whether ALLOW FILTERING is required or not, but also that the results of the filtering is correct. Currently, four of the included tests are expected to fail ("xfail") on Scylla, reproducing two issues: 1. Refs scylladb#5545: "WHERE x IN ..." on indexed column x wrongly requires ALLOW FILTERING 2. Refs scylladb#7608: "WHERE c=1" on clustering key c should require ALLOW FILTERING, but doesn't. All tests, except the one for issue scylladb#5545, pass on Cassandra. That one fails on Cassandra because doesn't support IN on an indexed column at all (regardless of whether ALLOW FILTERING is used or not). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201115124631.1224888-1-nyh@scylladb.com>
Another example of an inconsistency of the current behavior:
This doesn't make sense. If we come up with some justification why we should allow the first query without |
…ering key' from Jan Ciołek Add examples from issue #8991 to tests Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3` First test tests that selecting values from indexed table using only clustering key returns correct values. The second test tests that performing this operation requires filtering. The filtering test looks similar to [the one for #7608](https://github.com/scylladb/scylla/blob/1924e8d2b63e6611b78ac6252b5ddbc4884f9d22/test/cql-pytest/test_allow_filtering.py#L124) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths. Contains a quick fix for the `needs_filtering()` function to make these tests pass. It returns `true` for this case and the one described in #7708. This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results. Fixes #8991. Fixes #7708. Closes #8994 * github.com:scylladb/scylla: cql3: Fix need_filtering on indexed table cql-pytest: Test selecting using only clustering key requires filtering cql-pytest: Test selecting from indexed table using clustering key
When a query covers multiple partitions, require that it also says ALLOW FILTERING, even if do_filter() isn't invoked. Fixes scylladb#7608; see comments there for justification. Tests: unit (debug, dev), dtest (next-gating, cql*) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
GitHub closed this automatically due to overzealous interpretation of a dtest commit message. In fact, it needs to get fixed in the Scylla repo. Reopening. |
When a query covers multiple partitions, require that it also says ALLOW FILTERING, even if do_filter() isn't invoked. Fixes scylladb#7608; see comments there for justification. Tests: unit (debug, dev), dtest (next-gating, cql*) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
When a query requests a partition slice but doesn't limit the number of partitions, require that it also says ALLOW FILTERING. Although do_filter() isn't invoked for such queries, the performance can still be unexpectedly slow, and we want to signal that to the user by demanding they explicitly say ALLOW FILTERING. Because we now reject queries that worked fine before, existing applications can break. Therefore, the behavior is controlled by a flag currently defaulting to off. We will default to "on" in the next Scylla version. Fixes scylladb#7608; see comments there for justification. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
When a query requests a partition slice but doesn't limit the number of partitions, require that it also says ALLOW FILTERING. Although do_filter() isn't invoked for such queries, the performance can still be unexpectedly slow, and we want to signal that to the user by demanding they explicitly say ALLOW FILTERING. Because we now reject queries that worked fine before, existing applications can break. Therefore, the behavior is controlled by a flag currently defaulting to off. We will default to "on" in the next Scylla version. Fixes scylladb#7608; see comments there for justification. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
When a query requests a partition slice but doesn't limit the number of partitions, require that it also says ALLOW FILTERING. Although do_filter() isn't invoked for such queries, the performance can still be unexpectedly slow, and we want to signal that to the user by demanding they explicitly say ALLOW FILTERING. Because we now reject queries that worked fine before, existing applications can break. Therefore, the behavior is controlled by a flag currently defaulting to off. We will default to "on" in the next Scylla version. Fixes scylladb#7608; see comments there for justification. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
… ' from Dejan Mircevski Return the pre- 6773563 behavior of demanding ALLOW FILTERING when partition slice is requested but on potentially unlimited number of partitions. Put it on a flag defaulting to "off" for now. Fixes #7608; see comments there for justification. Tests: unit (debug, dev), dtest (cql_additional_test, paging_test) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #9126 * github.com:scylladb/scylla: cql3: Demand ALLOW FILTERING for unlimited, sliced partitions cql3: Track warnings in prepared_statement test: Use ALLOW FILTERING more strictly cql3: Add statement_restrictions::to_string
Already in 4.6, and I don't want to backport this to earlier branches. I don't think it's a great idea that CQL behavior will change in bugfix releases - even if we consider the new behavior more correct. |
I may have prematurely decided that we shouldn't backport this fix - @tzach even opened an issue #9536 specifically about 4.5. @fee-mendes asked in #9536 to backport this to 4.5. |
Backporting of this to 4.5 is more difficult than I hoped. It assumes a bunch of other prior changes to the configuration (a3d6f50) and to handling of restrictions. Unless there is a strong need to do this backport, I propose that we don't. |
Fix present on all active branches, not backporting. |
Our documentation, https://docs.scylladb.com/getting-started/dml/, states that that queries that do not need
ALLOW FILTERING
"have predictable performance in the sense that they will execute in a time that is proportional to the amount of data returned by the query (which can be controlled through LIMIT)."Under this definition, the restriction
WHERE c=1
, wherec
is a clustering key, should only work withALLOW FILTERING
: Such a query may return just a few or even no matches, but still needs to go over all the partitions - and for each partition it needs to inspect the promoted index and quite likely also needs to read a chunk of rows just to check if any of them hasc=1
. The time to do that may be huge in a table with a huge number of partitions - and not proportional to the amount of data returned by the query.Cassandra indeed requires
ALLOW FILTERING
in this case, but currently Scylla does not. This is a regression from commit 756b14f by @dekimir - Scylla used to behave like Cassandra in this case, but that commit changed both implementation details (good) and the need for "ALLOW FILTERING" (less good).In cql-pytest, I created three tests for this in
test_allow_filtering.py
:test_allow_filtering_clustering_key
,test_allow_filtering_clustering_key_token_range
andtest_allow_filtering_clustering_key_token_specific
. These three tests pass on Cassandra, but xfail on Scylla.The text was updated successfully, but these errors were encountered: