Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restriction on clustering-key only should require ALLOW FILTERING #7608

Closed
nyh opened this issue Nov 15, 2020 · 9 comments · Fixed by #9126
Closed

Restriction on clustering-key only should require ALLOW FILTERING #7608

nyh opened this issue Nov 15, 2020 · 9 comments · Fixed by #9126

Comments

@nyh
Copy link
Contributor

nyh commented Nov 15, 2020

Our documentation, https://docs.scylladb.com/getting-started/dml/, states that that queries that do not need ALLOW FILTERING "have predictable performance in the sense that they will execute in a time that is proportional to the amount of data returned by the query (which can be controlled through LIMIT)."

Under this definition, the restriction WHERE c=1, where c is a clustering key, should only work with ALLOW FILTERING: Such a query may return just a few or even no matches, but still needs to go over all the partitions - and for each partition it needs to inspect the promoted index and quite likely also needs to read a chunk of rows just to check if any of them has c=1. The time to do that may be huge in a table with a huge number of partitions - and not proportional to the amount of data returned by the query.

Cassandra indeed requires ALLOW FILTERING in this case, but currently Scylla does not. This is a regression from commit 756b14f by @dekimir - Scylla used to behave like Cassandra in this case, but that commit changed both implementation details (good) and the need for "ALLOW FILTERING" (less good).

In cql-pytest, I created three tests for this in test_allow_filtering.py: test_allow_filtering_clustering_key, test_allow_filtering_clustering_key_token_range and test_allow_filtering_clustering_key_token_specific. These three tests pass on Cassandra, but xfail on Scylla.

@dekimir
Copy link
Contributor

dekimir commented Nov 26, 2020

Here is how we initially decided to change the behaviour. Our thinking has since evolved; we now want to match Cassandra and require ALLOW FILTERING for both queries quoted in the link.

@dekimir
Copy link
Contributor

dekimir commented Nov 27, 2020

And here is where we decided to switch back.

avikivity pushed a commit that referenced this issue Dec 6, 2020
The original goal of this patch was to replace the two single-node dtests
allow_filtering_test and allow_filtering_secondary_indexes_test, which
recently caused us problems when we wanted to change the ALLOW FILTERING
behavior but the tests were outside the tree. I'm hoping that after this
patch, those two tests could be removed from dtest.

But this patch actually tests more cases then those original dtest, and
moreover tests not just whether ALLOW FILTERING is required or not, but
also that the results of the filtering is correct.

Currently, four of the included tests are expected to fail ("xfail") on
Scylla, reproducing two issues:

1. Refs #5545:
   "WHERE x IN ..." on indexed column x wrongly requires ALLOW FILTERING
2. Refs #7608:
   "WHERE c=1" on clustering key c should require ALLOW FILTERING, but
   doesn't.

All tests, except the one for issue #5545, pass on Cassandra. That one
fails on Cassandra because doesn't support IN on an indexed column at all
(regardless of whether ALLOW FILTERING is used or not).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201115124631.1224888-1-nyh@scylladb.com>
syuu1228 pushed a commit to syuu1228/scylla that referenced this issue Dec 14, 2020
The original goal of this patch was to replace the two single-node dtests
allow_filtering_test and allow_filtering_secondary_indexes_test, which
recently caused us problems when we wanted to change the ALLOW FILTERING
behavior but the tests were outside the tree. I'm hoping that after this
patch, those two tests could be removed from dtest.

But this patch actually tests more cases then those original dtest, and
moreover tests not just whether ALLOW FILTERING is required or not, but
also that the results of the filtering is correct.

Currently, four of the included tests are expected to fail ("xfail") on
Scylla, reproducing two issues:

1. Refs scylladb#5545:
   "WHERE x IN ..." on indexed column x wrongly requires ALLOW FILTERING
2. Refs scylladb#7608:
   "WHERE c=1" on clustering key c should require ALLOW FILTERING, but
   doesn't.

All tests, except the one for issue scylladb#5545, pass on Cassandra. That one
fails on Cassandra because doesn't support IN on an indexed column at all
(regardless of whether ALLOW FILTERING is used or not).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201115124631.1224888-1-nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Jan 26, 2021

Another example of an inconsistency of the current behavior:
In a table with partition key p, clustering key c, and regular column r, the current situation is that:

  1. The restriction WHERE c=1 works without ALLOW FILTERING
  2. The restriction WHERE c=1 AND r=2 only works with ALLOW FILTERING

This doesn't make sense. If we come up with some justification why we should allow the first query without ALLOW FILTERING, then the second one should be allowed as well, with the same justification. For reasons I explained above, I think we should refuse both.

@nyh
Copy link
Contributor Author

nyh commented Jul 8, 2021

This bug was independently reported by @tzach in #8991. I think we should indeed consider it a bug, not a feature.

nyh added a commit that referenced this issue Jul 19, 2021
…ering key' from Jan Ciołek

Add examples from issue #8991 to tests
Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3`

First test tests that selecting values from indexed table using only clustering key returns correct values.
The second test tests that performing this operation requires filtering.

The filtering test looks similar to [the one for #7608](https://github.com/scylladb/scylla/blob/1924e8d2b63e6611b78ac6252b5ddbc4884f9d22/test/cql-pytest/test_allow_filtering.py#L124) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths.

Contains a quick fix for the `needs_filtering()` function to make these tests pass.
It returns `true` for this case and the one described in #7708.

This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results.

Fixes #8991.
Fixes #7708.

Closes #8994

* github.com:scylladb/scylla:
  cql3: Fix need_filtering on indexed table
  cql-pytest: Test selecting using only clustering key requires filtering
  cql-pytest: Test selecting from indexed table using clustering key
dekimir pushed a commit to dekimir/scylla that referenced this issue Jul 21, 2021
When a query covers multiple partitions, require that it also says
ALLOW FILTERING, even if do_filter() isn't invoked.

Fixes scylladb#7608; see comments there for justification.

Tests: unit (debug, dev), dtest (next-gating, cql*)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
@amoskong amoskong closed this as completed Aug 2, 2021
@dekimir
Copy link
Contributor

dekimir commented Aug 2, 2021

GitHub closed this automatically due to overzealous interpretation of a dtest commit message. In fact, it needs to get fixed in the Scylla repo. Reopening.

@dekimir dekimir reopened this Aug 2, 2021
dekimir pushed a commit to dekimir/scylla that referenced this issue Aug 2, 2021
When a query covers multiple partitions, require that it also says
ALLOW FILTERING, even if do_filter() isn't invoked.

Fixes scylladb#7608; see comments there for justification.

Tests: unit (debug, dev), dtest (next-gating, cql*)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
@nyh nyh added the type/bug label Aug 3, 2021
dekimir pushed a commit to dekimir/scylla that referenced this issue Aug 8, 2021
When a query requests a partition slice but doesn't limit the number
of partitions, require that it also says ALLOW FILTERING.  Although
do_filter() isn't invoked for such queries, the performance can still
be unexpectedly slow, and we want to signal that to the user by
demanding they explicitly say ALLOW FILTERING.

Because we now reject queries that worked fine before, existing
applications can break.  Therefore, the behavior is controlled by a
flag currently defaulting to off.  We will default to "on" in the next
Scylla version.

Fixes scylladb#7608; see comments there for justification.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
dekimir pushed a commit to dekimir/scylla that referenced this issue Aug 8, 2021
When a query requests a partition slice but doesn't limit the number
of partitions, require that it also says ALLOW FILTERING.  Although
do_filter() isn't invoked for such queries, the performance can still
be unexpectedly slow, and we want to signal that to the user by
demanding they explicitly say ALLOW FILTERING.

Because we now reject queries that worked fine before, existing
applications can break.  Therefore, the behavior is controlled by a
flag currently defaulting to off.  We will default to "on" in the next
Scylla version.

Fixes scylladb#7608; see comments there for justification.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
dekimir pushed a commit to dekimir/scylla that referenced this issue Aug 25, 2021
When a query requests a partition slice but doesn't limit the number
of partitions, require that it also says ALLOW FILTERING.  Although
do_filter() isn't invoked for such queries, the performance can still
be unexpectedly slow, and we want to signal that to the user by
demanding they explicitly say ALLOW FILTERING.

Because we now reject queries that worked fine before, existing
applications can break.  Therefore, the behavior is controlled by a
flag currently defaulting to off.  We will default to "on" in the next
Scylla version.

Fixes scylladb#7608; see comments there for justification.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
avikivity added a commit that referenced this issue Aug 31, 2021
… ' from Dejan Mircevski

Return the pre- 6773563 behavior of demanding ALLOW FILTERING when partition slice is requested but on potentially unlimited number of partitions.  Put it on a flag defaulting to "off" for now.

Fixes #7608; see comments there for justification.

Tests: unit (debug, dev), dtest (cql_additional_test, paging_test)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #9126

* github.com:scylladb/scylla:
  cql3: Demand ALLOW FILTERING for unlimited, sliced partitions
  cql3: Track warnings in prepared_statement
  test: Use ALLOW FILTERING more strictly
  cql3: Add statement_restrictions::to_string
@nyh
Copy link
Contributor Author

nyh commented Dec 29, 2021

Already in 4.6, and I don't want to backport this to earlier branches. I don't think it's a great idea that CQL behavior will change in bugfix releases - even if we consider the new behavior more correct.

@nyh
Copy link
Contributor Author

nyh commented Dec 29, 2021

I may have prematurely decided that we shouldn't backport this fix - @tzach even opened an issue #9536 specifically about 4.5.
@tzach, @eliransin please state here if you want this patch backported, and why. My thinking about was that it shouldn't be backported because we don't want such CQL changes in point releases (e.g., between 4.5.2 and 4.5.3) but maybe my thinking is wrong. Until we decide, I'm re-adding the Backport candidate tag.

@fee-mendes asked in #9536 to backport this to 4.5.

@nyh
Copy link
Contributor Author

nyh commented Feb 9, 2022

Backporting of this to 4.5 is more difficult than I hoped. It assumes a bunch of other prior changes to the configuration (a3d6f50) and to handling of restrictions. Unless there is a strong need to do this backport, I propose that we don't.

@avikivity
Copy link
Member

Fix present on all active branches, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants