Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering may ignore clustering key restrictions if they form a prefix without a partition key #4541

Closed
psarna opened this issue Jun 12, 2019 · 5 comments
Assignees
Milestone

Comments

@psarna
Copy link
Contributor

psarna commented Jun 12, 2019

Ref: #3803

The solution for issue #3803 provided a way of fetching columns that are not included in the select clause if they are needed for filtering. However, one corner case is still unsolved: if clustering columns form a prefix, we do not need to fetch them for filtering, but only if the whole partition key is present as well. Without the partition key, clustering key prefix cannot be optimized out.

I have the solution ready, I'll post it soon.

psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 12, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
@nyh
Copy link
Contributor

nyh commented Jun 12, 2019

Hi @psarna I'm in the middle of reviewing your patch (the github UI is less convenient than I thought before we started to experiment with it...), but have a general question about the entire approach or even the problem raised here:

If the full ck prefix is being restricted, we can do a quick read - and don't need "filtering" - on each partition. Yes, if the pk isn't restricted, we still need to go through all the partitions (so this is filtering), but we don't strictly need to read the entire partition and filter through it. But maybe you're going here for correctness, not absolute optimal performance?

@psarna
Copy link
Contributor Author

psarna commented Jun 12, 2019

@nyh yes, this issue is strictly for correctness - currently clustering restrictions can be ignored in certain corner cases, which yields incorrect results. I didn't take any optimizations involving using ck prefix for each partition into consideration (although we could create a separate issue about that).

psarna added a commit to psarna/scylla that referenced this issue Jun 13, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 13, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 13, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
nyh pushed a commit that referenced this issue Jun 13, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>
nyh pushed a commit that referenced this issue Jun 13, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>
@slivne slivne added this to the 3.2 milestone Jun 13, 2019
avikivity pushed a commit that referenced this issue Jun 16, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes #4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>

(cherry picked from commit adeea0a)
avikivity pushed a commit that referenced this issue Jun 16, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>

(cherry picked from commit c4b9357)
avikivity pushed a commit that referenced this issue Jun 16, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>

(cherry picked from commit 2c2122e)
@avikivity
Copy link
Member

@psarna While I was able to backport to 3.1, 3.0 is much harder. Please assist.

psarna added a commit to psarna/scylla that referenced this issue Jun 17, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 17, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 17, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
@psarna
Copy link
Contributor Author

psarna commented Jun 17, 2019

I pushed a version rebased on 3.0 here: https://github.com/psarna/scylla/commits/fix_ignoring_ck_restrictions_in_filtering_for_3.0 , but I still need to compile and test it on 3.0, which I'll do soon, after dealing with #4540. After I manage to test on 3.0 properly, I'll push it to the list.

psarna added a commit to psarna/scylla that referenced this issue Jun 18, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 18, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
avikivity added a commit that referenced this issue Jun 24, 2019
"
Tests: unit(dev)
Refs #4541
"

* 'fix_ignoring_ck_restrictions_in_filtering_for_3.0_2' of https://github.com/psarna/scylla:
  tests: add a test case for filtering clustering key
  cql3: fix qualifying clustering key restrictions for filtering
  cql3: fix fetching clustering key columns for filtering
psarna added a commit to psarna/scylla that referenced this issue Jun 25, 2019
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 25, 2019
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref scylladb#4541
psarna added a commit to psarna/scylla that referenced this issue Jun 25, 2019
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref scylladb#4541
avikivity added a commit that referenced this issue Jun 25, 2019
"
Tests: unit(dev)
Refs #4541
"

* 'fix_ignoring_ck_restrictions_in_filtering_for_3.0_2' of https://github.com/psarna/scylla:
  tests: add a test case for filtering clustering key
  cql3: fix qualifying clustering key restrictions for filtering
  cql3: fix fetching clustering key columns for filtering
@slivne
Copy link
Contributor

slivne commented Jul 29, 2019

This was released as part of 3.0.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants