New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pk_indexes aren't generated for queries with duplicate named bind variables #15374
Comments
This is most likely a regression caused by scylladb/scylladb@19a6e69 On
|
Here's the code I used to print the metadata: https://gist.github.com/cvybhu/23e5964a8daef57f8a5bd70651e6793d |
cvybhu
added a commit
to cvybhu/scylladb
that referenced
this issue
Sep 22, 2023
…ind variables When presented with queries that use the same named bind variables twice, like this one: ```cql SELECT p FROM table WHERE p = :x AND c = :x ``` Scylla generated empty partition_key_bind_indexes (pk_indexes). pk_indexes tell the driver which bind variables it should use to calculate the partition token for a query. Without it, the driver is unable to determine the token and it will send the query to a random node. Scylla should generate pk_indexes which tell the driver that it can use bind variable with bind_index = 0 to calculate the partition token for a query. The problem was that _target_columns keep only a single target_column for each bind variable. In the example above :x is compared with both p and c, but _target_columns would contain only one of them, and Scylla wasn't able to tell that this bind variable is compared with a partition key column. To fix it, let's replace _target_columns with _targets. _targets keeps all comparisons between bind variables and other expressions, so none of them will be forgotten/overwritten. Fixes: scylladb#15374 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu
added a commit
to cvybhu/scylladb
that referenced
this issue
Sep 22, 2023
…variable name Add a cql-pytest which reproduces issue scylladb#15374. When presented with the following query: ```cql SELECT p FROM table WHERE p = :x AND c = :x ``` Scylla generated empty pk_indexes, and driver wasn't able to determine the right token for request routing. Scylla should generate pk_indexes which tell the driver that it can use bind variable with bind_index = 0 to calculate the partition token for a query. I couldn't find a good place for this test, so I created a new file - test_prepare.py. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu
added a commit
to cvybhu/scylladb
that referenced
this issue
Sep 25, 2023
Add some tests that test whether `pk indexes` are generated correctly. When a driver asks to prepare a statement, Scylla's response includes the metadata for this prepared statement. In this metadata there's `pk indexes`, which tells the driver which bind variable values it should use to calculate the partition token. For a query like: SELECT * FROM t WHERE p2 = ? AND p1 = ? AND p3 = ? The correct pk_indexes would be [1, 0, 2], which means "To calculate the token calculate Hash(bind_vars[1] | bind_vars[0] | bind_vars[2])". More information is available in the specification: https://github.com/apache/cassandra/blob/1959502d8b16212479eecb076c89945c3f0f180c/doc/native_protocol_v4.spec#L699-L707 Two tests are marked as xfail because of scylladb#15374 - Scylla doesn't correctly handle using the same named variable in multiple places. This will be fixed soon. I couldn't find a good place for these tests, so I created a new file - test_prepare.py. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
cvybhu
added a commit
to cvybhu/scylladb
that referenced
this issue
Sep 25, 2023
…te_named_variables Issue scylladb#15374 has been fixed, so these tests can be enabled. Duplicate bind variable names are now handled correctly. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
avikivity
added a commit
that referenced
this issue
Sep 26, 2023
… named bind variables' from Jan Ciołek When presented with queries that use the same named bind variables twice, like this one: ```cql SELECT p FROM table WHERE p = :x AND c = :x ``` Scylla generated empty `partition_key_bind_indexes` (`pk_indexes`). `pk_indexes` tell the driver which bind variables it should use to calculate the partition token for a query. Without it, the driver is unable to determine the token and it will send the query to a random node. Scylla should generate pk_indexes which tell the driver that it can use bind variable with `bind_index = 0` to calculate the partition token for this query. The problem was that `_target_columns` keep only a single target_column for each bind variable. In the example above `:x` is compared with both `p` and `c`, but `_target_columns` would contain only one of them, and Scylla wasn't able to tell that this bind variable is compared with a partition key column. To fix it, let's replace `_target_columns` with `_targets`. `_targets` keeps all comparisons between bind variables and other expressions, so none of them will be forgotten/overwritten. A `cql-pytest` reproducer is added. I also added some comments in `prepare_context.hh/cc` to make it easier to read. Fixes: #15374 Closes #15526 * github.com:scylladb/scylladb: cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables cql3: improve readability of prepare_context cql-pytest: test generation of pk indexes during PREPARE
Doesn't apply cleanly to 5.2, @cvybhu please prepare a backport PR. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When the driver asks to prepare a statement, Scylla responds with a message that contains metadata about this prepared statement.
pk_indexes
is a field in this metadata which lets the driver know which bind variables it should use to calculate the partition token.For example:
When the driver prepares this
SELECT
statement, Scylla responds with the following pk_indexes:The meaning is: "To calculate partition key token, the driver should take the bound values with indexes 0 and 1, concatenate them and then generate the hash"
This allows the driver to correctly compute the token, and send the queries to the nodes that own those token ranges.
The problem is that for some statements that use the same named variable twice, Scylla doesn't produce any
pk_indexes
, even though it could.In this example:
PreparedMetadata
looks like this:The
pk_indexes
field is empty, even though the driver would be able to calculate the token value using the first bind variable.Because of this the queries will be send to a random node instead of the one that owns this token range. This isn't the end of the world, as the random node will forward the query to the correct one, but it would be more efficient to send it to the right node immediately.
Here's the relevant part in the CQL binary protocol: https://github.com/apache/cassandra/blob/1959502d8b16212479eecb076c89945c3f0f180c/doc/native_protocol_v4.spec#L699
The text was updated successfully, but these errors were encountered: