New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird return values from batch updates #7113
Comments
There's absolutely no guarantee regarding the order of statements in the batch. At least when no LWT statement is present in the batch. Does LWT change anything @kostja ? |
The result set is returned in the order of the clustering key, one row per clustering key updated. So if your batch updates clustering key 2, and then clustering key 1, you'll get back the previous row for key 1, and then the previous row for key 2. |
Nil would naturally be returned if UPDATE actually inserts a row. In this case the previous row is missing, so respective cells are nil. |
The case of returning less rows than there are distinct clustering keys in the batch needs to be investigated, thanks for reporting it! |
In addition to the missing-rows bug, I'd like to suggest changing at least one of these behaviors. Consider a BATCH operation which updates at least two keys which may not exist. Since their keys don't exist prior to the BATCH, they'll be reported as If BATCH returned update responses in the same order as the BATCH query itself, this wouldn't be a problem: you could disambiguate based on order. If BATCH returned the key (perhaps as metadata?) you could disambiguate based on the key. But as it stands, these two features interact to make BATCH return values impossible to use correctly, at least for certain classes of updates. |
I was able to verify the issue without Jepsen. Scylla returns "fewer" rows than there are distinct clustering keys in the batch if there was no previous row for one of the keys. When there are multiple updates per batch, this can be quite confusing, since some of the updates will be missing any return value: |
Cassandra: cassandra@cqlsh:test> CREATE TABLE IF NOT EXISTS "batch_ret" ( ... part int, ... key int, ... lwt_trivial int, ... int1 int, ... int2 int, ... PRIMARY KEY (part, key) ... ); cassandra@cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int1 = 6 WHERE part = 0 AND key = 4 IF lwt_trivial = null ... APPLY BATCH; [applied] ----------- True cassandra@cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int2 = 0, int1 = 0 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 1, int1 = 6 WHERE part = 0 AND key = 7 IF lwt_trivial = null ... APPLY BATCH; [applied] ----------- True cassandra@cqlsh:test> cassandra@cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int2 = 0, int1 = 2 WHERE part = 0 AND key = 9 IF lwt_trivial = null ... UPDATE batch_ret SET int1 = 7 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... APPLY BATCH; [applied] ----------- True cassandra@cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int1 = 6, int2 = 7 WHERE part = 0 AND key = 1 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 4 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 2 WHERE part = 0 AND key = 3 IF lwt_trivial = null ... APPLY BATCH; [applied] ----------- True cassandra@cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 4 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int1 = 4, int2 = 8 WHERE part = 0 AND key = 9 IF lwt_trivial = null ... UPDATE batch_ret SET int1 = 0, int2 = 9 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... APPLY BATCH; [applied] ----------- True |
Scylla: cqlsh:test> CREATE TABLE IF NOT EXISTS "batch_ret" ( ... part int, ... key int, ... lwt_trivial int, ... int1 int, ... int2 int, ... PRIMARY KEY (part, key) ... ); cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int1 = 6 WHERE part = 0 AND key = 4 IF lwt_trivial = null ... APPLY BATCH; [applied] | part | key | lwt_trivial -----------+------+------+------------- True | null | null | null cqlsh:test> cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int2 = 0, int1 = 0 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 1, int1 = 6 WHERE part = 0 AND key = 7 IF lwt_trivial = null ... APPLY BATCH; [applied] | part | key | lwt_trivial -----------+------+------+------------- True | null | null | null cqlsh:test> cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int1 = 6, int2 = 7 WHERE part = 0 AND key = 1 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 4 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 2 WHERE part = 0 AND key = 3 IF lwt_trivial = null ... APPLY BATCH; [applied] | part | key | lwt_trivial -----------+------+-----+------------- True | 0 | 0 | null cqlsh:test> cqlsh:test> BEGIN BATCH ... UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 4 IF lwt_trivial = null ... UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... UPDATE batch_ret SET int1 = 4, int2 = 8 WHERE part = 0 AND key = 9 IF lwt_trivial = null ... UPDATE batch_ret SET int1 = 0, int2 = 9 WHERE part = 0 AND key = 0 IF lwt_trivial = null ... APPLY BATCH; [applied] | part | key | lwt_trivial -----------+------+-----+------------- True | 0 | 0 | null True | 0 | 4 | null True | 0 | 9 | null |
As can be seen from this example, Scylla is using nulls for partition/clustering key to indicate that the previous row was missing. This is derived from the way Cassandra handles missing rows. While this is a good idea for non-batch updates, it seems to be useless in batches. |
Agree. |
Yeah, I think that makes good sense! |
@kostja Makes sense. |
(Alternatively, you could order response rows so they correspond to the order of UPDATEs in the batch. Clients know which keys they used in queries, and that'd let users tell whether they'd created new rows or not.) |
Or both! |
LWT statement returns the previous row. If you have multiple statements in a batch, that would be the previous row before the batch statement - the batch statement is atomic. Other than auto-generated batches, I would not expect multiple statements in a batch to update the same clustering key. |
The results arrive in the order of the clustering key. We could perhaps preserve the original order as you suggest, but then we would need to always return one result set row per update. |
Previously batch statement result set included rows for only those updates which have a prefetch data present (i.e. there was a "previous" row for a key). Also, these rows were sorted not in the order in which statements appear in the batch, but in the order of updated clustering keys. If we have a batch which updates a few non-existent keys, then it's impossible to figure out which update inserted a new key by looking at the query response. Not only because the responses may not correspond to the order of statements in the batch, but even some rows may not show up in the result set at all. The patch proposes the following fix: For conditional batch statements the result set now always includes a row for each LWT statement, in the same order in which individual statements appear in the batch. This way we can always tell which update did actually insert a new key or update the existing one. Technically, the following changes were made: * `update_parameters::prefetch_data::row::is_in_cas_result_set` member removed as well as the supporting code in `cas_request::applies_to` which iterated through cas updates and marked individual `prefetch_data` rows as "need to be in cas result set". * `cas_request::applies_to` substantially simplified since it doesn't do anything more than checking `stmt.applies_to()` in short-circuiting manner. * `modification_statement::build_cas_result_set` method moved to `cas_request`. This allows to easily iterate through individual `cas_row_update` instances and preserve the order of the rows in the result set. * A little helper `cas_request::find_matching_prefetch_data` is introduced to find a row in `prefetch_data` based on the (pk, ck) combination obtained from the current `cas_request` and a given `cas_row_update`. * A few tests for the issue are written, other lwt-batch-related tests adjusted accordingly. Tests: unit(dev, debug) Fixes: scylladb#7113 Co-authored-by: Konstantin Osipov <kostja@scylladb.com> Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Previously batch statement result set included rows for only those updates which have a prefetch data present (i.e. there was a "previous" row for a key). Also, these rows were sorted not in the order in which statements appear in the batch, but in the order of updated clustering keys. If we have a batch which updates a few non-existent keys, then it's impossible to figure out which update inserted a new key by looking at the query response. Not only because the responses may not correspond to the order of statements in the batch, but even some rows may not show up in the result set at all. The patch proposes the following fix: For conditional batch statements the result set now always includes a row for each LWT statement, in the same order in which individual statements appear in the batch. This way we can always tell which update did actually insert a new key or update the existing one. Technically, the following changes were made: * `update_parameters::prefetch_data::row::is_in_cas_result_set` member removed as well as the supporting code in `cas_request::applies_to` which iterated through cas updates and marked individual `prefetch_data` rows as "need to be in cas result set". * `cas_request::applies_to` substantially simplified since it doesn't do anything more than checking `stmt.applies_to()` in short-circuiting manner. * `modification_statement::build_cas_result_set` method moved to `cas_request`. This allows to easily iterate through individual `cas_row_update` instances and preserve the order of the rows in the result set. * A little helper `cas_request::find_matching_prefetch_data` is introduced to find a row in `prefetch_data` based on the (pk, ck) combination obtained from the current `cas_request` and a given `cas_row_update`. * A few tests for the issue are written, other lwt-batch-related tests adjusted accordingly. Tests: unit(dev, debug) Fixes: scylladb#7113 Co-authored-by: Konstantin Osipov <kostja@scylladb.com> Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
…sult set" from Pavel Solodovnikov Previously batch statement result set included rows for only those updates which have a prefetch data present (i.e. there was an "old" (pre-existing) row for a key). Also, these rows were sorted not in the order in which statements appear in the batch, but in the order of updated clustering keys. If we have a batch which updates a few non-existent keys, then it's impossible to figure out which update inserted a new key by looking at the query response. Not only because the responses may not correspond to the order of statements in the batch, but even some rows may not show up in the result set at all. Please see #7113 on Github for detailed description of the problem: #7113 The patch set proposes the following fix: For conditional batch statements the result set now always includes a row for each LWT statement, in the same order in which individual statements appear in the batch. This way we can always tell which update did actually insert a new key or update the existing one. Technically, the following changes were made: * `update_parameters::prefetch_data::row::is_in_cas_result_set` member removed as well as the supporting code in `cas_request::applies_to` which iterated through cas updates and marked individual `prefetch_data` rows as "need to be in cas result set". * `cas_request::applies_to` substantially simplified since it doesn't do anything more than checking `stmt.applies_to()` in short-circuiting manner. * `modification_statement::build_cas_result_set` method moved to `cas_request`. This allows to easily iterate through individual `cas_row_update` instances and preserve the order of the rows in the result set. * A little helper `cas_request::find_old_row` is introduced to find a row in `prefetch_data` based on the (pk, ck) combination obtained from the current `cas_request` and a given `cas_row_update`. * A few tests for the issue #7113 are written, other lwt-batch-related tests adjusted accordingly.
It looks like this is addressed in the test build @ManManson gave me the other day--we now get a separate row per UPDATE, and returned keys are either |
I've been trying to figure out how to perform batched reads in order to work around limitations in SELECT queries. @kostja suggested that the return value from batch UPDATE IFs should return the state of the row prior to the update taking place--by performing no-op updates, I could turn UPDATEs into reads. Scylla's docs don't seem to talk about this at all, but Cassandra's CQL docs say that updates with
IF
clauses do return at least some information about the row. This appears to be true in Scylla as well:This query returns a single row with both part and key, and the
lwt_trivial
column--we use that as a trivial condition to allow the update to always succeed.If you do multiple updates in a batch, you might expect to get back one row per UPDATE, in the order they were presented to the batch. This happens sometimes:
But about half the time, results come back out of order:
This suggests that perhaps BATCH returns a set of changes, which raises the question--when you perform two UPDATEs which affect the same key... which one's results get returned? Is it deterministic?
At least you'll get back a collection with the same set of keys you put in, right?
Our update to key 0 doesn't even appear here. In a quick test I threw together (and this likely depends on the distribution chosen and length of test), about 7% of batch transactions failed to return some of the keys they were supposed to affect. Weirder still, sometimes queries returned null values for keys:
I think this might have something to do with how Scylla handles the first update to a nonexistent row--these are all clustered at the start of the test. But if Scylla's returning a set of results, rather than an ordered list, how are we supposed to figure out which of the updates operated on nonexistent rows?
You can reproduce this with jepsen.scylla 40b42d4 by running something like
This behavior doesn't require concurrency, high rates of updates, or faults to reproduce.
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
Installation details
Scylla version (or git commit hash): 4.2
Cluster size: 5
OS (RHEL/CentOS/Ubuntu/AWS AMI): Debian 10
The text was updated successfully, but these errors were encountered: