Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird return values from batch updates #7113

Closed
1 task done
aphyr opened this issue Aug 25, 2020 · 17 comments
Closed
1 task done

Weird return values from batch updates #7113

aphyr opened this issue Aug 25, 2020 · 17 comments
Assignees

Comments

@aphyr
Copy link

aphyr commented Aug 25, 2020

I've been trying to figure out how to perform batched reads in order to work around limitations in SELECT queries. @kostja suggested that the return value from batch UPDATE IFs should return the state of the row prior to the update taking place--by performing no-op updates, I could turn UPDATEs into reads. Scylla's docs don't seem to talk about this at all, but Cassandra's CQL docs say that updates with IF clauses do return at least some information about the row. This appears to be true in Scylla as well:

BEGIN BATCH
UPDATE batch_ret SET int1 = 6 WHERE part = 0 AND key = 4 IF lwt_trivial = null;
APPLY BATCH;

This query returns a single row with both part and key, and the lwt_trivial column--we use that as a trivial condition to allow the update to always succeed.

({:[applied] true, :part 0, :key 4, :lwt_trivial nil})

If you do multiple updates in a batch, you might expect to get back one row per UPDATE, in the order they were presented to the batch. This happens sometimes:

BEGIN BATCH
UPDATE batch_ret SET int2 = 0, int1 = 0 WHERE part = 0 AND key = 0 IF lwt_trivial = null;
UPDATE batch_ret SET int2 = 1, int1 = 6 WHERE part = 0 AND key = 7 IF lwt_trivial = null;
APPLY BATCH;
({:[applied] true, :part 0, :key 0, :lwt_trivial nil}
 {:[applied] true, :part 0, :key 7, :lwt_trivial nil})

But about half the time, results come back out of order:

BEGIN BATCH
UPDATE batch_ret SET int2 = 0, int1 = 2 WHERE part = 0 AND key = 9 IF lwt_trivial = null;
UPDATE batch_ret SET int1 = 7 WHERE part = 0 AND key = 0 IF lwt_trivial = null;
APPLY BATCH
({:[applied] true, :part 0, :key 0, :lwt_trivial nil}
 {:[applied] true, :part 0, :key 9, :lwt_trivial nil})

This suggests that perhaps BATCH returns a set of changes, which raises the question--when you perform two UPDATEs which affect the same key... which one's results get returned? Is it deterministic?

At least you'll get back a collection with the same set of keys you put in, right?

BEGIN BATCH
UPDATE batch_ret SET int1 = 6, int2 = 7 WHERE part = 0 AND key = 1 IF lwt_trivial = null;
UPDATE batch_ret SET int2 = 4 WHERE part = 0 AND key = 0 IF lwt_trivial = null;
UPDATE batch_ret SET int2 = 2 WHERE part = 0 AND key = 3 IF lwt_trivial = null;
APPLY BATCH;
{:[applied] true, :part 0, :key 1, :lwt_trivial nil}
{:[applied] true, :part 0, :key 3, :lwt_trivial nil}

Our update to key 0 doesn't even appear here. In a quick test I threw together (and this likely depends on the distribution chosen and length of test), about 7% of batch transactions failed to return some of the keys they were supposed to affect. Weirder still, sometimes queries returned null values for keys:

BEGIN BATCH UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 4 IF lwt_trivial = null;
UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 0 IF lwt_trivial = null;
UPDATE batch_ret SET int1 = 4, int2 = 8 WHERE part = 0 AND key = 9 IF lwt_trivial = null;
UPDATE batch_ret SET int1 = 0, int2 = 9 WHERE part = 0 AND key = 0 IF lwt_trivial = null;
APPLY BATCH;
({:[applied] true, :part nil, :key nil, :lwt_trivial nil})}

I think this might have something to do with how Scylla handles the first update to a nonexistent row--these are all clustered at the start of the test. But if Scylla's returning a set of results, rather than an ordered list, how are we supposed to figure out which of the updates operated on nonexistent rows?

You can reproduce this with jepsen.scylla 40b42d4 by running something like

lein run test -w batch-return -r 1 --concurrency 1 --time-limit 60

This behavior doesn't require concurrency, high rates of updates, or faults to reproduce.

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 4.2
Cluster size: 5
OS (RHEL/CentOS/Ubuntu/AWS AMI): Debian 10

@haaawk
Copy link
Contributor

haaawk commented Aug 25, 2020

If you do multiple updates in a batch, you might expect to get back one row per UPDATE, in the order they were presented to the batch.

There's absolutely no guarantee regarding the order of statements in the batch. At least when no LWT statement is present in the batch. Does LWT change anything @kostja ?

@kostja
Copy link
Contributor

kostja commented Aug 26, 2020

The result set is returned in the order of the clustering key, one row per clustering key updated. So if your batch updates clustering key 2, and then clustering key 1, you'll get back the previous row for key 1, and then the previous row for key 2.
There is no guarantee about the order that we've ever made, I'm just describing here the order in which the implementation does it.

@kostja
Copy link
Contributor

kostja commented Aug 26, 2020

Nil would naturally be returned if UPDATE actually inserts a row. In this case the previous row is missing, so respective cells are nil.

@kostja
Copy link
Contributor

kostja commented Aug 26, 2020

The case of returning less rows than there are distinct clustering keys in the batch needs to be investigated, thanks for reporting it!

@kostja kostja added the lwt label Aug 26, 2020
@kostja kostja assigned alecco and kostja and unassigned alecco and gleb-cloudius Aug 26, 2020
@aphyr
Copy link
Author

aphyr commented Aug 26, 2020

The result set is returned in the order of the clustering key, one row per clustering key updated. So if your batch updates clustering key 2, and then clustering key 1, you'll get back the previous row for key 1, and then the previous row for key 2.
There is no guarantee about the order that we've ever made, I'm just describing here the order in which the implementation does it.

Nil would naturally be returned if UPDATE actually inserts a row. In this case the previous row is missing, so respective cells are nil.

In addition to the missing-rows bug, I'd like to suggest changing at least one of these behaviors. Consider a BATCH operation which updates at least two keys which may not exist. Since their keys don't exist prior to the BATCH, they'll be reported as null in the response. Because responses are unordered, this makes it impossible to figure out which row corresponds to which batch update.

If BATCH returned update responses in the same order as the BATCH query itself, this wouldn't be a problem: you could disambiguate based on order. If BATCH returned the key (perhaps as metadata?) you could disambiguate based on the key. But as it stands, these two features interact to make BATCH return values impossible to use correctly, at least for certain classes of updates.

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

I was able to verify the issue without Jepsen.

Scylla returns "fewer" rows than there are distinct clustering keys in the batch if there was no previous row for one of the keys. When there are multiple updates per batch, this can be quite confusing, since some of the updates will be missing any return value:

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

Cassandra:

cassandra@cqlsh:test> CREATE TABLE IF NOT EXISTS "batch_ret" (
        ...     part int,
        ...     key int,
        ...     lwt_trivial int,
        ...     int1 int,
        ...     int2 int,
        ...     PRIMARY KEY (part, key)
        ... );
cassandra@cqlsh:test> BEGIN BATCH
        ...     UPDATE batch_ret SET int1 = 6 WHERE part = 0 AND key = 4 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied]
-----------
      True

cassandra@cqlsh:test> BEGIN BATCH
        ...     UPDATE batch_ret SET int2 = 0, int1 = 0 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int2 = 1, int1 = 6 WHERE part = 0 AND key = 7 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied]
-----------
      True

cassandra@cqlsh:test> 
cassandra@cqlsh:test> BEGIN BATCH
        ...     UPDATE batch_ret SET int2 = 0, int1 = 2 WHERE part = 0 AND key = 9 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int1 = 7 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied]
-----------
      True

cassandra@cqlsh:test> BEGIN BATCH
        ... UPDATE batch_ret SET int1 = 6, int2 = 7 WHERE part = 0 AND key = 1 IF lwt_trivial = null
        ... UPDATE batch_ret SET int2 = 4 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ... UPDATE batch_ret SET int2 = 2 WHERE part = 0 AND key = 3 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied]
-----------
      True

cassandra@cqlsh:test> BEGIN BATCH
        ...     UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 4 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int1 = 4, int2 = 8 WHERE part = 0 AND key = 9 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int1 = 0, int2 = 9 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied]
-----------
      True

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

Scylla:

cqlsh:test> CREATE TABLE IF NOT EXISTS "batch_ret" (                                                                                        
        ...     part int,                                                                                                                   
        ...     key int,                                                                                                                    
        ...     lwt_trivial int,                                                                                                            
        ...     int1 int,                                                                                                                   
        ...     int2 int,                                                                                                                   
        ...     PRIMARY KEY (part, key)                                                                                                     
        ... );                                                                                                                              
cqlsh:test> BEGIN BATCH                                                                                                                     
        ...     UPDATE batch_ret SET int1 = 6 WHERE part = 0 AND key = 4 IF lwt_trivial = null                                              
        ... APPLY BATCH;                                                                                                                    
                                                                                                                                            
 [applied] | part | key  | lwt_trivial                                                                                                      
-----------+------+------+-------------                                                                                                     
      True | null | null |        null                                                                                                      
                                                                                                                                            
cqlsh:test>                                                                                                                                 
cqlsh:test> BEGIN BATCH                                                                                                                     
        ...     UPDATE batch_ret SET int2 = 0, int1 = 0 WHERE part = 0 AND key = 0 IF lwt_trivial = null                                    
        ...     UPDATE batch_ret SET int2 = 1, int1 = 6 WHERE part = 0 AND key = 7 IF lwt_trivial = null                                    
        ... APPLY BATCH;                                                                                                                    
                                                                                                                                            
 [applied] | part | key  | lwt_trivial                                                                                                      
-----------+------+------+-------------
      True | null | null |        null

cqlsh:test> 
cqlsh:test> BEGIN BATCH
        ... UPDATE batch_ret SET int1 = 6, int2 = 7 WHERE part = 0 AND key = 1 IF lwt_trivial = null
        ... UPDATE batch_ret SET int2 = 4 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ... UPDATE batch_ret SET int2 = 2 WHERE part = 0 AND key = 3 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied] | part | key | lwt_trivial
-----------+------+-----+-------------
      True |    0 |   0 |        null

cqlsh:test> 
cqlsh:test> BEGIN BATCH
        ...     UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 4 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int2 = 1 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int1 = 4, int2 = 8 WHERE part = 0 AND key = 9 IF lwt_trivial = null
        ...     UPDATE batch_ret SET int1 = 0, int2 = 9 WHERE part = 0 AND key = 0 IF lwt_trivial = null
        ... APPLY BATCH;

 [applied] | part | key | lwt_trivial
-----------+------+-----+-------------
      True |    0 |   0 |        null
      True |    0 |   4 |        null
      True |    0 |   9 |        null

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

As can be seen from this example, Scylla is using nulls for partition/clustering key to indicate that the previous row was missing. This is derived from the way Cassandra handles missing rows. While this is a good idea for non-batch updates, it seems to be useless in batches.
I think best course of action for batches is to always supply primary key/clustering key in response.

@avikivity
Copy link
Member

Agree.

@aphyr
Copy link
Author

aphyr commented Sep 1, 2020

Yeah, I think that makes good sense!

@tgrabiec
Copy link
Contributor

tgrabiec commented Sep 1, 2020

@kostja Makes sense.

@aphyr
Copy link
Author

aphyr commented Sep 1, 2020

(Alternatively, you could order response rows so they correspond to the order of UPDATEs in the batch. Clients know which keys they used in queries, and that'd let users tell whether they'd created new rows or not.)

@avikivity
Copy link
Member

Or both!

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

This suggests that perhaps BATCH returns a set of changes, which raises the question--when you perform two UPDATEs
which affect the same key... which one's results get returned? Is it deterministic?

LWT statement returns the previous row. If you have multiple statements in a batch, that would be the previous row before the batch statement - the batch statement is atomic. Other than auto-generated batches, I would not expect multiple statements in a batch to update the same clustering key.

@kostja
Copy link
Contributor

kostja commented Sep 1, 2020

But about half the time, results come back out of order:

The results arrive in the order of the clustering key. We could perhaps preserve the original order as you suggest, but then we would need to always return one result set row per update.

ManManson pushed a commit to ManManson/scylla that referenced this issue Sep 3, 2020
ManManson pushed a commit to ManManson/scylla that referenced this issue Sep 3, 2020
Previously batch statement result set included rows for only
those updates which have a prefetch data present (i.e. there
was a "previous" row for a key).

Also, these rows were sorted not in the order in which statements
appear in the batch, but in the order of updated clustering keys.

If we have a batch which updates a few non-existent keys, then
it's impossible to figure out which update inserted a new key
by looking at the query response. Not only because the responses
may not correspond to the order of statements in the batch, but
even some rows may not show up in the result set at all.

The patch proposes the following fix:

For conditional batch statements the result set now always
includes a row for each LWT statement, in the same order
in which individual statements appear in the batch.

This way we can always tell which update did actually insert
a new key or update the existing one.

Technically, the following changes were made:
 * `update_parameters::prefetch_data::row::is_in_cas_result_set`
   member removed as well as the supporting code in
   `cas_request::applies_to` which iterated through cas updates
   and marked individual `prefetch_data` rows as "need to be in
   cas result set".
 * `cas_request::applies_to` substantially simplified since it
   doesn't do anything more than checking `stmt.applies_to()`
   in short-circuiting manner.
 * `modification_statement::build_cas_result_set` method moved
   to `cas_request`. This allows to easily iterate through
   individual `cas_row_update` instances and preserve the order
   of the rows in the result set.
 * A little helper `cas_request::find_matching_prefetch_data`
   is introduced to find a row in `prefetch_data` based on the
   (pk, ck) combination obtained from the current `cas_request`
   and a given `cas_row_update`.
 * A few tests for the issue are written, other lwt-batch-related
   tests adjusted accordingly.

Tests: unit(dev, debug)
Fixes: scylladb#7113

Co-authored-by: Konstantin Osipov <kostja@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
ManManson pushed a commit to ManManson/scylla that referenced this issue Sep 3, 2020
Previously batch statement result set included rows for only
those updates which have a prefetch data present (i.e. there
was a "previous" row for a key).

Also, these rows were sorted not in the order in which statements
appear in the batch, but in the order of updated clustering keys.

If we have a batch which updates a few non-existent keys, then
it's impossible to figure out which update inserted a new key
by looking at the query response. Not only because the responses
may not correspond to the order of statements in the batch, but
even some rows may not show up in the result set at all.

The patch proposes the following fix:

For conditional batch statements the result set now always
includes a row for each LWT statement, in the same order
in which individual statements appear in the batch.

This way we can always tell which update did actually insert
a new key or update the existing one.

Technically, the following changes were made:
 * `update_parameters::prefetch_data::row::is_in_cas_result_set`
   member removed as well as the supporting code in
   `cas_request::applies_to` which iterated through cas updates
   and marked individual `prefetch_data` rows as "need to be in
   cas result set".
 * `cas_request::applies_to` substantially simplified since it
   doesn't do anything more than checking `stmt.applies_to()`
   in short-circuiting manner.
 * `modification_statement::build_cas_result_set` method moved
   to `cas_request`. This allows to easily iterate through
   individual `cas_row_update` instances and preserve the order
   of the rows in the result set.
 * A little helper `cas_request::find_matching_prefetch_data`
   is introduced to find a row in `prefetch_data` based on the
   (pk, ck) combination obtained from the current `cas_request`
   and a given `cas_row_update`.
 * A few tests for the issue are written, other lwt-batch-related
   tests adjusted accordingly.

Tests: unit(dev, debug)
Fixes: scylladb#7113

Co-authored-by: Konstantin Osipov <kostja@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
tgrabiec added a commit that referenced this issue Sep 4, 2020
…sult set" from Pavel Solodovnikov

Previously batch statement result set included rows for only
those updates which have a prefetch data present (i.e. there
was an "old" (pre-existing) row for a key).

Also, these rows were sorted not in the order in which statements
appear in the batch, but in the order of updated clustering keys.

If we have a batch which updates a few non-existent keys, then
it's impossible to figure out which update inserted a new key
by looking at the query response. Not only because the responses
may not correspond to the order of statements in the batch, but
even some rows may not show up in the result set at all.

Please see #7113 on Github for detailed description
of the problem:
#7113

The patch set proposes the following fix:

For conditional batch statements the result set now always
includes a row for each LWT statement, in the same order
in which individual statements appear in the batch.

This way we can always tell which update did actually insert
a new key or update the existing one.

Technically, the following changes were made:
 * `update_parameters::prefetch_data::row::is_in_cas_result_set`
   member removed as well as the supporting code in
   `cas_request::applies_to` which iterated through cas updates
   and marked individual `prefetch_data` rows as "need to be in
   cas result set".
 * `cas_request::applies_to` substantially simplified since it
   doesn't do anything more than checking `stmt.applies_to()`
   in short-circuiting manner.
 * `modification_statement::build_cas_result_set` method moved
   to `cas_request`. This allows to easily iterate through
   individual `cas_row_update` instances and preserve the order
   of the rows in the result set.
 * A little helper `cas_request::find_old_row`
   is introduced to find a row in `prefetch_data` based on the
   (pk, ck) combination obtained from the current `cas_request`
   and a given `cas_row_update`.
 * A few tests for the issue #7113 are written, other lwt-batch-related
   tests adjusted accordingly.
@slivne slivne added the jepsen label Sep 6, 2020
@aphyr
Copy link
Author

aphyr commented Sep 7, 2020

It looks like this is addressed in the test build @ManManson gave me the other day--we now get a separate row per UPDATE, and returned keys are either nil or equal to the corresponding update's key!

@aphyr aphyr closed this as completed Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants