Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LWT update with empty clustering key range causes a crash #13129

Closed
cvybhu opened this issue Mar 9, 2023 · 12 comments · Fixed by #14429
Closed

LWT update with empty clustering key range causes a crash #13129

cvybhu opened this issue Mar 9, 2023 · 12 comments · Fixed by #14429

Comments

@cvybhu
Copy link
Contributor

cvybhu commented Mar 9, 2023

Trying to perform an LWT update where the clustering key has 0 possible values,
for example when it's restricted by c = 1 AND c = 2, causes a crash.

Here's a cql-pytest reproducer:

import pytest
from util import new_test_table, unique_key_int

@pytest.fixture(scope="module")
def table1(cql, test_keyspace):
    schema='p int, c int, r int, s int static, PRIMARY KEY(p, c)'
    with new_test_table(cql, test_keyspace, schema) as table:
        yield table

# Generate an LWT update where there is no value for the clustering key,
# as the WHERE restricts it using `c = 2 AND c = 3`.
# Such queries are rejected.
def test_lwt_empty_clustering_range(cql, table1):
    with pytest.raises(InvalidRequest):
        cql.execute(f"UPDATE {table1} SET r = 9000 WHERE p = {p} AND c = 2 AND c = 3 IF r = 3")

Cassandra rejects such queries, running it there gives an error:

[Invalid query] message="c cannot be restricted by more than one relation if it includes an Equal"

Decoded backtrace:

__GI___sigaction at :?
__normal_iterator at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_iterator.h:1073
 (inlined by) std::vector<nonwrapping_interval<clustering_key_prefix>, std::allocator<nonwrapping_interval<clustering_key_prefix> > >::begin() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:879
 (inlined by) std::vector<nonwrapping_interval<clustering_key_prefix>, std::allocator<nonwrapping_interval<clustering_key_prefix> > >::front() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:1219
 (inlined by) cql3::statements::cas_request::find_old_row(cql3::statements::cas_row_update const&) const at ./cql3/statements/cas_request.cc:135
cql3::statements::cas_request::applies_to() const at ./cql3/statements/cas_request.cc:104
 (inlined by) cql3::statements::cas_request::apply(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, query::partition_slice const&, long) at ./cql3/statements/cas_request.cc:115
service::storage_proxy::cas(seastar::lw_shared_ptr<schema const>, seastar::shared_ptr<service::cas_request>, seastar::lw_shared_ptr<query::read_command>, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > >, service::storage_proxy::coordinator_query_options, db::consistency_level, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, bool) at ./service/storage_proxy.cc:5895
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<bool>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/coroutine:244
 (inlined by) seastar::internal::coroutine_traits_base<bool>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:78

Scylla version: 25cf325

Found when looking at the causes of #13001

Doing such an update with empty partition range properly throws an error:

Unrestricted partition key in a conditional update
cvybhu added a commit to cvybhu/scylladb that referenced this issue Mar 9, 2023
Add a test case which performs an LWT UPDATE, but the clustering key
has 0 possible values, because it's supposed to be equal to two
different values.

This currently causes a crash, see scylladb#13129

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
@nyh
Copy link
Contributor

nyh commented Mar 9, 2023

Nice catch.

By the way, what happens when we do the same query without LWT (without the "IF ..."), is it handled correctly (i.e., get a sensible error)?

By the way, doing a "c=2 AND c=3" may not be the only way to get a restriction that matches nothing, there is also "c IN ()", "c = 123" (where there is no match with 123), "c = null" (never true), so please verify the fix is more general than just refusing two equality conditions (which apparently Cassandra does).

@cvybhu
Copy link
Contributor Author

cvybhu commented Mar 9, 2023

By the way, what happens when we do the same query without LWT (without the "IF ..."), is it handled correctly (i.e., get a sensible error)?

It should be ok, it's the same case as in #13001 and #13007, LWT modifications use a different code path that doesn't handle empty pk ranges.
There might be existing tests with impossible CQL conditions somewhere.

nyh added a commit that referenced this issue Mar 12, 2023
… ranges are empty' from Jan Ciołek

Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`).

Empty partition ranges work properly, empty clustering range currently causes a crash (#13129).
I added tests for both of these cases.

Closes #13130

* github.com:scylladb/scylladb:
  cql-pytest/test_lwt: test LWT update with empty clustering range
  cql-pytest/test_lwt: test LWT update with empty partition range
@DoronArazii DoronArazii added this to the 5.3 milestone Mar 15, 2023
@DoronArazii
Copy link

DoronArazii commented Jun 6, 2023

I see it in Eliran's backlog, unsure when we are going to commit to it - putting also in scylla backlog until we will have a different update.

@DoronArazii DoronArazii modified the milestones: 5.3, 5.x Jun 6, 2023
@mykaul mykaul added the P1 Urgent label Jun 28, 2023
@mykaul mykaul modified the milestones: 5.x, 5.4 Jun 28, 2023
@cvybhu
Copy link
Contributor Author

cvybhu commented Jun 28, 2023

Backports:

  • 2023.1.0~rc6-0.20230517.ca8d6a0d4fa7: needs backport
  • 2022.2.9-0.20230618.843304f9f734: OK
  • 2021.1.20.0.20230604.0f9b75eaf1: weird behavior, needs backport
  • 5.3.0~rc0-0.20230523.3c3621db072d: needs backport
  • 5.2.3-0.20230608.ea08d409f155: needs backport
  • 5.1.13-0.20230622.b635a30b59a9: OK
  • 5.0.13-0.20230423.a0ca8abe4: needs backport

On 2021.1 a LWT query with an IF condition behaves in a weird way.
Scylla accepts the query, and says that the change was applied, but in reality it doesn't apply it:

cqlsh:ks> create table t (p int, c int , r int , primary key (p, c));
cqlsh:ks> insert into t (p, c, r) VALUES (0, 1, 2);
cqlsh:ks> UPDATE t SET r = 3 WHERE p = 0 AND c = 1 AND c = 2 AND r = 2;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:ks> UPDATE t SET r = 3 WHERE p = 0 AND c = 1 AND c = 2 IF r = 2;

 [applied] | r
-----------+---
      True | 2

cqlsh:ks> select * from t;

 p | c | r
---+---+---
 0 | 1 | 2

(1 rows)
cqlsh:ks> UPDATE t SET r = 4 WHERE p = 0 AND c = 1 IF r = 2;

 [applied] | r
-----------+---
      True | 2

cqlsh:ks> select * from t;

 p | c | r
---+---+---
 0 | 1 | 4

(1 rows)

This could be fixed by backporting the change that rejects such queries.

@bhalevy
Copy link
Member

bhalevy commented Jul 30, 2023

A user just reported this on the users forum:

I have set up a 3-node Scylla cluster in GCP (machine type: n1-highmem-8(8 vCPU, 52 GB), version of scylla-server: 4.5.1-0.20211024.4c0eac049, OS: Linux 8-gcp #24~20.04.1-Ubuntu SMP Mon Sep 12 06:14:01 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux). We have a keyspace, and here is the information about the keyspace:

mykeyspace | True | {‘class’: ‘org.apache.cassandra.locator.SimpleStrategy’, ‘replication_factor’: ‘2’}

The Scylla instance sometimes restarts, and we see this message in the logs:

Jul 25 18:14:16 db-scylla1 scylla[2190565]: Segmentation fault on shard 4.
Jul 25 18:14:16 db-scylla1 scylla[2190565]: Backtrace:
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3f75e08
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3fa7ee6
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x7f63611c81df
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x1c6f863
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x217a7e2
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3f88c4f
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3f89e37
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3fa8488
Jul 25 18:14:16 db-scylla1 scylla[2190565]: 0x3f5438a
Jul 25 18:14:16 db-scylla1 scylla[2190565]: /opt/scylladb/libreloc/libpthread.so.0+0x93f8
Jul 25 18:14:16 db-scylla1 scylla[2190565]: /opt/scylladb/libreloc/libc.so.6+0x101902
Jul 25 18:19:34 db-scylla1 scylla[2212677]: Scylla version 4.5.1-0.20211024.4c0eac049 with build-id e0df888020bbfef43aee10264ff14c85581609f7 starting …
Jul 25 18:19:34 db-scylla1 scylla[2212677]: command used: "/usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --reserve-memory 4G --io-properties-file=>
Jul 25 18:19:34 db-scylla1 scylla[2212677]: parsed command line options: [log-to-syslog: 1, log-to-stdout: 0, default-log-level: info, network-stack: posix, reserve-memory: 4G, io-properties-file: /e>

Decoded:

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:758
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:788
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:800
 (inlined by) seastar::sigsegv_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:3593
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:3579
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:3575
?? ??:0
__normal_iterator at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/stl_iterator.h:979
 (inlined by) std::vector<nonwrapping_interval<clustering_key_prefix>, std::allocator<nonwrapping_interval<clustering_key_prefix> > >::begin() const at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/stl_vector.h:821
 (inlined by) std::vector<nonwrapping_interval<clustering_key_prefix>, std::allocator<nonwrapping_interval<clustering_key_prefix> > >::front() const at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/stl_vector.h:1135
 (inlined by) cql3::statements::cas_request::find_old_row(cql3::statements::cas_row_update const&) const at ./cql3/statements/cas_request.cc:163
 (inlined by) cql3::statements::cas_request::applies_to() const at ./cql3/statements/cas_request.cc:131
 (inlined by) cql3::statements::cas_request::apply(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, query::partition_slice const&, long) at ./cql3/statements/cas_request.cc:141
service::storage_proxy::cas(seastar::lw_shared_ptr<schema const>, seastar::shared_ptr<service::cas_request>, seastar::lw_shared_ptr<query::read_command>, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > >, service::storage_proxy::coordinator_query_options, db::consistency_level, db::consistency_level, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, bool) at ./service/storage_proxy.cc:4506
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2263
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2672
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2831
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4025
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>(std::__invoke_other, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>, void>::type std::__invoke_r<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291
std::function<void ()>::operator()() const at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:622
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:60

@bhalevy
Copy link
Member

bhalevy commented Jul 30, 2023

@cvybhu how come no backport is needed to 5.1.13 but it is required for newer releases?

@bhalevy
Copy link
Member

bhalevy commented Jul 30, 2023

@scylladb/scylla-maint please consider backport

@nyh
Copy link
Contributor

nyh commented Jul 30, 2023

@cvybhu how come no backport is needed to 5.1.13 but it is required for newer releases?

I have the same question. How can 5.1 be immune to this issue, but both older and newer be vulnerable.

@bhalevy please note that even if I backport this to 5.2 (which I plan to do now) and/or 5.1, it won't help the user you quoted who was using the ancient 4.5. We'll not backport anything to that.

@bhalevy
Copy link
Member

bhalevy commented Jul 30, 2023

@bhalevy please note that even if I backport this to 5.2 (which I plan to do now) and/or 5.1, it won't help the user you quoted who was using the ancient 4.5. We'll not backport anything to that.

Yes, I've already let him know that 4.5.x is not support any more and they'd need to upgrade to a newer (best to latest available) version.

Also, this fix will turn the crash into a CQL error, which is much better, bu it will not solve the root cause, which is apparently a bad CQL LWT query, which they's need to fix in any case.
Fixing the query should prevent the crash.

nyh added a commit that referenced this issue Jul 30, 2023
LWT queries with empty clustering range used to cause a crash.
For example in:
```cql
UPDATE tab SET r = 9000 WHERE p = 1  AND c = 2 AND c = 2000 IF r = 3
```
The range of `c` is empty - there are no valid values.

This caused a segfault when accessing the `first` range:
```c++
op.ranges.front()
```

Cassandra rejects such queries at the preparation stage. It doesn't allow two `EQ` restriction on the same clustering column when an IF is involved.
We reject them during runtime, which is a worse solution. The user can prepare a query with `c = ? AND c = ?`, and then run it, but unexpectedly it will throw an `invalid_request_exception` when the two bound variables are different.

We could ban such queries as well, we already ban the usage of `IN` in conditional statements. The problem is that this would be a breaking change.

A better solution would be to allow empty ranges in `LWT` statements. When an empty range is detected we just wouldn't apply the change. This would be a larger change, for now let's just fix the crash.

Fixes: #13129

Closes #14429

* github.com:scylladb/scylladb:
  modification_statement: reject conditional statements with empty clustering key
  statements/cas_request: fix crash on empty clustering range in LWT

(cherry picked from commit 49c8c06)
@nyh
Copy link
Contributor

nyh commented Jul 30, 2023

Backported to 5.2 (bc127b8) - the cherry-pick required trivial but still non-empty changes to go in, so I tested manually that it compiles and works with the test included in the patch.

Didn't backport to 5.1 yet, waiting for a clarification from @cvybhu.

denesb pushed a commit that referenced this issue Jul 31, 2023
LWT queries with empty clustering range used to cause a crash.
For example in:
```cql
UPDATE tab SET r = 9000 WHERE p = 1  AND c = 2 AND c = 2000 IF r = 3
```
The range of `c` is empty - there are no valid values.

This caused a segfault when accessing the `first` range:
```c++
op.ranges.front()
```

Cassandra rejects such queries at the preparation stage. It doesn't allow two `EQ` restriction on the same clustering column when an IF is involved.
We reject them during runtime, which is a worse solution. The user can prepare a query with `c = ? AND c = ?`, and then run it, but unexpectedly it will throw an `invalid_request_exception` when the two bound variables are different.

We could ban such queries as well, we already ban the usage of `IN` in conditional statements. The problem is that this would be a breaking change.

A better solution would be to allow empty ranges in `LWT` statements. When an empty range is detected we just wouldn't apply the change. This would be a larger change, for now let's just fix the crash.

Fixes: #13129

Closes #14429

* github.com:scylladb/scylladb:
  modification_statement: reject conditional statements with empty clustering key
  statements/cas_request: fix crash on empty clustering range in LWT

(cherry picked from commit 49c8c06)
@cvybhu
Copy link
Contributor Author

cvybhu commented Aug 9, 2023

Didn't backport to 5.1 yet, waiting for a clarification from @cvybhu.

5.1 throws an error, so there's nothing to fix:

Connected to c1-5.1 at 127.0.0.1:9042
[cqlsh 6.0.9 | Scylla 5.1.15-0.20230730.12966e84352e | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> use ks;
cqlsh:ks> create table t (p int, c int , r int , primary key (p, c));
cqlsh:ks> insert into t (p, c, r) VALUES (0, 1, 2);
cqlsh:ks> UPDATE t SET r = 3 WHERE p = 0 AND c = 1 AND c = 2 AND r = 2;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:ks> UPDATE t SET r = 3 WHERE p = 0 AND c = 1 AND c = 2 IF r = 2;
InvalidRequest: Error from server: code=2200 [Invalid query] message="clustering key ranges empty - probably caused by an unset value"
cqlsh:ks> 

The check is from another PR that fixed a similar problem: #13133

@DoronArazii
Copy link

IIUC, it's not required in 5.1 - removing labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants