Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[branch-5.1] - minimal fix for crash caused by empty primary key range in LWT update #13133

Merged

Conversation

cvybhu
Copy link
Contributor

@cvybhu cvybhu commented Mar 9, 2023

This is another attempt to fix #13001 on branch-5.1.

In #13001 we found a test case which causes a crash on branch-5.1 because it didn't handle UNSET_VALUE properly:

def test_unset_insert_where(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?)')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])

def test_unset_insert_where_lwt(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?) IF NOT EXISTS')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])

This problem has been fixed on master by PR #12517. I tried to backport it to branch-5.1 (#13029), but this didn't go well - it was a big change that touched a lot of components. It's hard to make sure that it won't cause some unexpected issues.

Then I made a simpler fix for branch-5.1, which achieves the same effect as the original PR (#13057).
The problem is that this effect includes backwards incompatible changes - it bans UNSET_VALUE in some places that branch-5.1 used to allow.

Breaking changes are bad, so I made this PR, which does an absolutely minimal change to fix the crash.
It adds a check the moment before the crash would happen.

To make sure that everything works correctly, and to detect any possible breaking changes, I wrote a bunch of tests that validate the current behavior.
I also ported some tests from the master branch, at least the ones that were in line with the behavior on branch-5.1.

Doing an LWT INSERT/UPDATE and passing UNSET_VALUE
for the primary key column used to caused a crash.

This is a minimal fix for this crash.

Crash backtrace pointed to a place where
we tried doing .front() on an empty vector
of primary key ranges.

I added a check that the vector isn't empty.
If it's empty then let's throw an error
and mention that it's most likely
caused by an unset value.

This has been fixed on master,
but the PR that fixed it introduced
breaking changes, which I don't want
to add to branch-5.1.

This fix is absolutely minimal
- it performs the check at the
last moment before a crash.

It's not the prettiest, but it works
and can't introduce breaking changes,
because the new code gets activated
only in cases that would've caused
a crash.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add some tests which test what happens when an UNSET_VALUE
is passed to an INSERT statement.

Passing it for partition key column is impossible
because python driver doesn't allow it.

Passing it for clustering key column causes Scylla
to silently ignore the INSERT.

Passing it for a regular or static column
causes this column to remain unchanged,
as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add tests which test INSERT statements with IF NOT EXISTS,
when an UNSET_VLAUE is passed for some column.
The test are similar to the previous ones done for simple
INSERTs without IF NOT EXISTS.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement with IF EXISTS condition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Test what happens when an UNSET_VALUE is passed to
an UPDATE statement with an LWT condition.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
I copied cql-pytest tests from the master branch,
at least the ones that were compatible with branch-5.1

Some of them were expecting an InvalidRequest exception
in case of UNSET VALUES being present in places that
branch-5.1 allows, so I skipped these tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
@scylladb-promoter
Copy link
Contributor

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good option for the backport. I tend to think it's the better option, but if you prefer the other option, I don't feel strongly about this one.

@@ -119,6 +119,9 @@ std::optional<mutation> cas_request::apply(foreign_ptr<lw_shared_ptr<query::resu

const update_parameters::prefetch_data::row* cas_request::find_old_row(const cas_row_update& op) const {
static const clustering_key empty_ckey = clustering_key::make_empty();
if (_key.empty()) {
throw exceptions::invalid_request_exception("partition key ranges empty - probably caused by an unset value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is acceptable for branch 5.1, but please remember to also add such a check in master (maybe as on_internal_error), because if _key is empty because of some unknown bug, this code will crash - also on master.

@scylladb-promoter
Copy link
Contributor

@cvybhu
Copy link
Contributor Author

cvybhu commented Mar 10, 2023

CI state FAILURE - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/4862/

The CI failure is the same as in #13057. In there it failed 2 times, but then there was a different run for a non-master branch (scylla-5.1/job/scylla-ci instead of releneg/job/Scylla-CI) and this different CI build passed. @benipeled can I trigger the CI for a non-master branch somehow?

@benipeled
Copy link
Contributor

CI state FAILURE - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/4862/

The CI failure is the same as in #13057. In there it failed 2 times, but then there was a different run for a non-master branch (scylla-5.1/job/scylla-ci instead of releneg/job/Scylla-CI) and this different CI build passed. @benipeled can I trigger the CI for a non-master branch somehow?

Triggered - https://jenkins.scylladb.com/job/scylla-5.1/job/scylla-ci/3

Until we make the switch officially (hopefully next week) you can trigger it manually - https://jenkins.scylladb.com/job/scylla-5.1/job/scylla-ci (this one is for 5.1, we have the same on 5.2 and master folders)

@scylladb-promoter
Copy link
Contributor

@nyh
Copy link
Contributor

nyh commented Mar 12, 2023

Merged to next-5.1.
If I remember correctly, despite the "Closes" on the merge commit, this issue won't automatically close when the fix is promoted to brach-5.1, so we'll need to do it manually.

@cvybhu what about backporting to 5.2 and/or 5.0? We could (?) backport the original fix to 5.2 and this new simplified fix to 5.0, but I don't know if you had some other plan.

@nyh
Copy link
Contributor

nyh commented Mar 12, 2023

@cvybhu what about backporting to 5.2 and/or 5.0? We could (?) backport the original fix to 5.2 and this new simplified fix to 5.0, but I don't know if you had some other plan.

Correction: 5.2 is not vulnerable because it already had the UNSET-handling changes.
So 5.0 is the only one left to consider.

@scylladb-promoter scylladb-promoter merged commit 7007c94 into scylladb:branch-5.1 Mar 12, 2023
@cvybhu
Copy link
Contributor Author

cvybhu commented Mar 14, 2023

Merged to next-5.1. If I remember correctly, despite the "Closes" on the merge commit, this issue won't automatically close when the fix is promoted to brach-5.1, so we'll need to do it manually.

@cvybhu what about backporting to 5.2 and/or 5.0? We could (?) backport the original fix to 5.2 and this new simplified fix to 5.0, but I don't know if you had some other plan.

I'll add a backport to 5.0

denesb added a commit that referenced this pull request May 8, 2023
…ey range in LWT update' from Jan Ciołek

In #13001 we found a test case which causes a crash because it didn't handle `UNSET_VALUE` properly:

```python3
def test_unset_insert_where(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?)')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])

def test_unset_insert_where_lwt(cql, table2):
    p = unique_key_int()
    stmt = cql.prepare(f'INSERT INTO {table2} (p, c) VALUES ({p}, ?) IF NOT EXISTS')
    with pytest.raises(InvalidRequest, match="unset"):
        cql.execute(stmt, [UNSET_VALUE])
```

This PR does an absolutely minimal change to fix the crash.
It adds a check the moment before the crash would happen.

To make sure that everything works correctly, and to detect any possible breaking changes, I wrote a bunch of tests that validate the current behavior.
I also ported some tests from the `master` branch, at least the ones that were in line with the behavior on `branch-5.0`.

The changes are the same as in #13133, just cherry-picked to `branch-5.0`

Closes #13178

* github.com:scylladb/scylladb:
  cql-pytest/test_unset: port some tests from master branch
  cql-pytest/test_unset: test unset value in UPDATEs with LWT conditions
  cql-pytest/test_unset: test unset value in UPDATEs with IF EXISTS
  cql-pytest/test_unset: test unset value in UPDATE statements
  cql-pytest/test_unset: test unset value in INSERTs with IF NOT EXISTS
  cql-pytest/test_unset: test unset value in INSERT statements
  cas_request: fix crash on unset value in primary key with LWT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants