Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/storage_service/keyspace_upgrade_sstables/{keyspace} crashes the server if keyspace uses tablets #17452

Closed
lkshminarayanan opened this issue Feb 21, 2024 · 1 comment
Assignees
Labels
area/compaction area/storage Storage space reduction and other storage enhancements area/upgrade Backport candidate
Milestone

Comments

@lkshminarayanan
Copy link
Member

The PR #17335 fixes an issue in the sstables upgrade flow : it sends nullptr for owned_ranges to cleanup compaction when the keyspace uses tablets. But this has introduced another bug in get_ranges_for_invalidation, where the function crashes as it tries to de-reference owned_ranges, which in this case is a nullptr.

Resulting crash on upgrade of a keyspace with tablets :

INFO  2024-02-21 22:39:10,843 [shard 0:strm] api - upgrade_sstables: keyspace=cql_test_1708535288739 tables={table{name=cql_test_1708535288744, id=c938eb70-d0db-11ee-aa20-f7e40560782c}} exclude_current_version=true
INFO  2024-02-21 22:39:33,831 [shard 0:strm] api - upgrade_sstables: keyspace=cql_test_1708535288739 tables={table{name=cql_test_1708535288744, id=c938eb70-d0db-11ee-aa20-f7e40560782c}} exclude_current_version=false
INFO  2024-02-21 22:39:33,831 [shard 0:strm] compaction - [Upgrade cql_test_1708535288739.cql_test_1708535288744 fbf05170-d0db-11ee-aa20-f7e40560782c] Cleaning [/home/lkshminarayanan/Repo/scylladb-workdir/workdir/data/cql_test_1708535288739/cql_test_1708535288744-c938eb70d0db11eeaa20f7e40560782c/me-3gdt_1blk_4j3ww2l4zpw8xj6ur0-big-Data.db:level=0:origin=memtable]
Segmentation fault on shard 0.
Backtrace:
  0x30ee7bc
  0x311dfb2
  /usr/lib/libc.so.6+0x3e70f
  0x334cd66
  0x1ad682c
  0x1ad6687
  0x1ad344b
  0x1ad026b
  0x1aca5d4
  0x316fe96

@lkshminarayanan lkshminarayanan self-assigned this Feb 21, 2024
@lkshminarayanan lkshminarayanan added area/storage Storage space reduction and other storage enhancements area/compaction area/upgrade area/tablets labels Feb 21, 2024
@lkshminarayanan lkshminarayanan added this to the 6.0 milestone Feb 21, 2024
lkshminarayanan added a commit to lkshminarayanan/scylladb that referenced this issue Feb 22, 2024
Commit 7a98877 sets owned_ranges_ptr to null when initiating an upgrade
on keyspaces with tablets. owned_ranges_ptr is set to null because
cleanup is not required for tables using tablets. As a result, cache
invalidation is also not required for any ranges. Update
get_ranges_for_invalidation to handle null values of owned_ranges_ptr by
returning empty ranges for cache invalidation.

Fixes scylladb#17452

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
@lkshminarayanan
Copy link
Member Author

This will be fixed by #17502

xtrey pushed a commit to xtrey/scylladb that referenced this issue Mar 1, 2024
…anup work

Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes scylladb#17501.
Fixes scylladb#17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#17502
denesb pushed a commit that referenced this issue Mar 13, 2024
…anup work

Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #17502

(cherry picked from commit f07c233)
dgarcia360 pushed a commit to dgarcia360/scylla that referenced this issue Apr 30, 2024
…anup work

Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes scylladb#17501.
Fixes scylladb#17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#17502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compaction area/storage Storage space reduction and other storage enhancements area/upgrade Backport candidate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants