Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compaction: Fix sstable cleanup after resharding on refresh #14147

Conversation

raphaelsc
Copy link
Member

Problem can be reproduced easily:

  1. wrote some sstables with smp 1
  2. shut down scylla
  3. moved sstables to upload
  4. restarted scylla with smp 2
  5. ran refresh (resharding happens, adds sstable to cleanup set and never removes it)
  6. cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding.

Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files).

Dtest: https://github.com/scylladb/scylla-dtest/pull/3206

Fixes #14001.

Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@raphaelsc raphaelsc requested a review from nyh as a code owner June 6, 2023 01:16
@raphaelsc raphaelsc requested a review from bhalevy June 6, 2023 01:16
@scylladb-promoter
Copy link
Contributor

@@ -1590,6 +1590,10 @@ bool needs_cleanup(const sstables::shared_sstable& sst,

bool compaction_manager::update_sstable_cleanup_state(table_state& t, const sstables::shared_sstable& sst, const dht::token_range_vector& sorted_owned_ranges) {
auto& cs = get_compaction_state(&t);
if (sst->is_shared()) {
throw std::runtime_error(format("Shared SSTable {} cannot be marked as requiring cleanup, as it can only be processed by resharding",
sst->get_filename()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we now have a printer for shared_sstable that prints also the sstable origin, that might be relevant here, so better pass just sst.

co_await d.filter_sstables([&] (sstables::shared_sstable sst) -> future<bool> {
if (table.update_sstable_cleanup_state(sst, *sorted_owned_ranges_ptr)) {
if (needs_cleanup(sst, *sorted_owned_ranges_ptr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the reason this wasn't the root cause for #14001 is that here we filter the unshared sstables, meaning it's benign to cleanup those sstables later on from the multi-shard sharing perspective.
But the hunk is still needed.

The original intent was to detect sstables requiring cleanup on start-up, when populating the table.
But we're not there yet as we don't have the token metadata in this early stage.

Copy link
Member

@bhalevy bhalevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks!
Just one nit regarding the runtime error message.

bhalevy pushed a commit to bhalevy/scylla that referenced this pull request Jun 11, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14147

(cherry picked from commit 156d771)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
denesb pushed a commit that referenced this pull request Jun 13, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes #14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14147

(cherry picked from commit 156d771)
bhalevy pushed a commit to bhalevy/scylla that referenced this pull request Jun 14, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14147

(cherry picked from commit 156d771)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy pushed a commit to bhalevy/scylla that referenced this pull request Aug 31, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14147

(cherry picked from commit 156d771)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy pushed a commit to bhalevy/scylla that referenced this pull request Sep 19, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14147

(cherry picked from commit 156d771)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy pushed a commit to bhalevy/scylla that referenced this pull request Nov 13, 2023
Problem can be reproduced easily:
1) wrote some sstables with smp 1
2) shut down scylla
3) moved sstables to upload
4) restarted scylla with smp 2
5) ran refresh (resharding happens, adds sstable to cleanup
set and never removes it)
6) cleanup (tries to cleanup resharded sstables which were
leaked in the cleanup set)

Bumps into assert "Assertion `!sst->is_shared()' failed", as
cleanup picks a shared sstable that was leaked and already
processed by resharding.

Fix is about not inserting shared sstables into cleanup set,
as shared sstables are restricted to resharding and cannot
be processed later by cleanup (nor it should because
resharding itself cleaned up its input files).

Dtest: scylladb/scylla-dtest#3206

Fixes scylladb#14001.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14147

(cherry picked from commit 156d771)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants