Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staging sstables are not cleaned up in cleanup compaction potentially causing data resurrection #9559

Closed
bhalevy opened this issue Nov 1, 2021 · 20 comments · Fixed by #10657 or #13812
Assignees
Labels
area/compaction Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed symptom/data resurrection type/bug
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Nov 1, 2021

Currently (scylla 4.6.dev 034f79c), when cleanup compaction is performed via compaction_manager::perform_cleanup
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/compaction/compaction_manager.cc#L844

We select the sstables to compact with get_candidates
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/compaction/compaction_manager.cc#L185

that calls table::in_strategy_sstables
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/table.cc#L1140-L1146

that filters out staging sstables that are undergoing view building.

This means that when view building completes and the sstables are moved out of staging
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/db/view/view_update_generator.cc#L128
calling https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/table.cc#L2214

They may contain tokens that are no longer owned by the current node, if a topology change occurred after the said sstables were written and before view build completed.

@bhalevy
Copy link
Member Author

bhalevy commented Nov 1, 2021

I think that the right way to go here is not to fix the cleanup case specifically, but rather allow compaction of sstables in staging, and mark them for deletion when compaction is done, as we do in the delete_sstables_for_interrupted_compaction case:
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/compaction/compaction.cc#L778-L786

Then, when view building is done, the sstable will be auto-deleted in close_files:
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/sstables/sstables.cc#L2443-L2462

and move_sstables_from_staging can be skipped.

@bhalevy
Copy link
Member Author

bhalevy commented Nov 1, 2021

An important corner case that needs to be handled is keeping the sstable around until view building successfully completes.
For this we may want to introduce a new mark_for_deletion code:
https://github.com/scylladb/scylla/blob/bd0b573a9252045bcc86bc48eb919ff807c740a4/sstables/sstables.hh#L528-L532

e.g. tentative, that can be examined and turned into marked by the view building code, but will not cause auto-deletion, by itself, in sstable::close_files().

@bhalevy
Copy link
Member Author

bhalevy commented Nov 10, 2021

Cc @raphaelsc @psarna @nyh

@slivne slivne added this to the 4.8 milestone Dec 23, 2021
@bhalevy
Copy link
Member Author

bhalevy commented Jan 11, 2022

Simplified (and untested yet) idea: create the sstables with hardlinkes both in main table directory and under staging.
Normal reads, compaction, snapshot, etc. will not consider the staging directory at all and will treat the staging sstables in the main directory as any other sstable. The hardlink in the staging directory will only function to mark the sstables that need view building as it does today and when vie building is done it will just unlink the sstable unconditionally from the staging dir.
Compaction (and kind) will happen orthogonally and independently on the hardlink in the main directory regardless of staging.

Truncate will need to remove all the hardlinks in staging after stopping the view build.

Snapshot will consider only sstables in the main directory.

bhalevy added a commit to bhalevy/scylla that referenced this issue Feb 14, 2022
staging sstables need to be compacted as any other sstable.
In particular, they need to be subject to cleanup compaction
and scrubbing.

To achieve that, link them to the base dir when sealed.
When moved out of staging, table::move_sstables_from_staging
already unlinks the sstable from the staging subdirectory
if its get_dir() is set to the base dir.

Fix scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Feb 15, 2022
staging sstables need to be compacted as any other sstable.
In particular, they need to be subject to cleanup compaction
and scrubbing.

To achieve that, link them to the base dir when sealed.
When moved out of staging, table::move_sstables_from_staging
already unlinks the sstable from the staging subdirectory
if its get_dir() is set to the base dir.

Fix scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue May 23, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue May 23, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
xemul added a commit that referenced this issue May 24, 2022
Before touching any of this code for #9559,
that requires a change when loading sstables from the staging subdirectory,
simplify it using coroutines.

Closes #10609

* github.com:scylladb/scylla:
  replica: distributed_loader: reindent populate_keyspace
  replica: distributed_loader: coroutinize populate_keyspace
  replica: distributed_loader: reindent handle_sstables_pending_delete
  replica: distributed_loader: coroutinize handle_sstables_pending_delete
  replica: distributed_loader: reindent cleanup_column_family_temp_sst_dirs
  replica: distributed_loader: coroutinize cleanup_column_family_temp_sst_dirs
  replica: distributed_loader: reindent make_sstables_available
  replica: distributed_loader: coroutinize make_sstables_available
  sstable_directory: parallel_for_each_restricted: keep func alive across calls
  replica: distributed_loader: reindent reshape
  replica: distributed_loader: coroutinize reshape
  replica: distributed_loader: coroutinize reshard
  replica: distributed_loader: reindent run_resharding_jobs
  replica: distributed_loader: coroutinize run_resharding_jobs
  replica: distributed_loader: reindent distribute_reshard_jobs
  replica: distributed_loader: coroutinize distribute_reshard_jobs
  replica: distributed_loader: reindent collect_all_shared_sstables
  replica: distributed_loader: coroutinize collect_all_shared_sstables
  replica: distributed_loader: reindent process_sstable_dir
  replica: distributed_loader: coroutinize process_sstable_dir
bhalevy added a commit to bhalevy/scylla that referenced this issue May 25, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue May 25, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 5, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 7, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 7, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subsirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 12, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subdirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 13, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subdirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subdirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2022
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subdirectory will be simply unlinked.

Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
denesb added a commit that referenced this issue Jun 24, 2022
This series decouples the staging sstables from the table's sstable set.

The current behavior keeps the sstables in the staging directory until view building is done. They are readable as any other sstable, but fenced off from compaction, so they don't go away in the meanwhile.

Currently, when views are built, the sstables are moved into the main table directory where they will then be compacted normally.

The problem with this design is that the staging sstables are never compacted, in particular they won't get cleaned up or scrubbed.

The cleanup scenario open a backdoor for data resurrection when the staging sstables are moved after view building while possibly containing stale partitions (#9559) which will not be cleaned up until next time cleanup compaction is performed.

With this series, SSTables that are created in or moved to the staging sub-directory are "cloned" into the base table directory by hard-linking the components there and creating a new sstable object which loads the cloned files.

The former, in the staging directory is used solely for view building and is not added to the table's sstable set, while the latter, its clone, behaves like any other sstable and is added either to the regular or maintenance set and is read and compacted normally.

When view building is done, instead of moving the staging sstable into the table's base directory, it is simply unlinked.
If its "clone" wasn't compacted away yet, then it will just remain where it is, exactly like it would be after it was moved there in the present state of things.  If it was already compacted and no longer exists, then unlinking will then free its storage.

Note that snapshot is based on the sstables listed by the table, which do not include the staging sstables with this change.
But that shouldn't matter since even today, the sstables in the snapshot has no notion of "staging" directory and it is expected that the MV's are either updated view `nodetool refresh` if restoring sstables from snapshot using the uploads dir, or if restoring the whole table from backup - MV's are effectively expected to be rebuilt from scratch (they are not included in automatic snapshots anyway since we don't have snapshot-coherency across tables).

A fundamental infrastructure change was done to achieve that which is to change the sstable_list which was a std::unordered_set<shared_sstable> into a std::unordered_map<generation_type, shared_sstable> that keeps the shared_sstable objects indexed by generation number (that must be unique).  With this model, sstables are supposed to be searched by the generation number, not by their pointer, since when the staging sstable is clones, there will be 2 shared_sstable objects with the same generation (and different `dir()`) and we must distinguish between them.

Special care was taken to throw a runtime_error exception if when looking up a shared sstable and finding another one with the same generation, since they must never exist in the same sstable_map.

Fixes #9559

Closes #10657

* github.com:scylladb/scylla:
  table: clone staging sstables into table dir
  view_update_generator: discover_staging_sstables: reindent
  table: add get_staging_sstables
  view_update_generator: discover_staging_sstables: get shared table ptr earlier
  distributed_loader: populate table directory first
  sstables: time_series_sstable_set: insert: make exception safe
  sstables: move_to_new_dir: fix debug log message
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

\Refs scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4db961e)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 11, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

\Refs scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4db961e)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
denesb added a commit that referenced this issue Aug 9, 2023
…om Benny Halevy

This mini-series introduces dht::tokens_filter and uses it for consuming staging sstable in the view_update_generator.

The tokens_filter uses the token ranges owned by the current node, as retrieved by get_keyspace_local_ranges.

Refs #9559

Closes #11932

* github.com:scylladb/scylladb:
  db: view_update_generator: always clean up staging sstables
  compaction: extract incremental_owned_ranges_checker out to dht

(cherry picked from commit 3aff59f)
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 31, 2023
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

\Refs scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4db961e)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Aug 31, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Aug 31, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Aug 31, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
bhalevy added a commit to bhalevy/scylla that referenced this issue Sep 19, 2023
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

\Refs scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4db961e)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Sep 19, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Sep 19, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Sep 19, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
@mykaul mykaul added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Nov 1, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 13, 2023
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.

With that, staging sstables will eventually be cleaned up
via regular compaction.

\Refs scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4db961e)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Nov 13, 2023
…m Benny Halevy

This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (scylladb#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (scylladb#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

\Closes scylladb#12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor

(cherry picked from commit f1bbf70)
bhalevy added a commit to bhalevy/scylla that referenced this issue Nov 13, 2023
…eaned up

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a5a8020)
bhalevy pushed a commit to bhalevy/scylla that referenced this issue Nov 13, 2023
…m Benny Halevy

cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

\Fixes scylladb#9559

\Closes scylladb#13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers

(cherry picked from commit 3b424e3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compaction Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed symptom/data resurrection type/bug
Projects
None yet
7 participants