Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental compaction is not working in cleanup #14035

Closed
raphaelsc opened this issue May 25, 2023 · 3 comments
Closed

Incremental compaction is not working in cleanup #14035

raphaelsc opened this issue May 25, 2023 · 3 comments
Assignees
Labels
Milestone

Comments

@raphaelsc
Copy link
Member

Since c7826aa, cleanup compacts entire runs, not only the individual fragments. Turns out cleanup is incorrectly holding reference to input runs, meaning incremental compaction cannot release space earlier. So the promise of low temporary space requirement is not hold when running cleanup with ICS.

@raphaelsc raphaelsc self-assigned this May 25, 2023
raphaelsc added a commit to raphaelsc/scylla that referenced this issue May 25, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes scylladb#14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone May 30, 2023
raphaelsc added a commit to raphaelsc/scylla that referenced this issue May 30, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes scylladb#14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii
Copy link

@scylladb/scylla-maint please consider backport to 2022.2 + 2023.1
@raphaelsc do we need it any further than that?

denesb pushed a commit that referenced this issue Jun 6, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0)
@denesb
Copy link
Contributor

denesb commented Jun 6, 2023

Backported to 5.3. Doesn't apply cleanly to 5.2. @raphaelsc please open backport PRs for 5.2 and 5.1.

@mykaul mykaul added the backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed label Jun 7, 2023
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 9, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes scylladb#14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14038

(cherry picked from commit 23443e0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 9, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes scylladb#14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14038

(cherry picked from commit 23443e0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 9, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes scylladb#14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#14038

(cherry picked from commit 23443e0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
denesb pushed a commit that referenced this issue Jun 13, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14193
denesb pushed a commit that referenced this issue Jun 13, 2023
After c7826aa, sstable runs are cleaned up together.

The procedure which executes cleanup was holding reference to all
input sstables, such that it could later retry the same cleanup
job on failure.

Turns out it was not taking into account that incremental compaction
will exhaust the input set incrementally.

Therefore cleanup is affected by the 100% space overhead.

To fix it, cleanup will now have the input set updated, by removing
the sstables that were already cleaned up. On failure, cleanup
will retry the same job with the remaining sstables that weren't
exhausted by incremental compaction.

New unit test reproduces the failure, and passes with the fix.

Fixes #14035.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14038

(cherry picked from commit 23443e0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14195
@denesb
Copy link
Contributor

denesb commented Jun 13, 2023

5.2 and 5.1 backports are queued.

@denesb denesb removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants