New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enhancement proposal for running cleanup after scaling #1300

Merged

zimnx merged 1 commit into scylladb:master from zimnx:mz/cleanup-proposal

Aug 9, 2023

Collaborator

zimnx commented Jul 24, 2023 •

edited

The Operator supports both horizontal and vertical scaling, but the procedure isn’t in sync with ScyllaDB documentation,
because the cleanup part is not implemented. It’s important because upon scaling, the stored on node disk might become
stale taking up unnecessary space. Operator should support running the cleanup to keep disk space as low as possible and ensure clusters are stable and reliable over time.

Details can be found in actual enhancement.

zimnx added the kind/design label

zimnx added this to the v1.10 milestone

zimnx requested a review from tnozicka

July 24, 2023 13:19

Contributor

mykaul commented Jul 25, 2023

cc @bhalevy - can you review the suggestion?

Member

bhalevy commented Jul 25, 2023

Since this PR only contains the document for the proposal not the enhacement itself, how about renaming the PR subject to be the same as the patch: Enhancement proposal for running cleanup after scaling?

Member

bhalevy commented Jul 25, 2023

Also, what is this file for: enhancements/proposals/dir_placeholder.delete_me_with_first_proposal?

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              ## Summary
+              The Operator supports both horizontal and vertical scaling, but the procedure isn’t in sync with ScyllaDB documentation,
+              because the cleanup part is not implemented. It’s important because upon scaling, the stored on node disk might become

Member

bhalevy Jul 25, 2023

nit: stored data?

Collaborator Author

zimnx Jul 27, 2023

done

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              The Operator supports both horizontal and vertical scaling, but the procedure isn’t in sync with ScyllaDB documentation,
+              because the cleanup part is not implemented. It’s important because upon scaling, the stored on node disk might become
+              stale taking up unnecessary space. Operator should support running the cleanup to keep disk space as low as possible and
+              ensure clusters are stable and reliable over time.

Member

bhalevy Jul 25, 2023

The most serious problem with not running cleanup is the possibility of data resurrection.
This happens when tombstones delete data on other nodes, and the tombstone is eventually purged, leaving behind neither the data nor the tombstone. Then, decommission or removenode, may expose the stale data that wasn't cleaned up, when the token ownership is moved back to the original node, that might still have the data. Since the tombstone that deleted it was purged, the data will get resurrected.

Collaborator Author

zimnx Jul 27, 2023

Added mention about the data resurrection to Motivation paragraph

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              the node disks. When nodes are added or removed from the cluster, they gain or lose some tokens, which can result in
+              files stored on the node disks still containing data associated with lost tokens. Over time, this can lead to a build-up
+              of unnecessary data and cause disk space issues. By running node cleanup after scaling, these files can be cleared,
+              freeing up disk space.

Member

bhalevy Jul 25, 2023

See above. It's more than just cleaning up disk space.

Collaborator Author

zimnx Jul 27, 2023

Added mention about the data resurrection to Motivation paragraph

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated


		### Non-Goals

		Running node cleanup during off-peak hours.

Member

bhalevy Jul 25, 2023

We can add: Running node cleanup after vertical scaling.
(Since it is not needed in this case)

Collaborator Author

zimnx Jul 27, 2023 •

edited

It's a drawback, not a non-goal. Mentioned it in Drawbacks paragraph

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              tokens for each node as an annotation in the member service.
+              In addition, a new controller in Operator will be responsible for managing Jobs that will execute a cleanup on nodes
+              that require it. The trigger for the Job creation will be a mismatch between the current and latest hash. The controller
+              will ensure that there will be only one cleanup Job running at the same time to prevent extraneous load on the cluster

Member

bhalevy Jul 25, 2023

nit: s/Job/job/

Should be one cleanup job per node I guess.
But we can (and should) run cleanup on all nodes in parallel.

Member

tnozicka Jul 26, 2023

yep, I think we agreed on running in parallel. the proposal should reflect it.

Collaborator Author

zimnx Jul 27, 2023

fixed

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              This design doesn’t take into account whether a node received a token or lost it, it only detects ring changes and
+              reacts with a cleanup trigger upon change. When a node is decommissioned, tokens are redistributed and nodes getting
+              them doesn’t require a cleanup since there’s no stale data on their disks associated with these new tokens. Operator

Member

bhalevy Jul 25, 2023

nit: s/doesn't/don't/

Collaborator Author

zimnx Jul 27, 2023

done

bhalevy reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              #### Cleanup is not run when necessary
+              When keyspace RF is decreased, nodes no longer need to keep extraneous copies of the data, cleanup could free the disks.
+              Approach designed here doesn't detect this case because the token ring is not changed.

Member

bhalevy Jul 25, 2023

Similarly, cleanup is needed after changing replication strategy, e.g. from Simple to NetworkTopology as token ownership of secondary replica will change.

By the way, speaking of automation in this respect - when increasing RF, repair need to run in order to build the additional replicas. This is probably not automated as well IIUC.

Collaborator Author

zimnx Jul 27, 2023

Added mention about it.

tnozicka added the priority/important-soon label

tnozicka reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated

+              tokens for each node as an annotation in the member service.
+              In addition, a new controller in Operator will be responsible for managing Jobs that will execute a cleanup on nodes
+              that require it. The trigger for the Job creation will be a mismatch between the current and latest hash. The controller
+              will ensure that there will be only one cleanup Job running at the same time to prevent extraneous load on the cluster

Member

tnozicka Jul 26, 2023

yep, I think we agreed on running in parallel. the proposal should reflect it.

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

Member

tnozicka commented Jul 26, 2023

Also, what is this file for: enhancements/proposals/dir_placeholder.delete_me_with_first_proposal?

git doesn't track folders, so this was used as a placeholder for that directory until we have the first proposal here which should remove it.

zimnx force-pushed the mz/cleanup-proposal branch from c6bb3c0 to 9ff0699 Compare

July 27, 2023 09:10

zimnx changed the title ~~Enhancement for running cleanup after scaling~~ Enhancement proposal for running cleanup after scaling

zimnx force-pushed the mz/cleanup-proposal branch 2 times, most recently from 9f4da8f to 377fe05 Compare

July 27, 2023 09:25

zimnx requested review from tnozicka and bhalevy

July 27, 2023 09:27

tnozicka mentioned this pull request

Run node cleanup after scaling the ScyllaCluster #1294

Merged

zimnx force-pushed the mz/cleanup-proposal branch from 377fe05 to 4475e8b Compare

July 31, 2023 17:22

tnozicka reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

tnozicka reviewed

View reviewed changes

enhancements/proposals/cleanup_after_scaling.md Outdated Show resolved Hide resolved

zimnx force-pushed the mz/cleanup-proposal branch from 4475e8b to e169e66 Compare

August 3, 2023 09:48

Member

rzetelskik commented Aug 8, 2023 •

edited

I think that instead of adding a file with the proposal directly to the top directory, we should adopt a similar naming scheme to the original KEP repository, i.e. a subdirectory prefixed with a tracking issue number and a readme inside of it. I believe it provides better readability, as well as a reference to the origin of the proposal. See What is the number at the beginning of the KEP name?.


          Enhancement proposal for running cleanup after scaling

090c893

zimnx force-pushed the mz/cleanup-proposal branch from e169e66 to 090c893 Compare

August 8, 2023 13:53

scylla-operator-bot bot added the size/L label

Collaborator Author

zimnx commented Aug 8, 2023

I think that instead of adding a file with the proposal directly to the top directory, we should adopt a similar naming scheme to the original KEP repository, i.e. a subdirectory prefixed with a tracking issue number and a readme inside of it. I believe it provides better readability, as well as a reference to the origin of the proposal. See What is the number at the beginning of the KEP name?.

I agree, in addition having a separate directory allows for external file injection like images, yamls etc. Moved it.

zimnx enabled auto-merge

August 9, 2023 13:27

tnozicka approved these changes

View reviewed changes

Member

tnozicka left a comment

/approve
/lgtm

thanks

zimnx merged commit 96dd75e into scylladb:master

21 checks passed

scylla-operator-bot bot assigned tnozicka

scylla-operator-bot bot added the lgtm label

Contributor

scylla-operator-bot bot commented Aug 9, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zimnx deleted the mz/cleanup-proposal branch

August 9, 2023 13:34

tnozicka mentioned this pull request

Create proposal for running node cleanup #1316

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment