Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair: Add ranges_parallelism option #4847

Closed
asias opened this issue Aug 14, 2019 · 3 comments · Fixed by #14886
Closed

repair: Add ranges_parallelism option #4847

asias opened this issue Aug 14, 2019 · 3 comments · Fixed by #14886

Comments

@asias
Copy link
Contributor

asias commented Aug 14, 2019

In commit 131acc0 (repair: Adjust
parallelism according to memory size), we choose the ranges to repair in
parallel according to the memory size automatically.

However, this automatic number might not the optimal one. For example,
user wants repair to have minimal impact on the user cql reads and
writes, or user wants to repair at full speed allowing more cpu and
memory resources.

We can add an option that controls number of ranges that we repair
in parallel, so that the advanced user can control the parallelism.

asias added a commit to asias/scylla that referenced this issue Aug 14, 2019
In commit 131acc0 (repair: Adjust
parallelism according to memory size), we choose the ranges to repair in
parallel according to the memory size automatically.

However, this automatic number might not the optimal one. For example,
user wants repair to have minimal impact on the user cql reads and
writes, or user wants to repair at full speed allowing more cpu and
memory resources.

This patch introduces a ranges_parallelism option that controls number
of ranges that we repair in parallel, so that the advanced
user can control the parallelism.

Here is an example:

In a cluster with high network latency, e.g., multiple DCs with high
latency link, user can increase ranges_parallelism to compensate.

For example in two DCs cluster with 7 nodes + 60ms round trip time + RF
(dc1:3, dc2:4) + a keyspace with 3 empty tables:

2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ...
2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ...

2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ...
2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ...

2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ...
2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ...

That is 1949s vs 257s vs 75s to complete repair respectively, which
gives 7X and 25X difference.

Fixes scylladb#4847
@slivne
Copy link
Contributor

slivne commented Aug 14, 2019

@asias I know we have high latency links in the scope of row level repair being merged - AFAIK we did not select to provide a user control over this for row level repair and settled on a manner to make sure the links are utilized if possible.

How does this issue change align with those decisions ?

@asias
Copy link
Contributor Author

asias commented Aug 14, 2019

In addition to choosing parallelism automatically, we agreed that a user control of the parallelism is useful and needed.

@slivne slivne added this to the 3.2 milestone Aug 19, 2019
@slivne slivne assigned asias and unassigned avikivity Aug 19, 2019
@asias
Copy link
Contributor Author

asias commented Oct 8, 2019

Ping. The PR was sent almost 2 months ago.

asias added a commit to asias/scylla that referenced this issue Oct 9, 2019
In commit 131acc0 (repair: Adjust
parallelism according to memory size), we choose the ranges to repair in
parallel according to the memory size automatically.

However, this automatic number might not the optimal one. For example,
user wants repair to have minimal impact on the user cql reads and
writes, or user wants to repair at full speed allowing more cpu and
memory resources.

This patch introduces a ranges_parallelism option that controls number
of ranges that we repair in parallel, so that the advanced
user can control the parallelism.

Here is an example:

In a cluster with high network latency, e.g., multiple DCs with high
latency link, user can increase ranges_parallelism to compensate.

For example in two DCs cluster with 7 nodes + 60ms round trip time + RF
(dc1:3, dc2:4) + a keyspace with 3 empty tables:

2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ...
2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ...

2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ...
2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ...

2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ...
2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ...

That is 1949s vs 257s vs 75s to complete repair respectively, which
gives 7X and 25X difference.

Fixes scylladb#4847
@slivne slivne modified the milestones: 3.2, 3.4 Feb 9, 2020
@slivne slivne modified the milestones: 4.0, 4.1 Mar 24, 2020
@slivne slivne modified the milestones: 4.1, 4.2 Jun 1, 2020
@slivne slivne modified the milestones: 4.2, 4.3 Nov 26, 2020
@slivne slivne modified the milestones: 4.3, 4.5 Jan 19, 2021
@slivne slivne modified the milestones: 4.5, 4.6 Mar 29, 2021
@slivne slivne modified the milestones: 4.6, 4.7 Nov 10, 2021
@slivne slivne modified the milestones: 5.0, 5.1 Apr 13, 2022
@DoronArazii DoronArazii removed this from the 5.1 milestone Oct 12, 2022
@DoronArazii DoronArazii added this to the 5.2 milestone Oct 12, 2022
@DoronArazii DoronArazii modified the milestones: 5.2, 5.x Nov 22, 2022
asias added a commit to asias/scylla that referenced this issue Jul 31, 2023
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.

To make it safer, unlike the PR scylladb#4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.

Fixes scylladb#4847
asias added a commit to asias/scylla that referenced this issue Jul 31, 2023
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.

To make it safer, unlike the PR scylladb#4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.

Fixes scylladb#4847
asias added a commit to asias/scylla that referenced this issue Jul 31, 2023
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.

To make it safer, unlike the PR scylladb#4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.

Fixes scylladb#4847
denesb added a commit that referenced this issue Aug 3, 2023
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job.

To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel.

Fixes #4847

Closes #14886

* github.com:scylladb/scylladb:
  repair: Add ranges_parallelism option
  repair: Change to use coroutine in do_repair_ranges
@DoronArazii DoronArazii modified the milestones: Backlog, 5.4 Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants