repair: Add ranges_parallelism option #4847

asias · 2019-08-14T09:31:13Z

In commit 131acc0 (repair: Adjust
parallelism according to memory size), we choose the ranges to repair in
parallel according to the memory size automatically.

However, this automatic number might not the optimal one. For example,
user wants repair to have minimal impact on the user cql reads and
writes, or user wants to repair at full speed allowing more cpu and
memory resources.

We can add an option that controls number of ranges that we repair
in parallel, so that the advanced user can control the parallelism.

The text was updated successfully, but these errors were encountered:

In commit 131acc0 (repair: Adjust parallelism according to memory size), we choose the ranges to repair in parallel according to the memory size automatically. However, this automatic number might not the optimal one. For example, user wants repair to have minimal impact on the user cql reads and writes, or user wants to repair at full speed allowing more cpu and memory resources. This patch introduces a ranges_parallelism option that controls number of ranges that we repair in parallel, so that the advanced user can control the parallelism. Here is an example: In a cluster with high network latency, e.g., multiple DCs with high latency link, user can increase ranges_parallelism to compensate. For example in two DCs cluster with 7 nodes + 60ms round trip time + RF (dc1:3, dc2:4) + a keyspace with 3 empty tables: 2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ... 2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ... 2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ... 2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ... 2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ... 2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ... That is 1949s vs 257s vs 75s to complete repair respectively, which gives 7X and 25X difference. Fixes scylladb#4847

slivne · 2019-08-14T10:52:05Z

@asias I know we have high latency links in the scope of row level repair being merged - AFAIK we did not select to provide a user control over this for row level repair and settled on a manner to make sure the links are utilized if possible.

How does this issue change align with those decisions ?

asias · 2019-08-14T10:58:25Z

In addition to choosing parallelism automatically, we agreed that a user control of the parallelism is useful and needed.

asias · 2019-10-08T03:21:29Z

Ping. The PR was sent almost 2 months ago.

In commit 131acc0 (repair: Adjust parallelism according to memory size), we choose the ranges to repair in parallel according to the memory size automatically. However, this automatic number might not the optimal one. For example, user wants repair to have minimal impact on the user cql reads and writes, or user wants to repair at full speed allowing more cpu and memory resources. This patch introduces a ranges_parallelism option that controls number of ranges that we repair in parallel, so that the advanced user can control the parallelism. Here is an example: In a cluster with high network latency, e.g., multiple DCs with high latency link, user can increase ranges_parallelism to compensate. For example in two DCs cluster with 7 nodes + 60ms round trip time + RF (dc1:3, dc2:4) + a keyspace with 3 empty tables: 2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ... 2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ... 2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ... 2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ... 2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ... 2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ... That is 1949s vs 257s vs 75s to complete repair respectively, which gives 7X and 25X difference. Fixes scylladb#4847

This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR scylladb#4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes scylladb#4847

This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes #4847 Closes #14886 * github.com:scylladb/scylladb: repair: Add ranges_parallelism option repair: Change to use coroutine in do_repair_ranges

asias mentioned this issue Aug 14, 2019

repair: Add ranges_parallelism option #4848

Closed

slivne added this to the 3.2 milestone Aug 19, 2019

slivne added the area/repair label Aug 19, 2019

slivne assigned avikivity Aug 19, 2019

slivne added the feature/enhancement label Aug 19, 2019

slivne assigned asias and unassigned avikivity Aug 19, 2019

slivne modified the milestones: 3.2, 3.4 Feb 9, 2020

slivne modified the milestones: 4.0, 4.1 Mar 24, 2020

slivne modified the milestones: 4.1, 4.2 Jun 1, 2020

asias mentioned this issue Sep 18, 2020

Allow throttling repair-based node ops #7255

Closed

slivne modified the milestones: 4.2, 4.3 Nov 26, 2020

slivne modified the milestones: 4.3, 4.5 Jan 19, 2021

slivne modified the milestones: 4.5, 4.6 Mar 29, 2021

slivne modified the milestones: 4.6, 4.7 Nov 10, 2021

slivne modified the milestones: 5.0, 5.1 Apr 13, 2022

DoronArazii removed this from the 5.1 milestone Oct 12, 2022

DoronArazii added this to the 5.2 milestone Oct 12, 2022

DoronArazii modified the milestones: 5.2, 5.x Nov 22, 2022

asias mentioned this issue Jul 31, 2023

repair: Add ranges_parallelism option #14886

Merged

scylladb-promoter closed this as completed in 9b3fd94 Aug 4, 2023

DoronArazii modified the milestones: Backlog, 5.4 Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repair: Add ranges_parallelism option #4847

repair: Add ranges_parallelism option #4847

asias commented Aug 14, 2019

slivne commented Aug 14, 2019

asias commented Aug 14, 2019

asias commented Oct 8, 2019

repair: Add ranges_parallelism option #4847

repair: Add ranges_parallelism option #4847

Comments

asias commented Aug 14, 2019

slivne commented Aug 14, 2019

asias commented Aug 14, 2019

asias commented Oct 8, 2019