New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repair: Add ranges_parallelism option #4847
Comments
In commit 131acc0 (repair: Adjust parallelism according to memory size), we choose the ranges to repair in parallel according to the memory size automatically. However, this automatic number might not the optimal one. For example, user wants repair to have minimal impact on the user cql reads and writes, or user wants to repair at full speed allowing more cpu and memory resources. This patch introduces a ranges_parallelism option that controls number of ranges that we repair in parallel, so that the advanced user can control the parallelism. Here is an example: In a cluster with high network latency, e.g., multiple DCs with high latency link, user can increase ranges_parallelism to compensate. For example in two DCs cluster with 7 nodes + 60ms round trip time + RF (dc1:3, dc2:4) + a keyspace with 3 empty tables: 2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ... 2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ... 2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ... 2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ... 2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ... 2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ... That is 1949s vs 257s vs 75s to complete repair respectively, which gives 7X and 25X difference. Fixes scylladb#4847
@asias I know we have high latency links in the scope of row level repair being merged - AFAIK we did not select to provide a user control over this for row level repair and settled on a manner to make sure the links are utilized if possible. How does this issue change align with those decisions ? |
In addition to choosing parallelism automatically, we agreed that a user control of the parallelism is useful and needed. |
Ping. The PR was sent almost 2 months ago. |
In commit 131acc0 (repair: Adjust parallelism according to memory size), we choose the ranges to repair in parallel according to the memory size automatically. However, this automatic number might not the optimal one. For example, user wants repair to have minimal impact on the user cql reads and writes, or user wants to repair at full speed allowing more cpu and memory resources. This patch introduces a ranges_parallelism option that controls number of ranges that we repair in parallel, so that the advanced user can control the parallelism. Here is an example: In a cluster with high network latency, e.g., multiple DCs with high latency link, user can increase ranges_parallelism to compensate. For example in two DCs cluster with 7 nodes + 60ms round trip time + RF (dc1:3, dc2:4) + a keyspace with 3 empty tables: 2019-04-10 11:00:03.303549 Repair with ks ranges_parallelism=1 on node 1 started ... 2019-04-10 11:32:32.673043 Repair with ks on node 1 finished ... 2019-04-10 11:32:32.673292 Repair with ks ranges_parallelism=16 on node 1 started ... 2019-04-10 11:36:49.475977 Repair with ks on node 1 finished ... 2019-04-10 11:36:49.476145 Repair with ks ranges_parallelism=256 on node 1 started ... 2019-04-10 11:38:04.242553 Repair with ks on node 1 finished ... That is 1949s vs 257s vs 75s to complete repair respectively, which gives 7X and 25X difference. Fixes scylladb#4847
This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR scylladb#4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes scylladb#4847
This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR scylladb#4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes scylladb#4847
This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR scylladb#4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes scylladb#4847
This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes #4847 Closes #14886 * github.com:scylladb/scylladb: repair: Add ranges_parallelism option repair: Change to use coroutine in do_repair_ranges
In commit 131acc0 (repair: Adjust
parallelism according to memory size), we choose the ranges to repair in
parallel according to the memory size automatically.
However, this automatic number might not the optimal one. For example,
user wants repair to have minimal impact on the user cql reads and
writes, or user wants to repair at full speed allowing more cpu and
memory resources.
We can add an option that controls number of ranges that we repair
in parallel, so that the advanced user can control the parallelism.
The text was updated successfully, but these errors were encountered: