Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair: add control for repair percentage for partition count estimation #18615

Closed
denesb opened this issue May 10, 2024 · 2 comments
Closed
Assignees
Labels
Milestone

Comments

@denesb
Copy link
Contributor

denesb commented May 10, 2024

642f9a1 slashes partition count estimates during repair by 10%, based on the assumption that in average, around 10% of the data is moved by repair. This number is pulled from thin air, and while I expect in most cases it will be fine, in certain cases it will be a gross under-estimation and can lead to bloom filters violating FP chance and generating more IO. We need a config item to control this estimate so field can intervene in the case of disaster. The config item has to be live-update.

@denesb denesb added this to the 6.0 milestone May 10, 2024
@denesb denesb added backport/5.2 backport/6.0 should be backported to 6.0 labels May 10, 2024
asias added a commit to asias/scylla that referenced this issue May 13, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes scylladb#18615
asias added a commit to asias/scylla that referenced this issue May 13, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes scylladb#18615
@michoecho
Copy link
Contributor

michoecho commented May 13, 2024

642f9a1 slashes partition count estimates during repair by 10%, based on the assumption that in average, around 10% of the data is moved by repair. This number is pulled from thin air, and while I expect in most cases it will be fine, in certain cases it will be a gross under-estimation and can lead to bloom filters violating FP chance and generating more IO.

What's the interaction between estimation and RBNO, which moves all of the data?

After e.g. bootstrap, are the repair-written sstables used as they are (and in this case — won't 642f9a1 make their filters very ineffective?), or are they guaranteed to go through some kind of compaction (reshape?) first, which will fix their filters?

@asias
Copy link
Contributor

asias commented May 13, 2024

642f9a1 slashes partition count estimates during repair by 10%, based on the assumption that in average, around 10% of the data is moved by repair. This number is pulled from thin air, and while I expect in most cases it will be fine, in certain cases it will be a gross under-estimation and can lead to bloom filters violating FP chance and generating more IO.

What's the interaction between estimation and RBNO, which moves all of the data?

Most of the RBNO operations will move all of the data for a given range. Node ops like rebuild might move only few data in case the node already has some of the data.

After e.g. bootstrap, are the repair-written sstables used as they are (and in this case — won't 642f9a1 make their filters very ineffective?), or are they guaranteed to go through some kind of compaction (reshape?) first, which will fix their filters?

The sstables generated by RBNO will go through off strategy compaction which will integrate them to main dataset. This will fix the filters.

Also the new node is supposed to have a lower "heat" so that the heat_load_balance will route less traffic to it.

mergify bot pushed a commit that referenced this issue May 14, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

(cherry picked from commit 340eae0)

# Conflicts:
#	db/config.cc
#	db/config.hh
mergify bot pushed a commit that referenced this issue May 14, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

(cherry picked from commit 340eae0)
asias added a commit to asias/scylla that referenced this issue May 21, 2024
Since commit 952dfc6 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs scylladb#18615
denesb pushed a commit that referenced this issue May 22, 2024
Since commit 952dfc6 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs #18615

Closes #18780
denesb pushed a commit that referenced this issue May 22, 2024
Since commit 952dfc6 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs #18615

Closes #18780
asias added a commit to asias/scylla that referenced this issue May 27, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes scylladb#18615

(cherry picked from commit 340eae0)
asias added a commit to asias/scylla that referenced this issue May 27, 2024
Since commit 952dfc6 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs scylladb#18615

Closes scylladb#18780

(cherry picked from commit 1a03e3d)
asias added a commit to asias/scylla that referenced this issue May 27, 2024
Since commit 952dfc6 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs scylladb#18615

Closes scylladb#18780

(cherry picked from commit 1a03e3d)
asias added a commit to asias/scylla that referenced this issue May 27, 2024
In commit 642f9a1 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes scylladb#18615

(cherry picked from commit 340eae0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants