scrub compaction: segregate mode: unbounded number of buckets can cause OOM #9400

denesb · 2021-09-28T07:41:36Z

The number of buckets the partition_based_splitting_writer can create is unbounded and can cause large memory pressure. Recently we observed a case where a particular sstable caused 1.7K buckets to be created causing OOM.

The text was updated successfully, but these errors were encountered:

…x buckets Recently we observed an OOM caused by the partition based splitting writer going crazy, creating 1.7K buckets while scrubbing an especially broken sstable. To avoid situations like that in the future, this patch provides a max limit for the number of live buckets. When the number of buckets reach this number, the largest bucket is closed and replaced by a bucket. This will end up creating more output sstables during scrub overall, but now they won't all be written at the same time causing insane memory pressure and possibly OOM. Scrub compaction sets this limit to 100, the same limit the TWCS's timestamp based splitting writer uses (implemented through the classifier - time_window_compaction_strategy::max_data_segregation_window_count). Fixes: scylladb#9400 Tests: unit(dev)

…x buckets Recently we observed an OOM caused by the partition based splitting writer going crazy, creating 1.7K buckets while scrubbing an especially broken sstable. To avoid situations like that in the future, this patch provides a max limit for the number of live buckets. When the number of buckets reach this number, the largest bucket is closed and replaced by a bucket. This will end up creating more output sstables during scrub overall, but now they won't all be written at the same time causing insane memory pressure and possibly OOM. Scrub compaction sets this limit to 100, the same limit the TWCS's timestamp based splitting writer uses (implemented through the classifier - time_window_compaction_strategy::max_data_segregation_window_count). Fixes: scylladb#9400 Tests: unit(dev) Closes scylladb#9401 (cherry picked from commit 970fe9a)

avikivity · 2022-07-07T17:10:16Z

Fix present on all active branches, not backporting.

denesb mentioned this issue Sep 28, 2021

mutation_writer: partition_based_splitting_writer: limit number of max buckets #9401

Closed

scylladb-promoter closed this as completed in 970fe9a Sep 29, 2021

scylladb-promoter added the Backport candidate label Sep 29, 2021

avikivity removed the Backport candidate label Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrub compaction: segregate mode: unbounded number of buckets can cause OOM #9400

scrub compaction: segregate mode: unbounded number of buckets can cause OOM #9400

denesb commented Sep 28, 2021

avikivity commented Jul 7, 2022

scrub compaction: segregate mode: unbounded number of buckets can cause OOM #9400

scrub compaction: segregate mode: unbounded number of buckets can cause OOM #9400

Comments

denesb commented Sep 28, 2021

avikivity commented Jul 7, 2022