needs_cleanup(), used by SSTable cleanup, is SLOW #6730

raphaelsc · 2020-06-29T16:58:18Z

needs_cleanup() is a procedure used for cleanup to determine whether a SSTable needs cleanup.

The problem is that it iterates through all owned ranges for each SSTable.

A node can have thousands of SSTables and number of ranges for a given node can be 756 with a RF of 3 ((RF / NODES) * (NODES * NUM TOKENS).

The complexity is O(NUM_SSTABLES * NUM_RANGES), but we can make it O(NUM_SSTABLES * LOG(NUM_RANGES)), given that ranges are sorted and non-overlapping.

avikivity · 2020-08-09T14:52:42Z

Not a regression, performance impact only. Not backporting.

avikivity · 2020-08-27T08:31:07Z

This is also related to #6662, which is a 16-second stall! So will backport.

avikivity · 2020-08-27T08:36:10Z

Backported to 4.0, 4.1, 4.2.

needs_cleanup() returns true if a sstable needs cleanup. Turns out it's very slow because it iterates through all the local ranges for all sstables in the set, making its complexity: O(num_sstables * local_ranges) We can optimize it by taking into account that abstract_replication_strategy documents that get_ranges() will return a list of ranges that is sorted and non-overlapping. Compaction for cleanup already takes advantage of that when checking if a given partition can be actually purged. So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)). With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means the max # of checks performed will go from 768000 to ~9584. Fixes #6730. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com> (cherry picked from commit cf352e7)

needs_cleanup() returns true if a sstable needs cleanup. Turns out it's very slow because it iterates through all the local ranges for all sstables in the set, making its complexity: O(num_sstables * local_ranges) We can optimize it by taking into account that abstract_replication_strategy documents that get_ranges() will return a list of ranges that is sorted and non-overlapping. Compaction for cleanup already takes advantage of that when checking if a given partition can be actually purged. So needs_cleanup() can be optimized into O(num_sstables * log(local_ranges)). With num_sstables=1000, RF=3, then local_ranges=256(num_tokens)*3, it means the max # of checks performed will go from 768000 to ~9584. Fixes scylladb#6730. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200629171355.45118-2-raphaelsc@scylladb.com> (cherry picked from commit cf352e7)

scylladb-promoter added the Backport candidate label Jul 1, 2020

scylladb-promoter closed this as completed in cf352e7 Jul 1, 2020

avikivity removed the Backport candidate label Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

needs_cleanup(), used by SSTable cleanup, is SLOW #6730

needs_cleanup(), used by SSTable cleanup, is SLOW #6730

raphaelsc commented Jun 29, 2020

avikivity commented Aug 9, 2020

avikivity commented Aug 27, 2020

avikivity commented Aug 27, 2020

needs_cleanup(), used by SSTable cleanup, is SLOW #6730

needs_cleanup(), used by SSTable cleanup, is SLOW #6730

Comments

raphaelsc commented Jun 29, 2020

avikivity commented Aug 9, 2020

avikivity commented Aug 27, 2020

avikivity commented Aug 27, 2020