compaction: Fix key estimation per sstable to produce efficient filters #15727

raphaelsc · 2023-10-16T19:57:27Z

The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation.

With certain data models, like single-row partitions containing small values, the index could be even larger than data.
For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys.

Fix it by only accounting size of data file.

Fixes #15726.

The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation. With certain data models, like single-row partitions containing small values, the index could be even larger than data. For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys. Fix it by only accounting size of data file. Fixes scylladb#15726. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

scylladb-promoter · 2023-10-16T22:27:59Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests
✅ - Sanity Tests

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4218/
Duration: 2 hr 25 min
Builder: spider4.cloudius-systems.com

raphaelsc requested a review from nyh as a code owner October 16, 2023 19:57

raphaelsc requested a review from denesb October 16, 2023 19:57

denesb approved these changes Oct 17, 2023

View reviewed changes

denesb requested a review from bhalevy October 17, 2023 08:02

bhalevy approved these changes Oct 17, 2023

View reviewed changes

scylladb-promoter closed this in da04fea Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compaction: Fix key estimation per sstable to produce efficient filters #15727

compaction: Fix key estimation per sstable to produce efficient filters #15727

raphaelsc commented Oct 16, 2023

scylladb-promoter commented Oct 16, 2023

compaction: Fix key estimation per sstable to produce efficient filters #15727

compaction: Fix key estimation per sstable to produce efficient filters #15727

Conversation

raphaelsc commented Oct 16, 2023

scylladb-promoter commented Oct 16, 2023

🟢 CI State: SUCCESS

Build Details: