Compaction throughput is incorrectly based on the amount of data written #14533

raphaelsc · 2023-07-05T14:45:20Z

We calculate throughput using output size, but that's terribly wrong, because if expire 99% of data, throughput will be reported on the 1% left, which can mislead the user into thinking compaciton is terribly slow, when in reality, it is not.

Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes scylladb#14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes #14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14615 (cherry picked from commit 3b1829f)

avikivity · 2023-09-14T18:31:43Z

Backported to 5.1, 5.2.

raphaelsc assigned raphaelsc and rayakurl and unassigned rayakurl Jul 5, 2023

raphaelsc added the type/bug label Jul 5, 2023

mykaul added area/compaction P1 Urgent labels Jul 5, 2023

raphaelsc mentioned this issue Jul 10, 2023

compaction: base compaction throughput on amount of data read #14615

Closed

scylladb-promoter closed this as completed in 3b1829f Jul 11, 2023

scylladb-promoter added the Backport candidate label Jul 11, 2023

DoronArazii added this to the 5.4 milestone Aug 29, 2023

avikivity removed the Backport candidate label Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compaction throughput is incorrectly based on the amount of data written #14533

Compaction throughput is incorrectly based on the amount of data written #14533

raphaelsc commented Jul 5, 2023

avikivity commented Sep 14, 2023

Compaction throughput is incorrectly based on the amount of data written #14533

Compaction throughput is incorrectly based on the amount of data written #14533

Comments

raphaelsc commented Jul 5, 2023

avikivity commented Sep 14, 2023