Skip to content

Commit

Permalink
compaction: base compaction throughput on amount of data read
Browse files Browse the repository at this point in the history
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes #14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14615
  • Loading branch information
raphaelsc authored and avikivity committed Jul 11, 2023
1 parent 25f4a7c commit 3b1829f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion compaction/compaction.cc
Expand Up @@ -788,7 +788,7 @@ class compaction {
log_info("{} {} sstables to {}. {} to {} (~{}% of original) in {}ms = {}. ~{} total partitions merged to {}.",
report_finish_desc(),
_input_sstable_generations.size(), new_sstables_msg, utils::pretty_printed_data_size(_start_size), utils::pretty_printed_data_size(_end_size), int(ratio * 100),
std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_end_size, duration),
std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_start_size, duration),
_cdata.total_partitions, _cdata.total_keys_written);

return ret;
Expand Down

0 comments on commit 3b1829f

Please sign in to comment.