Skip to content

Commit

Permalink
compaction: base compaction throughput on amount of data read
Browse files Browse the repository at this point in the history
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes scylladb#14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
  • Loading branch information
raphaelsc committed Jul 5, 2023
1 parent 5d34db2 commit 15576ef
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion compaction/compaction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ class compaction {
log_info("{} {} sstables to {}. {} to {} (~{}% of original) in {}ms = {}. ~{} total partitions merged to {}.",
report_finish_desc(),
_input_sstable_generations.size(), new_sstables_msg, utils::pretty_printed_data_size(_start_size), utils::pretty_printed_data_size(_end_size), int(ratio * 100),
std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_end_size, duration),
std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_start_size, duration),
_cdata.total_partitions, _cdata.total_keys_written);

return ret;
Expand Down

0 comments on commit 15576ef

Please sign in to comment.