compaction: base compaction throughput on amount of data read

Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes scylladb#14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc · Jul 5, 2023 · 15576ef · 15576ef
1 parent 5d34db2
commit 15576ef
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/compaction/compaction.cc b/compaction/compaction.cc
@@ -788,7 +788,7 @@ class compaction {
         log_info("{} {} sstables to {}. {} to {} (~{}% of original) in {}ms = {}. ~{} total partitions merged to {}.",
                 report_finish_desc(),
                 _input_sstable_generations.size(), new_sstables_msg, utils::pretty_printed_data_size(_start_size), utils::pretty_printed_data_size(_end_size), int(ratio * 100),
-                std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_end_size, duration),
+                std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(), utils::pretty_printed_throughput(_start_size, duration),
                 _cdata.total_partitions, _cdata.total_keys_written);
 
         return ret;