Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction throughput is incorrectly based on the amount of data written #14533

Closed
raphaelsc opened this issue Jul 5, 2023 · 1 comment
Closed
Assignees
Milestone

Comments

@raphaelsc
Copy link
Member

We calculate throughput using output size, but that's terribly wrong, because if expire 99% of data, throughput will be reported on the 1% left, which can mislead the user into thinking compaciton is terribly slow, when in reality, it is not.

@raphaelsc raphaelsc assigned raphaelsc and rayakurl and unassigned rayakurl Jul 5, 2023
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 5, 2023
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes scylladb#14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 10, 2023
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes scylladb#14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone Aug 29, 2023
avikivity pushed a commit that referenced this issue Sep 14, 2023
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes #14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14615

(cherry picked from commit 3b1829f)
avikivity pushed a commit that referenced this issue Sep 14, 2023
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes #14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14615

(cherry picked from commit 3b1829f)
@avikivity
Copy link
Member

Backported to 5.1, 5.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants