New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data_read_resolver::resolve() computes for hundreds of milliseconds #2361
Comments
Some other traces:
|
|
|
I don't see anything obviously wrong. We have O(N log N) apply and a bunch of linear operations, like freeze and unfreeze. Overall, it is not very surprising that resolve is an expensive operation, though hundreds of ms is probably a bit excessive. |
N was something ridiculous; there was a jmx call to get compaction history, and we implement that by internally performing a non-paging select on the compactionhistory table. Since that node was busy resharding for a while, there was lots of history. |
On Mon, May 08, 2017 at 08:46:46AM -0700, Avi Kivity wrote:
N was something ridiculous; there was a jmx call to get compaction history, and we implement that by internally performing a non-paging select on the compactionhistory table. Since that node was busy resharding for a while, there was lots of history.
Note that compaction history is local table, so reconciliation step can
be completely dropped.
…--
Gleb.
|
Why was it even attempted, for RF=1? |
On Mon, May 08, 2017 at 10:24:04AM -0700, Avi Kivity wrote:
Why was it even attempted, for RF=1?
Range queries always goes through reconciliation code path since there
is no hash matching stage. Single key read fall back there only if there is
hash mismatch. Reconciliation code should be optimized for RF=1 case by
skipping most of the logic and jumping directly to building a result.
…--
Gleb.
|
Another one:
|
Recent stalls in "[scylla-perf-results] Performance Regression Compare Results - PerformanceRegressionCDCTest.test_mixed_throughput - 5.1.dev - 2022-05-01 07:16:36.190302" For example:
Decoded:
|
Allow yielding in data_read_resolver::resolve to prevent reactor stalls. TODO: unfreeze_gently, to prevent stalls due to large partitions. Refs scylladb#2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And use in data_read_resolver::resolve Fixes scylladb#2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow yielding in data_read_resolver::resolve to prevent reactor stalls. TODO: unfreeze_gently, to prevent stalls due to large partitions. Refs scylladb#2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
… from Benny Halevy This series futurizes two synchronous functions used for data reconciliation: `data_read_resolver::resolve` and `to_data_query_result` and does so by introducing lower-level asynchronous infrastructure: `mutation_partition_view::accept_gently`, `frozen_mutation::unfreeze_gently` and `frozen_mutation::consume_gently`, and `mutation::consume_gently`. This trades some cycles on this cold path to prevent known reactor stalls. Fixes #2361 Fixes #10038 Closes #10482 * github.com:scylladb/scylla: mutation: add consume_gently frozen_mutation: add consume_gently query: coroutinize to_data_query_result frozen_mutation: add unfreeze_gently mutation_partition_view: add accept_gently methods storage_proxy: futurize data_read_resolver::resolve
… from Benny Halevy This series futurizes two synchronous functions used for data reconciliation: `data_read_resolver::resolve` and `to_data_query_result` and does so by introducing lower-level asynchronous infrastructure: `mutation_partition_view::accept_gently`, `frozen_mutation::unfreeze_gently` and `frozen_mutation::consume_gently`, and `mutation::consume_gently`. This trades some cycles on this cold path to prevent known reactor stalls. Fixes #2361 Fixes #10038 Closes #10482 * github.com:scylladb/scylla: mutation: add consume_gently frozen_mutation: add consume_gently query: coroutinize to_data_query_result frozen_mutation: add unfreeze_gently mutation_partition_view: add accept_gently methods storage_proxy: futurize data_read_resolver::resolve
Performance only, and not a regression, so not backporting. |
Seen around this line:
We need both to add defer points, and to understand why this is so slow.
The text was updated successfully, but these errors were encountered: