New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read timeout metrics are misleading #876
Comments
@glommer for future issues, can you please add the dashboard name and Scylla-version (when applicable). |
Timeouts appear in both the overview and detailed dashboards. |
not all of them has LWT support, but I'll verify what is applicable |
This has nothing to do with LWT. All I am saying is that LWT introduces new read and write types as well, so if we'll fix it, we should already fix in a way that doesn't stumble upon the very next issue. Please consider patching scylla to use labels to make this easier in the future, but for now we need range and normal reads separated. |
I've already opened a Scylla issue about it but it's too late for OS 4.0, and I'm fixing the rest for the next Monitoring release |
This is the metric we use for the "Read timeouts" graph:
$func(delta(scylla_storage_proxy_coordinator_read_timeouts{instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m])) by ([[by]])
The reason it is misleading is that while it says "Read", there are many types of reads. Each with its own timeout metrics. For instance, range queries are accumulated in the metric
scylla_storage_proxy_coordinator_range_timeouts
.With the introduction of LWT there are now also cas reads and cas writes.
Currently users are blind to that. I propose that we accumulate all reads into one in the Overview dashboard, and show per-type metric in the detailed dashboard.
Also @amnonh for newer versions please consider patching Scylla as well to use labels for each of those operation types instead of explicit names. With labels the dashboards would have worked out of the box and for free.
The text was updated successfully, but these errors were encountered: