Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeouts are not properly displayed (function is not applied correctly) #2085

Closed
tarzanek opened this issue Oct 16, 2023 · 5 comments · Fixed by #2086
Closed

timeouts are not properly displayed (function is not applied correctly) #2085

tarzanek opened this issue Oct 16, 2023 · 5 comments · Fixed by #2086
Labels
bug Something isn't working right

Comments

@tarzanek
Copy link

tarzanek commented Oct 16, 2023

Installation details
Panel Name: Detailed
Dashboard Name: Read Timeout
Scylla-Monitoring Version: (can be found at the bottom of the overview dashboard) : 4.4.5
Scylla-Version: 2022.2.13

$func is applied to whole sum instead of doing (below is fix, apply func to each and THEN sum them):

$func(rate(scylla_storage_proxy_coordinator_read_timeouts{instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m])) by ([[by]]) + $func(rate(scylla_storage_proxy_coordinator_cas_read_timeouts{instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m])) by ([[by]]) + $func((rate(scylla_storage_proxy_coordinator_range_timeouts{instance=~"[[node]]",cluster=~"$cluster|$^", dc=~"$dc", shard=~"[[shard]]"}[1m]))) by ([[by]])

@amnonh can you please fix? if func is applied on top of whole sum and not as above we can sometimes miss timeouts

cc @vladzcloudius

@tarzanek tarzanek added the bug Something isn't working right label Oct 16, 2023
@vladzcloudius
Copy link
Contributor

vladzcloudius commented Oct 16, 2023

The reason for issue is that if any of the metrics in the expression has "no data" this makes the whole expression into zero somehow.

@amnonh this is the super urgent bug and we need it to be fixed and backported to any supported release ASAP.

@amnonh
Copy link
Collaborator

amnonh commented Oct 16, 2023

Fixing it now, wouldn't it be better to split into multiple lines limiting to non zero?

@vladzcloudius
Copy link
Contributor

vladzcloudius commented Oct 16, 2023

Fixing it now, wouldn't it be better to split into multiple lines limiting to non zero?

Actually, yes. IMO it's much better to see each of these timeouts graphs independently. I'm hesitant about going all the way an moving them into separate graphs and think that what you suggest makes sense at least as a first step.

@vladzcloudius
Copy link
Contributor

@amnonh which Monitoring version is going to have this fix?

@amnonh
Copy link
Collaborator

amnonh commented Oct 19, 2023

It's already part of 4.5.0 (I re-release the RC) I'll either release 4.5.0 early next week, or I'll patch and release 4.4.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
3 participants