-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "job" labels in Grafana dashboards' queries #1199
Fix "job" labels in Grafana dashboards' queries #1199
Conversation
@@ -4602,15 +4602,15 @@ data: | |||
] | |||
}, | |||
"datasource": "prometheus", | |||
"definition": "query_result(count(up{job=\"scylla\"}) by (dc))", | |||
"definition": "query_result(count(up{job=~\"$cluster|$^\"}) by (dc))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pls explain the syntax beyond '${cluster}'
to me? in particular why the "or" (|
) and what $^
means here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shamelessly copied it over from the other queries - and I suppose you did the same, as it originates from scylla-monitoring dashboards. I actually tried to find something on it in the docs with no success. This essentially substitutes the variables and then matches a regex, so for a cluster named simple-cluster
you end up matching e.g. simple-cluster|$^
, but I don't see how that's any different than matching just simple-cluster
, with or without the regex. @amnonh would you be able to explain? Is that some grafana/prom query magic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnozicka an example from scylladb/scylla-monitoring
as requested: https://github.com/scylladb/scylla-monitoring/blob/606dacc88e036e884bbbe1d7171e739ad296ae55/grafana/scylla-overview.template.json#L79. Most of the expressions there use the same regex. Interestingly enough not all of them do, some use simple '$cluster'
: https://github.com/scylladb/scylla-monitoring/blob/606dacc88e036e884bbbe1d7171e739ad296ae55/grafana/scylla-overview.template.json#L360.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine deferring that one and wait for @amnonh's input. Please file an issue to track this and we can merge this at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1209
e83c276
to
f7306ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
(better then what we have now an we'll track the regex reasoning in a follow up issue)
Description of your changes:
Currently there is a bug breaking our Grafana dashboards introduced in #1111. The "job" label in some of the PromQL queries is hardcoded to "scylla", while the ServiceMonitor manifests set it to the value of "scylla/cluster" labels of the identity Service.
scylla-operator/assets/monitoring/prometheus/v1/scylladb.servicemonitor.yaml
Line 8 in ee48da7
For reference see https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitor.
This PR fixes it by setting it to a
cluster
label defined in the ServiceMonitor:scylla-operator/assets/monitoring/prometheus/v1/scylladb.servicemonitor.yaml
Line 25 in ee48da7