New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rancher-Monitoring: Query on Cluster Grafana Dashboard reports incorrect values #24343
Comments
same issue and the memory metic is exposed also twice. |
@toddexp I think the added variable shoud be |
In my environment, for the memory and cpu queries that are used in the cluster page in Grafana, the metrics in Prometheus do not have image labels in the metrics used. When I applied the Taking a closer look it does appear that the memory graphs are also effected and the queries are incorrect. However it seems to only be a minor increase in memory metrics as opposed to the cpu metrics which are doubling. I am not sure what Rancher has added for metrics gathering but there seems to be these entries that have these metrics in them: |
@loganhz has any plan to resolve this bug? |
I was hoping that this bug was corrected with this issue fix. I just installed rancher 2.4.2 with monitoring v0.1.0 and the Grafana graphs are still incorrectly double what they should be. |
continue to follow this issue |
Same issue, rancher v2.3.6
while the query are:
So the result are all doubled, that's really confusing |
I add |
I've got a 2.4.8 Rancher installation and pod memory usage is indeed effectively doubled when viewed in Grafana at the cluster level. |
Same on 2.5.5 version Like for Pod CPU Usage as an example
To
|
Hello, this behavior is still present in rancher 2.9 and the rancher monitoring stack , chart version "104.0.2+up45.31.1" over rhel9. Any advise? |
#44726 is similar, possibly duplicate issue that has a PR. |
What kind of request is this (question/bug/enhancement/feature request):
Bug
Steps to reproduce (least amount of steps as possible):
Enable cluster monitoring
View cluster Grafana dashboard
Result:
The Pod CPU Usage and All Process CPU Usage sections of the dashboard are inaccurate. These graphs on the dashboard are doubled due to the query that was used.
Other details that may be helpful:
The Rancher monitoring seems to be exposing similar metrics in multiple ways. Due to the query that was used to form the graphs multiple metrics are being summed together.
Incorrect Pod CPU Usage query:
sum (rate (container_cpu_usage_seconds_total{pod_name!="",container_name!="POD",node=~"^$Node$"}[5m])) by (pod_name)
Incorrect All Process CPU Usage query:
sum (rate (container_cpu_usage_seconds_total{namespace!="",pod_name!="",node=~"^$Node$"}[5m])) by (namespace, pod_name)
corrected Pod CPU Usage query:
sum (rate (container_cpu_usage_seconds_total{pod_name!="",container_name!="",container_name!="POD",node=~"^$Node$"}[5m])) by (pod_name)
Added container_namer!="" to the querycorrected All Process CPU Usage query:
sum (rate (container_cpu_usage_seconds_total{namespace!="",pod_name!="",container_name!="",node=~"^$Node$"}[5m])) by (namespace, pod_name)
Added container_namer!="" to the queryThere was a similar issue opened for daemonset Grafana graphs: #20162
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.3.0gz#15744
The text was updated successfully, but these errors were encountered: