Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum(container_memory_usage_bytes{...}) rule doubles values #136

Closed
rrichardson opened this issue Jan 17, 2019 · 7 comments
Closed

sum(container_memory_usage_bytes{...}) rule doubles values #136

rrichardson opened this issue Jan 17, 2019 · 7 comments

Comments

@rrichardson
Copy link

the sum(container* ...) rules are duplicates of data provided by cAdvisor within kubelet, but they are reported in the same record names, albeit with different labels.

The label selectors in the rules in the default rules file collects both the NodeExporter(I think?) records as well as the kubelet cAdvisor records. This results in values that are exactly double reality.

I think the solution here is to just use the service="kubelet" and container_name!="" label selectors, and there is no need for a sum()

Originally posted here:
prometheus-operator/prometheus-operator#2302

What did you do?;
Installed Prometheus chart and friends via Helm in a K8s cluster created by Kubeadm 1.11

What did you expect to see?
Correct values aggregated by the rules:


record: pod_name:container_memory_usage_bytes:sum expr: sum   by(pod_name) (container_memory_usage_bytes{container_name!="POD",pod_name!=""}) | OK |   | 16.737s ago | 17.95ms
-- | -- | -- | -- | --
record: pod_name:container_spec_cpu_shares:sum expr: sum   by(pod_name) (container_spec_cpu_shares{container_name!="POD",pod_name!=""}) | OK |   | 16.719s ago | 14.89ms
record: pod_name:container_cpu_usage:sum expr: sum   by(pod_name) (rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m])) | OK |   | 16.704s ago | 19.75ms
record: pod_name:container_fs_usage_bytes:sum expr: sum   by(pod_name) (container_fs_usage_bytes{container_name!="POD",pod_name!=""})

If the rules were changed to just use the output from Kubelet, a sum() would not be necessary. This would require setting {service="kubelet", container_name!=""}

What did you see instead? Under which circumstances? : In addition to NodeExporter(I think?) exporting data under these record names, kubelet also reports data under these record names, albeit with different labels. Kublet reports the exact sum of all containers in the Pod.. so the above rules report a value that is exactly double the actual value.

Environment

  • Prometheus Operator version:

    Image ID: docker-pullable://quay.io/coreos/prometheus-operator@sha256:faa9f8a9045092b9fe311016eb3888e2c2c824eb2b4029400f188a765b97648a

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T07:10:00Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    kubeadm on bare metal

  • Manifests:

https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/manifests/prometheus-rules.yaml#L22

  • Prometheus Operator Logs:

not relevant

@tomwilkie
Copy link
Member

Hi @rrichardson; thanks for the issue! I'd be surprised if node_exporter is exporting container_* metrics, but cadvisor (embedded in the kubelet) exports metrics in a hierarchical fashion - and hence if we aggregate lower levels of the hierarchy with upper levels, we can get doubling...

We solve this by dropping lower levels at scrape time, see https://github.com/grafana/jsonnet-libs/blob/master/prometheus-ksonnet/lib/prometheus-config.libsonnet#L287

Could you confirm that image!="" to the rules fixes this for you? We should add that here, and see if we can have prometheus operator also drop those rules.

@rrichardson
Copy link
Author

rrichardson commented Jan 19, 2019 via email

@metalmatze
Copy link
Member

The Prometheus-Operator supports relabeling on the Endpoints of a ServiceMonitor:
https://github.com/coreos/prometheus-operator/blob/master/pkg/apis/monitoring/v1/types.go#L558

What we would need to do is to update this endpoint in the kube-prometheus stack:
https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/jsonnet/kube-prometheus/prometheus/prometheus.libsonnet#L280

Would you like to do a PR @rrichardson? Let me know, otherwise I can do the change too, but it's a nice little contribution if you want to.

@rrichardson
Copy link
Author

@metalmatze Sure. I'll take a whack at it.

@rrichardson
Copy link
Author

I still don't fully understand prometheus or the scope of cAdvisor and its hierarchy of metrics.

But shouldn't https://github.com/grafana/jsonnet-libs/blob/master/prometheus-ksonnet/lib/prometheus-config.libsonnet#L286-L292 already solve my problem?

It should be dropping container_.* where image =="" . when I filter out image = "" then my result is correct. But shouldn't those records not exist?

@Sayrus
Copy link
Contributor

Sayrus commented Jul 6, 2020

Hey @rrichardson ,

Thanks for documing your findings. I'm currently having the exact same problem with the metrics from cAdvisor not playing well with sum (for the exact same reason you opened this issue: the "total" (image="") is not filtered out as the relabeling rule does not exist in kube-prometheus).

I still don't fully understand prometheus or the scope of cAdvisor and its hierarchy of metrics.

But shouldn't https://github.com/grafana/jsonnet-libs/blob/master/prometheus-ksonnet/lib/prometheus-config.libsonnet#L286-L292 already solve my problem?

It should be dropping container_.* where image =="" . when I filter out image = "" then my result is correct. But shouldn't those records not exist?

I see that the PR on kube-prometheus was closed as this metric is not redundant and only Grafana dashboards should be fixed to filter it out.
However, I don't seem do find any patch on the dashboard side.

How did you ended up fixing the issue? Were the changes up-streamed and I'm missing something?

Edit:

To add a bit more context, here is an example of value I get in double:
https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/dashboards/resources/pod.libsonnet#L106
image

Just like your issue, one is the Pod Total (container=""), the other is the container itself (container="manager"), container="POD" is already filtered out by the query.

Sayrus pushed a commit to Sayrus/kubernetes-mixin that referenced this issue Jul 6, 2020
…ner_cpu_cfs

Currently, Kubelet cAdvisor exports metrics for the parent cgroup as
well as for each container. This leads to having "duplicate metrics" and
espacially lead to strange or wrong visualisations.

Filtering by `container!=""` exclude metrics from the parent cgroup.
This patch avoids having two time-series in the CPU Throttling panel.

Related to kubernetes-monitoring#136.
Sayrus added a commit to Sayrus/kubernetes-mixin that referenced this issue Jul 6, 2020
…ner_cpu_cfs

Currently, Kubelet cAdvisor exports metrics for the parent cgroup as
well as for each container. This leads to having "duplicate metrics" and
espacially lead to strange or wrong visualisations.

Filtering by `container!=""` exclude metrics from the parent cgroup.
This patch avoids having two time-series in the CPU Throttling panel.

Related to kubernetes-monitoring#136.

Signed-off-by: Mathis Raguin <mathis@cri.epita.fr>
csmarchbanks pushed a commit that referenced this issue Jul 6, 2020
…ner_cpu_cfs (#456)

Currently, Kubelet cAdvisor exports metrics for the parent cgroup as
well as for each container. This leads to having "duplicate metrics" and
espacially lead to strange or wrong visualisations.

Filtering by `container!=""` exclude metrics from the parent cgroup.
This patch avoids having two time-series in the CPU Throttling panel.

Related to #136.

Signed-off-by: Mathis Raguin <mathis@cri.epita.fr>
@paulfantom
Copy link
Member

I think this can be closed after #512 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants