kube-state-metrics provides wrong memory metrics for pods #748

rsoika · 2020-10-30T09:44:20Z

What happened?
I installed latest version of kube-prometheus. with kube-state-metrics 1.9.7 on my self managed kubernetes cluster

Did you expect to see some different?

I expected that the metric service will provide metrics about pod memory consumption.

I verified the metrics with:

$ kubectl top pod my-app

and on the corresponding worker node with:

$ docker stats

It shows that the metric data concerning memory usage of a pod shows the double size as it should be.

How to reproduce it (as minimally and precisely as possible):

You can compare the metric data with kubectl top and docker stats

After I uninstalled kube-prometheus stack and installed instead the metric-server all memory is displayed correctly and kubernetes scheduling was again as expected

Does anybody know how this can happen and what I can do about this issue?

See also discussions here:

https://stackoverflow.com/questions/64582065/why-is-openjdk-docker-container-ignoring-memory-limits-in-kubernetes
https://stackoverflow.com/questions/64440319/why-java-container-in-kubernetes-takes-more-memory-as-limits

Environment

Debian Buster

Kubernetes 1.19.3

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Manifests:

kube-state-metrics 1.9.7

The text was updated successfully, but these errors were encountered:

brancz · 2020-11-11T15:28:21Z

You are most likely scraping both the /metrics/cadvisor as well as the /metrics/resource endpoints, which causes this. At least we've had many reports of this on slack lately. How did you install kube-prometheus and with what configuration?

rsoika · 2020-11-11T15:44:51Z

I cloned the kube-prometheus repo from github and than I did:

$ kubectl create -f manifests/setup
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl create -f manifests/

I did use kustomize only to add an ingress setup and add some Grafana environment variables to support SMTP.
So I think from the core installation of kube-prometheus I did follow the given instructions.

The thing which I did not understand is that kubectl top pod ... returned wrong metrics. After I uninstalled kube-prometheus and I installed the 'metrics-server' kubectl top shows the correct data. And than also my kubernetes cluster behaves as expected. The strange thing was, that when I looked into the memory consumption of my pods (I am running only java services) the numbers were double in the size. But I did not realize that these numbers where wrong. I also used k9s to monitor my cluster.

So the most of my time I invested to optimize my java containers. After uninstalling kube-prometheus and installing the metrics-server I can see now that everything is fine with my pods and also with my cluster.

brancz · 2020-12-04T09:23:14Z

Sorry for taking so long to reply, but could you elaborate in what way they seem "wrong"? Also which version of kube-prometheus are you installing?

rsoika · 2020-12-04T09:52:12Z

I checked out the master branch from github. This was a month ago.
The result was that memory usage was measured in double, so that I had to high volumes leading to a wrong behaviour. I uninstalled everything and installed the metrics-server to get the correct data.

brancz · 2020-12-04T09:55:01Z

Would you be open to trying again and sharing some data with us so we could debug the situation?

rsoika · 2020-12-04T10:18:21Z

I'm sorry, I can't test this again at the moment because my environment can't be changed now.

rsoika · 2020-12-04T10:19:52Z

I fear that we must close this issue because I can not provide more information or do a new testing. Do you have any suspicions about what might have been the root of my problem?

brancz · 2020-12-04T12:22:35Z

No problem! Please feel free to open a new issue should you encounter anything in the future! :)

rdxmb · 2021-08-25T11:07:53Z

I am having this issue with the double size of the memory with


kube-prometheus-stack
appVersion: 0.49.0
version: 17.2.2

I can reproduce this with a container running a single process with about 7GB memory:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                          
     **17 root      20   0 7720256   7.4g   1740 S   6.2   5.9   0:32.41 tail**                             
      1 root      20   0    4260    592    528 S   0.0   0.0   0:00.05 sleep                            
      6 root      20   0    5992   3980   3392 S   0.0   0.0   0:00.02 bash                             
     15 root      20   0    5732   3508   3256 S   0.0   0.0   0:00.00 bash                             
     18 root      20   0    5992   3960   3372 S   0.0   0.0   0:00.02 bash                             
     27 root      20   0    7860   3616   3052 R   0.0   0.0   0:00.00 top

The grafana dashboard shows me that there are about 14 GB memory used.

When I use TICK stack with the docker input plugin, the usage of about 7GB can be seen. Also have a look at the host memory, which increased for 7 GB only.

In prometheus/kube-state-metrics, you can also reproduce this by having a process that uses more than the half of the memory allowed by kubernetes limit:

For example:

k8 memory limit is on 27 GB
script in container uses 15 GB

result:

via TICK stack:

via kube-prometheus-stack:

This is impossible...

Information about the kubernetes Node which is running this:

zwei Ready <none> 216d v1.19.7 10.139.0.92 <none> Ubuntu 20.04.1 LTS 5.4.0-62-generic docker://19.3.14

/reopen

rdxmb · 2021-08-25T11:08:29Z

/reopen please

BalzGuenat · 2023-04-26T11:30:50Z

Ran into the same issue. I believe this is because one tool measures the total memory usage while another only measures the working set.

abhinov93 · 2023-11-09T06:43:14Z

I'm facing same issue.

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2

If anyone has resolved it, kindly let me know .

Using below command .
helm install prometheus prometheus-community/kube-prometheus-stack

rsoika added the kind/bug label Oct 30, 2020

brancz mentioned this issue Dec 4, 2020

jsonnet: Fix affinity order dependency #822

Merged

rsoika closed this as completed Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-state-metrics provides wrong memory metrics for pods #748

kube-state-metrics provides wrong memory metrics for pods #748

rsoika commented Oct 30, 2020

brancz commented Nov 11, 2020

rsoika commented Nov 11, 2020

brancz commented Dec 4, 2020

rsoika commented Dec 4, 2020

brancz commented Dec 4, 2020

rsoika commented Dec 4, 2020

rsoika commented Dec 4, 2020

brancz commented Dec 4, 2020

rdxmb commented Aug 25, 2021 •

edited

rdxmb commented Aug 25, 2021 •

edited

BalzGuenat commented Apr 26, 2023

abhinov93 commented Nov 9, 2023 •

edited

kube-state-metrics provides wrong memory metrics for pods #748

kube-state-metrics provides wrong memory metrics for pods #748

Comments

rsoika commented Oct 30, 2020

brancz commented Nov 11, 2020

rsoika commented Nov 11, 2020

brancz commented Dec 4, 2020

rsoika commented Dec 4, 2020

brancz commented Dec 4, 2020

rsoika commented Dec 4, 2020

rsoika commented Dec 4, 2020

brancz commented Dec 4, 2020

rdxmb commented Aug 25, 2021 • edited

rdxmb commented Aug 25, 2021 • edited

BalzGuenat commented Apr 26, 2023

abhinov93 commented Nov 9, 2023 • edited

rdxmb commented Aug 25, 2021 •

edited

rdxmb commented Aug 25, 2021 •

edited

abhinov93 commented Nov 9, 2023 •

edited