Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-state-metrics provides wrong memory metrics for pods #748

Closed
rsoika opened this issue Oct 30, 2020 · 12 comments
Closed

kube-state-metrics provides wrong memory metrics for pods #748

rsoika opened this issue Oct 30, 2020 · 12 comments
Labels

Comments

@rsoika
Copy link

rsoika commented Oct 30, 2020

What happened?
I installed latest version of kube-prometheus. with kube-state-metrics 1.9.7 on my self managed kubernetes cluster

Did you expect to see some different?

I expected that the metric service will provide metrics about pod memory consumption.

I verified the metrics with:

$ kubectl top pod my-app

and on the corresponding worker node with:

$ docker stats

It shows that the metric data concerning memory usage of a pod shows the double size as it should be.

How to reproduce it (as minimally and precisely as possible):

You can compare the metric data with kubectl top and docker stats

After I uninstalled kube-prometheus stack and installed instead the metric-server all memory is displayed correctly and kubernetes scheduling was again as expected

Does anybody know how this can happen and what I can do about this issue?

See also discussions here:

https://stackoverflow.com/questions/64582065/why-is-openjdk-docker-container-ignoring-memory-limits-in-kubernetes
https://stackoverflow.com/questions/64440319/why-java-container-in-kubernetes-takes-more-memory-as-limits

Environment

Debian Buster

Kubernetes 1.19.3

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Manifests:

kube-state-metrics 1.9.7

@brancz
Copy link
Collaborator

brancz commented Nov 11, 2020

You are most likely scraping both the /metrics/cadvisor as well as the /metrics/resource endpoints, which causes this. At least we've had many reports of this on slack lately. How did you install kube-prometheus and with what configuration?

@rsoika
Copy link
Author

rsoika commented Nov 11, 2020

I cloned the kube-prometheus repo from github and than I did:

$ kubectl create -f manifests/setup
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl create -f manifests/

I did use kustomize only to add an ingress setup and add some Grafana environment variables to support SMTP.
So I think from the core installation of kube-prometheus I did follow the given instructions.

The thing which I did not understand is that kubectl top pod ... returned wrong metrics. After I uninstalled kube-prometheus and I installed the 'metrics-server' kubectl top shows the correct data. And than also my kubernetes cluster behaves as expected. The strange thing was, that when I looked into the memory consumption of my pods (I am running only java services) the numbers were double in the size. But I did not realize that these numbers where wrong. I also used k9s to monitor my cluster.

So the most of my time I invested to optimize my java containers. After uninstalling kube-prometheus and installing the metrics-server I can see now that everything is fine with my pods and also with my cluster.

@brancz
Copy link
Collaborator

brancz commented Dec 4, 2020

Sorry for taking so long to reply, but could you elaborate in what way they seem "wrong"? Also which version of kube-prometheus are you installing?

@rsoika
Copy link
Author

rsoika commented Dec 4, 2020

I checked out the master branch from github. This was a month ago.
The result was that memory usage was measured in double, so that I had to high volumes leading to a wrong behaviour. I uninstalled everything and installed the metrics-server to get the correct data.

@brancz
Copy link
Collaborator

brancz commented Dec 4, 2020

Would you be open to trying again and sharing some data with us so we could debug the situation?

@rsoika
Copy link
Author

rsoika commented Dec 4, 2020

I'm sorry, I can't test this again at the moment because my environment can't be changed now.

@rsoika
Copy link
Author

rsoika commented Dec 4, 2020

I fear that we must close this issue because I can not provide more information or do a new testing. Do you have any suspicions about what might have been the root of my problem?

@brancz
Copy link
Collaborator

brancz commented Dec 4, 2020

No problem! Please feel free to open a new issue should you encounter anything in the future! :)

@rsoika rsoika closed this as completed Dec 4, 2020
@rdxmb
Copy link

rdxmb commented Aug 25, 2021

I am having this issue with the double size of the memory with


kube-prometheus-stack
appVersion: 0.49.0
version: 17.2.2

I can reproduce this with a container running a single process with about 7GB memory:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                          
     **17 root      20   0 7720256   7.4g   1740 S   6.2   5.9   0:32.41 tail**                             
      1 root      20   0    4260    592    528 S   0.0   0.0   0:00.05 sleep                            
      6 root      20   0    5992   3980   3392 S   0.0   0.0   0:00.02 bash                             
     15 root      20   0    5732   3508   3256 S   0.0   0.0   0:00.00 bash                             
     18 root      20   0    5992   3960   3372 S   0.0   0.0   0:00.02 bash                             
     27 root      20   0    7860   3616   3052 R   0.0   0.0   0:00.00 top  

The grafana dashboard shows me that there are about 14 GB memory used.

Screenshot_20210825_125104

When I use TICK stack with the docker input plugin, the usage of about 7GB can be seen. Also have a look at the host memory, which increased for 7 GB only.

Screenshot_20210825_125336

In prometheus/kube-state-metrics, you can also reproduce this by having a process that uses more than the half of the memory allowed by kubernetes limit:

For example:

  • k8 memory limit is on 27 GB
  • script in container uses 15 GB

result:

via TICK stack:

Screenshot_20210825_130424

via kube-prometheus-stack:

Screenshot_20210825_130113

This is impossible...

Information about the kubernetes Node which is running this:

zwei Ready <none> 216d v1.19.7 10.139.0.92 <none> Ubuntu 20.04.1 LTS 5.4.0-62-generic docker://19.3.14

/reopen

@rdxmb
Copy link

rdxmb commented Aug 25, 2021

/reopen please

@BalzGuenat
Copy link

Ran into the same issue. I believe this is because one tool measures the total memory usage while another only measures the working set.

@abhinov93
Copy link

abhinov93 commented Nov 9, 2023

I'm facing same issue.

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2

If anyone has resolved it, kindly let me know .

Using below command .
helm install prometheus prometheus-community/kube-prometheus-stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants