-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cavisor v0.39.1 is causing massively increased IOPS(from 100-300 IOPS to 10k-20K IOPS) #2947
Comments
@bobbypage @JohnnyG235711 , Could you take a look at this issue? is there a way to opt out this metric so that to avoid the extra load? |
Can you try running with I think you'll need something like cadvisor/container/docker/handler.go Lines 375 to 377 in 8c0666a
Let's pinpoint if the the commit you mentioned is the root cause for this. |
Thanks @bobbypage will give a try and keep you updated. thanks! |
Hi @bobbypage, I can confirm, that the node load and IOPS on the Netapp go back to the "normal usage area", as soon as I set the
One important question remains: Which metrics do we lose when we simply disable all Thanks & regards, |
Interesting, thanks for the note. You can see usages of Disk Metrics here. It's mostly used in docker / raw handler to get file system stats. Docker handler for file system stats is here which fills the FsStats struct. File system stats can also be obtained for non docker containers (i.e. other types of cgroups) by the |
Hi @bobbypage Thanks for your answer. I just had a more detailed look at the returned metrics. Once with the arg
So.. this workaround seems to work. Nevertheless, it's just a workaround and not a solution to fix the root cause. Do you think there's something you could do here? Edit: Here's a Bash script to reproduce the issue: https://gist.github.com/PhilipSchmid/c19d085363809e5ba2356037e5e66551 Thanks & regards, |
https://10.10.10.10:10250/metrics/cadvisor this api is not for cadvisor pod, it's actually a request to the kubelet. kubelet using an embedded cadvisor which could also provide metrics. |
Thanks for the input! I just redid the same test with the actual cadvisor endpoint:
As we can see, we have a difference of 19 metrics. Every odd is also that there are even more metrics available when To be more precise, the following 24 metrics are only available, when the
On the other hand, the following 5 metrics are only available, when the
Is there a better way to test this? This test does somehow not make sense to me. Thanks & regards, |
@PhilipSchmid Looks like it was work as design, there is a default ignore metric list, check this( Lines 78 to 93 in 8b96dc6
However, if you disable only one metric, for this case, --disable_metrics=disk, it will overwrite the default-ingored-metric structure,(check this line Line 101 in 8b96dc6
@bobbypage Please confirm if this is a side effect of disable-metric usage or just design like this. Thanks! |
@xiaoping8385 Thanks for the clarification! Now it makes sense to me. Nevertheless, I personally think this isn't desired behavior but let's give @bobbypage some time to confirm this. Thanks & regards |
After we upgraded to cadvisor from v0.36.0 to v0.39.1, we notice the load of multiple worker nodes massively increased to up to 150 (load15, 4 vCPU per worker)! At the same time we noticed a massive increase of disk IOPS on our Netapp storage system which provides the NFS shares for the RWX volumes some workload Pods use (from 100-300 IOPS to about 10k-20k IOPS!).
As soon as we terminate the cAdvisor pods ,the node load and Netapp NFS share IOPS decrease right away to normal values back again.
we run a self-made bpftrace program in order to detect the massive amount of "nfs_statfs" and "nfs_getattr" events from the cAdvisor process (the shown amount of calls (most right values) are done within a 5-second interval):
12:12:19
@[17941, cadvisor, nfs_statfs]: 11993
@[17941, cadvisor, nfs_getattr]: 36093
I searched a commit which related to fs metric support, not sure if that has some relationship with this, can someone take a look please, thanks! 8c0666a#diff-dce7cce055566bed799f788cd0048e209a27a473c0f48b956fa1f1780e80d2c1
The text was updated successfully, but these errors were encountered: