-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage with low number of containers #1774
Comments
A number of performance tweaks have been made since then. From a quick scan, it looks like a large amount of that time is spent handling external requests. Might be worth measuring the number of requests cadvisor is serving, and seeing what its resource usage is when it is not handling any requests. |
Same issue: 8core x2 CPU with 9 running containers ~ CAdvisor take 13% of processor resources |
Same here: 1core VM with ~10 containers (running but doing nothing). CAdvisor takes 18% cpu. |
Same here. Lots of iowait. cadvisor USED mem (top) > 2G. Running |
@bmerry the graph you show looks like about what I would expect. Assuming your containers don't have anything in their r/w layer, most cpu usage generally comes from reading from cgroup files, of which there are many for the memory cgroup. Not sure about the memory leak. I'm not super familiar with the cgroup implementation, but I know they aren't real files, and are "stored" at least partially in the dentry cache. Accessing cgroup files repeatedly should make the dentry cache grow, but it should be reclaimed when free memory gets low. |
@dashpole thanks for taking a look. I've noticed that the pprof report claims ~4% CPU usage, which doesn't match what top reports (40%+). I've confirmed by taking another profile and the graph header reports ~4% while top showed 30%+ over the entire 30s. So there is something odd in the way the profiler is working - perhaps it only profiles user time? htop shows the CPU usage is mostly in the kernel. Running perf top, the top hits are
which does seem to confirm your idea that it's related to cgroups. The machine with the graph (using ~40% CPU) has 14669 files in /sys/fs/cgroup (counted with FWIW, I've just tried with cadvisor 0.30.2 and
I'm hoping so - for now it's been "leaking" slowly enough that there hasn't been any memory pressure. On one machine the memory was in SUnreclaim rather than SReclaimable, but when I dropped the dentry cache manually the memory was returned. |
It seems like it's definitely some slow path in the kernel. Simply That sounds more like a kernel issue than cadvisor's problem and if I get time I may try to take it up on the LKML, but if you have any suggestions on fixing or diagnosing it, I'll be happy to hear them. This is probably a separate issue from the original report, where pprof showed high CPU usage. @ZOXEXIVO @schabrolles are you seeing the same behaviour I am? |
After some discussions on the linux-mm mailing list and some tests, it sounds like the problem may be "zombie" cgroups: cgroups that have no processes and have been deleted but still have memory charged to them (in my case, from the dentry cache, but it could also be from page cache or tmpfs). These are still iterated over when computing the top-level memory stats. We had a service that was repeatedly failing and being restarted (by systemd), which probably churned through a lot of cgroups over a few weeks. I still need to experiment with ways to fix the underlying problem (including checking if it is better in newer kernels), but I'd like to find out if there is a way to work around it. In particular, we tend to use it only to get per-Docker-container metrics into Prometheus, and not so much for aggregate or system metrics (we have node-exporter for that). So if there is a way we can turn off collection of |
We don't have the option to disable collection of the root cgroup. We have an option --docker_only, but that keeps the root cgroup around. |
Do you think that would be reasonably easy for someone not familiar with the code to implement as a command-line option, or is it pretty core? |
I would rather not add a flag for that, but you can just remove the registration of the raw factory here: https://github.com/google/cadvisor/blob/master/manager/manager.go#L335, and that would turn off collection for all cgroups that are not containers. I am planning to introduce a command line flag to control which factories are used in the future, which should allow this behavior without rebuilding cAdvisor |
That didn't work for me:
I also tried taking out the |
@bmerry curious if you have made any progress debugging this since July? |
I never did get all the way to the bottom of it. Some step in creating and destroying cgroups interacts in a non-deterministic way with the reads that cadvisor does to cause cached dentries that keep the cgroups alive as zombies. There is a patch (which I assume will go into the next Linux release) which makes the stats collection a lot faster and thus reduces the impact, but doesn't prevent the zombies in the first place. I gave up trying to fix the problem, and now we have a cron job that times reading |
Thanks a lot @bmerry dropping the dentry cache fully fix my cadvisor cpu issue. I've done that using the following command:
More information about drop_caches here. |
I am seeing the same on Ubuntu 16.04 even after clearing all caches. cadvisor goes up to 50% CPU every few seconds.
|
Same issue here on Ubuntu 18.04.1 4.15.0-43 . Cadvisor options and echo 2 > /proc/sys/vm/drop_caches doenst help for me :( Any updates for the Issue? |
+1 |
@Aschenbecher then your high CPU load issue is not related to memory stats, you need to profile cadvisor using pprof in order to find what cause those high CPU loads. @zerthimon +1 comment are always useless, use github comments thumbs up feature. Please do not pollute issues with useless comments, Thanks |
thanks, I will take a look into it! |
+1 cpu 200% when housekeeping func is runing |
+1 |
We found this issue while tracking down sporadic 100ms+ latency to services on a bare metal kube cluster. We narrowed it down to any packets arriving on the system, and as far as system-level NIC RX queue processing stalls that caused packets to batch up and then process suddenly around 100ms later on random CPU cores at certain intervals. Some This work-around mentioned above did work for us, making the
It's worth noting that syscalls that are as slow as this can likely cause worse follow-on effects than high CPU, like what we observed. Hopefully the above might help some future person find this issue more quickly for the more insidious symptom. |
long-term-issue (note to self) |
Nice write-up of @ theojulienne |
try using |
same result with |
@tmpjg how many cores is your node? |
@dashpole 4 cores. processor : 0
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 1
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 2
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 3
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
Hardware : BCM2835
Revision : a020d3
Serial : 00000000e14271a3
Model : Raspberry Pi 3 Model B Plus Rev 1.3 |
there is a dicussion about a cgroup bug, see Showing /sys/fs/cgroup/memory/memory.stat very slow on some machines, maybe useful for this issue |
@theojulienne @bmerry CentOS Linux release 7.6.1810 (Core) but echo 2 > /proc/sys/vm/drop_caches will without stop, and cpu load high,too many "kworker" kernel processing, which kernal version can use echo 2 drop_caches? I will reinstall the system.thanks |
I would suggest either upgrading to a newer kernel or disabling the code that is polling this stat file if that's not an option. There's some more details of this from the kernel side from the RHEL folks here, which has a nice summary of the patches and trace scripts: https://bugzilla.redhat.com/show_bug.cgi?id=1795049 The |
@theojulienne OK , I'll try it later and get back to you soon .Thanks |
bugfix kernel release version can click https://bugzilla.redhat.com/show_bug.cgi?id=1795049
|
Performance of memory.stat was improved in https://spinics.net/lists/cgroups/msg21876.html |
Environment
Version: v0.24.1 (note: we are using this version due to the Prometheus label issues in later versions)
OS:
Ubuntu 16.04, 4.4.0-31-generic, x86_64
Docker: Docker version 1.12.6, build 78d1802
Containers: 4 (including cadvisor)
Cores: 2
Docker
Dockerfile:
Docker command:
Problem
cAdvisor CPU usage is extremely high, such that it impacts the performance of the other containers. Typical CPU usage is between 20 - 110%, usually sitting at around 60-80%.
Output of the following profiling is attached:
Note that I ran the profiling twice, just to compare results. I won't presume to interpret the results, other than to say that they look somewhat similarish, with a lot of time being spent in
syscall.Syscall / syscall.Syscall6
as well as memory allocations and what I presume is garbage collection.At first I thought it might be the same issue as #735 or kubernetes/kubernetes#23255 , but I haven't seen any invocations of
du
show up in the output ofps
.It's also interesting to note that we don't see this same CPU hit on all of our nodes. To date, we see it mostly on instances that are running containers that are using
net=host
and that spawn a significant number of process/threads within their containers. These containers wrap some legacy monolith applications that deviate from the "usual" operational model of web apps.The text was updated successfully, but these errors were encountered: