New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container hangs with memory usage almost the same as memory limit #35463
Comments
If containers start to get OOM killed by the kernel, chances are as well that the host as a whole is running out of resources (in which case, other processes could be killed). Do the system logs show information about processes that were killed by the kernel? |
Oh, actually the problem is exactly opposite. The hanged container wasn't oom killed, but had to be restarted manually. After that normal oom message appeared in kernel logs:
, docker logs show nothing around that time (but log level is "info" now) . Out of 16gb of ram 6gb were free and other containers were fine, so I wonder if docker/kernel tried to kill the container when it reached memory limit but something got stuck. |
Docker won't kill containers; it sets up the root-filesystem, namespace and cgroups, and after that it's the kernel managing, so not sure what happened. It could still be a kernel issue; 3.16 has come up quite a few times in issues. |
Yes, so looks like the kernel started OOM killing processes;
|
Yes, but only after I tried to exec into the container. Before that it was "stuck" for more than 5 hours.
The problem is that oom killer is NOT triggered automatically, but only after some interference. I suppose that it has something to do with swap, but I can not test it because I have not reliable way to reproduce. |
I'm not sure there's much we can do about that (given that it's the kernel deciding when to act). Perhaps @mlaventure has some thoughts on this. |
Looks like an issue on the kernel level. I was going to suggest it may be a 3.16 issue but looks like you also got the isse on a 4.x kernel on rancher. But from the description, I would definitely think of a kernel issue given that doing a |
|
Confirmed
|
Description
Hello. Recently. We faced a strange problem. One of containers suddenly hung (no logs, no response to amqp requests). We monitor all containers via croups pseudo-files and our monitoring system shows that at the same time container hung it's user cpu usage went to 0% (usully it jumps between 0% and 20% constantly) but system cpu usage went to 30-40%, and the most interesting: memory suddenly jumped to 255.5mb (the memory limit is exactly 256mb) and stayed perfectly flat until the container was manually restarted. We also observed a huge increase in disk i/o activity on a whole host: read rate up to 200MBps when usually it almost 0, and write rate up to 2MBps when usually it is about 0.5MBps. It almost looks like the container was blocked while swapping but we have swap disabled on all of our machines. This also happened a few times on other machine always with the same container, but haven't happened in a while. I though that it was a problem with a specific container but it turned out it's not. We also haven't had container monitoring back then.
Steps to reproduce the issue:
Unfortunally I was unable to reproduce this. Most of the time containers that exceed memory limit just get killed, but once in a few weeks the problem with the symptoms described above happens. I also couldn't find any reports similar to our issue so I desided to create one.
Describe the results you received:
Container hangs, huge increase in i/o activity.
Describe the results you expected:
Container gets killed.
Output of
docker version
:Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Fri Mar 24 00:34:45 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.1-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: c6d412e
Built: Fri Mar 24 00:34:45 2017
OS/Arch: linux/amd64
Experimental: false
Output of
docker info
:Containers: 123
Running: 122
Paused: 0
Stopped: 1
Images: 1360
Server Version: 17.03.1-ce
Storage Driver: aufs
Root Dir: /mnt/ssd/docker/aufs
Backing Filesystem: extfs
Dirs: 3663
Dirperm1 Supported: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.63 GiB
Name: s2.teslatele.com
ID: XOC3:KLQL:ENDZ:AANM:TEA6:FAYI:LEVE:DNQX:U7NB:AA4A:RVQ3:QFD5
Docker Root Dir: /mnt/ssd/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
physical
The text was updated successfully, but these errors were encountered: