-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker container stopped under heavy load: RPC error in /var/log/upstart/docker.log #34377
Comments
I believe this is because containerd is getting OOM killed and there was an error in how docker's containerd supervisor was handling this situation. This won't be addressed in 17.03 but should be fixed in the upcoming 17.06 patch release. |
Hi @cpuguy83 , Thanks for the quick reply! What you described also match what we are suspecting as well, though I still have two questions: #2 I thought 17.06.0-ce is already out, or do you mean there's another patch coming on top of 17.06.0-ce? Do you happen to have the PR that I can reference? Thank you very much. This error has been giving me so much trouble lately... |
I'd strongly recommend not using It's better to have a container killed than (e.g.) |
Well, it sure as shit messes up Docker because the error handling for the containerd healthcheck was wrong in this case and containerd is never automatically restarted like it should be.... but containers should stick around just fine outside of the kernel OOM killing them. But yes, you want dockerd and containerd to be less likely to be OOM killed than most of your other processes. |
Honestly I can't remember where the fixes came in. I think it's not in the .0 release but they could be.... I'd have to do some poking around. |
When Container OOM happened, I know under Ubuntu, |
@totoroliu You can fine OOM messages in /var/log/syslog |
Let me close this issue, because I don't think there's a bug at hand here, but feel free to continue the conversation |
Description
Unexpected docker container stopped under heavy load: RPC error in /var/log/upstart/docker.log
We are using Jenkins to start multiple docker container (40+ per host) and do builds inside the container. When it's fully loaded, all docker containers in this hosted are stopped and I see the rpc error in the log.
The command Jenkins uses to start docker container is
docker run -t -d -u 1000:1000 --privileged -u root --memory=1400m --cpus 1.0 --oom-kill-disable
Describe the results you received:
time="2017-08-02T16:01:59.170520197-07:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = transport is closing"
time="2017-08-02T16:01:59.179258675-07:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = transport is closing"
time="2017-08-02T16:01:59.206258346-07:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
time="2017-08-02T16:01:59.208830605-07:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
Additional information you deem important (e.g. issue happens only occasionally):
The issue seems to only happen under heavy load.
Another entry I keep seeing in the log is:
level=warning msg="Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap."
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
The host is a vm under VMWare.
The text was updated successfully, but these errors were encountered: