New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker hangs after docker stop
on a host with high load
#32809
Comments
Can you send |
We've collected a number of stack dumps. Seems like there's a pattern in all of them:
stacks.tar.gz |
On one, it looks like it's stuck syncing to disk:
On several others it's waiting on IO streams to exit (could be a bug):
So it seems like we aren't getting an EOF from the FIFO from containerd. |
Hi @cpuguy83, @mlaventure! Here's containerd stack for the last dockerd stack dump ( Should I report this as a separate issue in containerd repo? |
@eugene-dounar it looks like you have several dockerd daemon running at the same time:
Or, the |
@mlaventure I checked the logs once again — the daemon failed to start multiple times due to #32808.
No indication of multiple Docker daemons running at the same time |
@eugene-dounar your log indicates that your daemon is panicing, but the rest of the stacktrace is missing unfortunately, is it possible to get the rest of it? |
ping @eugene-dounar ^^ |
@mlaventure sorry I did not quit get what is the missing stacktrace? The panic was reported as a separate bug #32808 I don't think it's directly related.
But this particular bug does not make Docker daemon panic |
Hi all, Here is goroutines dump from the latest incident: goroutine-stacks-2017-09-04T091905Z.log.gz Then I've grepped syslog entries with this container. Note that Docker stopped responding to
It appears that Then I've tried to find the goroutine that actually holds the lock. After categorizing all of the them I've ended up with this one:
Judging by
PS. |
We got another similar incident and I've collected
By that time |
Had the same issue. Replicated very good on one of our servers with debian |
Seeing a similar issue. We've tested docker 1.13.1, 17.03.1, 17.06.0, 17.09.0 and the symptoms are the same. Here is info from 17.03.1: Stack trace: Kernel: Docker Info:
Docker logs around issue: |
This is an old issue. I will close as stale. If you see this error on 23.0 or newer, please open a new issue. |
Description
Docker daemon is stuck after Docker Enforcer stops container. Seem to be a race condition occurring on heavy load (load average ~100 on 64 CPUs).
Steps to reproduce the issue:
Not easily reproducible but the general steps are:
docker kill <cid>
Describe the results you received:
docker ps
hangsDescribe the results you expected:
docker ps
should show running containersAdditional information you deem important (e.g. issue happens only occasionally):
Note that
containerd
says process exited at09:51:25
butdockerd
reports it failed to exit 6 seconds later at09:51:31
. After that docker becomes unresponsive.Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
AWS: x1.16xlarge (dedicated host)
The text was updated successfully, but these errors were encountered: