-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker daemon hanging #25321
Comments
I notice when doing this:
|
We got a a bunch of these errors before it started happening. Not sure if it is relevant:
|
Without getting to specific to our application logic, we have a CRON task that runs a few
If the logfile is empty the cron task is done. Otherwise we run another command:
This causes an application we use to close
For each log we read it:
Then we remove it:
As you can see we are generating a few commands within this CRON task. After a few hours of running these commands across 20 or so docker images we see docker hang. For whatever reason it seems isolated to a single container as |
/cc @mlaventure |
Here is a stack trace from I don't know much about the stack being dumped there but I do see something about some of the "goroutines" being "525 minutes" old. This makes since because I went to sleep and about 8 hours later came back to the docker daemon being hung/stuck. |
Also it's worth noting that attempting to |
ping @crosbymichael @mlaventure PTAL |
I have the same issue with 1.12.0 too. "docker ps" hangs forever and "docker stop" etc. access via swarm hangs too. The log is absolutely quiet about this. strace docker ps...snip... docker infoContainers: 11 |
@b00lduck that looks like a strace of the client; could you try to get a stacktrace of the daemon? See #25321 (comment) |
The strace of the dockerd ist absolutely quiet here: strace -p 2853 (which is dockerd)strace: Process 2853 attached even if i issue docker ps, nothing in the trace. |
I think if you send it |
kill -s USR1 1234 Where 1234 is the pic. Works for me on stock debian Jessie. |
I added some sleep commands in between various docker commands and we have not had a problem for about 12 hours now. |
Stacktrace attached |
@krisives could you share what commands you put the sleep between? |
Thanks @b00lduck! |
@krisives (off topic) looking at your use-case; is there a reason you're logging to files inside the container, instead of logging to stdout, and making use of the docker logging features? |
From b00lduck's stacktrace it seems that goroutine 7 is stuck while trying to connect to containerd, waiting for the connection to change it's idle state. Don't know if that indicates a problem with containerd or rather an RPC or network issue? There's other goroutines IO-waiting in grpc-code, but also some other places... |
@rokkbert that goroutine is just here to handle drops in connection with @chbatey the |
@b00lduck on your stacktrace file, it looks like a container's lock was never released. There is several instance of @chbatey if you still have the daemon running, a stacktrace may be helpful too :) Thanks, |
@mlaventure @b00lduck Looks like it's stuck leaving an overlay network while the container is locked:
|
ping @mavenugo Looks like it's blocking on a channel send: if d.notifyCh != nil {
d.notifyCh <- ovNotify{
action: "leave",
nw: n,
ep: ep,
}
} |
@cpuguy83 ah, correct the lock is taken here: https://github.com/docker/docker/blob/v1.12.0/daemon/monitor.go#L31 Thanks! |
Maybe @mrjana too, since I know madhu is travelling. |
After adding the sleep command we have been running for 24+ hours without a docker hang. This could be because we are issuing less commands overall or because we never issue two commands within a short period of time.
I am currently sleeping after any
Yes here is everything up until the USR1 signal stack trace when it was hung:
I am not a docker expert I just play one on TV. My next solution is to write a script which runs within the container itself and parses the logs and inserts it into the appropriate database tables. How could I get docker to output all of the logs to one file on the host? |
To keep your "current" approach, you can bind-mount a directory on your host, and have the container write to that ( However, that also doesn't use the docker logging features. To do that, make sure the container writes logs to (sorry all for the off topic - back to the issue at hand) |
Hi, we're having a similar issue across all of our docker machines that were recently upgraded to 1.12. I'll try to get get stack traces out tomorrow. docker ps and all docker compose commands now hang. Going to try to rollback to previous version of docker. |
Same issue here on Gentoo. One thing I read the same from a SuSE user. |
@vanthome You may be experiencing an openrc bug. I haven't dug deep enough to figure out the problem yet but downgrading openrc from 0.21.3 to 0.19.1 fixed the hanging issue for me. |
Is everyone experiencing this running extra processes with |
I have the same issue. I'm not using "docker exec" at all. After running "/etc/init.d/docker start" I can't run a single docker command, even "docker version" is hung. Version is "1.12.1_rc1 (Gentoo)". Tried to downgrade to 1.11, same issue. |
@kingfame Can you send a USR1 signal to the daemon and provide the daemon log? Thanks |
Same issue here as @kingfame on gentoo -- no docker commands are responsive. The daemon doesn't respond to a USR1 signal. The daemon log gets to the "starting containers" bit during initialization and no further. As I said before, downgrading openrc somehow fixes it. |
I downgraded to openrc 0.19.1 without success, same issue. Here is the output of the USR1 signal:
|
@kingfame your issue is different.
What version of Could you provide |
Containerd isn't installed, and doesn't seems to be a dependency regarding to the gentoo packet manager (portage/emerge). I guess I should install it o0 EDIT: Sorry for the noise. This seems to be a gentoo bug, where it is not pulling in containerd. After I installed containerd docker seems to be working again. |
@kingfame If gentoo uses our package format, EDIT: ah, glad to see it fixed it :) |
@kingfame The atom is "app-emulation/containerd" and it is in fact a runtime dependency. |
Running with storage driver |
Several issues are now part of this thread:
I've created #26459 to keep track of the deadlock and am thus closing this thread in favor of the new issue for easier tracking. |
@mlaventure is my issue (#25321 (comment)) documented as well? |
@beedub I'll take a look today, thanks for the ping! |
@beedub can you confirm what issue you are seeing? Are you having the daemon being hung or something else? |
@mlaventure daemon hung as far as I remember |
We appear to be encountering similar after moving to docker 1.12.2. Restarting docker appears to be the only way to recover. We are using an lvm thinpool devicemapper setup leveraging xfs and unfortunately, overlay is not very stable on Centos 7.2 so making that move is not currently an option. Doing so would be quite disruptive to existing workloads as well. Here is the stack trace from dockerd: Docker info (note that even docker info hangs in my case):
Docker client strace:
dockerd strace:
docker-containerd (pid 850188) strace:
|
@sakserv could you open a new issue? "daemon hang" can have many causes, and I'd like to prevent having possibly unrelated issue to be in the same issue on GitHub. Also, may be worth updating to 1.12.3, because there are some deadlocks fixed in that version; https://github.com/docker/docker/releases/tag/v1.12.3 |
@thaJeztah - sure, I will open a new issue. |
I have seen the daemon hang (1.8.3) under high load: #13885. However this appears different. I am now running 1.11.2 and had a hang under little to no load.
BUG REPORT INFORMATION
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
AWS, Centos 7
Steps to reproduce the issue:
Unknown
Describe the results you received:
All docker client commands hang
Stracing docker client:
So basically no response back from the daemon.
Stracing docker daemon
So the daemon look to be reading on FD 46
Describe the results you expected:
Docker to work
Additional information you deem important (e.g. issue happens only occasionally):
Happens occasionally
Any additional information I can add today. I have taken this node out of use and left it hung. The only fix is to restart docker which I can hold off for 24 hours.
The text was updated successfully, but these errors were encountered: