-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.11.0] Possible deadlock on container object #22124
Comments
We are seeing these hangs regularly in our test suite, but only when using Docker 1.11. Curiously we did not see this in the 1.11 release candidates, though perhaps we were just lucky. |
@rade Can you send sigusr1 to daemon and copy the traces from logs when this happens. Just want to make sure this is the same issue. |
I notice that the stack trace referenced above contains
I saw the exact same when I dumped the threads on our stuck Docker. Our tests invoke |
@rade This should just mean that there was an active exec process. A goroutine is made for every exec that only returns when process exits. |
I know that. I was simply highlighting a possible correlation, not causation. In our case the exec should have terminated. |
Seeing this a couple of times with 1.11.1 |
We're also seeing deadlocks in 1.11.1, but how do I confirm/deny it's this same issue? |
@bukzor Send |
We have six examples right now, and I've dumped the debugging for each. Here's the gist: https://gist.github.com/anonymous/0185edaee33ee8db8df6fca4a07df529 |
Here's the same, in more readable form: (I expanded the |
@tonistiigi Please take a look and let us know if there's any smoking gun ^ |
I've had a
The stack trace shows the attach-handling goroutine:
|
@rade Why do you think this goroutine matches the container from inspect? In your stack trace I see a bunch of goroutines in the attach stream copy phase but I don't see goroutines waiting for a container lock. So I'm not sure what "getting stuck" means in this case. |
It's a bit presumptious, but it's the only goroutine that is dealing with an
I have a |
I haven't looked into that, but if so, and if "waiting for a container lock" is the defining criteria for this issue then perhaps the what I am seeing is a different issue. |
I'm seeing this also in 1.10.3 |
Any updates yet? |
This is confirmed fixed in 1.11.2, as far as I'm concerned. On Mon, Jun 20, 2016, 03:10 Daniel Huhn notifications@github.com wrote:
|
Changelog for 1.11.2 only has #22918 for deadlocks. Not this issue. And its still open anyway, so it probably is not confirmed, is it? |
Can any one reproduce this with 1.12.0-rc2 ? |
@GameScripting would you be able to check if 1.12-rc fixes this in your use-case? |
@thaJeztah @tiborvass We've been hitting this issue at my company as well, and I've been subscribed to this issue so I figured I'd respond to your query of whether anyone is attempting to reproduce on the new version. We were hitting some issues with getting 1.12-rc2 deployed last week due to other changes in 1.12, but have worked around those and, as of a few hours ago, we are running 1.12-rc2 on 4 hosts in our QA cluster that experiences these deadlocks. We'll likely expand that to 16 hosts in a few days if 1.12 proves otherwise stable. In this particular cluster, we hit the deadlock at least once every two weeks, but sometimes more frequently (some weeks it happens almost daily, guessing likely due to container churn). If we hit the deadlock on 1.12-rc2, I'll be sure to post it here ASAP for you to confirm, but wanted to let you know that it may take time. |
thanks @PaulFurtado that's really appreciated. Issues like this may be hard to reproduce |
This was closed because the original issue is fixed in master. I can't see a deadlock in the trace from @rade , issue from @bukzor was fixed in #22732 in If you encounter deadlocks or strange unresponsiveness please send |
@PaulFurtado We are hit by this issue on one host today. Did you have any issues so far? |
Originally reported by @mblaschke in #13885 (comment)
Creating a different issue because it may be a 1.11 regression.
https://gist.github.com/tonistiigi/9d79de62b2f7919f33a9e987619b9de8 goroutine trace seems to point that lots of goroutines are waiting on a container lock. No obvious goroutine that would keep a lock in that trace so possibly we have a codepath that returns without releasing.
Original report:
Since we updated to 1.11.0 running rspec docker image tests (~10 parallel containers running these tests on a 4 cpu machine) sometimes freezes and fails with a timeout. Docker freezes completely and doesn't respond (eg.
docker ps
). This is happening on vserver with Debian strech (btrfs) and with (vagrant) Parallels VM Ubuntu 14.04 (backported kernel 3.19.0-31-generic, ext4).Filesystem for
/var/lib/docker
on both servers was cleared (btrfs was recreated) after first freeze. The freeze happens randomly when running these tests.Stack trace is attached from both servers:
docker-log.zip
strace from
docker-containerd
anddocker daemons
:Docker info (Ubuntu 14.04 with backported kernel)
Docker version (Ubuntu 14.04 with backported kernel)
The text was updated successfully, but these errors were encountered: