-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image remove blocked by a container lock #37072
Comments
Any chance you can get a containerd stack trace? |
Ya, I'll do that now. |
Attaching the containerd stack trace: containerd-stack-dump.log And the shim stack trace (pulled from I can see the containerd goroutine that is stuck on kill is in a Looking at the process list, this is the containerd-shim for the container that is locked:
From the shim stack trace it seems like the shim is stuck waiting on a channel from runc, but runc is gone so it's going to be waiting a while. I guess there needs to be some timeout so that it can detect that runc is gone? |
Weird issue indeed. The shim has issued a |
I don't think it has been fixed, so it is still an issue. |
Using 18.04 on centos, We meet similar problem. A container has exited, but docker ps show it is Up. When we exec "docker inspect id", the cmd had blocked and never return. |
I took another look at the stack trace from version This may have been fixed by containerd/containerd#2743 which seems to be in containerd 1.2.1. I see (docker-archive#129) backports containerd 1.2.1 to 18.09.1. I will test again with that version to see if it is resolved. |
Thanks @dnephin - sounds hopeful! |
also @cpuguy83 in case you were curious (since you took a look into this issue a while ago) I think https://github.com/containerd/go-runc/blob/master/monitor.go#L55 was a red herring. That |
This issue occurs when trying to deploy new hosts using 18.03.1-ce. Previously workloads would run correctly using ~17.06.
Trying to
image rm
any image will block and never complete. There were 5 containers running on the host. Trying toinspect
orrm
two of them had the same behaviour (blocked and never complete).Full output from
SIGUSR1
:goroutine-stacks-2018-05-15T151422Z.log
There are a bunch of goroutines stuck on image remove. There is also one stuck on container remove that seems to be holding the lock:
I looked around for related issues, but couldn't find anything recent.
The text was updated successfully, but these errors were encountered: