New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost container can't be stopped #30927
Comments
Could it fix itself if I kill the problematic shim process or will that make the main docker service to go crazy? I have seen a few people write "Just reboot the machine" or "Just restart the docker service" however there are a few database containers running that would cause complete downtime if they went down. |
Killing the shim process made the whole dockerd unresponsive. Went into quick quick downtime and restarted the docker service. Everything was back within a minute and the ghost problem is now gone. I find this a dirty fix but it works. Due to the inability to reproduce the bug this can be closed, however, I still think the StackTrace from |
This just happened again. Nginx again. Can't stop container. Will try using the non-alpine version to see if that changes anything. |
This seems to be a duplicate of #10589 however that issue is closed and we are still experiencing the same problem. |
The linked issue has some similarities, but isdefinitely not the same (the whole runtime for docker has been replaced since docker 1.5) What seems to happen in your case, is that either;
The docker compose Traceback won't help debugging here, because that error is only becsuse of this situation. What could help narrowing down what's happening;
/cc @mlaventure |
Hello @thaJeztah , This has happened twice today, on a production machine. In both cases I had to restart the docker service as this is the only known fix. Therefore I do not have any logfiles, and that the
Thanks for the help! Any comments or ideas what could be causing these crashes? |
@Foorack could you put your daemon into debug mode and provide the daemon logs when it happens? The daemon would have to have been in debug mode before the issue occurs. Usually, the only reason for a shim to stay stuck is that it is waiting for IOs to be consumed. It could be an issue with the communication with The sub |
Hi, I have encountered same problem. I had run mongodb, kafka images upon ubuntu 16.04 in swarm mode. Please see belows, docker version Server: Containers: 35 |
Having the same issue:
Docker version 1.13.1, build 092cba3 on Mesos/Marathon Some files do seem still open:
Edit:
|
Cont'd: Trying to kill that process with
|
@sgnn7 The daemon logs would be useful here, also if you could provide which files related to docker itself are still open it would be useful. When the daemon doesn't answer to |
@thinkhard-j-park your issue seem to be different, your error seems to indicate that |
@mlaventure Sadly I can't provide the daemon logs (security issues) and I can't do |
We're seeing a very similar error, with jwilder/nginx-proxy, lots of logs with We first saw the issue when promoting to production from our Jenkins server, here's the relevant logs from that:
While debugging directly on the host, I couldn't kill or remove the container until the daemon was reloaded to turn on debugging:
Our logs from journalctl include the following, unfortunately turning on debugging also eliminated the issue:
From the above logs, once we reloaded the daemon to turn on debugging, the ghost container finally died and we were able to deploy again using compose. Our environment consists of two docker hosts (host1 and host2), setup with swarm mode to act as a k/v store for our attachable overlay networks (we'll finish the transition to services later), and classic swarm being used to spin up the containers on these hosts. There is a logspout container on host1 that is shipping logs off to another system. One of my next steps will be to setup docker container log rotation to see if this prevents this issue. This container probably has more logs than any other due to some frequently polling from a monitoring tool. |
@bmitch3020 it looks like There should be a message like:
If containerd generated the event and docker received it. Could you check if they are present in your log (assuming the daemon was already in debug mode)? |
@mlaventure I have some logs from after we got debugging turned on, but that's after the container failed to stop or rm:
|
If you are binding to things like 80:80 and 443:443 it would be interesting if we could see if these are networking related. A quick test would be to run the container in host networking mode, it will not go though NAT or the docker proxy. The container will have access to the hosts network interfaces directly. This is just an idea because you suggested this seems to happen with heavy network use. This test would be a way for us to rule networking out of the things that could be causing this. If anyone is comfortable with running this test, that would help a lot. The docker run command for doing this would be |
@mlaventure I think we found a commonality between two occurrences of this error in our case: the logs for the container have grown to be massive so both times someone did a |
Let me close this ticket for now, as it looks like it went stale. |
Hello,
Basically got a "ghost" container if that is the right word. It's completely unstoppable. Neither
docker-compose stop
nordocker-compose kill
works. They both say "done" butdocker-compose ps
says the service is still up. The service is an nginx container binding to port 80 and 443.docker stop
also says it stoppd the container butdocker ps
says it is still running. Same withdocker kill
.docker-compose top
throws a TypeError when it comes to the problematic container.Steps to reproduce the issue:
Unknown how to reproduce as I don't know what caused it. The container is a normal nginx:1.11.8-alpine running for 2 days with normal configuration.
Describe the results you received:
A shim process with no child process. (the shim process with id 0585...) All other process has a child process except the problematic container which suggests it might be stuck in a frozen state.
Describe the results you expected:
I expected the container to be stopped... 🤔
Additional information you deem important (e.g. issue happens only occasionally):
Issue happens completely at random. I have read something similar happened in 2015 and the common theme seems to be they are both network-heavy processes.
docker-compose logs
anddocker logs
just get stuck without printing anything.Output of
docker version
:Output of
docker-compose version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box hosted at OVH (SoYouStart).
The text was updated successfully, but these errors were encountered: