New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerd fail to receive containerd's exit signal when container process have be oom kill. #33192
Comments
ping @mlaventure |
Wasn't able to reproduce with a quick try. @BSWANG if you have an easy way to reproduce it, could you put your daemon in debug mode and provide its logs? |
Saw similar issue in rancher/rancher#6922 #31614 . But I can not reproduce it easily. |
facing this right now.
it began after OOM killed container process.
|
I believe I am also experiencing this. After an OOM kill, containerd seems to get into a state where all subsequent container terminations, normal or otherwise, are ignored by the daemon. Getting docker back into a usable state requires restarting containerd. I cannot reliably reproduce this right now, but I can confirm that I'm also seeing this behaviour. I'm running 17.03.1-ce on AWS (EC2 Container Service). |
I think we have a similar problem. I am pretty sure it is not an OOM issue but our application terminating itself due to a detected performance problem (probably off-topic). However, the container still shows as up and running.
It is not possible to exec into the container:
It is not possible to stop the container:
Other containers work as expected. The same thing happend two weeks ago and I could only fix it by restarting docker daemon.
|
ping @mlaventure |
Is everyone affected can check if the container marked as "not found" appears in the output of Also, in your docker daemon logs, can you see a stacktrace or a indication that containerd was restarted? If you have a way to reproduce the issue, or had the daemon in debug mode when it occurs, please provide the logs, they would be helpful. |
The steal container can not found in |
@BSWANG what about a stacktrace within the docker daemon log? or a line mentioning a restart of |
I have faced the same issue with |
|
I suspect I have the same issue happening right now. I didn't notice the OOM but the host was running very close to 100% before it happened. I only noticed many hours later that the container did not terminate.
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 19395,
"ExitCode": 0,
"Error": "",
"StartedAt": "2017-12-13T20:11:18.876338839Z",
"FinishedAt": "2017-12-13T20:11:05.616491875Z",
"StartedTs": 1513195878,
"FinishedTs": 1513195865
} Is it odd that |
Hi! We have a similar problem.
As we can see docker service ps showed that service_0001_00001 is "Running 2 weeks ago" but docker ps -a showed that container service_1 was stopped at 30 minutes ago by the reason 137.
|
Any updates on this? We have the similiar problem with docker 17.03 Does update to newer version helps to fix this? |
Any updates on this? I am facing this issue a lot. |
@rishiloyola which docker version do you run? |
@rishiloyola also; what kernel version, because there was a bug in some kernel versions recently that prevented OOM events from being read; see containerd/cgroups#74 |
@SmilingNavern I am using @thaJeztah I am using following kernel version |
@rishiloyola i sugest you to try 18.03 docker |
@rishiloyola @SmilingNavern We have the similiar problem,are you sure which docker version helps to fix this? You know the commit code record for this issue? |
@thaJeztah are you sure which docker version helps to fix this? You know the commit code record for this issue? |
@webPageDev docker with containerd > 1.0 is good to go. I believe it's fixed inside of containerd but i don't know specific commit. |
Description
Dockerd fail to receive containerd's exit signal when container process have be oom kill. Container's process, runc and contaierd has been all exit. But
docker ps
show the container isUP
. Anddocker stop
,docker kill
,docker exec
no longer work as expect. such as below:Steps to reproduce the issue:
Describe the results you received:
docker ps show the container still "UP", and
docker stop
,docker kill
,docker exec
no longer work as expect.Describe the results you expected:
When container's process has be killed, the docker container should be "Exit", Not "Up".
Additional information you deem important (e.g. issue happens only occasionally):
issue happens only occasionally.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
physical
docker daemon's log:
The text was updated successfully, but these errors were encountered: