New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.13.0-rc5 healthcheck on non existant containers goes awry #30107
Comments
@RRAlex looks like that Logs are showing only the OpenMonitorChannel.
Anyhow I never hitted the |
I couldn't find any OpenMonitorChannel reference in any docker or system log. |
@fntlnz told me he was looking into this, so let me put a "claimed" label on it, but let me know if you need help |
We are hitting this issue in one of our jenkins agents running jobs that create and then kill a lot of containers. Looking at our previous job runs, it seems that in a job run several days ago, one container failed to turn healthy and the job failed. The next job then removed all containers as per normal procedure. Now several days later the health check for that container is still running every few seconds, even though the container was removed ( Should we stop the containers some other way (
|
ping @fntlnz did you have time to look into this? |
I just found a way to reproduce this in our setup. It happens if we restart the docker service when we have containers running. After restart the docker service will try and start up all the containers again, but without docker-compose to take care with the I'm not sure how easy it would be to create a reproducible |
@esbite could you provide your If your issue it similar to the original one, it looks like |
I managed to create a small In short it happens for us when we had running containers with
Gist with instructions on how to reproduce this: |
@esbite thanks for the instructions, I can reproduce the issue following them. |
@mlaventure I actually tried to check if your fix to containerd brought to the 17.03.x branch by #31662 solved this but it doesn't looks like. |
A few minutes after we launch our containers, and for too long afterwards, we see ongoing health check on non existent containers...
Maybe the 1 or 2 we use at the beginning to setup things, but that exit after, leave their mark somewhere and aren't cleared in the health check container list.
The result is that the
No such exec instance
warning quickly (sometimes < 1h) turn intocontext deadline exceeded
errors and the daemon reports everyone as unhealthy.Furthermore, the daemon sometimes stops responding swiftly.
Even stranger, the docker info line of the
containerd version:
now saysN/A
as seen below while it was fine on the daemon's initial start.Steps to reproduce the issue:
Hard to say, it seems we use a container whose health check stay active after their gone, causing a build up that ends up causing more trouble for the docker daemon.
Describe the results you received:
Describe the results you expected:
docker ps -a
not showing these hashes).Additional information you deem important (e.g. issue happens only occasionally):
Output of
docker version
:Output of
docker info
:The interesting thing is that the
containerd version
line, after a while, becomes N/A...Additional environment details (AWS, VirtualBox, physical, etc.):
It runs on a Proxmox VM.
Not sure if it relates to these other issues:
#29854
#29369
The text was updated successfully, but these errors were encountered: