New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to remove network "has active endpoints" #17217
Comments
This issue seems to be very intermittent and does not happen very often. |
Awesome. Thanks. |
I also just reproduced this in 1.10.3 and landed here via google looking for a work around. I can't force disconnect the active endpoints b/c none of the containers listed via I eventually had to recreate my consul container and restart the docker daemon. |
ping @mavenugo do you want this issue reopened, or prefer a new issue in case it has a different root cause? |
Clarification, docker 1.10.1
|
Let me reopen this for investigation |
Madhu, assigned you, but feel free to reassign, of point to the related workaround if it's there already 😄 |
@keithbentrup @brendandburns thanks for raising the issue. Couple of questions
FYI. for a multi-host network driver, docker maintains the endpoints for a network across the cluster in the KV-Store. Hence, if any host in that cluster still has an endpoint alive in that network, we will see this error and this is an expected condition. |
@thaJeztah PTAL my comment above and based on the scenario, this need not be a bug. am okay to keep this issue open if that helps. |
@mavenugo Yes, I'm using the overlay driver via docker-compose with a swarm host managing 2 nodes. When I |
@keithbentrup This is a stale endpoint case. Do you happen to have the error log when that container that was originally removed (which left the endpoint in this state). |
@brendandburns can you please help reply to #17217 (comment) ? |
@mavenugo sorry for the delay. I'm not using docker multi-host networking afaik. Its a single node raspberry pi and I haven't done anything other than install docker via hypriot. Here's the output you requested (
The kv file is attached, I had to name it .txt to get around github filters, but its the binary file. I created the network via direct API calls (dockerode) This has worked (create and delete) numerous times, I think in this instance, I Hope that helps. |
@mavenugo If by If you meant the |
@keithbentrup i meant the |
@brendandburns thanks for the info and it is quite useful to narrow down the issue. There is a stale reference to the endpoint which is causing this issue. The stale reference is most likely caused by the power-cycle when the endpoints were being cleaned up. we will get this inconsistency issue resolved in 1.11. |
@mavenugo glad it helped. In the meantime, if I blow away that file, will things still work? thanks |
@brendandburns yes. pls go ahead. it will just work fine for you. |
@mavenugo I think you misunderstood me. I was using the |
I was able to recreate a condition when trying to remove docker_gwbridge that might alleviate some of the confusion.
I first tried removing the container by container name (not shown), then by id, then by container endpoint id. None were successful. Then I logged onto the docker host, and used the local docker client to issue commands via the docker unix socket:
|
@keithbentrup thats correct. as I suggested earlier. the |
@mavenugo but what you suggest is not what the help says. furthermore it lacks the consistency of the most cmds where id/name are interchangeable. unless others find this thread, others will repeat this same issue, so before adding support for endpoint-id, fix the |
Apply workaround from moby/moby#17217 (comment) to see if it fixes those nasty errors in CI. In any case, it seems sensible and safe to remove orphans always by default.
Apply workaround from moby/moby#17217 (comment) to see if it fixes those nasty errors in CI. In any case, it seems sensible and safe to remove orphans always by default.
Apply workaround from moby/moby#17217 (comment) to see if it fixes those nasty errors in CI. In any case, it seems sensible and safe to remove orphans always by default.
For a closed issue this ticket is rather active. |
I just experienced the bug that makes impossible to remove a network that has no containers attached to. Restarting the docker daemon solved the issue. Thank you @davidroeca! |
Solution: Old post with error behavior: I am on Ubuntu 20.04 LTS, with:
|
@lkaupp from the output of your error, I suspect you have the docker "snap" installed, which are packaged and maintained by Canonical / Ubuntu. I know there's various issues with those; are you seeing the same problem when running the official docker packages (https://docs.docker.com/engine/install/ubuntu/)? |
yes you are correct, I just found the solution myself and uninstalled the snap version. Added a description for others that run into the problem. Thank you for the fast response @thaJeztah :) |
I have this error running airflow. Running
fixed it. |
Thanks for sharing your fix, @Wallace-Kelly. Adding a |
Still having this issue with docker 20.10.12 on Big Sur on darwin/amd64. Adding the |
Still an issue, had to restart docker service to remove a spurious network that existed with supposed active endpoints to a container that no longer exists. |
i had this problem after changing my docker-compose file and trying to shut down the "old" setup with my changes, where i removed some containers and a network. I rolled back my code to the "old" setup and then the docker-compose down worked as expected. after that i readded my changes and did the compose up. perhaps someone did the same thing. |
seeing a similar issue, albeit very very intermittently in CI (for https://github.com/envoyproxy/envoy) for an example that scales backend services for ref, the related compose file is here https://github.com/envoyproxy/envoy/blob/main/examples/locality-load-balancing/docker-compose.yaml and the script that is testing it in CI is here https://github.com/envoyproxy/envoy/blob/main/examples/locality-load-balancing/verify.sh im going to add the |
This might resolve a very infrequent CI bug in which docker doesnt clean up containers before removing the network. This relates to a very long-standing docker bug that never seems to have been fully resolved. cf moby/moby#17217 Signed-off-by: Ryan Northey <ryan@synca.io>
This might resolve a very infrequent CI bug in which docker doesnt clean up containers before removing the network. This relates to a very long-standing docker bug that never seems to have been fully resolved. cf moby/moby#17217 Signed-off-by: Ryan Northey <ryan@synca.io>
Having similar issues in some of our CI machines. Essentially network removal fails since it tells me it has active endpoints. But docker network inspect shows no containers. Not sure what the cause is but a restart to docker service fixes the problem. Docker info below:
|
This might resolve a very infrequent CI bug in which docker doesnt clean up containers before removing the network. This relates to a very long-standing docker bug that never seems to have been fully resolved. cf moby/moby#17217 Signed-off-by: Ryan Northey <ryan@synca.io>
Restarting docker didn't do the trick for me. Added |
docker-compose down --remove-orphans |
Running into same issue with Docker client and server : v23.0.1 on centos . |
Same issue happening in our ci pipeline. This is happening too often now a days and failing out pipelines. docker-compose down with --remove-orphans also same issue. Restarting docker is the only solution now and retriggering the jobs . |
I run into this occasionally. Not sure how to resolve it without restarting everything.
EDIT: Ok the error totally made sense if I had just read it. |
Similar situation occurs in out CI on CentOS 7/8.
According to response network does not have active endpoints. Restart of docker service fixes the problem. PS. if it could help, we are dynamically creating and adding some containers to compose network during run, from the inside container (mounting docker socket), then removing them. |
Restarting docker helped clear this issue
|
Not to sure if this belongs in this repo or libnetwork.
docker version:
Docker version 1.9.0-rc1, build 9291a0e
docker info:
uname -a:
Linux carbon1.rmb938.com 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
List the steps to reproduce the issue:
Describe the results you received:
If the remote network driver gives an error when processing /NetworkDriver.Leave docker still kills and removes the container but does not remove the endpoint. This allows docker's internal db to think that the endpoint still exists even though the container is removed.
When you try and remove the network this error is returned
Describe the results you expected:
Docker should not be allowed to kill or remove the container if /NetworkDriver.Leave returned an error.
The text was updated successfully, but these errors were encountered: