Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Swarm service / overlay breakage - starting container failed: Address already in use #34163
Swarm service / overlay breakage - starting container failed: Address already in use
Also reported by @nickjj who said it prevented a production roll-out of Swarm.
Possibly related: #31698
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
(watchdog process binds to port 8080)
Creation / removal is done via Docker/Swarm API:
This works very intermittently.
This is the
referenced this issue
Aug 16, 2017
@alexellis as explained here #31698 (comment) , if this is a temporary state - > task reconcilation happens and get rescheduled on an other node. If it is permanent state then I have a possible fix in docker/libnetwork#1853. Let me know your thoughts on this. I will give this try again today and update the thread.
@abhinandanpb - a fix would be great. This seems like a very normal use-case for CD. The error I'm getting appears to be permanent unless this is dependent on a restart-policy?
referenced this issue
Aug 22, 2017
I have the same issue with
I didn't try to only restart the docker daemon but rebooting EC2 instances actually fix the problem temporarily. The issue occurs again intermittently when updating services.
So we follow the docker stack deploy CI pipeline to deploy our services to the cluster as opposed to docker service create.
Attempt #1 Engine version 17.06-ce
It has been a major roadblock for us. Not really sure how everyone else has been getting their clusters to work.
Would be more than happy to share relevant logs.
Finally this workaround in my CD system made the magic happen:
Imho it looks likes that terminated containers are still connected to the overlay network. As I first noticed the issue and no workaround worked, I used to disconnect all containers from the network, removing the stack and so beeing able again to redeploy. Now I use the excerpt above with each deploy and it seems to work fine.