Skip to content

overlay network stops working after stack down/up cycles (possible race condition or locking issue) #2081

@jcmcote

Description

@jcmcote

Following these steps you can reproduce the issues in a matter of minutes. All you need is to bring up a cluster of 2 nodes

create a manager node

docker-machine create --driver virtualbox manager
docker-machine ssh manager

add debug setting

echo '{ "debug": true }' > /etc/docker/daemon.json

get dockerd to reload the config

kill -HUP $(pidof dockerd)

check log for releasing of overlay network

tail -f /var/log/docker.log | grep 'releasing IPv4 pools'

start another terminal and do the steps above for a worker node

start another terminal init swarm manager

eval $(docker-machine env manager)
docker swarm init --advertise-addr 192.168.99.103

make the worker join the swarm

eval $(docker-machine env worker)
docker swarm join --token SWMTKN-1-2duh1guir5ywynuyz2p4w2 192.168.99.103:2377

you should now have a 2 node cluster

eval $(docker-machine env manager)
docker node ls

run this until the worker log inidicate it did not release the overlay network as it should

./up-and-down.sh

Monitor the nodes dockerd logs

tail -f /var/log/docker.log | grep 'releasing IPv4 pools'

You'll notice both nodes release the overlay network but sometimes (after a few cycles) the worker node does not release the overlay network and then your in a state where both nodes do not use the same overlay network id. At this point the services are unable to ping each other.

Files needed

up-and-down.sh script brings up and down the stack
ping.sh used to ping other service in the overlay network
Dockerfile create an image and put the ping.sh script into it
docker-stack.yml services to deploy to the swarm

files.tar.gz

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions