overlay network stops working after stack down/up cycles (possible race condition or locking issue)

Following these steps you can reproduce the issues in a matter of minutes. All you need is to bring up a cluster of 2 nodes

# create a manager node
docker-machine create --driver virtualbox manager
docker-machine ssh manager
# add debug setting
echo '{ "debug": true }' > /etc/docker/daemon.json
# get dockerd to reload the config
kill -HUP $(pidof dockerd)
# check log for releasing of overlay network
tail -f /var/log/docker.log | grep 'releasing IPv4 pools'

# start another terminal and do the steps above for a worker node

# start another terminal init swarm manager 
eval $(docker-machine env manager)
docker swarm init --advertise-addr 192.168.99.103

# make the worker join the swarm
eval $(docker-machine env worker)
docker swarm join --token SWMTKN-1-2duh1guir5ywynuyz2p4w2 192.168.99.103:2377

# you should now have a 2 node cluster
eval $(docker-machine env manager)
docker node ls

# run this until the worker log inidicate it did not release the overlay network as it should
./up-and-down.sh

# Monitor the nodes dockerd logs
tail -f /var/log/docker.log | grep 'releasing IPv4 pools'

You'll notice both nodes release the overlay network but sometimes (after a few cycles) the worker node does not release the overlay network and then your in a state where both nodes do not use the same overlay network id. At this point the services are unable to ping each other.

# Files needed
up-and-down.sh script brings up and down the stack
ping.sh used to ping other service in the overlay network
Dockerfile create an image and put the ping.sh script into it
docker-stack.yml services to deploy to the swarm

[files.tar.gz](https://github.com/docker/libnetwork/files/1738488/files.tar.gz)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overlay network stops working after stack down/up cycles (possible race condition or locking issue) #2081

create a manager node

add debug setting

get dockerd to reload the config

check log for releasing of overlay network

start another terminal and do the steps above for a worker node

start another terminal init swarm manager

make the worker join the swarm

you should now have a 2 node cluster

run this until the worker log inidicate it did not release the overlay network as it should

Monitor the nodes dockerd logs

Files needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

overlay network stops working after stack down/up cycles (possible race condition or locking issue) #2081

Description

create a manager node

add debug setting

get dockerd to reload the config

check log for releasing of overlay network

start another terminal and do the steps above for a worker node

start another terminal init swarm manager

make the worker join the swarm

you should now have a 2 node cluster

run this until the worker log inidicate it did not release the overlay network as it should

Monitor the nodes dockerd logs

Files needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions