Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of “failed to allocate gateway: Address already in use” in Docker CE 17.09.0 #35204

Open
kinghuang opened this issue Oct 15, 2017 · 13 comments

Comments

@kinghuang
Copy link

kinghuang commented Oct 15, 2017

Description

I recently upgraded a 10-node cluster from Docker CE 17.07.0 to 17.09.0. After the upgrade, I'm having a lot of difficulties with services unable to start on random nodes (seems different every time). Typically, the service's tasks will show a status like “failed to allocate gateway (10.0.24.1): Address already in use”.

Steps to reproduce the issue:

  1. Upgrade an existing 10-node Docker CE 17.07.0 swarm to Docker CE 17.09.0.
  2. Deploy a large stack.

Describe the results you received:

I expect all the networks and services defined in the stack to be created and started.

Describe the results you expected:

Some services fail to start. The tasks report errors along the lines of “failed to allocate gateway (10.0.24.1): Address already in use”.

Additional information you deem important (e.g. issue happens only occasionally):

Here's the output of docker service ps for a specific service, as an example. The tasks did launch on some nodes (and failed), but towards the end, it kept getting rejected by nodes 06 and 09 in this swarm.

(rmsdev) →  rms git:(update-17_10_1) docker service ps rms_runner
ID                  NAME                   IMAGE                                          NODE                     DESIRED STATE       CURRENT STATE                ERROR                              PORTS
lh4wyejdi489        rms_runner.1           docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 33 minutes ago      "failed to allocate gateway (1…"   
lx6dny80mj0e         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 34 minutes ago      "failed to allocate gateway (1…"   
8khf2p5993o8         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev06.ucalgary.ca   Shutdown            Rejected 35 minutes ago      "failed to allocate gateway (1…"   
65vhpa61z54k         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 36 minutes ago      "failed to allocate gateway (1…"   
yw62fwl0birm         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 37 minutes ago      "failed to allocate gateway (1…"   
qvxct38udcny         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev06.ucalgary.ca   Shutdown            Rejected 38 minutes ago      "failed to allocate gateway (1…"   
afmtvch6my82         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 39 minutes ago      "failed to allocate gateway (1…"   
ivexmxooj1zv         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 40 minutes ago      "failed to allocate gateway (1…"   
lmc6rsut5h1w         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 41 minutes ago      "failed to allocate gateway (1…"   
t6wqb4dvbmwz         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 42 minutes ago      "failed to allocate gateway (1…"   
jin0415ftxur         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 43 minutes ago      "failed to allocate gateway (1…"   
reo32u7m4pfm         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected 44 minutes ago      "failed to allocate gateway (1…"   
fpl8z2gguu16         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev01.ucalgary.ca   Shutdown            Failed 44 minutes ago        "task: non-zero exit (1)"          
4yi24irzq9lc         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev02.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"   
d08ca34hig9t         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"   
s2q47k1l19b7         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev01.ucalgary.ca   Shutdown            Failed about an hour ago     "task: non-zero exit (1)"          
r4zqgsov3gs3         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"   
b9v8c15a1x5n         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev05.ucalgary.ca   Shutdown            Failed about an hour ago     "task: non-zero exit (1)"          
zb7duugo1u35         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev09.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"   
u0ovtotrgced         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev02.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"   
y0q7gso5oy98         \_ rms_runner.1       docker.ucalgary.ca/srv/engine:17.10.0          itrmsdev02.ucalgary.ca   Shutdown            Rejected about an hour ago   "failed to allocate gateway (1…"                                         

If I inspect into the very last task (lh4wyejdi489), the status block shows the full error.

        "Status": {
            "Timestamp": "2017-10-15T03:44:16.086430116Z",
            "State": "rejected",
            "Message": "preparing",
            "Err": "failed to allocate gateway (10.0.24.1): Address already in use",
            "ContainerStatus": {},
            "PortStatus": {}
        },

The gateway address 10.0.24.1 corresponds to a network named rms_adjunct, which can be found in the tasks's NetworkAttachments.

            {
                "Network": {
                    "ID": "evrdhcqaty5wuucg4inmwqdqm",
                    "Version": {
                        "Index": 682782
                    },
                    "CreatedAt": "2017-10-15T03:24:39.088275308Z",
                    "UpdatedAt": "2017-10-15T03:24:39.092753187Z",
                    "Spec": {
                        "Name": "rms_adjunct",
                        "Labels": {
                            "com.docker.stack.namespace": "rms"
                        },
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4121"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.24.0/24",
                                "Gateway": "10.0.24.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.24.47/24"
                ]
            },

This task was repeatedly rejected by nodes 06 and 09. If I do a docker network inspect rms_adjunct on all 10 nodes, those two nodes return “Error: No such network: rms_adjunct”, while the other nodes return the network.

Q: How do I debug why the failed to allocate gateway error occurs?

Across the 10 nodes, only the 3 controller nodes (01 to 03) consistently have all overlay networks. The remaining 7 nodes all seem to have a different list of networks. I can't remember if it was like this before with Docker CE 17.07.0 or not, but I didn't have this problem before.

Output of docker version:

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:41:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:49 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 15
 Running: 12
 Paused: 0
 Stopped: 3
Images: 53
Server Version: 17.09.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: sy4tk0fkmicikubo4nw9bvfap
 Is Manager: true
 ClusterID: rybcwsnikz7kvb1lpslbwyjeu
 Managers: 3
 Nodes: 10
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: true
 Root Rotation In Progress: false
 Node Address: 10.41.149.137
 Manager Addresses:
  10.41.149.137:2377
  10.41.149.138:2377
  10.41.149.139:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.26GiB
Name: itrmsdev01.ucalgary.ca
ID: MYIJ:6PFS:CZR5:QGU6:2ZBA:S3T7:3ER5:3QRM:5QYB:ZM22:SH6N:S2JP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

10 node Docker CE 17.09.0 swarm on RHEL 7.3 VMs.

@thaJeztah
Copy link
Member

Swarm managed networks are dynamically extended to nodes where a task is deployed on; so that explains why the network was not found on those two nodes.

For the "unable to allocate" error; does that IP range possibly overlap with a physical network on those two nodes?

@kinghuang
Copy link
Author

kinghuang commented Oct 15, 2017

Thanks for confirming that networks dynamically extended to nodes, @thaJeztah. And, I've confirmed that the IP range doesn't overlap with a physical network on those two nodes.

I've been playing around with this some more, and I've narrowed down the problem to networks not being removed from all nodes when stacks are removed.

If I make a new deployment with a stack named rms-ci-config, it'll deploy and run successfully with no “unable to allocate” errors. Then, after doing docker stack rm rms-ci-config, if I check on each node, some of them will have networks from the stack lingering. Upon redeployment of rms-ci-config, the “unable to allocate” errors occur.

Output of docker stack rm:

$ docker stack rm $STACK_NAME
Removing service rms-ci-config_pcproj-read-status
Removing service rms-ci-config_converis-search
Removing service rms-ci-config_pcproj-create
Removing service rms-ci-config_engine
Removing service rms-ci-config_converis-web
Removing service rms-ci-config_ps-publisher
Removing service rms-ci-config_api-read
Removing service rms-ci-config_fscmproj-create
Removing service rms-ci-config_project-ethics-dates
Removing service rms-ci-config_cdbex-other
Removing service rms-ci-config_pcteam-create
Removing service rms-ci-config_stream-cdo
Removing service rms-ci-config_stream-ccg
Removing service rms-ci-config_schema-registry
Removing service rms-ci-config_psspeedtype-create
Removing service rms-ci-config_ps-collector
Removing service rms-ci-config_pcact-read
Removing service rms-ci-config_ucpcbolton-read
Removing service rms-ci-config_fscmproj-align-stat
Removing service rms-ci-config_stream-cdm
Removing service rms-ci-config_ftrigger-kafka
Removing service rms-ci-config_gateway
Removing service rms-ci-config_converis-dis
Removing service rms-ci-config_fscmproj-link
Removing service rms-ci-config_kafka
Removing service rms-ci-config_resolver
Removing service rms-ci-config_pstree-create-leaf
Removing service rms-ci-config_converis
Removing service rms-ci-config_cevents
Removing service rms-ci-config_runner
Removing service rms-ci-config_prometheus
Removing service rms-ci-config_zookeeper
Removing service rms-ci-config_converis-chemistry
Removing service rms-ci-config_ucpcbolton-create
Removing service rms-ci-config_pcteam-read-manager
Removing service rms-ci-config_pcproj-read
Removing service rms-ci-config_pcact-create
Removing service rms-ci-config_kafka-rest
Removing service rms-ci-config_converis-db
Removing service rms-ci-config_psspeedchart-create
Removing service rms-ci-config_cdbex-data
Removing network rms-ci-config_adjunct
Removing network rms-ci-config_actions
Removing network rms-ci-config_functions
Removing network rms-ci-config_converis
Removing network rms-ci-config_perimeter
Removing network rms-ci-config_streaming

Lingering networks on some nodes 5 minutes after stack is removed:

[kchuang@itrmsdev03 ~]$ docker network ls | grep rms
unbugkx6kcyg        rms-ci-config_streaming         overlay             swarm

[kchuang@itrmsdev04 ~]$ docker network ls | grep rms
ao9j9z9ciqq4        rms-ci-config_functions         overlay             swarm

[kchuang@itrmsdev08 ~]$ docker network ls | grep rms
gzpkv924rkib        rms-ci-config_converis          overlay             swarm
o9krirtdzka6        rms-ci-config_perimeter         overlay             swarm

[kchuang@itrmsdev10 ~]$ docker network ls | grep rms
ao9j9z9ciqq4        rms-ci-config_functions         overlay             swarm

The other nodes don't have any leftovers.

Redeploy of rms-ci-config, now showing “unable to allocate” errors (trimmed to a few examples):

(rmsdev) →  ~ docker stack ps rms-ci-config
ID                  NAME                                   IMAGE                                                 NODE                     DESIRED STATE       CURRENT STATE                ERROR                              PORTS
snnz1ddtwhlb        rms-ci-config_pcproj-read-status.1     docker.ucalgary.ca/rms/functions/fscm:17.10.0         itrmsdev03.ucalgary.ca   Shutdown            Rejected 2 minutes ago       "failed to allocate gateway (1…"   
z9lwxla3kauk        rms-ci-config_gateway.1                kinghuang/gateway:function-label-constraints          itrmsdev03.ucalgary.ca   Shutdown            Rejected 2 minutes ago       "failed to allocate gateway (1…"   
val0b2s9lfk8        rms-ci-config_ps-publisher.1           ucalgary/ps-stream:latest                             itrmsdev04.ucalgary.ca   Shutdown            Rejected 2 minutes ago       "failed to allocate gateway (1…"   
ojfrfrpk92h6        rms-ci-config_pcact-create.1           docker.ucalgary.ca/rms/functions/fscm:17.10.0         itrmsdev10.ucalgary.ca   Shutdown            Failed 2 minutes ago         "starting container failed: In…"   

Digging into task snnz1ddtwhlb, it reports "Err": "failed to allocate gateway (10.0.21.1): Address already in use", which is the following under NetworkAttachments.

{
	"Network": {
		"ID": "jenfq66g9r5jixysiypq473l5",
		"Version": {
			"Index": 686216
		},
		"CreatedAt": "2017-10-15T17:05:15.736500245Z",
		"UpdatedAt": "2017-10-15T17:05:15.865003454Z",
		"Spec": {
			"Name": "rms-ci-config_functions",
			"Labels": {
				"com.docker.stack.namespace": "rms-ci-config"
			},
			"DriverConfiguration": {
				"Name": "overlay"
			},
			"Scope": "swarm"
		},
		"DriverState": {
			"Name": "overlay",
			"Options": {
				"com.docker.network.driver.overlay.vxlanid_list": "4118"
			}
		},
		"IPAMOptions": {
			"Driver": {
				"Name": "default"
			},
			"Configs": [
				{
					"Subnet": "10.0.21.0/24",
					"Gateway": "10.0.21.1"
				}
			]
		}
	},
	"Addresses": [
		"10.0.21.30/24"
	]
}

If I remove the stack, manually remove any leftover networks on the nodes, and redeploy, then the deployment will succeed without any “unable to allocate” errors.

Really confused as to why this is happening.

@kinghuang
Copy link
Author

I've ended up adding the following to all my CI deploy jobs after docker stack rm $STACK_NAME. It loops through all the nodes and removes any leftover networks.

DOCKER_HOST_CI=$DOCKER_HOST;
for node in $(docker node ls --format '{{ .Hostname }}'); do
  export DOCKER_HOST=$node:2376;
  docker network rm $(docker network ls --filter label=com.docker.stack.namespace=$STACK_NAME -q) 2> /dev/null || true;
done;
export DOCKER_HOST=$DOCKER_HOST_CI;

I'd really like to find out why networks are left lingering around. But, this avoids the issue in the meantime so that CI deployments work reliably.

One other thing I've noticed is that the larger the stack, the more likely it is that networks get left behind. The smallest stack has 3 services, and never seems to run into this. The biggest stack has 43 services, and seems to run into this every single time.

@Vacant0mens
Copy link

Is there a way to exclude a specific subnet or address range? (I know there's a way to configure the new subnet specifically, but that's not what I'm talking about)

I'm seeing this problem on 17.06-ee as well, using stacks with an automatically-configured stack network.

Why doesn't it specifically exclude the host addresses/subnets from the automatic subnet creation pool? Is this by design? or has it just not been added yet?

@kinghuang
Copy link
Author

I've just completed a clean install of Docker 17.09.1 on the same 10-node cluster. The problem still persists. After removing a stack, some of the stack's networks linger on the nodes. This causes anomalies on subsequent deploys of the same stack.

For example, the stack rms-master was deployed once and then removed. After removing, it left behind a rms-master_functions (cvookqk1py4d) network on node 02. When the stack was deployed again, another network named rms-master_functions (51m12clr4g1z) was created

-bash-4.2$ docker network ls
NETWORK ID          NAME                   DRIVER              SCOPE
1a4a951b6013        bridge                 bridge              local
b4bcbc4109bd        docker_gwbridge        bridge              local
fb3b61bf25c4        host                   host                local
st2gsav5p3t8        ingress                overlay             swarm
96b46da1f1a4        none                   null                local
t7rim5msj8nk        rms-master_actions     overlay             swarm
ohk9eqhpnm8s        rms-master_adjunct     overlay             swarm
lb9yhs0643xu        rms-master_converis    overlay             swarm
cvookqk1py4d        rms-master_functions   overlay             swarm
51m12clr4g1z        rms-master_functions   overlay             swarm
eog5seqb6tay        rms-master_perimeter   overlay             swarm
0xqp1roc60e1        rms-master_streaming   overlay             swarm
16ovr8kbq147        traefik                overlay             swarm

Services that use the rms-master_functions network won't start up on the second deployment, as before. The task error says failed to allocate gateway (10.0.4.1): Address already in use. Here's the details of one of the tasks.

(rmsdev) →  ~ docker inspect ys84bd776ftd
[
    {
        "ID": "ys84bd776ftdiq189w88y2g6o",
        "Version": {
            "Index": 1957
        },
        "CreatedAt": "2017-12-23T03:41:04.39480548Z",
        "UpdatedAt": "2017-12-23T03:41:04.863693152Z",
        "Labels": {},
        "Spec": {
            "ContainerSpec": {
                "Image": "docker.ucalgary.ca/rms/functions/fscm:master@sha256:b3d2d46b90da38d45bed329ec5123eea9b7958668086ec8c72ac7f77d3f90f09",
                "Labels": {
                    "com.docker.stack.namespace": "rms-master",
                    "function": "true"
                },
                "Env": [
                    "fprocess=pcteam read-manager"
                ],
                "Privileges": {
                    "CredentialSpec": null,
                    "SELinuxContext": null
                },
                "Secrets": [
                    {
                        "File": {
                            "Name": "converis-api-password",
                            "UID": "0",
                            "GID": "0",
                            "Mode": 292
                        },
                        "SecretID": "9aqy6beg7iud0q3rpq8oo55f3",
                        "SecretName": "converis-api-password"
                    },
                    {
                        "File": {
                            "Name": "converis-api-username",
                            "UID": "0",
                            "GID": "0",
                            "Mode": 292
                        },
                        "SecretID": "i4dk7eatl9p575v7d4svrwv85",
                        "SecretName": "converis-api-username"
                    }
                ]
            },
            "Resources": {},
            "Placement": {
                "Platforms": [
                    {
                        "Architecture": "amd64",
                        "OS": "linux"
                    }
                ]
            },
            "Networks": [
                {
                    "Target": "ohk9eqhpnm8swtm267en3392t",
                    "Aliases": [
                        "pcteam-read-manager"
                    ]
                },
                {
                    "Target": "51m12clr4g1zlxr2qnlll9vdz",
                    "Aliases": [
                        "pcteam-read-manager"
                    ]
                }
            ],
            "ForceUpdate": 0
        },
        "ServiceID": "0kfalrblv3jek037ajjc807o2",
        "Slot": 1,
        "NodeID": "yilhvkyfwph36d7ld8mqy3god",
        "Status": {
            "Timestamp": "2017-12-23T03:41:04.585391076Z",
            "State": "rejected",
            "Message": "preparing",
            "Err": "failed to allocate gateway (10.0.4.1): Address already in use",
            "ContainerStatus": {},
            "PortStatus": {}
        },
        "DesiredState": "shutdown",
        "NetworksAttachments": [
            {
                "Network": {
                    "ID": "ohk9eqhpnm8swtm267en3392t",
                    "Version": {
                        "Index": 1753
                    },
                    "CreatedAt": "2017-12-23T03:40:47.642481435Z",
                    "UpdatedAt": "2017-12-23T03:40:47.648152915Z",
                    "Spec": {
                        "Name": "rms-master_adjunct",
                        "Labels": {
                            "com.docker.stack.namespace": "rms-master"
                        },
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4101"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.4.0/24",
                                "Gateway": "10.0.4.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.4.32/24"
                ]
            },
            {
                "Network": {
                    "ID": "51m12clr4g1zlxr2qnlll9vdz",
                    "Version": {
                        "Index": 1755
                    },
                    "CreatedAt": "2017-12-23T03:40:47.652693582Z",
                    "UpdatedAt": "2017-12-23T03:40:47.65853957Z",
                    "Spec": {
                        "Name": "rms-master_functions",
                        "Labels": {
                            "com.docker.stack.namespace": "rms-master"
                        },
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4102"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.5.0/24",
                                "Gateway": "10.0.5.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.5.29/24"
                ]
            }
        ]
    }
]

Here's the details of the two copies of the rms-master_functions network.

-bash-4.2$ docker network inspect cvookqk1py4d
[
    {
        "Name": "rms-master_functions",
        "Id": "cvookqk1py4dn32sext9xobex",
        "Created": "2017-12-22T20:22:14.962488619-07:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.5.0/24",
                    "Gateway": "10.0.5.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "23fd441f231a5a3faaa1c36266d4c261763bc85f3db0c159d3485bf517b741ac": {
                "Name": "rms-master_pcproj-create.1.6ja8jhcu1ojwcwflpizpjcxzj",
                "EndpointID": "d05a170b9ec88498fb6c35b0125761dbaf8d0841d070e11881127a19c252a92c",
                "MacAddress": "02:42:0a:00:05:1f",
                "IPv4Address": "10.0.5.31/24",
                "IPv6Address": ""
            },
            "a9b655022426553b3436d1aff932668eff6770288dfb3c9f177832b38a86dc39": {
                "Name": "rms-master_fscmproj-link.1.hmaf4stgzgzbiekzsj4t30fcb",
                "EndpointID": "923d843018873d850df1190a67e9f2bc4c111242e520649646b2ffe219c825bf",
                "MacAddress": "02:42:0a:00:05:27",
                "IPv4Address": "10.0.5.39/24",
                "IPv6Address": ""
            },
            "b7b6260fb5995c3a38f571f16405ba4e2eedd7dab9cf81ad9fe1e68b1ace9d44": {
                "Name": "rms-master_ucpcbolton-create.1.mb4v0oyeq8ym0r3vw68ov6csc",
                "EndpointID": "312719e543e8aa8446fc851dcb97e79e7d9bc6fb871d38e8afdaac3722209dc7",
                "MacAddress": "02:42:0a:00:05:15",
                "IPv4Address": "10.0.5.21/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4102"
        },
        "Labels": {
            "com.docker.stack.namespace": "rms-master"
        },
        "Peers": [
            {
                "Name": "itrmsdev02.ucalgary.ca-3b781904ab4c",
                "IP": "10.41.149.139"
            }
        ]
    }
]
-bash-4.2$ docker network inspect 51m12clr4g1z
[
    {
        "Name": "rms-master_functions",
        "Id": "51m12clr4g1zlxr2qnlll9vdz",
        "Created": "0001-01-01T00:00:00Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "",
            "Options": null,
            "Config": null
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4102"
        },
        "Labels": {
            "com.docker.stack.namespace": "rms-master"
        }
    }
]

As before, explicitly visiting each node after removing a stack and removing dangling networks seems to avoid errors on subsequent deployments.

@kinghuang
Copy link
Author

I've tried doing a clean install of Docker CE 17.12.0-rc4 on the 10-node swarm and still get the same behaviour. Removing stacks sometimes leaves behind stray networks.

I'm really not sure how to debug this. I'll be in San Francisco this Monday to Thursday (Dec 25 to 28), if there's anyone from Docker that wants to see this interactively.

@mavenugo
Copy link
Contributor

mavenugo commented Dec 26, 2017

@kinghuang yes. These dangling dynamic networks are the cause for the issue. In order to understand why we have these networks left uncleaned, can you pls enable daemon debug logs and capture the issue when the network was left uncleaned ?
In order to do that, you have to cleanup the dangling networks and then enable the debugs on these nodes where the issue is seen and try to reproduce the issue of unclean network states.

Pls share the debug logs once you are able to reproduce the root-cause.

@kinghuang
Copy link
Author

kinghuang commented Dec 28, 2017

@mavenugo I cleared all deployed stacks (including dangling networks) on the swarm, set 'debug': true in daemon.json, and restarted all ten VMs (itrmsdev01 to 10). Then, I deployed one stack (rms-master), waited 10 minutes for all the services to start, and removed it with docker stack rm rms-master. In this case, only rms-master_streaming was left behind on itrmsdev03.

Engine and swarm info (from itrmsdev01):

(rmsdev) →  ~ docker info
Containers: 18
 Running: 2
 Paused: 0
 Stopped: 16
Images: 18
Server Version: 17.09.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 4z1lg52f0gbf00jpbjmkwrae8
 Is Manager: true
 ClusterID: bp06yifdgqdsesyq53s82012t
 Managers: 3
 Nodes: 10
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.41.149.137
 Manager Addresses:
  10.41.149.137:2377
  10.41.149.138:2377
  10.41.149.139:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.26GiB
Name: itrmsdev01.ucalgary.ca
ID: MYIJ:6PFS:CZR5:QGU6:2ZBA:S3T7:3ER5:3QRM:5QYB:ZM22:SH6N:S2JP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 61
 Goroutines: 197
 System Time: 2017-12-27T17:24:52.739398103-07:00
 EventsListeners: 2
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

(rmsdev) →  ~ docker node ls
ID                            HOSTNAME                 STATUS              AVAILABILITY        MANAGER STATUS
4z1lg52f0gbf00jpbjmkwrae8 *   itrmsdev01.ucalgary.ca   Ready               Active              Reachable
qint6611qscpfl1ld8yf4hz0d     itrmsdev02.ucalgary.ca   Ready               Active              Leader
as4y36wpcr7x79d4zb1temofv     itrmsdev03.ucalgary.ca   Ready               Active              Reachable
wvqix1z8axdblnka5k20yyq40     itrmsdev04.ucalgary.ca   Ready               Active              
awzmjucpnfjv0hbq9f9a0zbwl     itrmsdev05.ucalgary.ca   Ready               Active              
k54wv6eh92z3jf4wblndwpafr     itrmsdev06.ucalgary.ca   Ready               Active              
uyhtr0pk2igblc9tiis335o6g     itrmsdev07.ucalgary.ca   Ready               Active              
ljx2d4bembio74ivhdb6vx91t     itrmsdev08.ucalgary.ca   Ready               Active              
xn32pr7ud21zlqf2y917rq00t     itrmsdev09.ucalgary.ca   Ready               Active              
wv8myvr47guoajb2qo0yy7t2g     itrmsdev10.ucalgary.ca   Ready               Active
(rmsdev) →  ~ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                                 PORTS
5033xwo7u8dt        gitlab-runner       global              10/10               gitlab/gitlab-runner:alpine-v10.2.0   
0chcuz046apu        traefik             global              3/3                 traefik:1.4.4-alpine                  
(rmsdev) →  ~ docker stack ls
NAME                SERVICES

On itrmsdev01, 10 minutes after stack deployed:

(rmsdev) →  ~ docker stack ps rms-master
ID                  NAME                                IMAGE                                                   NODE                     DESIRED STATE       CURRENT STATE               ERROR                       PORTS
rw24ho1acw9c        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev04.ucalgary.ca   Ready               Ready about a minute ago                                
9l6kiv47i6x3        rms-master_engine.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev07.ucalgary.ca   Ready               Ready about a minute ago                                
g9rho03a7pac        rms-master_resolver.1               docker.ucalgary.ca/rms/srv/engine:master                itrmsdev09.ucalgary.ca   Ready               Ready about a minute ago                                
ugzg34l59tbb        rms-master_app.1                    docker.ucalgary.ca/rms/srv/app:master                   itrmsdev08.ucalgary.ca   Running             Running 39 seconds ago                                  
sanrs6q5oper        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev01.ucalgary.ca   Shutdown            Failed about a minute ago   "task: non-zero exit (1)"   
vzu03pldiqm1        rms-master_engine.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev05.ucalgary.ca   Shutdown            Failed about a minute ago   "task: non-zero exit (1)"   
lswzdg61m2cv        rms-master_resolver.1               docker.ucalgary.ca/rms/srv/engine:master                itrmsdev03.ucalgary.ca   Shutdown            Failed about a minute ago   "task: non-zero exit (1)"   
tskr592fl1b6        rms-master_app.1                    docker.ucalgary.ca/rms/srv/app:master                   itrmsdev02.ucalgary.ca   Shutdown            Failed 2 minutes ago        "task: non-zero exit (1)"   
k2kj0y26bh3b        rms-master_engine.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev08.ucalgary.ca   Shutdown            Failed 3 minutes ago        "task: non-zero exit (1)"   
t1ixskgzmokp        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev01.ucalgary.ca   Shutdown            Failed 3 minutes ago        "task: non-zero exit (1)"   
p1haq32i1rmk        rms-master_resolver.1               docker.ucalgary.ca/rms/srv/engine:master                itrmsdev05.ucalgary.ca   Shutdown            Failed 3 minutes ago        "task: non-zero exit (1)"   
115dmnk4qi0s        rms-master_app.1                    docker.ucalgary.ca/rms/srv/app:master                   itrmsdev03.ucalgary.ca   Shutdown            Failed 4 minutes ago        "task: non-zero exit (1)"   
6rg4x9i01qn0        rms-master_engine.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev07.ucalgary.ca   Shutdown            Failed 5 minutes ago        "task: non-zero exit (1)"   
4ueim5hb88qr        rms-master_resolver.1               docker.ucalgary.ca/rms/srv/engine:master                itrmsdev08.ucalgary.ca   Shutdown            Failed 6 minutes ago        "task: non-zero exit (1)"   
gc90tkcm17jc        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev01.ucalgary.ca   Shutdown            Failed 5 minutes ago        "task: non-zero exit (1)"   
o3231hpa1r38        rms-master_app.1                    docker.ucalgary.ca/rms/srv/app:master                   itrmsdev02.ucalgary.ca   Shutdown            Failed 6 minutes ago        "task: non-zero exit (1)"   
u4cg4d6odt6k        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev10.ucalgary.ca   Shutdown            Failed 8 minutes ago        "task: non-zero exit (1)"   
9w9czb3hym23        rms-master_pcact-read.1             docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev07.ucalgary.ca   Running             Running 9 minutes ago                                   
l7a1y2j6din7        rms-master_converis.1               docker.ucalgary.ca/tr/converis/app:5.10.5               itrmsdev06.ucalgary.ca   Running             Running 9 minutes ago                                   
ii8cmtuoy0oj        rms-master_kafka.1                  confluentinc/cp-kafka:3.2.1                             itrmsdev09.ucalgary.ca   Running             Running 9 minutes ago                                   
p3dlkurzczei        rms-master_app.1                    docker.ucalgary.ca/rms/srv/app:master                   itrmsdev03.ucalgary.ca   Shutdown            Failed 8 minutes ago        "task: non-zero exit (1)"   
pqf2riv1c1m4        rms-master_engine.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev09.ucalgary.ca   Shutdown            Failed 8 minutes ago        "task: non-zero exit (1)"   
7j2kjpxcivrg        rms-master_pcact-create.1           docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev02.ucalgary.ca   Running             Running 9 minutes ago                                   
i7fklbj54rp3        rms-master_converis-dis.1           docker.ucalgary.ca/rms/converis-dis-configs:master      itrmsdev05.ucalgary.ca   Running             Running 9 minutes ago                                   
gvu4qmg3myqo        rms-master_pstree-create-leaf.1     docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev04.ucalgary.ca   Running             Running 9 minutes ago                                   
5mcvaxh4bquq        rms-master_schema-registry.1        confluentinc/cp-schema-registry:3.2.1                   itrmsdev07.ucalgary.ca   Running             Running 9 minutes ago                                   
mba844wgjs0j        rms-master_pcteam-read-manager.1    docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev03.ucalgary.ca   Running             Running 9 minutes ago                                   
g7xamra7llgm        rms-master_pcproj-read.1            docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev08.ucalgary.ca   Running             Running 9 minutes ago                                   
2l3kfq6n1qly        rms-master_zookeeper.1              confluentinc/cp-zookeeper:3.2.1                         itrmsdev09.ucalgary.ca   Running             Running 9 minutes ago                                   
izesi7fyocmd        rms-master_project-ethics-dates.1   docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev06.ucalgary.ca   Running             Running 9 minutes ago                                   
r76gpxdcu7xj        rms-master_fscmproj-link.1          docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev10.ucalgary.ca   Running             Running 9 minutes ago                                   
4pa8ondmuegy        rms-master_fscmproj-align-stat.1    docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev01.ucalgary.ca   Running             Running 9 minutes ago                                   
vrv3wqccc0dr        rms-master_cbex-actionlog.1         ucalgary/bottledwater:latest                            itrmsdev09.ucalgary.ca   Running             Running 9 minutes ago                                   
ktg11qcsm0jt        rms-master_converis-web.1           docker.ucalgary.ca/tr/converis-web:1.1.1                itrmsdev04.ucalgary.ca   Running             Running 9 minutes ago                                   
44r7z7pn3yzi        rms-master_psspeedchart-create.1    docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev10.ucalgary.ca   Running             Running 9 minutes ago                                   
bsqqtm6v9gbp        rms-master_pcproj-create.1          docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev02.ucalgary.ca   Running             Running 9 minutes ago                                   
o6rl1oyagng6        rms-master_fscmproj-create.1        docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev08.ucalgary.ca   Running             Running 9 minutes ago                                   
sy5j41okeydj        rms-master_stream-cdm.1             docker.ucalgary.ca/rms/streams/converis-stream:master   itrmsdev07.ucalgary.ca   Running             Running 9 minutes ago                                   
vpnm4uoabs14        rms-master_pcproj-read-status.1     docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev06.ucalgary.ca   Running             Running 9 minutes ago                                   
p84ml6e3nkzx        rms-master_cevents.1                docker.ucalgary.ca/rms/streams/rms-events:master        itrmsdev05.ucalgary.ca   Running             Running 9 minutes ago                                   
9tl1wrn9pw7q        rms-master_pcteam-create.1          docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev03.ucalgary.ca   Running             Running 9 minutes ago                                   
3afjnyr57i7b        rms-master_stream-ccg.1             docker.ucalgary.ca/rms/streams/converis-stream:master   itrmsdev08.ucalgary.ca   Running             Running 9 minutes ago                                   
sap86vve2esi        rms-master_ftrigger-kafka.1         ucalgary/ftrigger:latest                                itrmsdev01.ucalgary.ca   Running             Running 9 minutes ago                                   
bkeug6ufa3xe        rms-master_gateway.1                kinghuang/gateway:function-label-constraints            itrmsdev02.ucalgary.ca   Running             Running 9 minutes ago                                   
zik6r8jinhga        rms-master_ps-collector.1           ucalgary/ps-stream:latest                               itrmsdev01.ucalgary.ca   Running             Running 9 minutes ago                                   
ivywqwmbvryx        rms-master_ps-publisher.1           ucalgary/ps-stream:latest                               itrmsdev04.ucalgary.ca   Running             Running 9 minutes ago                                   
i8a6ddr11pzf        rms-master_api-read.1               docker.ucalgary.ca/rms/interfaces/api-read:master       itrmsdev05.ucalgary.ca   Running             Running 9 minutes ago                                   
p1ty08gl3bnv        rms-master_converis-chemistry.1     docker.ucalgary.ca/tr/converis/dms:0.13.0-5.10.5        itrmsdev06.ucalgary.ca   Running             Running 9 minutes ago                                   
y9kck355u37j        rms-master_stream-cdo.1             docker.ucalgary.ca/rms/streams/converis-stream:master   itrmsdev10.ucalgary.ca   Running             Running 9 minutes ago                                   
wbpenhkwrwzc        rms-master_resolver.1               docker.ucalgary.ca/rms/srv/engine:master                itrmsdev07.ucalgary.ca   Shutdown            Failed 8 minutes ago        "task: non-zero exit (1)"   
4dxjfy1bm7sw        rms-master_ucpcbolton-create.1      docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev09.ucalgary.ca   Running             Running 9 minutes ago                                   
o735y8fih7nl        rms-master_runner.1                 docker.ucalgary.ca/rms/srv/engine:master                itrmsdev09.ucalgary.ca   Shutdown            Shutdown 9 minutes ago                                  
971z9zwtpdl2        rms-master_cdbex-other.1            ucalgary/bottledwater:latest                            itrmsdev05.ucalgary.ca   Running             Running 9 minutes ago                                   
loh6pdbug1cw        rms-master_cdbex-data.1             ucalgary/bottledwater:latest                            itrmsdev04.ucalgary.ca   Running             Running 9 minutes ago                                   
vcx4enk4u6qd        rms-master_converis-search.1        docker.ucalgary.ca/tr/converis/search:2.8.2             itrmsdev10.ucalgary.ca   Running             Running 9 minutes ago                                   
zcr538mfr1tt        rms-master_ucpcbolton-read.1        docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev07.ucalgary.ca   Running             Running 9 minutes ago                                   
v0he253d9llp        rms-master_prometheus.1             functions/prometheus:latest                             itrmsdev03.ucalgary.ca   Running             Running 9 minutes ago                                   
i17dkrjcwcuq        rms-master_converis-db.1            docker.ucalgary.ca/rms/converis-db-configs:master       itrmsdev08.ucalgary.ca   Running             Running 9 minutes ago                                   
rdmqy7eo5txd        rms-master_psspeedtype-create.1     docker.ucalgary.ca/rms/functions/fscm:master            itrmsdev06.ucalgary.ca   Running             Running 9 minutes ago   
(rmsdev) →  ~ docker stack rm rms-master
Removing service rms-master_api-read
Removing service rms-master_cbex-actionlog
Removing service rms-master_cdbex-data
Removing service rms-master_cdbex-other
Removing service rms-master_cevents
Removing service rms-master_converis
Removing service rms-master_converis-chemistry
Removing service rms-master_converis-db
Removing service rms-master_converis-dis
Removing service rms-master_converis-search
Removing service rms-master_converis-web
Removing service rms-master_fscmproj-align-stat
Removing service rms-master_fscmproj-create
Removing service rms-master_fscmproj-link
Removing service rms-master_ftrigger-kafka
Removing service rms-master_gateway
Removing service rms-master_kafka
Removing service rms-master_kafka-rest
Removing service rms-master_pcact-create
Removing service rms-master_pcact-read
Removing service rms-master_pcproj-create
Removing service rms-master_pcproj-read
Removing service rms-master_pcproj-read-status
Removing service rms-master_pcteam-create
Removing service rms-master_pcteam-read-manager
Removing service rms-master_project-ethics-dates
Removing service rms-master_prometheus
Removing service rms-master_ps-collector
Removing service rms-master_ps-publisher
Removing service rms-master_psspeedchart-create
Removing service rms-master_psspeedtype-create
Removing service rms-master_pstree-create-leaf
Removing service rms-master_schema-registry
Removing service rms-master_stream-ccg
Removing service rms-master_stream-cdm
Removing service rms-master_stream-cdo
Removing service rms-master_ucpcbolton-create
Removing service rms-master_ucpcbolton-read
Removing service rms-master_app
Removing service rms-master_engine
Removing service rms-master_resolver
Removing service rms-master_runner
Removing service rms-master_zookeeper
Removing network rms-master_adjunct
Removing network rms-master_functions
Removing network rms-master_streaming
Removing network rms-master_actions
Removing network rms-master_perimeter
Removing network rms-master_converis

On itrmsdev03, after stack removed:

-bash-4.2$ docker network ls
NETWORK ID          NAME                   DRIVER              SCOPE
6c667f53ecf0        bridge                 bridge              local
f75b2c153ae8        docker_gwbridge        bridge              local
b6b296956cf3        host                   host                local
jxz5om1adoly        ingress                overlay             swarm
b0864e4cf4ad        none                   null                local
dg0x2qpdxvem        rms-master_streaming   overlay             swarm
mktl71vfk2d6        traefik                overlay             swarm
-bash-4.2$ docker network inspect -v dg0x2qpdxvem
[
    {
        "Name": "rms-master_streaming",
        "Id": "dg0x2qpdxvemk63g2gpba7yrt",
        "Created": "2017-12-27T16:55:25.683341305-07:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.3.0/24",
                    "Gateway": "10.0.3.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4100"
        },
        "Labels": {
            "com.docker.stack.namespace": "rms-master"
        },
        "Peers": [
            {
                "Name": "itrmsdev03.ucalgary.ca-c7ea027edd7e",
                "IP": "10.41.149.138"
            }
        ]
    }
]

Attached are the daemon logs from itrmsdev01 (where the stack commands were issued) and itrmsdev03 (the only node with a dangling network after stack rm in this example). The stack was deployed around 16:55 and removed around 17:05. Stack file (minus a few passwords) also attached.

docker-itrmsdev01.ucalgary.ca.log
docker-itrmsdev03.ucalgary.ca.log
rms-master.txt

I can repeat this again, if that helps. I also have the daemon logs from the other 8 nodes available.

@kinghuang
Copy link
Author

I repeated it again. Cleaned and restarted all the nodes, then deployed and removed the same stack. This time, there are two leftover networks: rms-master_streaming on itrmsdev03 and rms-master_actions on itrmsdev04.

itrmsdev03:

-bash-4.2$ docker network ls
NETWORK ID          NAME                   DRIVER              SCOPE
4b22bcc21529        bridge                 bridge              local
f75b2c153ae8        docker_gwbridge        bridge              local
b6b296956cf3        host                   host                local
jxz5om1adoly        ingress                overlay             swarm
b0864e4cf4ad        none                   null                local
izt2vzy1pta8        rms-master_streaming   overlay             swarm
mktl71vfk2d6        traefik                overlay             swarm
-bash-4.2$ docker network inspect -v izt2vzy1pta8
[
    {
        "Name": "rms-master_streaming",
        "Id": "izt2vzy1pta8dxto1x4wrd74m",
        "Created": "2017-12-27T17:46:26.503620156-07:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4098"
        },
        "Labels": {
            "com.docker.stack.namespace": "rms-master"
        },
        "Peers": [
            {
                "Name": "itrmsdev03.ucalgary.ca-b8600eac8bc7",
                "IP": "10.41.149.138"
            }
        ]
    }
]

itrmsdev04:

[kchuang@itrmsdev04 ~]$ docker network ls
NETWORK ID          NAME                 DRIVER              SCOPE
8c475c15e0c1        bridge               bridge              local
59e24f9e9ddb        docker_gwbridge      bridge              local
a2c221f5e20c        host                 host                local
jxz5om1adoly        ingress              overlay             swarm
c7b55057c03f        none                 null                local
kh7zuz4e97cj        rms-master_actions   overlay             swarm
[kchuang@itrmsdev04 ~]$ docker network inspect -v kh7zuz4e97cj
[
    {
        "Name": "rms-master_actions",
        "Id": "kh7zuz4e97cjeygmurvgw5e0j",
        "Created": "2017-12-27T17:44:23.180828772-07:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4099"
        },
        "Labels": {
            "com.docker.stack.namespace": "rms-master"
        },
        "Peers": [
            {
                "Name": "itrmsdev04.ucalgary.ca-74efb3098426",
                "IP": "10.41.149.148"
            }
        ]
    }
]

Logs for nodes 01 (where stack commands were issued), 03, and 04 attached.

docker2-itrmsdev01.ucalgary.ca.log
docker2-itrmsdev03.ucalgary.ca.log
docker2-itrmsdev04.ucalgary.ca.log

@juliusakula
Copy link

For me I just needed to run docker network prune

@taragurung
Copy link

@juliusakula that cleaned the network but did't solve the problem in my case .

@caleblloyd
Copy link

caleblloyd commented Feb 26, 2018

I just hit this as well. For me it turned out to be an overlay network that had been deleted from Docker Swarm, but was still present on a Swarm Worker for some reason. It appears that networks are not always properly cleaned up on workers when running docker network rm on a manager.

I am on 17.09.1-ce. Running docker network prune on the worker with the network that failed to cleanup fixed the problem.

@raarts
Copy link

raarts commented Mar 2, 2018

Having leftover networks on Docker version 18.01.0-ce as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
maintainers-session
  
Networking
Development

No branches or pull requests

9 participants