Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some veths created by Docker are not assigned a master bridge #26492

Closed
edevil opened this issue Sep 12, 2016 · 19 comments
Closed

Some veths created by Docker are not assigned a master bridge #26492

edevil opened this issue Sep 12, 2016 · 19 comments

Comments

@edevil
Copy link

edevil commented Sep 12, 2016

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.3
 Git commit:   82a3ad7
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.3
 Git commit:   82a3ad7
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 25
 Running: 13
 Paused: 0
 Stopped: 12
Images: 56
Server Version: 1.11.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 4.7.1-coreos
Operating System: CoreOS 1153.3.0 (MoreOS)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.641 GiB
Name: node-0-vm
ID: WYRU:33T3:U4UF:3MY3:PAK6:NKYR:ZLYV:HHOP:RF7Z:UPW3:SJQP:6IRL
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/

Additional environment details (AWS, VirtualBox, physical, etc.):

Azure VM

Steps to reproduce the issue:

  1. Start containers

Describe the results you received:

Some veths don't get assigned a master bridge and so are unreachable:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0d:3a:25:22:b8 brd ff:ff:ff:ff:ff:ff
3: cbr0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc htb state UP mode DEFAULT group default qlen 1000
    link/ether 12:ad:84:d9:76:4e brd ff:ff:ff:ff:ff:ff
5: veth1df4092@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default <----
    link/ether 62:69:01:c6:05:4e brd ff:ff:ff:ff:ff:ff
11: veth8ffd52d@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cbr0 state UP mode DEFAULT group default
    link/ether a2:26:fb:1c:ff:b5 brd ff:ff:ff:ff:ff:ff
37: veth1b79130@if36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cbr0 state UP mode DEFAULT group default
    link/ether 7a:3b:72:4a:85:1c brd ff:ff:ff:ff:ff:ff
47: veth7c344d8@if46: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cbr0 state UP mode DEFAULT group default
    link/ether 5e:41:45:cc:89:b8 brd ff:ff:ff:ff:ff:ff
49: veth700962c@if48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cbr0 state UP mode DEFAULT group default
    link/ether aa:4f:31:92:ad:a1 brd ff:ff:ff:ff:ff:ff
59: veth267617a@if58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default <----
    link/ether 3a:1d:3f:1a:c0:04 brd ff:ff:ff:ff:ff:ff
75: veth4a9e42d@if74: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cbr0 state UP mode DEFAULT group default
    link/ether 12:ad:84:d9:76:4e brd ff:ff:ff:ff:ff:ff

Notice veth1df4092 and veth267617a.

Describe the results you expected:

All the veths should have the master bridge defined.

Additional information you deem important (e.g. issue happens only occasionally):

The issue happens occasionally, and only noticed it after the latest coreos beta update. Here are logs of affected veths:

Sep 09 08:58:28 node-0-vm kernel: cbr0: port 1(veth1df4092) entered blocking state
Sep 09 08:58:28 node-0-vm kernel: cbr0: port 1(veth1df4092) entered disabled state
Sep 09 08:58:28 node-0-vm kernel: device veth1df4092 entered promiscuous mode
Sep 09 08:58:28 node-0-vm systemd-udevd[1770]: Could not generate persistent MAC address for veth1df4092: No such file or directory
Sep 09 08:58:28 node-0-vm systemd-networkd[1177]: veth1df4092: IPv6 enabled for interface: Success
Sep 09 08:58:28 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_UP): veth1df4092: link is not ready
Sep 09 08:58:28 node-0-vm kernel: device veth1df4092 left promiscuous mode
Sep 09 08:58:28 node-0-vm kernel: cbr0: port 1(veth1df4092) entered disabled state
Sep 09 08:58:28 node-0-vm systemd-networkd[1177]: veth1df4092: Link readded
Sep 09 08:58:28 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth1df4092: link becomes ready
Sep 09 08:58:28 node-0-vm systemd-networkd[1177]: veth1df4092: Gained carrier
Sep 09 08:58:29 node-0-vm systemd-networkd[1177]: veth1df4092: Lost carrier
Sep 09 08:58:29 node-0-vm systemd-networkd[1177]: veth1df4092: Gained IPv6LL
Sep 09 08:58:29 node-0-vm systemd-networkd[1177]: veth1df4092: Gained carrier
Sep 09 08:58:42 node-0-vm systemd-networkd[1177]: veth1df4092: Configured
Sep 12 12:57:27 node-0-vm kernel: cbr0: port 4(veth267617a) entered blocking state
Sep 12 12:57:27 node-0-vm kernel: cbr0: port 4(veth267617a) entered disabled state
Sep 12 12:57:27 node-0-vm kernel: device veth267617a entered promiscuous mode
Sep 12 12:57:27 node-0-vm systemd-udevd[929]: Could not generate persistent MAC address for veth267617a: No such file or directory
Sep 12 12:57:27 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_UP): veth267617a: link is not ready
Sep 12 12:57:27 node-0-vm systemd-networkd[1177]: veth267617a: IPv6 enabled for interface: Success
Sep 12 12:57:27 node-0-vm kernel: device veth267617a left promiscuous mode
Sep 12 12:57:27 node-0-vm kernel: cbr0: port 4(veth267617a) entered disabled state
Sep 12 12:57:27 node-0-vm systemd-networkd[1177]: veth267617a: Link readded
Sep 12 12:57:27 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth267617a: link becomes ready
Sep 12 12:57:27 node-0-vm systemd-networkd[1177]: veth267617a: Gained carrier
Sep 12 12:57:27 node-0-vm systemd-networkd[1177]: veth267617a: Lost carrier
Sep 12 12:57:27 node-0-vm systemd-networkd[1177]: veth267617a: Gained carrier
Sep 12 12:57:29 node-0-vm systemd-networkd[1177]: veth267617a: Gained IPv6LL
Sep 12 12:57:41 node-0-vm systemd-networkd[1177]: veth267617a: Configured

Here are logs of correctly operating veths:

Sep 10 20:05:10 node-0-vm kernel: cbr0: port 2(veth1b79130) entered blocking state
Sep 10 20:05:10 node-0-vm kernel: cbr0: port 2(veth1b79130) entered disabled state
Sep 10 20:05:10 node-0-vm kernel: device veth1b79130 entered promiscuous mode
Sep 10 20:05:10 node-0-vm systemd-udevd[49019]: Could not generate persistent MAC address for veth1b79130: No such file or directory
Sep 10 20:05:10 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_UP): veth1b79130: link is not ready
Sep 10 20:05:10 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth1b79130: link becomes ready
Sep 10 20:05:10 node-0-vm kernel: cbr0: port 2(veth1b79130) entered blocking state
Sep 10 20:05:10 node-0-vm kernel: cbr0: port 2(veth1b79130) entered forwarding state
Sep 10 20:05:10 node-0-vm systemd-networkd[1177]: veth1b79130: Gained carrier
Sep 10 20:05:10 node-0-vm systemd-networkd[1177]: veth1b79130: Lost carrier
Sep 10 20:05:10 node-0-vm kernel: cbr0: port 2(veth1b79130) entered disabled state
Sep 10 20:05:11 node-0-vm kernel: cbr0: port 2(veth1b79130) entered blocking state
Sep 10 20:05:11 node-0-vm kernel: cbr0: port 2(veth1b79130) entered forwarding state
Sep 10 20:05:11 node-0-vm systemd-networkd[1177]: veth1b79130: Gained carrier
Sep 10 20:05:11 node-0-vm systemd-networkd[1177]: veth1b79130: Gained IPv6LL
Sep 10 20:05:24 node-0-vm systemd-networkd[1177]: veth1b79130: Configured
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered blocking state
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered disabled state
Sep 12 13:23:54 node-0-vm kernel: device veth4a9e42d entered promiscuous mode
Sep 12 13:23:54 node-0-vm systemd-udevd[27087]: Could not generate persistent MAC address for veth4a9e42d: No such file or directory
Sep 12 13:23:54 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_UP): veth4a9e42d: link is not ready
Sep 12 13:23:54 node-0-vm kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth4a9e42d: link becomes ready
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered blocking state
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered forwarding state
Sep 12 13:23:54 node-0-vm systemd-networkd[1177]: veth4a9e42d: Gained carrier
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered disabled state
Sep 12 13:23:54 node-0-vm systemd-networkd[1177]: veth4a9e42d: Lost carrier
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered blocking state
Sep 12 13:23:54 node-0-vm kernel: cbr0: port 4(veth4a9e42d) entered forwarding state
Sep 12 13:23:54 node-0-vm systemd-networkd[1177]: veth4a9e42d: Gained carrier
Sep 12 13:23:55 node-0-vm systemd-networkd[1177]: veth4a9e42d: Gained IPv6LL
Sep 12 13:24:08 node-0-vm systemd-networkd[1177]: veth4a9e42d: Configured

Notice that on the logs of incorrect veths we have the line "Link readded", which is not present on the logs of correct veths. On the other hand, the log "entered forwarding state" is only present on correct veths. Maybe some kind of race condition?

@mlaventure
Copy link
Contributor

ping @mavenugo

@Quentin-M
Copy link

Quentin-M commented Sep 13, 2016

I was able to reproduce as well. I've ran 500 containers with sleep inf and used ip link | grep veth | grep -v docker0 to count/identify the failures.

@dm0-
Copy link

dm0- commented Sep 13, 2016

This is reproducible by running a bunch of containers in sequence on current CoreOS beta or alpha. Eventually one will fail. When the master is not set, the container can't ping its own gateway interface, so this will do it:

for ((i=0; i<100; i++)) ; do docker run --rm busybox ping -c 1 172.17.0.1 || break ; done ; echo $i

So far, I've seen that the requests set up in LinkSetMasterByIndex appear to be essentially identical for both failing and working containers. When manually running the equivalent ip link command in the function description, it successfully sets the master and makes the container's networking functional.

@dm0-
Copy link

dm0- commented Sep 15, 2016

I looked into this a bit more, and it appears to be due to a race condition between creating the veth interfaces and adding the host interface to the bridge. I wrote a proof-of-concept fix that waits for the kernel to notify that the veth interfaces are running. See dm0-/libnetwork@4343ba4c21f1a121f9e867efda3231a61dc5565e. I've run a couple thousand containers with it and did not have any network problems. Can someone else verify this?

@thaJeztah
Copy link
Member

ping @mavenugo @aboch PTAL! ^^

@dm0-
Copy link

dm0- commented Sep 23, 2016

Sorry, I updated the cause on the pull request, but not this issue. Here's what can reproduce the issue:

This error is triggered by a race condition on systems using systemd-networkd with a network file that matches the veths. When a network file matches an interface, networkd brings up the interface, even if that network file has no configuration settings in it. The network files don't appear to support marking links unmanaged, so every networkd configuration file shipped by distros would need to be written to not match Docker's veths, and end-users will also need to be aware of this for any custom configuration files they add (unless an Unmanaged option is added to networkd to match the veths first).

Bridge networking seems to work fine in two scenarios: systemd-networkd is not running or doesn't hijack the veths, or if Docker waits for the veth interfaces to be fully brought up before adding to the bridge.

@aboch
Copy link
Contributor

aboch commented Sep 23, 2016

Thanks @dm0- for the analysis.

I have not been able to reproduce the issue on ubuntu machine, but it is congruent with your explanation, because in my case systemd-networkd service is not running.

It seems it is possible to control with configuration files whether networkd should manage a link based on the link type.

I would rather document your finding in the docker documentation, instead of modifying the docker code to be resilent to networkd interfeerence. That way users can decide if it is worth to configure their networkd or to shut it down, depending on their setup.

@dm0-
Copy link

dm0- commented Sep 23, 2016

I looked into networkd earlier today, and for the record, this happens when it brings up an interface:

https://github.com/systemd/systemd/blob/38b383d9fe0f5c4e987c1e01136ae6073076fee3/src/network/networkd-link.c#L1602

Delete that block, and there is no issue.

@mavenugo
Copy link
Contributor

@dm0- Thanks for the info. It seems this is not a docker issue. Closing it now,
@edevil pls let us know if you think this is a docker issue.

@galindro
Copy link

@dm0-
I'm using Docker 1.12.3 on Ubuntu 16.04 LTS and I've noticed this error. Reading the thread, I see that you proposed a PR to systemd project to solve this problem. Am I right?

I'll need to update my systemd service through package manager to solve this problem in my current system? Is there any workarround that I can make to avoid this issue? If yes, what is? Which file I need to edit and what content needs to be replaced?

In my case, I have installed docker 1.12.3 from scratch (a few minutes ago) in 4 nodes, and executed the following commands:

# manager1: 
  docker swarm init
# manager2:
  docker swarm join --token MYTOKEN-MANAGER 10.0.1.100:2377
# manager3: 
  docker swarm join --token MYTOKEN-MANAGER 10.0.1.100:2377
# worker1: 
  docker swarm join --token MYTOKEN-WORKER 10.0.1.100:2377
#manager1:
  docker network create \
    --driver overlay \
    infra

  docker service create \
    --name=viz \
    --publish=8080:8080/tcp \
    --constraint=node.role==manager \
    --network=infra \
    --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
    manomarks/visualizer

  docker service create \
    --name=busybox \
    --network=infra \
    busybox

The viz service can start successfully but the busybox startup enters in loop:

root@ip-10-0-1-100:~# docker service ps busybox
ID                         NAME           IMAGE    NODE           DESIRED STATE  CURRENT STATE                 ERROR
13ico75d3ia92kzdzdaww0yj5  busybox.1      busybox  ip-10-0-3-100  Ready          Ready less than a second ago  
7desvrxblwv5je0cfz3l3gw43   \_ busybox.1  busybox  ip-10-0-20-48  Shutdown       Complete 2 seconds ago        
5csbpwxhf89ssxtuw03al1gyx   \_ busybox.1  busybox  ip-10-0-2-100  Shutdown       Complete 8 seconds ago        
127vnlr64kxrffhmbz5gne3uj   \_ busybox.1  busybox  ip-10-0-3-100  Shutdown       Complete 14 seconds ago       
2c0pzhq9sva3ld4g71hpywfsb   \_ busybox.1  busybox  ip-10-0-20-48  Shutdown       Complete 20 seconds ago       

root@ip-10-0-1-100:~# docker service ls
ID            NAME     REPLICAS  IMAGE                 COMMAND
bnamhbjl00az  viz      1/1       manomarks/visualizer  
dnghsob7orek  busybox  0/1       busybox               

root@ip-10-0-1-100:~# docker service ps viz
ID                         NAME   IMAGE                 NODE           DESIRED STATE  CURRENT STATE          ERROR
4idjeh5mzs26u7eo6esvse8kf  viz.1  manomarks/visualizer  ip-10-0-1-100  Running        Running 6 minutes ago  

This is a piece of docker log from one of the nodes that is generated by docker-engine daemon when swarm tries to start the service:

Dec 13 23:30:08 ip-10-0-20-48 kernel: [ 1653.611994] aufs au_opts_verify:1597:dockerd[13364]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 23:30:08 ip-10-0-20-48 kernel: [ 1653.639362] aufs au_opts_verify:1597:dockerd[13364]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 23:30:10 ip-10-0-20-48 kernel: [ 1656.113789] aufs au_opts_verify:1597:dockerd[14025]: dirperm1 breaks the protection by the permission bits on the lower branch
Dec 13 23:30:10 ip-10-0-20-48 kernel: [ 1656.120385] IPVS: Creating netns size=2192 id=63
Dec 13 23:30:10 ip-10-0-20-48 kernel: [ 1656.136840] br0: renamed from ov-000101-b29gr
Dec 13 23:30:10 ip-10-0-20-48 systemd-udevd[14514]: Could not generate persistent MAC address for vx-000101-b29gr: No such file or directory
Dec 13 23:30:10 ip-10-0-20-48 kernel: [ 1656.168396] vxlan1: renamed from vx-000101-b29gr
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14541]: Could not generate persistent MAC address for veth21f8b53: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14540]: Could not generate persistent MAC address for veth34c132c: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.184273] device vxlan1 entered promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.184432] br0: port 1(vxlan1) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.184437] br0: port 1(vxlan1) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.200380] veth2: renamed from veth21f8b53
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14585]: Could not generate persistent MAC address for vethd539d43: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14586]: Could not generate persistent MAC address for vethc117fcb: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.208307] device veth2 entered promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.208418] IPv6: ADDRCONF(NETDEV_UP): veth2: link is not ready
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.208421] br0: port 2(veth2) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.208426] br0: port 2(veth2) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.209891] device vethc117fcb entered promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.209956] IPv6: ADDRCONF(NETDEV_UP): vethc117fcb: link is not ready
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.209960] docker_gwbridge: port 2(vethc117fcb) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.209967] docker_gwbridge: port 2(vethc117fcb) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.242273] IPVS: Creating netns size=2192 id=64
Dec 13 23:30:11 ip-10-0-20-48 dockerd[6616]: time="2016-12-13T23:30:11Z" level=info msg="Firewalld running: false"
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.316499] eth0: renamed from veth34c132c
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.336338] IPv6: ADDRCONF(NETDEV_CHANGE): veth2: link becomes ready
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.336379] docker_gwbridge: port 2(vethc117fcb) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.396454] eth1: renamed from vethd539d43
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.416302] IPv6: ADDRCONF(NETDEV_CHANGE): vethc117fcb: link becomes ready
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.416324] docker_gwbridge: port 2(vethc117fcb) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.416333] docker_gwbridge: port 2(vethc117fcb) entered forwarding state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.505537] br0: port 2(veth2) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.505567] br0: port 1(vxlan1) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.507886] ov-000101-b29gr: renamed from br0
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.516216] device veth2 left promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.516237] ov-000101-b29gr: port 2(veth2) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.528139] device vxlan1 left promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.528154] ov-000101-b29gr: port 1(vxlan1) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.566518] vx-000101-b29gr: renamed from vxlan1
Dec 13 23:30:11 ip-10-0-20-48 dockerd[6616]: message repeated 2 times: [ time="2016-12-13T23:30:11Z" level=info msg="Firewalld running: false"]
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14722]: Could not generate persistent MAC address for vx-000101-b29gr: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.595248] veth21f8b53: renamed from veth2
Dec 13 23:30:11 ip-10-0-20-48 systemd-udevd[14743]: Could not generate persistent MAC address for veth21f8b53: No such file or directory
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.666459] veth34c132c: renamed from eth0
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.701051] docker_gwbridge: port 2(vethc117fcb) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.701105] vethd539d43: renamed from eth1
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.746335] docker_gwbridge: port 2(vethc117fcb) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.749061] device vethc117fcb left promiscuous mode
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.749065] docker_gwbridge: port 2(vethc117fcb) entered disabled state
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.856097] IPVS: __ip_vs_del_service: enter
Dec 13 23:30:11 ip-10-0-20-48 kernel: [ 1656.856100] IPVS: __ip_vs_del_service: enter

I've tested this with docker 1.12.4 too and I can confirm that it continues occurring. So, I think that it isn't related to docker engine itself.

@edevil @mavenugo @aboch @thaJeztah

@dm0-
Copy link

dm0- commented Dec 13, 2016

@galindro Yes, I have sent two different methods of addressing the issue to upstream systemd: systemd/systemd#4228 and systemd/systemd#4809. Both have been merged, but they are not in a release yet, so you would have to manually build a patched package if you want to use them. See coreos/coreos-overlay#2300 for example configuration files using the first method.

If your problem is caused by the same issue (networkd matching Docker veths), then the workaround is to rewrite your .network files' [Match] sections to be more restrictive so they don't match Driver=veth bridge etc. I am not familiar with Ubuntu, so I don't know of anything more specific for you.

@thaJeztah
Copy link
Member

@galindro the busybox service restarting is expected; the entrypoint/command for busybox is sh, which will exit immediately if there's no stdin and tty attached. You need an image with a process that doesn't daemonize and keeps running, otherwise the container just exits immediately, and swarm will start a new container to replace it

@galindro
Copy link

Guys, thank you very much for the quick reply!

@ArseniiPetrovich
Copy link

Hi, guys. Do we have any, like, quick solution on this at 2018?
Facing the same issue and, unfortunately, do not understand clearly how to fix this.

@rvernica
Copy link

Could this cause an AWS instance not to be accessible by SSH anymore?

These are the last few lines I see in the log and I can't SSH to the instance anymore:

[   29.832213] br-9bacab3c71dd: port 6(veth05e21c1) entered blocking state
[   29.839355] br-9bacab3c71dd: port 6(veth05e21c1) entered disabled state
[   29.875175] device veth05e21c1 entered promiscuous mode
[   29.904463] IPv6: ADDRCONF(NETDEV_UP): veth05e21c1: link is not ready
[   30.778724] IPv6: ADDRCONF(NETDEV_CHANGE): veth05e21c1: link becomes ready
[   30.786407] br-9bacab3c71dd: port 6(veth05e21c1) entered blocking state
[   30.793603] br-9bacab3c71dd: port 6(veth05e21c1) entered forwarding state 

@rvernica
Copy link

Just an update on my case, after restart there entries are not present in the log anymore and the instance works.

@CrashLaker
Copy link

Could this cause an AWS instance not to be accessible by SSH anymore?

These are the last few lines I see in the log and I can't SSH to the instance anymore:

[   29.832213] br-9bacab3c71dd: port 6(veth05e21c1) entered blocking state
[   29.839355] br-9bacab3c71dd: port 6(veth05e21c1) entered disabled state
[   29.875175] device veth05e21c1 entered promiscuous mode
[   29.904463] IPv6: ADDRCONF(NETDEV_UP): veth05e21c1: link is not ready
[   30.778724] IPv6: ADDRCONF(NETDEV_CHANGE): veth05e21c1: link becomes ready
[   30.786407] br-9bacab3c71dd: port 6(veth05e21c1) entered blocking state
[   30.793603] br-9bacab3c71dd: port 6(veth05e21c1) entered forwarding state 

i'm having the exact same issue.. is there a way to solve this instead of restarting it.

@jaydhary14
Copy link

having same issue , ssh server is refusing to start and seeing error like "port4 entered blocking state"

@tiagofrancafernandes
Copy link

tiagofrancafernandes commented Jun 2, 2022

Thats works for me

sudo kill -9 $(sudo service docker status|grep 'Main PID'|grep '(dockerd)'|grep -o -E "[0-9]+") && sudo service docker start &

After

docker-compose up

# OR
docker-compose down && docker-compose up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests