Docker 1.10.3 fails to create network bridge on start up #20312

ascheman · 2016-02-14T14:36:07Z

I seem to have the #18113 issue with Docker 1.10.1!

But it also occurs mostly when using a virtual machine (virtual box) with more than 3G of memory. Then startup of Dockers fails the most time and leaves a /var/lib/docker/network/files/local-kv.db. As long as the file exists, I can not start docker. As soon as I remove it, everything is fine. I have some more hints to reproduce the problem below.

I have the following setup:

Host:
- Ubuntu: 15.10
- 4.2.0-27-generic 'docker commit' does not record deleted files #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
- Vagrant: 1.7.4 (Default on Ubuntu 15.10)
- VirtualBox: 5.0.14 (Default on Ubuntu 15.10)
Create a Vagrant virtual machine with VirtualBox:
- Ubuntu 14.04 (LTS)
- 3.13.0-77-generic Fix issue #120, initialize TCPAddr w/ field names #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Inside of the VM
- Install Puppet (3.4.3) with some modules (apt, docker, epel, etckeeper, stdlib)
- Start Docker via Puppet manifest: latest Puppet-Docker (garether-docker) sets up Docker version 1.10.1, build 9e83765

I have a Jenkins starting some of those virtual machines. Most of them require only 2048 MB of assigned RAM. But since yesterday I needed some larger instances and set the memory size for one of them to 4096 MB. This one almost always fails to start Docker and leaves the mentioned local-kv.db. Why only "almost", why not always?

I was trying around with some ideas:

At first I was running the puppet based setup with strace, something like strace -f -o /tmp/docker-problem -ttt puppet apply docker-setup.pp. When run with strace the problem never occured, though I was also never able to see access to the local-kv.db (I guess this happens in the container?).
Then I was trying with different memory sizes:
- 3072 MB (always works)
- 3584 MB: Worked some times (1 out of 3)
- 3800 MB: Failed (only one try)
- 4095 MB (trying to be slightly below the 4GB): Worked (only one try)
- 4096 MB: Always fails

Maybe it's something completely different and has nothing todo with memory sizes?

I attach one of the local-kv.db (a bzip2 in a zip) for further investigation:
local-kv.db.zip

The text was updated successfully, but these errors were encountered:

ascheman · 2016-02-14T14:42:07Z

docker version:

Client:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   9e83765
 Built:        Thu Feb 11 19:27:08 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   9e83765
 Built:        Thu Feb 11 19:27:08 2016
 OS/Arch:      linux/amd64

docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.1
Storage Driver: devicemapper
 Pool Name: docker-8:1-267070-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 305.7 MB
 Data Space Total: 107.4 GB
 Data Space Available: 40.35 GB
 Metadata Space Used: 733.2 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-77-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.861 GiB
Name: devopssquare-full-strace
ID: 45XZ:LNXX:D4SZ:TNGL:B3IF:NCWI:USNB:WFJF:7JSQ:ZCTP:7C5C:HWTD
WARNING: No swap limit support

mavenugo · 2016-02-14T17:10:47Z

@ascheman Just to confirm if this is related to #18113, do you see this error in the daemon log : failed to allocate gateway (172.17.0.1): No available addresses on this pool ?
Thanks for sharing the local-kv.db file. Can you also share the daemon bootup logs ?
FYI there can be other cases (especially when the db is put in an inconsistent state). Hence we cannot assume it is #18113 without knowing all the details.

ascheman · 2016-02-14T17:44:30Z

I was setting the reference to #18113 since the problem was caused by the file local-kv.db which was left over. In fact, the problem occurs during installation of docker and first startup of the daemon through the puppet module.

During that time I get the following log in /var/log/syslog:

Feb 14 17:21:47 vagrant-ubuntu-trusty-64 kernel: [  109.602706] bio: create slab <bio-1> at 1
Feb 14 17:21:47 vagrant-ubuntu-trusty-64 kernel: [  109.610923] bio: create slab <bio-2> at 2
Feb 14 17:21:48 vagrant-ubuntu-trusty-64 kernel: [  110.780651] Bridge firewalling registered
Feb 14 17:21:48 vagrant-ubuntu-trusty-64 kernel: [  110.789882] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Feb 14 17:21:48 vagrant-ubuntu-trusty-64 kernel: [  110.804419] ip_tables: (C) 2000-2006 Netfilter Core Team
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  110.972998] init: docker main process (12420) terminated with status 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.188388] init: docker main process (12642) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.188394] init: docker main process ended, respawning
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.284999] bio: create slab <bio-1> at 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.292451] bio: create slab <bio-2> at 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.359650] init: docker main process (12717) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.359656] init: docker main process ended, respawning
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.462584] bio: create slab <bio-1> at 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.481806] bio: create slab <bio-2> at 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.539345] init: docker main process (12841) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.539351] init: docker main process ended, respawning
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.611057] bio: create slab <bio-1> at 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.616451] bio: create slab <bio-2> at 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.659566] init: docker main process (12924) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.659583] init: docker main process ended, respawning
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.705675] bio: create slab <bio-1> at 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.711428] bio: create slab <bio-2> at 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.754647] init: docker main process (13002) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.754654] init: docker main process ended, respawning
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.799799] bio: create slab <bio-1> at 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.805659] bio: create slab <bio-2> at 2
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.849109] init: docker main process (13079) terminated with status 1
Feb 14 17:21:49 vagrant-ubuntu-trusty-64 kernel: [  111.849128] init: docker main process ended, respawning
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  111.899541] bio: create slab <bio-1> at 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  111.904846] bio: create slab <bio-2> at 2
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  111.948785] init: docker main process (13156) terminated with status 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  111.948791] init: docker main process ended, respawning
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  111.996065] bio: create slab <bio-1> at 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.002065] bio: create slab <bio-2> at 2
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.044233] init: docker main process (13233) terminated with status 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.044255] init: docker main process ended, respawning
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.092092] bio: create slab <bio-1> at 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.098328] bio: create slab <bio-2> at 2
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.151326] init: docker main process (13310) terminated with status 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.151333] init: docker main process ended, respawning
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.196786] bio: create slab <bio-1> at 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.202399] bio: create slab <bio-2> at 2
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.245816] init: docker main process (13387) terminated with status 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.245837] init: docker main process ended, respawning
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.293325] bio: create slab <bio-1> at 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.298044] bio: create slab <bio-2> at 2
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.341466] init: docker main process (13464) terminated with status 1
Feb 14 17:21:50 vagrant-ubuntu-trusty-64 kernel: [  112.341472] init: docker respawning too fast, stopped

BTW: If I suspend (save state in VBox) and relaunch the VM (Start) after this problem occurs, everything seems to be OK (Docker is up and running after restart).

ascheman · 2016-02-14T17:47:26Z

Just calling service docker start (before restart of the VM) always shows the mentioned logs and docker aborts.
Starting docker manually with docker daemon prints the following:

WARN[0000] devmapper: Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man docker` to refer to dm.thinpooldev section. 
WARN[0000] devmapper: Base device already exists and has filesystem ext4 on it. User specified filesystem  will be ignored. 
INFO[0000] [graphdriver] using prior storage driver "devicemapper" 
INFO[0000] Graph migration to content-addressability took 0.00 seconds 
INFO[0000] Firewalld running: false                     
INFO[0000] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address 
FATA[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.17.0.1): Address already in use

ascheman · 2016-02-15T05:47:01Z

Last night the variant with 4096 MB for the VM also was build successfully for one time. So my thesis that it has something to do with the memory of the VM has proven wrong ... Maybe it also depends on the load of the hosting server (at that time it was the only build job at the physical server)?

mavenugo · 2016-02-15T17:32:32Z

@ascheman yes. could you pls share the full daemon log during bootup ?
Can you also make sure there arent multiple docker daemons running at the same time sharing the graph folder (-g) corrupting the db ?
Please read my earlier comment on this topic : #17939 (comment)

Also, can you share the docker daemon configuration ?

ascheman · 2016-02-16T05:46:25Z

@mavenugo I will attach a ZIP with two tars, one of a non-working instance and one of a working one. Both contain the docker.log, some other logs and the docker configuration.

As I pointed out earlier, I am setting up Docker on fresh VM instances. Sometimes the initial installation fails due to the mentioned problem. Then I stop the complete setup of the VM (the state you will get from the one tar in the ZIP). If the installation works well, other steps are performed afterwards (so the logs become much longer, this is the case for the other tar).

docker-differences.zip

ascheman · 2016-02-16T05:56:38Z

BTW: During the mentioned builds the one which was working had a VM RAM size of 4GB while the non-working one had only 2 GB.

claytono · 2016-02-22T13:11:42Z

I've experienced the same problem with Docker 1.10 and 1.10.1. We're doing hourly and triggered CI runs and we see about a 30-40% failure rate due to this issue since upgrading to Docker 1.10.

docker version:

Client:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   9e83765
 Built:        Thu Feb 11 19:27:08 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.1
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   9e83765
 Built:        Thu Feb 11 19:27:08 2016
 OS/Arch:      linux/amd64

docker info:
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 0
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: son-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-67-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.548 GiB
Name: dev01-build-001
ID: YGVH:QXXZ:YE7E:BBBO:HMOA:HD6Z:6HJB:DYWN:DGN3:DNLX:5HWA:FNTC
WARNING: No swap limit support

/var/log/upstart/docker.log:

Waiting for /var/run/docker.sock
�[34mINFO�[0m[0000] Graph migration to content-addressability took 0.00 seconds 
�[33mWARN�[0m[0000] Running modprobe bridge br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found.
insmod /lib/modules/3.13.0-67-generic/kernel/net/llc/llc.ko 
insmod /lib/modules/3.13.0-67-generic/kernel/net/802/stp.ko 
insmod /lib/modules/3.13.0-67-generic/kernel/net/bridge/bridge.ko 
, error: exit status 1 
/var/run/docker.sock is up
�[34mINFO�[0m[0000] Firewalld running: false                     
�[34mINFO�[0m[0000] Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address 
�[34mINFO�[0m[0000] [graphdriver] using prior storage driver "auks" 
�[34mINFO�[0m[0000] Graph migration to content-addressability took 0.00 seconds 
�[34mINFO�[0m[0000] Firewalld running: false                     
�[33mWARN�[0m[0000] Could not find endpoint count key docker/network/v1.0/endpoint_count/3b8db9905f9ebe4fba7733b6b29b2e2a61deb5c990d4314362442c9f9ac5ef50/ for network bridge while listing: Key not found in store 
�[33mWARN�[0m[0000] could not find endpoint count key docker/network/v1.0/endpoint_count/3b8db9905f9ebe4fba7733b6b29b2e2a61deb5c990d4314362442c9f9ac5ef50/ for network bridge while listing: Key not found in store 
�[34mINFO�[0m[0000] Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address 
�[31mFATA�[0m[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.18.0.1): Address already in use 
�[34mINFO�[0m[0000] [graphdriver] using prior storage driver "auks" 
�[34mINFO�[0m[0000] Graph migration to content-addressability took 0.00 seconds 
�[34mINFO�[0m[0000] Firewalld running: false                     
�[33mWARN�[0m[0000] Could not find endpoint count key docker/network/v1.0/endpoint_count/3b8db9905f9ebe4fba7733b6b29b2e2a61deb5c990d4314362442c9f9ac5ef50/ for network bridge while listing: Key not found in store 
�[33mWARN�[0m[0000] could not find endpoint count key docker/network/v1.0/endpoint_count/3b8db9905f9ebe4fba7733b6b29b2e2a61deb5c990d4314362442c9f9ac5ef50/ for network bridge while listing: Key not found in store 
�[34mINFO�[0m[0000] Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address 
�[31mFATA�[0m[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.18.0.1): Address already in use 
�[34mINFO�[0m[0000] [graphdriver] using prior storage driver "auks" 
�[34mINFO�[0m[0000] Graph migration to content-addressability took 0.00 seconds 
�[34mINFO�[0m[0000] Firewalld running: false

tiborvass · 2016-02-22T15:27:46Z

@Dvorak Probably unrelated to the issue, but something is really weird with your setup: there is no auks storage driver...

claytono · 2016-02-22T15:47:38Z

That was auto-correct being stupid on cut and paste, it actually says aufs in the logs.

aboch · 2016-02-22T19:37:28Z

Thanks @ascheman @Dvorak for the logs, but one request: In order to debug this further, we need the debug level logs (start daemon with the -D option).

aboch · 2016-02-22T19:54:33Z

@ascheman
I checked the local-kv.db file you provided. I could not find any anomalies.
The bitsequence structure backing the allocated IP addresses is correct.
The network object referring to the default network has the correct IP information (including network gateway IP).
Therefore on daemon start, when the default network is being deleted, the network gateway and the 172.17.0.0/16 pool should be released and later reallocated.

If you check you debug level logs on daemon start, you should see one like:

releasing IPv4 pools from network <name> (<ID>)

before this one is printed:

Allocating IPv4 pools for network <name> (<ID>)

Can you please confirm you see the above two logs. Thanks.

claytono · 2016-02-23T14:39:01Z

We actually use the default pool 172.16.0.0/16 and 172.17.0.0/17 in our development environments.

Debug startup logs:

DEBU[0000] docker group found. gid: 999
DEBU[0000] Server created for HTTP on unix (/var/run/docker.sock)
DEBU[0000] Using default logging driver json-file
INFO[0000] [graphdriver] using prior storage driver "aufs"
DEBU[0000] Using graph driver aufs
INFO[0000] Graph migration to content-addressability took 0.00 seconds
DEBU[0000] Option DefaultDriver: bridge
DEBU[0000] Option DefaultNetwork: bridge
INFO[0000] Firewalld running: false
DEBU[0000] /sbin/iptables, [--wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t nat -D PREROUTING]
DEBU[0000] /sbin/iptables, [--wait -t nat -D OUTPUT]
DEBU[0000] /sbin/iptables, [--wait -t nat -F DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t nat -X DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -F DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -X DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -F DOCKER-ISOLATION]
DEBU[0000] /sbin/iptables, [--wait -t filter -X DOCKER-ISOLATION]
DEBU[0000] /sbin/iptables, [--wait -t nat -n -L DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t nat -N DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -n -L DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -N DOCKER]
DEBU[0000] /sbin/iptables, [--wait -t filter -n -L DOCKER-ISOLATION]
DEBU[0000] /sbin/iptables, [--wait -t filter -N DOCKER-ISOLATION]
DEBU[0000] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION -j RETURN]
DEBU[0000] /sbin/iptables, [--wait -I DOCKER-ISOLATION -j RETURN]
DEBU[0000] Registering ipam driver: "default"
INFO[0000] Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address
DEBU[0000] Allocating IPv4 pools for network bridge (52cad9c52af1fe0ee72089bbd2a5fe10055084c14d47d524159c59051312739f)
DEBU[0000] RequestPool(LocalDefault, 172.18.0.0/16, , map[], false)
DEBU[0000] RequestAddress(LocalDefault/172.18.0.0/16, 172.18.0.1, map[RequestAddressType:com.docker.network.gateway])
DEBU[0000] ReleasePool(LocalDefault/172.18.0.0/16)
DEBU[0000] Cleaning up old shm/mqueue mounts: start.
FATA[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.18.0.1): Address already in use

claytono · 2016-02-23T14:41:49Z

Also, if I delete the docker0 bridge (ip link del docker0) then the daemon starts up fine afterwards. We have a work around for the issue, but that doesn't help in CI environments where this is happening unpredictably.

ascheman · 2016-02-23T21:45:51Z

I will look into this as soon as I have access to my test environment again during the next days.

The problem will be that I just run a lot of CI jobs which build up a Vagrant/VirtualBox environment with Ubuntu 14.04 LTS. These boxes set Docker up by a Puppet module (https://github.com/garethr/garethr-docker by @garethr) which simply calls a dpkg install as far as I can see. To make any change to this setup with Docker debugging switched on would require to prepare a patched version of the Puppet module and the respective infrastructure to inject it into my provisioning chain. I am not that deep into Puppet and do not have the time to prepare this in short term. Maybe @garether can assist here?

In general it seems to be the same problem as @Dvorak pointed out.

On 22.02.2016, at 20:56, aboch notifications@github.com wrote:

@ascheman
I checked the local-kv.db file you provided. I could not find any anomalies.
The bitsequence structure backing the allocated IP addresses is correct.
The network object referring to the default network has the correct IP information (including network gateway IP).
Therefore on daemon start, when the default network is being deleted, the network gateway and the 172.17.0.0/16 pool should be released and later reallocated.

If you check you debug level logs on daemon start, you should see one like:

`"releasing IPv4 pools from network ()

before this one is printed:

log.Debugf("Allocating IPv4 pools for network ()

Can you please confirm you see the above two logs. Thanks.

—
Reply to this email directly or view it on GitHub.

Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)
+49/173/3264070 -- gerd@aschemann.net -- http://www.aschemann.net

ascheman · 2016-02-23T21:52:21Z

@Dvorak: As I pointed out, my CI setup is driven by a Puppet module. However, my workaround is the following, maybe it also works in your case?

# Workaround for https://github.com/docker/docker/issues/20312
sleeptime=10
maxtries=5
try=0
echo "Sleeping ${sleeptime}s until Docker is up and running (first)"
sleep $sleeptime
until $dir/test.sh || test $try -ge $maxtries
do  sudo /bin/rm -f /var/lib/docker/network/files/local-kv.db
    sudo service docker start
    try=`expr $try + 1`
    echo "Sleeping ${sleeptime}s until Docker is up and running (try #$try)"
    sleep $sleeptime
done
echo "Sleeping ${sleeptime}s until Docker is up and running (final)"
sleep $sleeptime
$dir/test.sh

The mentioned test.sh in $dir is simply

#!/bin/bash

set -eu

if status docker 2>/dev/null | grep -q 'process '
   then echo "Docker is up and running"
   else echo "Docker is not running!" >&2
        exit 1
fi

If it does not work after n (=5) retries I give up and stop the overall CI box setup. Usually this works (sometimes with 3 or 4 retries, but it works).

On 23.02.2016, at 15:43, dvorak notifications@github.com wrote:

Also, if I delete the docker0 bridge (ip link del docker0) then the daemon starts up fine afterwards. We have a work around for the issue, but that doesn't help in CI environments where this is happening unpredictably.

garethr · 2016-02-23T23:36:07Z

@ascheman you can pass arbitrary options to the docker daemon on startup using the extra_paramemters param on the docker class, mentioned here: https://github.com/garethr/garethr-docker/blob/master/manifests/init.pp#L158-L160. These end up in the docker engine config here: https://github.com/garethr/garethr-docker/blob/master/templates/etc/conf.d/docker.erb#L43. That should allow you to enable debugging with the -D flag.

ascheman · 2016-03-04T06:48:39Z

@garethr: Thanks for the hint

@aboch: Added two docker debug logs, one from an instance that failed and one from a working one.
docker-logs-20160304.zip

aboch · 2016-03-05T08:25:44Z

Thanks @ascheman for the debug logs.
They confirm you are hitting the same issue reported by @Dvorak.

The fact that the following logs are missing when docker reboots is a hint to the issue:

DEBU[0001] releasing IPv4 pools from network bridge (<id>) 
DEBU[0001] ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.1) 
DEBU[0001] ReleasePool(LocalDefault/172.17.0.0/16)

The default bridge network's gateway IP allocated during the previous daemon instance is not being removed, therefore the subsequent allocation failure.

Thanks for now,
will let you know when we find something or if we need more info.

aboch · 2016-03-05T09:35:45Z

Running docker with latest libnetwork using @ascheman's local-kv.db file (thanks @ascheman for providing the file and for the extra logs) explains why daemon does not find the bridge network, and therefore does not delete it at https://github.com/docker/docker/blob/master/daemon/daemon_unix.go#L596

DEBU[0000] Registering ipam driver: "default"           
DEBU[0000] Registering ipam driver: "null"              
WARN[0000] Could not find endpoint count key docker/network/v1.0/endpoint_count/a18d34c84f16467a2555bb3c834f192a567b4a6a14692a11a728e482cd4bdcbf/ for network bridge while listing: Key not found in store 
WARN[0000] could not find endpoint count key docker/network/v1.0/endpoint_count/a18d34c84f16467a2555bb3c834f192a567b4a6a14692a11a728e482cd4bdcbf/ for network bridge while listing: Key not found in store

(This log was not present in docker/docker 1.10.1 code)

Not sure how we ended in this state where the network object is present but not the correspondent endpoint_count object. (Maybe because of a well-timed ungraceful shutdown of the daemon ?)

We'll check if some code change can be done to avoid or limiting the chances to get into this scenario.

claytono · 2016-03-05T11:58:31Z

In my case the sequence is such:

Package is installed by puppet.
Docker engine started by packaging
Puppet changes /etc/default/docker
Puppet stops docker engine
Puppet starts docker engine
Start fails.

There is no manual steps for me. This doesn't happen 100% of the time.

ascheman · 2016-03-05T13:29:18Z

Same here, sometimes it works, sometimes not ...

On 05.03.2016, at 13:00, dvorak notifications@github.com wrote:

In my case the sequence is such:

Package is installed by puppet.
Docker engine started by packaging
Puppet changes /etc/default/docker
Puppet stops docker engine
Puppet starts docker engine
Start fails.
There is no manual steps for me. This doesn't happen 100% of the time.

—
Reply to this email directly or view it on GitHub.

Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)
+49/173/3264070 -- gerd@aschemann.net -- http://www.aschemann.net

mavenugo · 2016-03-05T13:41:22Z

@Dvorak @ascheman for those of us who are not familiar with puppet, it would be good if you can make it easier by providing the scripts and steps to reproduce it (with the most bare-minimal steps). The above steps are useful, but we dont know what all it takes to reproduce this exact same scenario.

ascheman · 2016-03-18T07:29:04Z

The problem still exists with docker-10.0.3.
I have provided a minimal setup in https://github.com/ascheman/minimal4dockerproblem

Just run vagrant up (maybe several times to reproduce)

thaJeztah · 2016-03-18T11:09:16Z

@ascheman I see a PR was just merged to resolve this issue, and will be included in docker 1.11

ascheman · 2016-03-18T14:35:57Z

Thanks for the Info, I was thinking it was resolved with 1.10.3 ... So I will wait for the next Minor release.
BTW: : is there a release roadmap somewhere?

On 18.03.2016, at 12:11, Sebastiaan van Stijn notifications@github.com wrote:

@ascheman I see a PR was just merged to resolve this issue, and will be included in docker 1.11

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)
+49/173/3264070 -- gerd@aschemann.net -- http://www.aschemann.net

thaJeztah · 2016-03-18T14:57:16Z

We usually use the wiki for that; https://github.com/docker/docker/wiki. But the project is always moving fast, so sometimes it may be out of date, or not yet include longer term goals

mavenugo · 2016-03-18T16:39:02Z

@ascheman could you please try master build to make sure this is infact fixed ?

ascheman · 2016-03-20T06:45:56Z

It is not a question of running the test once. I have set up a Jenkins job building a Vagrant machine with Docker on Ubuntu every few minutes. ~15% of the builds fail due to this bug.

The install uses the following chain/nested setup: Vagrant->Ubuntu->Puppet->Docker Debian Package (see my mentioned minimal Puppet install setup). If you could provide me with a current master build as an Ubuntu/Debian package I could easily phase this into my regular test build. On the other hand you could easily run this on your own if you have some Continuous Integration set up for Docker? This would allow for permanent testing of the minimal Docker setup as well as for more sophisticated tests (compositions of the different Docker tools etc.). I wonder if a project like Docker does not use CI?

On 18.03.2016, at 17:41, Madhu Venugopal notifications@github.com wrote:

@ascheman could you please try master build to make sure this is infact fixed ?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Gerd Aschemann --- Veröffentlichen heißt Verändern (Carmen Thomas)
+49/173/3264070 -- gerd@aschemann.net -- http://www.aschemann.net

mavenugo assigned aboch Feb 14, 2016

mavenugo added the area/networking label Feb 14, 2016

thaJeztah added this to the 1.10.2 milestone Feb 14, 2016

tiborvass modified the milestones: 1.10.3, 1.10.2 Feb 19, 2016

tiborvass modified the milestones: 1.11.0, 1.10.3 Mar 4, 2016

aboch mentioned this issue Mar 5, 2016

Avoid network/endpoint_count inconsistences moby/libnetwork#990

Merged

unclejack changed the title ~~Re-Open #18113 for Docker 1.10.1~~ Docker 1.10.1 fails to create network bridge on start up Mar 7, 2016

aboch mentioned this issue Mar 16, 2016

Vendoring libnetwork v0.7.0-dev.8 #21261

Merged

tonistiigi closed this as completed in #21261 Mar 17, 2016

ascheman added a commit to ascheman/minimal4dockerproblem that referenced this issue Mar 18, 2016

Provide a minimal Vagrant/Puppet Docker setup for moby/moby#20312

4e3dec1

ascheman changed the title ~~Docker 1.10.1 fails to create network bridge on start up~~ Docker 1.10.3 fails to create network bridge on start up Mar 18, 2016

thaJeztah mentioned this issue Mar 24, 2016

Docker fails to create network bridge on start up #18113

Closed

ascheman mentioned this issue May 28, 2016

Issue #20312 still open with 1.11.1 #23078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker 1.10.3 fails to create network bridge on start up #20312

Docker 1.10.3 fails to create network bridge on start up #20312

ascheman commented Feb 14, 2016

ascheman commented Feb 14, 2016

mavenugo commented Feb 14, 2016

ascheman commented Feb 14, 2016

ascheman commented Feb 14, 2016

ascheman commented Feb 15, 2016

mavenugo commented Feb 15, 2016

ascheman commented Feb 16, 2016

ascheman commented Feb 16, 2016

claytono commented Feb 22, 2016

tiborvass commented Feb 22, 2016

claytono commented Feb 22, 2016

aboch commented Feb 22, 2016

aboch commented Feb 22, 2016

claytono commented Feb 23, 2016

claytono commented Feb 23, 2016

ascheman commented Feb 23, 2016

ascheman commented Feb 23, 2016

garethr commented Feb 23, 2016

ascheman commented Mar 4, 2016

aboch commented Mar 5, 2016

aboch commented Mar 5, 2016

claytono commented Mar 5, 2016

ascheman commented Mar 5, 2016

mavenugo commented Mar 5, 2016

ascheman commented Mar 18, 2016

thaJeztah commented Mar 18, 2016

ascheman commented Mar 18, 2016

thaJeztah commented Mar 18, 2016

mavenugo commented Mar 18, 2016

ascheman commented Mar 20, 2016

Docker 1.10.3 fails to create network bridge on start up #20312

Docker 1.10.3 fails to create network bridge on start up #20312

Comments

ascheman commented Feb 14, 2016

ascheman commented Feb 14, 2016

mavenugo commented Feb 14, 2016

ascheman commented Feb 14, 2016

ascheman commented Feb 14, 2016

ascheman commented Feb 15, 2016

mavenugo commented Feb 15, 2016

ascheman commented Feb 16, 2016

ascheman commented Feb 16, 2016

claytono commented Feb 22, 2016

tiborvass commented Feb 22, 2016

claytono commented Feb 22, 2016

aboch commented Feb 22, 2016

aboch commented Feb 22, 2016

claytono commented Feb 23, 2016

claytono commented Feb 23, 2016

ascheman commented Feb 23, 2016

ascheman commented Feb 23, 2016

garethr commented Feb 23, 2016

ascheman commented Mar 4, 2016

aboch commented Mar 5, 2016

aboch commented Mar 5, 2016

claytono commented Mar 5, 2016

ascheman commented Mar 5, 2016

mavenugo commented Mar 5, 2016

ascheman commented Mar 18, 2016

thaJeztah commented Mar 18, 2016

ascheman commented Mar 18, 2016

thaJeztah commented Mar 18, 2016

mavenugo commented Mar 18, 2016

ascheman commented Mar 20, 2016