New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker fails to create network bridge on start up #18113

Closed
Chili-Man opened this Issue Nov 20, 2015 · 70 comments

Comments

Projects
None yet
@Chili-Man

Chili-Man commented Nov 20, 2015

Description of problem:
When freshly installing the docker 1.9.0 daemon, it sometimes fails to create the network bridge at startup and thus fails to start the daemon. It seems to fail about 50% of the time and I'm not sure why. Here's some of the log output:

time="2015-11-20T05:32:35.395996380Z" level=info msg="API listen on /var/run/docker.sock" 
time="2015-11-20T05:32:35.415570660Z" level=info msg="Firewalld running: false" 
time="2015-11-20T05:32:35.441312772Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address" 
time="2015-11-20T05:32:35.445369505Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.0.1): No available addresses on this pool" 
time="2015-11-20T05:32:35.562979814Z" level=info msg="API listen on /var/run/docker.sock" 
time="2015-11-20T05:32:35.581794400Z" level=info msg="Firewalld running: false" 
time="2015-11-20T05:32:35.672030288Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address" 
time="2015-11-20T05:32:35.677536533Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.0.1): No available addresses on this pool" 

docker version:

Client:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 17:43:42 UTC 2015
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

docker info:

Containers: 0
Images: 0
Server Version: 1.9.0
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-18-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 1
Total Memory: 2.045 GiB
Name: vagrant
ID: 253M:O5XT:BJQK:AHRB:U5MI:OQZL:UUK4:YZAR:6XIH:XHV2:62QQ:3BET
WARNING: No swap limit support

Cannot connect to the Docker daemon. Is the docker daemon running on this host?

uname -a:
Linux default-ubuntu-1404 4.2.0-18-generic #22~14.04.1-Ubuntu SMP Fri Nov 6 22:20:11 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):
I've ran into this issue on both AWS and Virtualbox with the latest ubuntu 14.04 images with the linux-virtual-lts-wily kernels installed (4.2.0)

How reproducible:
It seems to be about a 50% chance that the install will fail due to not being able to create the network bridge interface at start up

Steps to Reproduce:

  1. Install the linux-virtual-lts-wily package onto an ubuntu 14.04 virtual box or AWS server
  2. Install Docker 1.9.0, and enable the daemon with -s overlay
  3. Sometimes it will fail, sometimes it will succeed

Actual Results:
Docker does not run

Expected Results:
Docker daemon should be running

Additional info:
I dont get this error if I use the linux-virtual-lts-vivid kernel (3.19) instead.

@mavenugo

This comment has been minimized.

Contributor

mavenugo commented Nov 20, 2015

@Chili-Man this seems like a dupe of #17939. Can you pls make sure if there are multiple docker daemons running (either natively in the host or via dind using the same root directory as indicated in #17939).

@Chili-Man

This comment has been minimized.

Chili-Man commented Nov 20, 2015

@mavenugo At first I thought that the issue was that there were multiple docker daemons running, but that is not that case. I've checked all the processes running on the host machine, and there was no docker daemon running at all.

@mavenugo

This comment has been minimized.

Contributor

mavenugo commented Nov 20, 2015

@Chili-Man do you know why the log indicates the daemon is launched multiple times within the same second (2015-11-20T05:32:35) as indicated by the logs ?
Also, it is interesting to see this is specific to kernel v4.2.

@Chili-Man

This comment has been minimized.

Chili-Man commented Nov 20, 2015

yeah its within the same second because docker fails fast and upstart attempts a couple of times to restart it after it fails to start; I also manually tried starting the docker daemon after upstart gave up on trying with:

sudo docker daemon -s overlay

but it still produced the same error. I made sure that no other docker daemon was running when I issued the above command.

It is strange that I haven’t been able to reproduce that issue with the 3.19 kernel, only on the 4.2.0 kernel.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Nov 25, 2015

@Chili-Man are you still having this issue? 1.9.1 was released recently; wondering if that release solved the issue for you, or if it's still unresolved.

@Chili-Man

This comment has been minimized.

Chili-Man commented Nov 25, 2015

I haven't had a chance to try it out with 1.9.1, I'll try it over this weekend and update back here when I do.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Nov 25, 2015

@Chili-Man thanks! Not sure if there's anything changed in 1.9.1 for this particular case, but there were quite some changes, so interested to hear if it's still an issue

@ghost

This comment has been minimized.

ghost commented Dec 1, 2015

I'm having the same issue with 1.9.1. It's intermittent across my organization. I've put some information below along the lines of Chili-Man's supplied information.

Linux 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64         x86_64 x86_64 GNU/Linux
$ docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

time="2015-12-01T17:30:05.109681982Z" level=debug msg="vagrant group found. gid: 1000" 
time="2015-12-01T17:30:05.109774478Z" level=debug msg="Server created for HTTP on unix (/var/run/docker.sock)" 
time="2015-12-01T17:30:05.110093715Z" level=debug msg="[graphdriver] trying provided driver \"btrfs\"" 
time="2015-12-01T17:30:05.112235021Z" level=debug msg="Using graph driver btrfs" 
time="2015-12-01T17:30:05.112264108Z" level=debug msg="Using default logging driver json-file" 
time="2015-12-01T17:30:05.112334572Z" level=debug msg="Creating images graph" 
time="2015-12-01T17:30:05.113769728Z" level=debug msg="Restored 12 elements" 
time="2015-12-01T17:30:05.113899603Z" level=debug msg="Creating repository list" 
time="2015-12-01T17:30:05.113971500Z" level=debug msg="Option DefaultDriver: bridge" 
time="2015-12-01T17:30:05.114014208Z" level=debug msg="Option DefaultNetwork: bridge" 
time="2015-12-01T17:30:05.115306982Z" level=info msg="API listen on /var/run/docker.sock" 
time="2015-12-01T17:30:05.118348751Z" level=info msg="Firewalld running: false" 
time="2015-12-01T17:30:05.119903980Z" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]" 
time="2015-12-01T17:30:05.120908447Z" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]" 
time="2015-12-01T17:30:05.121839720Z" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]" 
time="2015-12-01T17:30:05.122916651Z" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING]" 
time="2015-12-01T17:30:05.124121042Z" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT]" 
time="2015-12-01T17:30:05.125459750Z" level=debug msg="/sbin/iptables, [--wait -t nat -F DOCKER]" 
time="2015-12-01T17:30:05.126623895Z" level=debug msg="/sbin/iptables, [--wait -t nat -X DOCKER]" 
time="2015-12-01T17:30:05.128540151Z" level=debug msg="/sbin/iptables, [--wait -t nat -n -L DOCKER]" 
time="2015-12-01T17:30:05.129601103Z" level=debug msg="/sbin/iptables, [--wait -t nat -N DOCKER]" 
time="2015-12-01T17:30:05.130786984Z" level=debug msg="/sbin/iptables, [--wait -t filter -n -L DOCKER]" 
time="2015-12-01T17:30:05.132208778Z" level=debug msg="Registering ipam provider: default" 
time="2015-12-01T17:30:05.132674076Z" level=warning msg="Could not get list of networks during endpoint cleanup: could not find endpoint count key docker/network/v1.0/endpoint_count/c13b96014f7f75ddc5e7aa856acafbadcaeb585fd0e9b0c5b598cc41c67f3367/ for network bridge while listing: Key not found in store" 
time="2015-12-01T17:30:05.133080964Z" level=error msg="could not find endpoint count key docker/network/v1.0/endpoint_count/c13b96014f7f75ddc5e7aa856acafbadcaeb585fd0e9b0c5b598cc41c67f3367/ for network bridge while listing: Key not found in store" 
time="2015-12-01T17:30:05.133284564Z" level=debug msg="Allocating IPv4 pools for network bridge (21cccc31f30fb5767a9011101dfac808fdc32b55304eb40543a496390bbbae13)" 
time="2015-12-01T17:30:05.133331905Z" level=debug msg="RequestPool(LocalDefault, 172.17.42.1/24, , map[], false)" 
time="2015-12-01T17:30:05.136452128Z" level=debug msg="RequestAddress(LocalDefault/172.17.42.0/24, 172.17.42.1, map[])" 
time="2015-12-01T17:30:05.136523345Z" level=debug msg="Retrieving bitmask (LocalDefault/172.17.42.0/24, 172.17.42.0/24)" 
time="2015-12-01T17:30:05.136669772Z" level=debug msg="ReleasePool(LocalDefault/172.17.42.0/24)" 
time="2015-12-01T17:30:05.152949577Z" level=debug msg="Cleaning up old shm/mqueue mounts: start." 
time="2015-12-01T17:30:05.153170368Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.42.1): No available addresses on this pool" 
@sheldonkwok

This comment has been minimized.

sheldonkwok commented Dec 1, 2015

I am also experiencing this issue

Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:12:04 UTC 2015
OS/Arch: linux/amd64

INFO[0000] API listen on /var/run/docker.sock
INFO[0000] [graphdriver] using prior storage driver "overlay"
INFO[0000] Firewalld running: false
INFO[0000] Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address
FATA[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.17.0.1): No available addresses on this pool

@mavenugo

This comment has been minimized.

Contributor

mavenugo commented Dec 1, 2015

ping @aboch

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 1, 2015

Most likely something is messing the /var/lib/docker/network/files/local-kv.db file content.

@mavenugo docker/libnetwork#687 would help give some clarity on where the failure is, as I have said in #17939 (comment)

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 1, 2015

@Chili-Man @dvanbuskirk @sheldonkwok
When you experience the issue, can you please copy your
/var/lib/docker/network/files/local-kv.db in some place where I can download it.
I want to take a look at it.

A possible work-around, please check, could be delete that file before starting the daemon.

@sheldonkwok

This comment has been minimized.

sheldonkwok commented Dec 1, 2015

Deleting the file worked... Thanks!
Where can I email it?

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 1, 2015

@sheldonkwok Thanks for confirming.
To share the file, I think dropbox or any similar service would do it.

@sheldonkwok

This comment has been minimized.

sheldonkwok commented Dec 1, 2015

I have it on s3. There's nothing sensitive in it right? I glanced through it and it looked fine

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 1, 2015

Yeah, nothing sensitive in it. Mostly Network, Endpoint and Sandbox ids and their configurations.
If you do not feel comfortable sharing it publicly it is perfectly fine. I will contact you via email for that.

Thanks.

@sheldonkwok

This comment has been minimized.

sheldonkwok commented Dec 1, 2015

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 3, 2015

Thanks @sheldonkwok for sending the file, it was very helpful.

In your case the issue was because the local store file had some inconsistent states, due to a sequence of ungraceful daemon shutdown.
We were able to reproduce and end up our local store file in the same state.

docker/libnetwork#794 was pushed to restrict the chances of ending in such inconsistent file state.

Next libnetwork vendoring will bring it to docker/docker.

@sheldonkwok

This comment has been minimized.

sheldonkwok commented Dec 3, 2015

Great thanks for the quick response and the temporary solution on my end

@brianbianco

This comment has been minimized.

brianbianco commented Dec 9, 2015

I have this problem intermittently as well.

@aboch

This comment has been minimized.

Contributor

aboch commented Dec 15, 2015

Now that docker/libnetwork#687 is merged on master, at least for some cases, the no "No available addresses on this pool" error would now look more like "internal failure while setting the bit: boltdb bucket doesn't exist " (I just hit this master).

This means the local-kv.db file got somehow overwritten probably by another daemon instance.

A way to say you are hitting this issue is to check the o/p of
strings /var/lib/docker/network/files/local-kv.db, it will be empty.

jvodan added a commit to EnginesOS/EnginesInstaller that referenced this issue Dec 21, 2015

@ablojh

This comment has been minimized.

ablojh commented Dec 23, 2015

@aboch, I just hit this issue when upgrading our 1.9.0 machines to 1.9.1 on Ubuntu 14.04.

"Fix" was to delete the:

/var/lib/docker/network/files/local-kv.db file.

If you guys need another local-kv.db file to look at, I can send it to you.

@DrPaulBrewer

This comment has been minimized.

DrPaulBrewer commented Jan 5, 2016

Had same problem with ubuntu 15.04 and docker version 1.9.1

deleting local-kv.db fixed it

@mafrosis

This comment has been minimized.

mafrosis commented Aug 2, 2016

Likewise running docker 1.11.1 on raspberry pi. I am unable to build a container, having tried many of the options on this thread.

EDIT: rebooting the raspberry pi mended this, as mentioned in another similar issue. YMMV

@zeitos

This comment has been minimized.

zeitos commented Aug 23, 2016

Same error on MacOs Version 1.12.1-rc1-beta23 (build: 11375)

time="2016-08-23T21:09:06.368055633Z" level=debug msg="Revoking external connectivity on endpoint my_docker_image (ac7becabec56e68b949038d740b6668121a3c09e75d03379c128ccb12826e5c9)"
time="2016-08-23T21:09:06.371134549Z" level=warning msg="driver failed revoking external connectivity on endpoint my_docker_image (ac7becabec56e68b949038d740b6668121a3c09e75d03379c128ccb12826e5c9): network not found: 05807ea9c6a73c24112f92b6795d18994c85ca90a885cdaa248046c590e7127b"
time="2016-08-23T21:09:06.465919398Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"exit\", Id:\"e3d97a98ffb0a6571c855082d80b0b474925a64a43b6773558f7104d15437911\", Status:0x89, Pid:\"init\", Timestamp:(*timestamp.Timestamp)(0xc8219b34d0)}"
time="2016-08-23T21:09:06.467110450Z" level=debug msg="Revoking external connectivity on endpoint my_docker_image (0f10417e3f4dcce2dd5e8264ee2e6ec2606874b0600f992315e205c21ee13fbf)"
time="2016-08-23T21:09:06.467798571Z" level=warning msg="driver failed revoking external connectivity on endpoint my_docker_image (0f10417e3f4dcce2dd5e8264ee2e6ec2606874b0600f992315e205c21ee13fbf): network not found: 05807ea9c6a73c24112f92b6795d18994c85ca90a885cdaa248046c590e7127b"

I couldn't try deleting local-kv.db b/c I don't know where that file is on MacOS
My only "solution" was reset to factory defaults :S

@martin-sweeny

This comment has been minimized.

martin-sweeny commented Sep 3, 2016

Deleting /var/lib/docker/network/files/local-kv.db worked for me

@dorflex150mg

This comment has been minimized.

dorflex150mg commented Sep 23, 2016

I was experiencing the same problem. Removing /var/lib/docker/network/files/local-kv.db didn't help. Turns out that I had already created a network with that IP range, so docker network rm <my-network> solved it for me.

@caneraydinbey

This comment has been minimized.

caneraydinbey commented Sep 28, 2016

Removing /var/lib/docker/network/files/local-kv.db as root did not work for me for those errors:

ERROR: for jhipster-registry Cannot start service jhipster-registry: service endpoint with name dev_jhipster-registry already exists

ERROR: for gateway-app Cannot start service gateway-app: failed to update store for object type *libnetwork.endpointCnt: Key not found in store

ERROR: for onboarding-app Cannot start service onboarding-app: failed to update store for object type *libnetwork.endpointCnt: Key not found in store

ERROR: for hb-rabbitmq Cannot start service hb-rabbitmq: service endpoint with name dev_hb-rabbitmq_1 already exists

ERROR: for hb-mongodb Cannot start service hb-mongodb: failed to update store for object type *libnetwork.endpointCnt: Key not found in

@LawAbidingNinja

This comment has been minimized.

LawAbidingNinja commented Oct 26, 2016

Just ran into the same issue with docker 1.11.2

sudo systemctl status docker gave me:

docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-10-26 11:52:30 UTC; 6s ago
     Docs: https://docs.docker.com
  Process: 22946 ExecStart=/usr/bin/docker daemon -s overlay --registry-mirror=https://mirror.gcr.io --host=fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
  Process: 22940 ExecStartPre=/bin/sh -x -c if [ ! -s /var/lib/docker/repositories-overlay ]; then rm -f /var/lib/docker/repositories-overlay; fi (code=exited, status=0/SUCCESS)
 Main PID: 22946 (code=exited, status=1/FAILURE)
   Memory: 16.5M
      CPU: 142ms
   CGroup: /system.slice/docker.service

Oct 26 11:52:29 gke-tk-uat1-uat-pool-13756537-7eu5 sh[22940]: + [ ! -s /var/lib/docker/repositories-overlay ]
Oct 26 11:52:29 gke-tk-uat1-uat-pool-13756537-7eu5 sh[22940]: + rm -f /var/lib/docker/repositories-overlay
Oct 26 11:52:30 gke-tk-uat1-uat-pool-13756537-7eu5 docker[22946]: time="2016-10-26T11:52:30.215930217Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gatewa...ss already in use"

Deleting /var/lib/docker/network/files/local-kv.db fixed the problem and allowed me to start up docker successfully.

This is quite a serious issue, it's taking down production machines on our Google Container Engine.

Detailed Version Info:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.3
 Git commit:   4dc5990
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.6.3
 Git commit:   4dc5990
 Built:
 OS/Arch:      linux/amd64
@reidrac

This comment has been minimized.

reidrac commented Oct 28, 2016

Had this problem too; Debian Jessie and 1.11.2, exactly as described by the previous comment.

Removing /var/lib/docker/network/files/local-kv.db allowed me to start docker again.

jjungnickel added a commit to sgm-media/kubespray that referenced this issue Nov 14, 2016

jjungnickel added a commit to sgm-media/kubespray that referenced this issue Nov 17, 2016

@sic-z

This comment has been minimized.

sic-z commented Nov 24, 2016

Encountered same problem with docker 1.9.1 and CentOS 7 with Linux kernal 3.18.27.

deleting /var/lib/docker/network/files/local-kv.db works for me.

[root@slave36 ~]# docker info
Containers: 0
Images: 0
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-253:1-2097182-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: 
 Data file: /dev/vg-docker/data
 Metadata file: /dev/vg-docker/metadata
 Data Space Used: 12.41 GB
 Data Space Total: 104.2 GB
 Data Space Available: 91.74 GB
 Metadata Space Used: 21.18 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.126 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.18.27
Operating System: CentOS Linux 7 (Core)
CPUs: 4
Total Memory: 15.67 GiB
@charneykaye

This comment has been minimized.

charneykaye commented Jan 11, 2017

+1, deleting /var/lib/docker/network/files/local-kv.db solved this for me.

@tyhunt99

This comment has been minimized.

tyhunt99 commented Jan 21, 2017

Had this problem too; Ubuntu 14.04 and 1.12.6, exactly as described by the previous comment.

Removing /var/lib/docker/network/files/local-kv.db allowed me to start docker containers again.

@pazoozooCH

This comment has been minimized.

pazoozooCH commented Feb 2, 2017

Same with Ubuntu 16.04.1 and Docker 1.11.2 -> Deleting local-kv.db safed the day...

@kuche1991

This comment has been minimized.

kuche1991 commented Feb 8, 2017

This fixed it for me:

head -n -1 /etc/resolv.conf | sudo tee /etc/resolv.conf

(https://sdujancourt.me/2017/02/07/dockerd-error-initializing-network-controller-list-bridge-addresses-failed-no-available-network/)

@tgulacsi

This comment has been minimized.

tgulacsi commented Feb 26, 2017

docker 1.13.0 cannot start if I have an IPv6 nameserver in /etc/resolv.conf !?!

@clockzhong

This comment has been minimized.

clockzhong commented Mar 3, 2017

All the solution tried, still face this error:
Error starting daemon: Error initializing network controller: list bridge addresses failed: no available network

@clockzhong

This comment has been minimized.

clockzhong commented Mar 3, 2017

Yes, I found programster's conclusion is right. If the VPN is working, the Docker installation process will fail. After disabling the VPN, it works. So it means when the VPN is working, Docker's installation logic couldn't correctly handle the network status.

@tgulacsi

This comment has been minimized.

tgulacsi commented Mar 3, 2017

@tgulacsi

This comment has been minimized.

tgulacsi commented Mar 3, 2017

@clockzhong

This comment has been minimized.

clockzhong commented Mar 3, 2017

Tgulacsi, my failure doesn't relate to the /etc/resolv.conf. Even this file doesn't have the IPV6 nameserver in my envriroment, it still fails when my VPN is enabling. It seems this is a multiple-root-causes bug in Docker's installation process.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Mar 3, 2017

What I've found is that if there's a n ipv6 nameserver in /etc/resolv.conf,
then docker won't start

@tgulacsi If the nameserver contains a zone id (i.e., the %ens33 in nameserver fe80::e23f:49ff:fe09:fd38%ens33), you're probably running into #30295 (comment), which was fixed in docker 1.13.1 and up

@osman-masood

This comment has been minimized.

osman-masood commented Nov 1, 2017

I experienced the same problem with Docker 17.06, Ubuntu 16 (Linux kernel 4.4.0), on a Docker swarm worker node. Removing /var/lib/docker/network/files/local-kv.db and restarting the Docker daemon fixed the problem. Yay!

me@myhost:~$ sudo docker version
Client:
 Version:      17.06.2-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:00:17 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.2-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 19:59:11 2017
 OS/Arch:      linux/amd64
 Experimental: false
me@myhost:~$ uname -a
Linux docker.myhost.com 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

This is happening with a worker node. The container fails to start with this inspect log:

me@myhost:~$ sudo docker inspect xqdw4ne0z4gp
[
    {
        "ID": "xqdw4ne0z4gpakhdpz5ybfo07",
        "Version": {
            "Index": 10537402
        },
        "CreatedAt": "2017-11-01T21:47:45.031782214Z",
        "UpdatedAt": "2017-11-01T21:48:52.028210273Z",
        "Labels": {},
        "Spec": {
            "ContainerSpec": {
                "Image": "my-amazon-image-path",
                "Labels": {
                    "com.docker.stack.namespace": "my-stack"
                },
                "Command": [
                    "pypy",
                    "/app/main.py"
                ],
                "Env": [
                    "log_level=INFO"
                ],
                "Privileges": {
                    "CredentialSpec": null,
                    "SELinuxContext": null
                },
                "Hosts": [
                    "172.31.1.185 myhost1",
                    "172.31.7.6 myhost2"
                ]
            },
            "Resources": {
                "Limits": {
                    "MemoryBytes": 1073741824
                }
            },
            "RestartPolicy": {
                "Condition": "any",
                "Delay": 60000000000,
                "MaxAttempts": 0
            },
            "Placement": {
                "Platforms": [
                    {
                        "Architecture": "amd64",
                        "OS": "linux"
                    }
                ]
            },
            "Networks": [
                {
                    "Target": "c0jvj7s8yaiu5soc1sv1hsq04",
                    "Aliases": [
                        "my-alias"
                    ]
                }
            ],
            "LogDriver": {
                "Name": "awslogs",
                "Options": {
                    "awslogs-group": "my-group",
                    "awslogs-region": "us-west-1",
                    "awslogs-stream": "my-stream"
                }
            },
            "ForceUpdate": 0
        },
        "ServiceID": "rmifpooavorrgxxr4oh7wb8nq",
        "Slot": 1,
        "NodeID": "ubkmi3ua2ri97qfq7l8thh2f7",
        "Status": {
            "Timestamp": "2017-11-01T21:48:51.33548237Z",
            "State": "failed",
            "Message": "starting",
            "Err": "starting container failed: failed to get network during CreateEndpoint: network c0jvj7s8yaiu5soc1sv1hsq04 not found",
            "ContainerStatus": {
                "ContainerID": "e9da66fdf0094003932fea61eb9b1821c6a2a831cf31d56b7b449508797b814e",
                "ExitCode": 128
            },
            "PortStatus": {}
        },
        "DesiredState": "shutdown",
        "NetworksAttachments": [
            {
                "Network": {
                    "ID": "c0jvj7s8yaiu5soc1sv1hsq04",
                    "Version": {
                        "Index": 10252930
                    },
                    "CreatedAt": "2017-09-24T23:31:48.64028447Z",
                    "UpdatedAt": "2017-11-01T01:15:49.580929009Z",
                    "Spec": {
                        "Name": "cjj-stack_default",
                        "Labels": {
                            "com.docker.stack.namespace": "my-stack"
                        },
                        "DriverConfiguration": {
                            "Name": "overlay"
                        },
                        "Scope": "swarm"
                    },
                    "DriverState": {
                        "Name": "overlay",
                        "Options": {
                            "com.docker.network.driver.overlay.vxlanid_list": "4097"
                        }
                    },
                    "IPAMOptions": {
                        "Driver": {
                            "Name": "default"
                        },
                        "Configs": [
                            {
                                "Subnet": "10.0.0.0/24",
                                "Gateway": "10.0.0.1"
                            }
                        ]
                    }
                },
                "Addresses": [
                    "10.0.0.71/24"
                ]
            }
        ]
    }
]

And I got a bunch of error logs like this:

Nov 01 21:29:37 ip-172-31-10-148 dockerd[30751]: time="2017-11-01T21:29:37.956166048Z" level=warning msg="rmServiceBinding handleEpTableEvent my-stack_my-container-1 0e6d99add6ba784544353bc22ac50c19d97586f8cae47a5a2b50edb3fcb52e8e aborted c.serviceBindings[skey] !ok"
Nov 01 21:29:37 ip-172-31-10-148 dockerd[30751]: time="2017-11-01T21:29:37.956430618Z" level=warning msg="rmServiceBinding handleEpTableEvent my-stack_my-container-2 0e8935545795010a03da388a241804516f0b4adf69e9cdf2b607ec4987d45e99 aborted lb.backEnds[eid] !ok"
Nov 01 21:29:37 ip-172-31-10-148 dockerd[30751]: time="2017-11-01T21:29:37.956705647Z" level=warning msg="rmServiceBinding handleEpTableEvent my-stack_my-container-3 0e927be8af4658f0f00d27ddf144ade27741719f95f03f45f9e6c74af305babd aborted c.serviceBindings[skey] !ok"
Nov 01 21:29:37 ip-172-31-10-148 dockerd[30751]: time="2017-11-01T21:29:37.974170738Z" level=warning msg="rmServiceBinding handleEpTableEvent my-stack_my-container-4 0ec4874eca5a1a055776cb27abd96591ed9b55fb7647708bca124bde084bb398 aborted c.serviceBindings[skey] !ok"
Nov 01 21:29:37 ip-172-31-10-148 dockerd[30751]: time="2017-11-01T21:29:37.982465612Z" level=warning msg="rmServiceBinding handleEpTableEvent my-stack_my-container-5 0ef0f5ff7110d5cce2b97d6ced27eefaac32057c701bc016bc3232df4d650604 aborted c.serviceBindings[skey] !ok"

@Chili-Man Would you mind re-opening this issue please? Thank you.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Nov 1, 2017

@osman-masood please open a new issue with details; the original issue is from 2015, at which time the Swarm-managed overlay networks didn't even exist, so your issue may well be unrelated (but similar outcome)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment