Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No route to host" after reboot , docker overlay network issue #23270

Open
michaljrk opened this issue Jun 5, 2016 · 13 comments
Open

"No route to host" after reboot , docker overlay network issue #23270

michaljrk opened this issue Jun 5, 2016 · 13 comments

Comments

@michaljrk
Copy link

michaljrk commented Jun 5, 2016

There are 3 virtual machines running docker, hosted on 3 separated physical boxes.
The docker overlay network is set up between them to communicate 3 mongo docker containers (mongo,mongo2,mongo3 - replica set).

Everything worked as expected for a while, but after restarting physical host for mongo3, the container has lost communication with mongo and mongo2 over the docker overlay network.

I'm still able to see the network when running docker network ls, and the mongo3 is still connected to it. I've tried to reconnect the container to the network, restart it, restart docker daemon, hosting server, nothing helps..

iptables rules look fine, traffic for 7946 and 4789 port is permitted.

Any ideas?

Environment:
docker version:

Client:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:37:01 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.2
API version: 1.22
Go version: go1.5.3
Git commit: c3959b1
Built: Mon Feb 22 21:37:01 2016
OS/Arch: linux/amd64

docker info:

Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.10.2
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: null host overlay bridge
Kernel Version: 3.13.0-86-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.861 GiB
Name: mongo3host
ID: I5GA:KJFZ:27OF:GG2C:QYLO:UX5J:ZCKU:IWFC:5LAH:K6IL:JXAS:A5SM
WARNING: No swap limit support
Cluster store: etcd://20.X.X.180:12379
Cluster advertise: 20.X.X.186:2375

@michaljrk michaljrk changed the title No route to host after reboot , docker overlay network issue "No route to host" after reboot , docker overlay network issue Jun 5, 2016
@michaljrk
Copy link
Author

Few more informations after further debugging:

  1. When I create another container within the same network and the same host , I still cannot reach mongo and mongo2 containers (hosted on the separated boxes), BUT the new container can be reached from the mongo / mongo2.
  2. When I create 2 fresh docker containers within the same network on different physical boxes, they are able to communicate.

So I may assume that the communication problem is really limited to the existing containers..

@garthk
Copy link

garthk commented Oct 5, 2016

Another #25266 relative?

@anpieber
Copy link

anpieber commented Dec 1, 2016

Hey,

I'm experiencing the same problems as @michaljrk . I can confirm that it's NOT related to #25266 . If I connect to container a. - do a ping to container b. it says "Host not reachable". Then I restart container b. Now my ping reaches container b. But only for either about 2-5 minutes or till I try to put anything else than a ping on the connection. I've the following additional information to this problem.

  • I've 4 connected hosts with 10 overlay networks where every overlay network spans each host.
  • There is no direct relation which overlay network is going to go down when I reboot any of this server. I can also not predict which container in the overlay network wont be reachable.
  • The only way to cure the network is 1. stopping all containers in the overlay network 2. remove the overlay network 3. re-create the network 4. reattach all containers.
  • It's not happening on every reboot. But maybe every 3-4.

All servers have similar stats and the same etcd, docker and OS Version and update state.

docker version
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 22:01:48 2016
OS/Arch: linux/amd64

Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 22:01:48 2016
OS/Arch: linux/amd64

docker info
Containers: 10
Running: 10
Paused: 0
Stopped: 0
Images: 4
Server Version: 1.12.3
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 65
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: null overlay bridge host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.8.0-28-generic
Operating System: Ubuntu 16.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.75 GiB
Name: XXX.YYY.ZZ
ID: P7WW:R7NC:WP63:IKTK:G75N:W7EG:DQFE:AZJO:PLMQ:OXQW:J4FA:O5TK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster Store: etcd://138.XXX.XXX.97:2379
Cluster Advertise: 138.XXX.XXX.97:2376
Insecure Registries:
127.0.0.0/8

My current workaround is to disable auto-reboot. If there's a kernel update I stop all containers, remove all overlay networks and re-setup the entire thing. Ansible is great for such a job, but this is still kind of a big problem and not really salable. Any ideas what could cause the issue? Any additional information I can provide?

Thank you very much!

@michaljrk
Copy link
Author

I was hoping that it will be resolved in 1.12 but apparently it's not.
Any chance to prioritize it?

@thaJeztah
Copy link
Member

/cc @sanimej

@anpieber
Copy link

I had hoped that the update to docker 1.12.5 might have fixed the problem, but I've just ran into the same troubles with the new version.

Kind regards,
Andreas

@sanimej
Copy link

sanimej commented Feb 7, 2017

@michaljrk @anpieber If you have been seeing this issue randomly but only after a host or daemon restart on one of the nodes this might address it.

moby/libnetwork#1639

@anpieber
Copy link

Thanks for the hint @sanimej - how can I find out which docker version contains the libnetwork release, that I can test it. Thank you very much in advance!

@akatiyar
Copy link

I am having same issue with the rebooted host. Has this been fixed?

@baldurmen
Copy link

I have the same problem. A fix would be very nice indeed.

@alexpirine
Copy link

same here

@baldurmen
Copy link

It's a stupid hotfix, but since I keep forgetting this bug when I reboot my docker machine I'm using this cron job:

@reboot root /bin/sleep 30 && /bin/systemctl restart docker

Seems to work well enough.

@emilm
Copy link

emilm commented Feb 13, 2020

Have the same problem. Only stable workaround is restarting docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants