New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker node not coming back after reboot #23828
Comments
I just found out it does work when I initialize the swarm with:
Is that intended behavior? The documentation says:
Which made me believe it would just listen on all addresses. Indeed, joining just worked the first time, using the docker machine |
@robbertkl After the worker has established a connection it maintains a list of all known manager's advertise-addresses and uses this list to find a new manager on restart. It doesn't use the address specified on join because that address is known to exists only on the time the join request was made and may not be active anymore. @nathanleclaire Is there a way to make the defaults work for docker-machine? @abronan Seems something similar to moby/swarmkit#957 could be useful for workers as well. |
Thank you, that makes sense. So actually the listen address you set is more like "the advertised listen address", and the default for this is really your "default IP" (based on default route). The same happens when I promote a worker to manager. It gets the IP from the default route. The additional thing here is:
|
Related to #23877 |
I don't understand what you want Isn't the issue here that Swarm is trying to divine an IP address to advertise with from Take a look at docker@sw1:~$ ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:31:3C:10:2F
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
docker_gwbridge Link encap:Ethernet HWaddr 02:42:AE:88:2F:BC
inet addr:172.18.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:aeff:fe88:2fbc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:536 (536.0 B) TX bytes:648 (648.0 B)
eth0 Link encap:Ethernet HWaddr 08:00:27:97:61:7F
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe97:617f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:966 errors:0 dropped:0 overruns:0 frame:0
TX packets:604 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:164634 (160.7 KiB) TX bytes:171644 (167.6 KiB)
eth1 Link encap:Ethernet HWaddr 08:00:27:FC:64:01
inet addr:192.168.99.101 Bcast:192.168.99.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fefc:6401/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:471 errors:0 dropped:0 overruns:0 frame:0
TX packets:358 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:62195 (60.7 KiB) TX bytes:100518 (98.1 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:219 errors:0 dropped:0 overruns:0 frame:0
TX packets:219 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:23450 (22.9 KiB) TX bytes:23450 (22.9 KiB)
veth88fa3ce Link encap:Ethernet HWaddr 66:2B:A3:3B:74:C8
inet6 addr: fe80::642b:a3ff:fe3b:74c8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:648 (648.0 B) TX bytes:1206 (1.1 KiB)
I kind of liked how in the libnetwork stuff you could specify an interface to advertise on. We should be careful about making assumptions WRT advertising IP addresses. There's no way to know if EDIT: Oops, didn't realize this is a bit on the older side. |
#24237 seems to be headed in right direction 👍 |
@nathanleclaire If we add a daemon command line argument for default interface that would probably work for machine as well? |
@tonistiigi Sure, we could add a daemon flag for that. |
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
VirtualBox
Steps to reproduce the issue:
docker-machine create -d virtualbox sw1
docker-machine create -d virtualbox sw2
docker $(docker-machine config sw1) swarm init
docker $(docker-machine config sw2) swarm join $(docker-machine ip sw1):2377
docker-machine restart sw2
Describe the results you received:
docker $(docker-machine config sw1) node ls
showingsw2
statusDown
, even after the restart was completed. The node does not come back.Describe the results you expected:
docker $(docker-machine config sw1) node ls
showingsw2
statusDown
during the restart, but changing back to statusReady
soon after the restart completed.Additional information you deem important (e.g. issue happens only occasionally):
After the restart, manually (re)joining won't fix it:
Accepting the node on the manager
sw1
doesn't help either.The text was updated successfully, but these errors were encountered: