Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upSwarm mesh routing doesn't load balance between nodes (Wireguard setup) #37985
Comments
GordonTheTurtle
added
the
area/swarm
label
Oct 6, 2018
This comment has been minimized.
This comment has been minimized.
|
I've been able to reproduce the issue on Vultr with Wireguard too. Here are some network info of one node, if that helps: root@node1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000
link/ether 56:00:01:b5:94:b1 brd ff:ff:ff:ff:ff:ff
inet 217.69.xxx.xxx/23 brd 217.69.xxx.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 2001:19f0:6801:d6c:5400:1ff:feb5:94b1/64 scope global mngtmpaddr dynamic
valid_lft 2591730sec preferred_lft 604530sec
inet6 fe80::5400:1ff:feb5:94b1/64 scope link
valid_lft forever preferred_lft forever
3: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1
link/none
inet 10.0.0.1/32 scope global wg0
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:c5:f5:66:4f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
9: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:17:2a:59:de brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
inet6 fe80::42:17ff:fe2a:59de/64 scope link
valid_lft forever preferred_lft forever
11: vethad870c8@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 22:ed:0a:82:44:16 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::20ed:aff:fe82:4416/64 scope link
valid_lft forever preferred_lft forever
15: vethfa2ee72@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 0e:2d:83:44:9a:8b brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::c2d:83ff:fe44:9a8b/64 scope link
valid_lft forever preferred_lft foreverroot@node1:~# ip r
default via 217.69.xxx.1 dev ens3
10.0.0.2 dev wg0 scope link
10.0.0.3 dev wg0 scope link
169.254.169.254 via 217.69.xxx.1 dev ens3
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1
217.69.xxx.0/23 dev ens3 proto kernel scope link src 217.69.xxx.xxxroot@node1:~# wg
interface: wg0
public key: QgS5sC4pRT4MwUM5YXgElfRc0O/NvHWUXwfX7LDiJmg=
private key: (hidden)
listening port: 1194
peer: 1W5TiygX6nYpH7gSPYNmeWLi/1dQ28zJXJYORsPufng=
endpoint public_ip:1194
allowed ips: 10.0.0.3/32
latest handshake: 1 minute, 44 seconds ago
transfer: 445.74 KiB received, 472.49 KiB sent
peer: 78NWEbQ6XF/wZ3d3kzUCKWf8kKDajH3YfpDpUVFzYVM=
endpoint: public_ip:1194
allowed ips: 10.0.0.2/32
latest handshake: 1 minute, 50 seconds ago
transfer: 467.85 KiB received, 492.89 KiB sent |
angristan
changed the title
Mesh doesn't load balance between nodes (Wireguard setup)
Swarm mesh routing doesn't load balance between nodes (Wireguard setup)
Oct 8, 2018
This comment has been minimized.
This comment has been minimized.
|
Containers on different hosts can access each other trough the |
This comment has been minimized.
This comment has been minimized.
|
Is there any technical documentation on mesh? |
This comment has been minimized.
This comment has been minimized.
|
It seems that @cecchisandrone has the exact same issue as me! |
This comment has been minimized.
This comment has been minimized.
|
Why are you running the swarm listener through wireguard? Also, have you tried tracing this through something like tcpdump? |
This comment has been minimized.
This comment has been minimized.
|
Because that's the only way the nodes can communicate? As I said, on Hetzner Cloud, we don't have private networking. (Did I miss something?) Yes, I have done a ton of tcpdump, while comparing to a functional non-wireguard cluster. I didn't find anything, but I also don't really know what I should be searching. |
thaJeztah
added
the
area/networking
label
Nov 8, 2018
This comment has been minimized.
This comment has been minimized.
bagbag
commented
Nov 16, 2018
|
Same happens for me with wireguard 0.0.20181018 and docker-ce 18.09.0 |
This comment has been minimized.
This comment has been minimized.
|
Hello @bagbag, no, I'm still searching for a solution |
This comment has been minimized.
This comment has been minimized.
bagbag
commented
Nov 17, 2018
|
I gave up and use the public network with |
agrrh
referenced this issue
Nov 18, 2018
Open
Docker swarm load balancing not working over private network #36689
This comment has been minimized.
This comment has been minimized.
klipitkas
commented
Dec 8, 2018
•
|
Also experiencing this issue on Hetzner Cloud. Edit |
angristan commentedOct 6, 2018
Description
I am running a swarm cluster with 3 workers that also are managers.
I am running them on Hetzner Cloud, and since they don't have private networking, I use Wireguard to create a VPN between the servers.
I had no issues before running the same steps on Digital Ocean, without wireguard and with "real" private networking, so I assume the issues come from my current setup, but I can't understand why.
All the nodes are running Debian 9.
Of course, they can reach other, and the 3 required ports for Swarm are not closed. I'm even running a GlusterFS cluster without issue.
I have seen quite a number of issues in this repo regarding this exact problem, but since my setup is different I have opened another one.
Steps to reproduce the issue:
erina:docker swarm init --advertise-addr 10.0.42.5docker swarm join --token xxxx 10.0.42.5:237docker service create -p 80:80 --name web nginx:latestDescribe the results you received:
The nginx container is only accessible via the node it's running on.
Here, I will be able to access Nginx on
utaha's localhost, pirvate IP and public IP, but not via the other nodes.Describe the results you expected:
Thanks to Swarm's mesh network, I should be able to access any service from any node.
Since they can reach each other and there is not closed ports, I wonder what's wrong? Maybe something with the overlay network?
Output of
docker version:Output of
docker info:Additional environment details (AWS, VirtualBox, physical, etc.):
Here are some logs:
I'll be happy to provide more information if needed.