-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
Problem is probably similar to #25325. Docker can't reach containers from hostB when I query hostA public address.
I'm using Docker swarm with 2 hosts, they are connected via wireguard tunnel and are reachable to each other. I'm able to ping those hosts from each other using internal addresses.
Then I initialize swarm mode using --advertise-addr, --data-path-addr and --listen-addr options, also stated internal addresses there. Hosts are visible via docker node ls, both active. No errors in syslog.
But when I create service with 2 replicas, I'm facing strange behavior, accessing service via one of public IPs, I'm able to reach only containers which are running on this particular node. Other requests fail with timeout.
Steps to reproduce the issue:
- Setup wireguard tunnel, check that it works fine.
- Setup docker in swarm mode.
- Run a serivce. I'm using this one: agrrh/dummy-service-py. It runs HTTP service on port 80 and answers with container's hostname + random uuid.
- Scale service at least with 2 replicas. (
docker service create --name dummy --replicas 2 --publish 8080:80 agrrh/dummy-service-py) - Try to cycle through replicas querying HostA address.
Describe the results you received:
As I said, requests to containers on other nodes fail:
$ http host1:port
{ "hostname": "containerA" } # this container running at host1
$ http host1:port
http: error: Request timed out (30.0s).
$ http host2:port
http: error: Request timed out (30.0s).
$ http host2:port
{ "hostname": "containerB" } # this container running at host2
Describe the results you expected:
I expect to be able to reach all of running containers by querying public address of any single node.
Additional information you deem important (e.g. issue happens only occasionally):
It seems to me that wireguard/tunnel itself is not the cause as I still able to send pings between containers. For example, containerB can reach those containerA addresses:
10.255.0.4 @lo~0.050 ms (looks like this actually don't leave host2)10.255.0.5 @eth0~0.700 ms (I can see this withtcpdumpon other end, it's reachable!)172.18.0.3 @eth1~0.050 ms (this probably don't leave host2 too)
Due to using --advertise-addr I can see packets running between hosts via private interface.
I tried to install ntp and sync the clock but this not helped.
I also attempted to apply various fixes (e.g. turn off masquerading, re-create default bridge with lower MTU, set default bind IP, etc), but got no luck.
I reproduced the issue 3 already times with clean setup and ready to provide collaborators access to my test hosts if you would like to investigate onsite.
Output of docker version:
Same on both hosts:
Client:
Version: 18.03.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:10:01 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:08:31 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 18.03.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: rdwi6u922eb93s3z3cq1vuih1
Is Manager: true
ClusterID: g8urrtm78sc68oro86k3wvjzf
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.0.5.1
Manager Addresses:
10.0.5.1:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-37-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 481.8MiB
Name: test1
ID: IS5W:2U5W:XDAE:UXIF:KXRR:FQSU:PI7K:UXEQ:OOHK:HC4O:TLZR:P4UU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Wireguard setup guide (assuming you installed it):
### Server
cd /etc/wireguard
umask 077
wg genkey | tee server_private_key | wg pubkey > server_public_key
# /etc/wireguard/wg0.conf
[Interface]
Address = 10.0.5.1/32
SaveConfig = true
PrivateKey = <paste server private key here>
ListenPort = 51820
[Peer]
PublicKey = <paste client public key here>
AllowedIPs = 10.0.5.2/32
wg-quick up wg0
### Client
cd /etc/wireguard
umask 077
wg genkey | tee server_private_key | wg pubkey > server_public_key
# /etc/wireguard/wg0.conf
[Interface]
Address = 10.0.5.2/32
PrivateKey = <paste client private key here>
[Peer]
PublicKey = <paste server public key here>
Endpoint = <paste server IP here>:51820
AllowedIPs = 10.0.5.0/24
wg-quick up wg0
Servers should be reachable via internal addresses in a moment after this steps.