Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SWARM] Docker swarm using closest ip, instead of given ip -- cannot join swarm #37046

Open
withinboredom opened this issue May 12, 2018 · 4 comments

Comments

@withinboredom
Copy link

Description

When connected to multiple networks using Docker For Windows, cannot join a swarm through NAT as Docker Swarm will use the closest ip, instead of the ip given in the join command. Results in a corrupted state.

My network topology looks like:

    (INET)
      ^
      |
 [ ROUTER : 192.168.2.1 ]
      |           inet          swarm_net     docker
[ MY COMPUTER : 192.168.2.11, 192.168.137.1, 10.0.75.1 ]
      |
      |                               hvint0       eth0        docker0     docker_gwbridge
      |----- [ DOCKER FOR WINDOWS : 10.0.75.2, 192.168.65.3, 172.17.0.1, 172.18.0.1 ]
      |
      |                             eth0          docker0     docker_gwbridge
      |----- [ SWARM MANAGER : 192.168.137.177, 172.17.0.1, 172.18.0.1 ]

On the 192.168.137.X subnet, I have a swarm cluster that is functioning properly.

I then run the following command:

docker swarm join --advertise-addr 192.168.2.11 --listen-addr 192.168.2.11 --data-path-addr 192.168.2.11 --token SWMTKN-1-62mgsensh897tvspfqqv1zc8ewsy64xjkt2igwn54yoa5i5n3o-6rnc98aekio4ghmbwumgq6a3e 192.168.137.177:2377

It connects properly, but after a few moments, the swarm manager reports it as down.

On the swarm manager I run docker node inspect linuxkit-00155dcca13d and get the following output:

[
    {
        "ID": "08qhacagruv9f4nd4u8wnluij",
        "Version": {
            "Index": 83
        },
        "CreatedAt": "2018-05-12T02:43:11.095200046Z",
        "UpdatedAt": "2018-05-12T02:43:37.457088867Z",
        "Spec": {
            "Labels": {},
            "Role": "manager",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "linuxkit-00155dcca13d",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 2000000000,
                "MemoryBytes": 1021038592
            },
            "Engine": {
                "EngineVersion": "18.03.1-ce",
                "Plugins": [
                    {
                        "Type": "Log",
                        "Name": "awslogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "fluentd"
                    },
                    {
                        "Type": "Log",
                        "Name": "gcplogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "gelf"
                    },
                    {
                        "Type": "Log",
                        "Name": "journald"
                    },
                    {
                        "Type": "Log",
                        "Name": "json-file"
                    },
                    {
                        "Type": "Log",
                        "Name": "logentries"
                    },
                    {
                        "Type": "Log",
                        "Name": "splunk"
                    },
                    {
                        "Type": "Log",
                        "Name": "syslog"
                    },
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "macvlan"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            },
            "TLSInfo": {
                "TrustRoot": "-----BEGIN CERTIFICATE-----\nMIIBazCCARCgAwIBAgIUQRFTeeafYRK/2Ol6qc3vFRTm4towCgYIKoZIzj0EAwIw\nEzERMA8GA1UEAxMIc3dhcm0tY2EwHhcNMTgwNTEyMDE1MjAwWhcNMzgwNTA3MDE1\nMjAwWjATMREwDwYDVQQDEwhzd2FybS1jYTBZMBMGByqGSM49AgEGCCqGSM49AwEH\nA0IABGJEAP20lH+j0OiGO8OtRjecXlyStopoeXbvbJnUvvhiQSp3ATvfNlc9PwJR\n+s+fbNjwXqWP8da7SjWQIthdvtmjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMB\nAf8EBTADAQH/MB0GA1UdDgQWBBRCnUqlTHkSVFOq9lzdidCdZwnLiTAKBggqhkjO\nPQQDAgNJADBGAiEAwhKRc9Gy78PdC0N8v/y5ZEyhflRMhVK15GufgkXgEXoCIQCr\nGVpo241wleVAXjqCuB9loJYhLgKdDq8yZf7KoJzl2Q==\n-----END CERTIFICATE-----\n",
                "CertIssuerSubject": "MBMxETAPBgNVBAMTCHN3YXJtLWNh",
                "CertIssuerPublicKey": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEYkQA/bSUf6PQ6IY7w61GN5xeXJK2imh5du9smdS++GJBKncBO982Vz0/AlH6z59s2PBepY/x1rtKNZAi2F2+2Q=="
            }
        },
        "Status": {
            "State": "down",
            "Message": "heartbeat failure",
            "Addr": "192.168.137.1"
        }
    }
]

As you can see, it has the completely wrong ip address, though given in the join command, it lists the correct ip address. The correct ip address is fully routable from the swarm subnet:

withinboredom@manager:~$ ping 192.168.2.11
PING 192.168.2.11 (192.168.2.11) 56(84) bytes of data.
64 bytes from 192.168.2.11: icmp_seq=1 ttl=127 time=0.581 ms
64 bytes from 192.168.2.11: icmp_seq=2 ttl=127 time=0.340 ms
64 bytes from 192.168.2.11: icmp_seq=3 ttl=127 time=0.281 ms
^C
--- 192.168.2.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.281/0.400/0.581/0.131 ms
withinboredom@manager:~$ sudo traceroute -T 192.168.2.11
traceroute to 192.168.2.11 (192.168.2.11), 30 hops max, 60 byte packets
 1  DESKTOP-F9LFL53.mshome.net (192.168.137.1)  0.753 ms * *
 2  * * *
 3  DESKTOP-F9LFL53 (192.168.2.11)  0.944 ms  0.938 ms  1.014 ms

Describe the results you received:

Trying to communicate with the wrong ip address.

Describe the results you expected:

I expected for it to use the ip address given.

Additional information you deem important (e.g. issue happens only occasionally):

No way that I can figure to force it to use the correct ip address

Output of docker version:

withinboredom@manager:~$ docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

withinboredom@manager:~$ docker info
Containers: 21
 Running: 21
 Paused: 0
 Stopped: 0
Images: 17
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: rou7ro70jbyb3ea84f6gsx5to
 Is Manager: true
 ClusterID: ufv7947vmngrdjrxk0w1ey2dg
 Managers: 1
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.137.177
 Manager Addresses:
  192.168.137.177:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-124-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 748.4MiB
Name: manager
ID: MA4Z:2SIL:4YYN:HWFV:AGR2:6MUU:OWJW:HKAE:Q22N:YRIW:4JN4:ROUZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
Hyper-v machines + Docker For Windows

@msupino
Copy link

msupino commented Nov 25, 2018

I also face this issue, but from a different use, i am trying to build a swam based on loopback addresses, and routing set in place

the swarm init works , docker node inspect self on the swarm init shows

    "Status": {
        "State": "ready",
        "Addr": "10.254.0.1"
    },
    "ManagerStatus": {
        "Leader": true,
        "Reachability": "reachable",
        "Addr": "10.254.0.1:2377"
    }

but when adding nodes, with the same details, the Status.Addr is not the loopback, its the physical interface

    "Status": {
        "State": "ready",
        "Addr": "10.2.0.3"
    },
    "ManagerStatus": {
        "Reachability": "reachable",
        "Addr": "10.254.222.217:2377"
    }

for the join command, i used a loopback interface for all params

docker swarm join --token SWMTKN-1 --advertise-addr lo10 --data-path-addr lo10 --listen-addr lo10 10.254.0.1:2377

any idea ? any clue where in the code this might happen ? ill compile myself if needed

Thanks.

@hrmoradi
Copy link

hrmoradi commented Apr 2, 2020

annot join a swarm through NAT as Docker Swarm

Hello,

I experiencing the same problem, have you been able to sole this issue?

thanks

@Florentin68
Copy link

It's exactly the same for me

@Florentin68
Copy link

I think this is due to docker swarm init with no --advertise-addr option which uses an IP shown in node join-token result and which is not the one we want to use on local network but this from docker0 network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants