New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm Service - IP resolves differently from within container vs other containers (same overlay) #30963

Open
ventz opened this Issue Feb 13, 2017 · 13 comments

Comments

Projects
None yet
7 participants
@ventz

ventz commented Feb 13, 2017

Description

Service containers seem to get 2 IP addresses (let's say x.y.z.2 and x.y.z.3) -- one ends up being the DNS name, and the other is inserted in /etc/hosts. The problem this causes is that the container resolves it's own name to x.y.z.3, while other containers resolve it (by name) to x.y.z.2. For services like MariaDB Galera, where you need to specify an IP or a name, this breaks the cluster, since the cluster advertises by name one IP but in reality the other nodes push a different.

Ex - Starting a simple service (only 1 container) with something like:

docker service create \
--name apache \
--hostname apache \ <-- NOTE: this seems to cause the issue!
--network some-overlay \
$image

Seems to assign 2 IP addresses to the container. Let's say the network is 10.0.0.0/24, it gives it:
10.0.0.2 and 10.0.0.3

Everything resolves "apache" as "10.0.0.2", except that /etc/hosts on the apache container is "10.0.0.3" so if you attach to the apache container and resolve "apache", it thinks it's 10.0.0.3.

Steps to reproduce the issue:

  1. Run a service, let's say the container is "$hostname"
  2. Find the host it's running on, and attach to it: "docker exec -it $container /bin/shell"
  3. "ip addr" to find the 2 IPs
    4.) ping $hostname, and note the IP
    5.) run another service or simply attach to the overlay (make sure it's attachable when created) and start an alpine container to test with: "docker run -it --rm --net=some-overlay alpine /bin/ash"
    6.) ping $hostname again, and note the IP -- it will NOT match #4

Describe the results you received:
DNS name within the swarm/overlay, and the container's internal hostname<->IP do not match.

Describe the results you expected:
A single IP, or at least for the /etc/hosts to agree with the DNS name that's available.

Additional information you deem important (e.g. issue happens only occasionally):
N/A

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 12
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 50
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: mwxekr4tr4jhbhiv3s9at6ozp
 Is Manager: true
 ClusterID: vcwzg0mebqw4kp58pz8ynm0cn
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: PUBIP#1
 Manager Addresses:
  PUBIP#1:2377
  PUBIP#2:2377
  PUBIP#3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 31.42 GiB
Name: docker01
ID: SZUT:WA2N:TMPP:MAYQ:ZBZ3:VQS6:7N35:QTGU:NOCB:GJNQ:BUEO:UIQ2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 nfs=no
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
Swarm cluster with 3 managers, over 3 public IPs (1-1 NAT). Encrypted overlay network.
update: tested with non-encrypted overlay, and same issue.
update#2: it seems that having --hostname when creating the service causes the issue, so maybe there is some sort of a bug around that? If you don't have it, the container resolves the same IP from internally and externally.

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

Tested with non-encrypted overlay -- same issue.

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

UPDATE: It seems that having "--hostname" when creating the service causes the issue, so maybe there is some sort of a bug around that? If you don't have it, the container resolves the same IP from internally and externally.

@aboch

This comment has been minimized.

Contributor

aboch commented Feb 13, 2017

@ventz

10.0.0.2 is the Virtual IP for the apache server, each apache backend task will be programmed with it. 10.0.0.3 is the real IP address of the backend apache task.

--hostname option should not make a difference.

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

@aboch It seems that --hostname makes the difference because it matches the name of the service (--name).

I think the bug is around which IP gets inserted into /etc/hosts though. So when the --hostname matches the --name, and the first IP is in the /etc/hosts file, that responds from the inside instead of the 2nd IP (the DNS name IP).

So essentially you get:
1.) (from the container) "ping $hostname" => 10.0.0.3
2.) (from the outside) "ping $hostname" => 10.0.0.2

@aboch

This comment has been minimized.

Contributor

aboch commented Feb 13, 2017

Yes, the --hostname you could set when running containers has never been populated in the internal DNS. It has only local meaning inside the container.
So now that was made available to set for services, it end up inside all the service backend tasks, but it is not visible to other containers in the network. I am not sure what is the intended use.

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

I guess the issue is that it causes the collision. (You are right - with the dynamic --name to DNS, it almost seems redundant, other than the nice benefit of setting the container name). Basically in any cluster where the nodes have to agree on the IPs by name, this causes a break now. MariaDB's Galera cluster is actually a really good example. You start the nodes with a gcomm://node1, node2, node3 -- and node2 + node3 will see a different IP for node1 than what node1 will claim it's IP is, with the --hostname flag.

@aboch

This comment has been minimized.

Contributor

aboch commented Feb 13, 2017

It looks like we support templates for the newly introduced --hostname on the service cli.

docker service create ... --replicas <N> --hostname {{.Service.Name}}.{{.Task.Slot}} ...

This will have it match the task's name which the internal DNS reports.

^ Hmm no. This one instead:

--hostname {{.Service.Name}}.{{.Task.Slot}}.{{.Task.ID}} ...

@jmzwcn

This comment has been minimized.

Contributor

jmzwcn commented Feb 13, 2017

Reproduced too, strange why there is two IPs for a container.

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

@aboch thanks. Could there be a warning possibly if the --hostname is set to the same as --name, since it would produce undesirable results?

@jmzwcn - @aboch mentioned the reason ^

10.0.0.2 is the Virtual IP for the apache server, each apache backend task will be programmed with it. 10.0.0.3 is the real IP address of the backend apache task.

Having the two IPs is ok, but not having the container/task resolve itself to the same as external nodes is the issue.

Basically in any circumstance where multiple nodes need to agree on a "pool" based on name/dns (since it can't be done by --ip (@thaJeztah - this is a great example of the need for a static IP here: #29816, which is the sub-issue for this: #25303 (comment))

@ventz

This comment has been minimized.

ventz commented Feb 13, 2017

Another issue that this causes (assuming you remove --hostname or make it different than --name to eliminate that bit), the traffic that leaves is using the 2nd IP. But the advertised/by-name is the first IP.

So in a simple example, let's say you have 2 nodes:
A (10.0.0.2, and 10.0.0.3)
B (10.0.0.4 and 10.0.0.5)

from B: "ping A" will tell you 10.0.0.2
from A: "ping B" will tell you 10.0.0.4

But the nodes will communicate using 10.0.0.3 and 10.0.0.5

For things that require pre-configuration/exchange of IP or hostname -- this breaks it. You are now starting both nodes with something like "cluster-members: A, B" and they are communicate using neither of those addresses.

And since there is no static --ip option, and you can't use by name/DNS, this basically becomes impossible to do with a "docker service" setup. Where as, this works perfectly with "docker run".

I think this really needs a re-examine, because there are many clustered applications that tend to work with requiring pre-distribution of IP or hostname in order to cluster.

@rahulpalamuttam

This comment has been minimized.

rahulpalamuttam commented Aug 9, 2017

Has there been any updates to this issue or workarounds?
It's been a blocker for me when setting up my clustered application.

@ventz

This comment has been minimized.

ventz commented Aug 9, 2017

@rahulpalamuttam My current "solution" (more like limited fix) is NOT to use the --hostname, but instead, just use --name, and then use those names everywhere. But yes, the real solution would be for this to be fixed.

@jdelamar

This comment has been minimized.

jdelamar commented Aug 25, 2017

Oh, well that is a bit of a party pooper I guess :(

I am deploying cassandra cluster with docker-compose, but I can't really find a way to hack something using docker swarm but we can't seem to be able to leverage docker service scale. It almost works.

This issue will likely break any clustering mechanism that relies on node identity to provide seed or contact point, and it makes docker swarm unusable for all but the most simple use cases (scaling NGINX, I guess :).

Has anyone found a workaround that could work in conjunction with docker-compose v3? I will continue trying stuff, but having this issue prioritized would be a great enabler for a lot of use cases!

Of course, I could always revert back to spelling out precisely my deployment. One entry per services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment