Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running two instances of same stack, DNS to service name is resolving to both stacks. #37348

Open
noonowl71 opened this issue Jun 26, 2018 · 2 comments

Comments

@noonowl71
Copy link

Description

When deploying two instances of the same stack with same network, dns queries to services inside the stack a resolving to IPs on both stacks.

This behaviour is not the same as it says on "Service discovery and links" docs, so I think this may be a bug:

If the container making the query is part of a stack, and there is a local match on the same stack, the local match takes precedence over the service or container that is outside the stack.

Steps to reproduce the issue:

  1. Define test stack:
version: '3.6'

services:

  app1:
    image: nginx:alpine
    networks:
      - testnet
      
  app2:
    image: nginx:alpine
    networks:
      - testnet
  
networks:
  testnet:
    external: true

  1. Create test network
docker network create -d overlay testnet

  1. Deploy two instances of the stack
docker stack deploy -c stack.yml stack1
docker stack deploy -c stack.yml stack2

Describe the results you received:

Resolving app1 from app2 in stack1 returns IPs for app1 in both stacks

# docker container exec stack1_app2.1.o3gfau7ghhztfk5bg71g35wyj nslookup app1

nslookup: can't resolve '(null)': Name does not resolve
Name:      app1
Address 1: 10.0.2.9
Address 2: 10.0.2.3

Describe the results you expected:

Resolving app1 from app2 in stack1 should return only the IP for app1 in stack1

# docker container exec stack1_app2.1.o3gfau7ghhztfk5bg71g35wyj nslookup app1

nslookup: can't resolve '(null)': Name does not resolve
Name:      app1
Address 1: 10.0.2.3

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

# docker version
Client:
 Version:      18.05.0-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   f150324
 Built:        Wed May  9 22:16:13 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   f150324
  Built:        Wed May  9 22:14:23 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

# docker info
Containers: 16
 Running: 16
 Paused: 0
 Stopped: 0
Images: 72
Server Version: 18.05.0-ce
Storage Driver: btrfs
 Build Version: Btrfs v4.15.1
 Library Version: 102
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: su5d932twbn7hsnkgrwzo02yb
 Is Manager: true
 ClusterID: oquf4auqar7agici1ruou6aoe
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.2.1.70
 Manager Addresses:
  10.2.1.70:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-23-generic
Operating System: Ubuntu 18.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.38GiB
Name: server
ID: OK6N:6ZLQ:OJQM:AAK7:WIDP:F3HC:H4PH:5WMT:LV2Z:LCUN:HMQ3:IJE5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical

@markwylde
Copy link

markwylde commented Jul 3, 2019

I think this is the same issue that's effecting me too, however not only deploying the same stack again, even deploying a completely separate stack that just so happens to have a service with the same name.

  • I create two completely different stacks, but have them default to same network.
  • I exec into service one_web-one and ping web-two. I will always connect to one_web-two.
  • I exec into service one_web-two and ping web-one. I will sometimes connect to one_web-one but sometimes get two_web-one.

I would have expected that if I haven't referenced another stack, it would always default to the service inside of my stack.

The images below use Alpine links. You can install dig by running apk update && apk install bind-tools

Stack One:

version: "3.7"

services:
  web-one:
    image: tutum/hello-world
  web-two:
    image: tutum/hello-world

networks:
  default:
    external: true
    name: someoverlaynetwork

Stack Two:

version: "3.7"

services:
  web-one:
    image: tutum/hello-world

networks:
  default:
    external: true
    name: someoverlaynetwork

When I dig web-one from the web-two service in stack one I see:

/ # dig web-one

; <<>> DiG 9.10.4-P8 <<>> web-one
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33978
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web-one.			IN	A

;; ANSWER SECTION:
web-one.		600	IN	A	10.0.0.50
web-one.		600	IN	A	10.0.0.55

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jun 23 13:02:07 UTC 2019
;; MSG SIZE  rcvd: 71

10.0.0.50 is actually two_web-one
10.0.0.55 is actually one_web-one (or what I would expect web-one to resolve to)

@noonowl71 did you ever find a way around this? My thoughts are around namespacing all my services. Something like:

services:
  com.mycompany.stackname.servicename:
    image: whatever...

@agunnarsson
Copy link

I think the problem in both your setups is that both stacks are connected to the same external network. If you define the network to be created when the stack is deployed, each stack will have its own network named stackname_networkname.

Change:

networks:
  testnet:
    external: true

To:

networks:
  testnet:
    driver: overlay

Question is still if services in other stacks should be discovered without the adding the stack namespace as prefix. I just filed a related issue which might clarify that. #40213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants