Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

Open
bloo opened this issue Apr 25, 2018 · 3 comments
Open

Comments

@bloo
Copy link

bloo commented Apr 25, 2018

Over time, ingress requests on our Swarm cluster start timing out when one host node tries to route traffic to a container on another host node. We've found that the ingress-sbox container on the ingress network on those 2 hosts have the same private ip address.

Steps to reproduce the issue:

  1. Run a swarm with multiple managers on a self-updating, self-rebooting OS (Container Linux)
  2. Wait
  3. Observe intermidden timeouts

Describe the results you received:

If the container that's suppose to handle ingress traffic is in global mode, for example, and constrained to only the manager nodes (ie there are 3 containers spread across 3 host nodes), 1 out of 3 ingress requests to the external address of one of the manager host nodes times out.

Describe the results you expected:

Perfect routing.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:	17.12.1-ce
 API version:	1.35
 Go version:	go1.9.4
 Git commit:	7390fc6
 Built:	Tue Feb 27 22:10:31 2018
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.1-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	7390fc6
  Built:	Tue Feb 27 22:10:31 2018
  OS/Arch:	linux/amd64
  Experimental:	true

Output of docker info:

nodeA

core@ip-10-255-2-125 ~ $ docker info
Containers: 14
 Running: 7
 Paused: 0
 Stopped: 7
Images: 11
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: z9yytapt4r8tbu48epze2z22r
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.2.125
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-2-125
ID: GVNR:74L4:JMGJ:UNPB:RB55:7OTB:HSGS:G3PR:YHEU:QC3T:2PSR:6O74
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.type=m4.large
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeB

core@ip-10-255-1-242 ~ $ docker info
Containers: 8
 Running: 8
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 7vlbs4n5s3tm3b0qvld2t3exr
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.1.242
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-1-242
ID: KAQS:KOWT:IOII:GUTQ:BLU7:SNLK:4VLH:JRM2:PMGG:RZZM:R6YV:AS6P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeC

core@ip-10-255-3-162 ~ $ docker info
Containers: 9
 Running: 7
 Paused: 0
 Stopped: 2
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: yrjmuu83zhqc1b95kf3s2fx8s
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.3.162
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-3-162
ID: JZX3:DCZT:S7W6:E43Y:4MRZ:NOTU:Y3XB:ZX7C:EZ3J:OYM7:WZIU:GCX6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS across 3 AZs using CoreOS Container Linux AMIs and identical Launch Configurations.

This is a duplicate and simplified explanation of #36871.

@thaJeztah
Copy link
Member

ping @ctelfer could you have a look if this is one of the things fixed in 18.03.x?

@bloo
Copy link
Author

bloo commented Jun 25, 2018

@thaJeztah @ctelfer any luck? Our clusters have since upgraded to 18.03.1-ce and it would be nice to close out our internal issue. Thanks!

@ctelfer
Copy link
Contributor

ctelfer commented Jul 3, 2018

I haven't seen a particular signature of duplicate IP addresses on the ingress networks. However, there were definitely general duplicate IP address issues fixed in the 18.03 CE release. See moby/libnetwork#2105 in particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants