ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

bloo · 2018-04-25T18:27:52Z

Over time, ingress requests on our Swarm cluster start timing out when one host node tries to route traffic to a container on another host node. We've found that the ingress-sbox container on the ingress network on those 2 hosts have the same private ip address.

Steps to reproduce the issue:

Run a swarm with multiple managers on a self-updating, self-rebooting OS (Container Linux)
Wait
Observe intermidden timeouts

Describe the results you received:

If the container that's suppose to handle ingress traffic is in global mode, for example, and constrained to only the manager nodes (ie there are 3 containers spread across 3 host nodes), 1 out of 3 ingress requests to the external address of one of the manager host nodes times out.

Describe the results you expected:

Perfect routing.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:	17.12.1-ce
 API version:	1.35
 Go version:	go1.9.4
 Git commit:	7390fc6
 Built:	Tue Feb 27 22:10:31 2018
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.1-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	7390fc6
  Built:	Tue Feb 27 22:10:31 2018
  OS/Arch:	linux/amd64
  Experimental:	true

Output of docker info:

nodeA

core@ip-10-255-2-125 ~ $ docker info
Containers: 14
 Running: 7
 Paused: 0
 Stopped: 7
Images: 11
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: z9yytapt4r8tbu48epze2z22r
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.2.125
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-2-125
ID: GVNR:74L4:JMGJ:UNPB:RB55:7OTB:HSGS:G3PR:YHEU:QC3T:2PSR:6O74
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.type=m4.large
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeB

core@ip-10-255-1-242 ~ $ docker info
Containers: 8
 Running: 8
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 7vlbs4n5s3tm3b0qvld2t3exr
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.1.242
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-1-242
ID: KAQS:KOWT:IOII:GUTQ:BLU7:SNLK:4VLH:JRM2:PMGG:RZZM:R6YV:AS6P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeC

core@ip-10-255-3-162 ~ $ docker info
Containers: 9
 Running: 7
 Paused: 0
 Stopped: 2
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: yrjmuu83zhqc1b95kf3s2fx8s
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.3.162
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-3-162
ID: JZX3:DCZT:S7W6:E43Y:4MRZ:NOTU:Y3XB:ZX7C:EZ3J:OYM7:WZIU:GCX6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS across 3 AZs using CoreOS Container Linux AMIs and identical Launch Configurations.

This is a duplicate and simplified explanation of #36871.

The text was updated successfully, but these errors were encountered:

thaJeztah · 2018-05-14T22:39:17Z

ping @ctelfer could you have a look if this is one of the things fixed in 18.03.x?

bloo · 2018-06-25T13:56:20Z

@thaJeztah @ctelfer any luck? Our clusters have since upgraded to 18.03.1-ce and it would be nice to close out our internal issue. Thanks!

ctelfer · 2018-07-03T19:54:50Z

I haven't seen a particular signature of duplicate IP addresses on the ingress networks. However, there were definitely general duplicate IP address issues fixed in the 18.03 CE release. See moby/libnetwork#2105 in particular.

thaJeztah added the area/networking label May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

bloo commented Apr 25, 2018

thaJeztah commented May 14, 2018

bloo commented Jun 25, 2018

ctelfer commented Jul 3, 2018 •

edited

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

Comments

bloo commented Apr 25, 2018

thaJeztah commented May 14, 2018

bloo commented Jun 25, 2018

ctelfer commented Jul 3, 2018 • edited

ctelfer commented Jul 3, 2018 •

edited