Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWARM IP overlapping during a new container creation in overlay network #41576

Open
IMMORTALxJO opened this issue Oct 21, 2020 · 6 comments
Open

Comments

@IMMORTALxJO
Copy link

IMMORTALxJO commented Oct 21, 2020

Description

Good day!

  • 20 nodes ( 3 managers + 17 workers )
  • 115 overlay networks
  • all networks 'precreated' on all nodes via global container as hack to avoid "Network not found" issue during a deployment

My SWARM cluster has ip overlaps during a new container allocation process. The worst case is then container takes address of lb endpoint, this causes container unreachability for other containers in network.
Screenshot 2020-10-21 at 11 00 11

I've investigated the issue for a while and found my managers has corrupted libnetwork bitseq. Cluster thinks that 1-2 addresses per each networks are unallocated, but they should be since they have been attached to lb endpoints.
Screenshot 2020-10-21 at 10 49 11

What's the right and safe way to fix a corrupted bitseq? And how to avoid corruption in a future?

Describe the results you received:
Sometimes new container receives IP address which has already been allocated to other swarm component ( usually overlay lb endpoint ). That causes network unreachability of a new container to other containers of a network.

Describe the results you expected:
Cluster has correct information about allocated ip addresses, there is no possibility of ip overlapping.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:23 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 190
  Running: 127
  Paused: 0
  Stopped: 63
 Images: 113
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: m2x88kblpabwax3acwhs18o7v
  Is Manager: false
  Node Address: 10.254.11.131
  Manager Addresses:
   10.254.10.58:2377
   10.254.10.59:2377
   10.254.10.60:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-0.bpo.8-amd64
 Operating System: Debian GNU/Linux 9 (stretch)
 OSType: linux
 Architecture: x86_64
 CPUs: 36
 Total Memory: 125.6GiB
 Name: swarm-worker-15
 ID: INQT:TGDK:EIRX:YOLQ:5FXU:LVGS:XQAF:MMS3:JPBB:DO7M:RNVS:3MTZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  provider=generic
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
physical ( 3 managers + 17 workers )

@xinfengliu
Copy link
Contributor

Well, that's a fancy visualization for bitseq.

all networks 'precreated' on all nodes via global container as hack to avoid "Network not found" issue during a deployment

Do you mean you have created a swarm global service attaching all pre-created networks to avoid IP duplication due to node LB not cleaned up in some corner cases?

I have an enhanced swarmctl moby/swarmkit#2977 , swarmctl node inspect <node> can output node attachments which contains the node LB IP allocated by swarmkit, if it is different from the real node LB IP observed from running docker network inspect on the node, then it indicates the problem.

@IMMORTALxJO
Copy link
Author

Well, that's a fancy visualization for bitseq.

Yeah, I wrote it to monitor changes in bitseq without a pain, bash script which prints state based on the last log message - https://gist.github.com/IMMORTALxJO/22784991ad3011f6ac0fc0eb687faeb0

Do you mean you have created a swarm global service attaching all pre-created networks to avoid IP duplication due to node LB not cleaned up in some corner cases?

Yes, I have created swarm global services with all networks attached, as easy workaround for avoiding Network not found issue during a deployment process ( got it periodically ) and to make it impossible to get IP overlap mentioned in #40989
With these containers I'm sure that each network has at least one container on each node.

I have an enhanced swarmctl moby/swarmkit#2977

Nice improvements, btw, I've already checked node attachments in raft via swarm-rafttool dump-object node and realized that some ip addresses have not been commited to storage and are missed.
I didn't have a success in looking for errors or warnings related to missed addresses in managers logs, so don't know the root cause yet. As a fast workaround I executed script which has been recreating lb-endpoints of corrupted pairs of worker+network until their allocated IP's became commited to raft. This dirty hack fixed the issue for all networks in my case, and I don't have an idea why lb-endpoint IP could be uncommited to datastore sometimes.
Looks quite similar with #40989 , but in my case overlay networks on all nodes have an attached container always ( with start-fist update policy ).

@xinfengliu
Copy link
Contributor

@IMMORTALxJO
Thanks much for sharing the information.

Nice improvements, btw, I've already checked node attachments in raft via swarm-rafttool dump-object node and realized that some ip addresses have not been commited to storage and are missed.

Do you mean the node LB IP in node attachments is shown as released (unallocated) IP in libnetwork bitseq?

@IMMORTALxJO
Copy link
Author

Do you mean the node LB IP in node attachments is shown as released (unallocated) IP in libnetwork bitseq?

Yes, I meant that. As example :

  • we have network test-app_default with CIDR 10.0.28.0/24
  • ip address 10.0.28.170 attached to lb-endpoint on swarm-worker-03 node
  • ip address couldn't be found in the output of swarm-rafttool dump-object node on every manager node. Despite the fact that lb-endpoint for test-app_default on swarm-worker-03 has been created and is online, swarm-worker-03 has no attachments of test-app_default network at all

@tartemov
Copy link

tartemov commented Nov 5, 2020

@IMMORTALxJO

As a fast workaround I executed script which has been recreating lb-endpoints of corrupted pairs of worker+network until their allocated IP's became commited to raft. This dirty hack fixed the issue for all networks in my case, and I don't have an idea why lb-endpoint IP could be uncommited to datastore sometimes.

I meet similar issue with overlapping IP for service and LB. Can you share your script to fix and recreate LB-endpoints?

@xinfengliu
Copy link
Contributor

@IMMORTALxJO

  • we have network test-app_default with CIDR 10.0.28.0/24
  • ip address 10.0.28.170 attached to lb-endpoint on swarm-worker-03 node
  • ip address couldn't be found in the output of swarm-rafttool dump-object node on every manager node. Despite the fact that lb-endpoint for test-app_default on swarm-worker-03 has been created and is online, swarm-worker-03 has no attachments of test-app_default network at all

In your case, I guess if you run docker network inspect test-app_default on swarm-worker-03, the only container in the output will be the node LB lb-test-app_default. If 10.0.28.170 is shown as an unallocated IP in libnetwork bitseq, that is expected (correct) because from swarmkit's view this IP is not allocated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants