Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken communication in Kubernetes cluster #1321

Closed
wkruse opened this issue Jul 17, 2020 · 5 comments
Closed

Broken communication in Kubernetes cluster #1321

wkruse opened this issue Jul 17, 2020 · 5 comments

Comments

@wkruse
Copy link

wkruse commented Jul 17, 2020

Expected Behavior

After node reboot, flannel works without manual interaction.

Current Behavior

After node reboot,

iptables:

-P INPUT ACCEPT -c 791 191464
-P FORWARD DROP -c 0 0
-P OUTPUT ACCEPT -c 810 199303
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-KUBELET-CANARY
-A INPUT -c 16296 5152241 -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -c 0 0 -j KUBE-FORWARD
-A FORWARD -c 0 0 -j DOCKER-USER
-A FORWARD -c 0 0 -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT
-A FORWARD -o docker0 -c 0 0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -c 0 0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -c 0 0 -j ACCEPT
-A OUTPUT -c 16467 5798180 -j KUBE-FIREWALL
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -c 0 0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -c 0 0 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -c 0 0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -c 0 0 -j RETURN
-A DOCKER-USER -c 0 0 -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -c 0 0 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -c 0 0 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -c 0 0 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT

flannel container logs:

kubectl -n kube-system logs flannel-68knm flannel
I0717 14:18:01.607556       1 main.go:454] Searching for interface using 172.17.4.12
I0717 14:18:01.608016       1 main.go:531] Using interface with name enp0s3 and address 172.17.4.12
I0717 14:18:01.608038       1 main.go:548] Defaulting external address to interface address (172.17.4.12)
W0717 14:18:01.608059       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0717 14:18:01.667620       1 kube.go:119] Waiting 10m0s for node controller to sync
I0717 14:18:01.667677       1 kube.go:306] Starting kube subnet manager
I0717 14:18:02.667958       1 kube.go:126] Node controller sync successful
I0717 14:18:02.668051       1 main.go:246] Created subnet manager: Kubernetes Subnet Manager - xxx.xx
I0717 14:18:02.668060       1 main.go:249] Installing signal handlers
I0717 14:18:02.668121       1 main.go:390] Found network config - Backend type: vxlan
I0717 14:18:02.668178       1 vxlan.go:121] VXLAN config: VNI=1 Port=4789 GBP=false Learning=false DirectRouting=false
I0717 14:18:02.748994       1 main.go:355] Current network or subnet (10.2.0.0/16, 10.2.0.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0717 14:18:02.793500       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0717 14:18:02.858287       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0717 14:18:02.888682       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0717 14:18:02.892268       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j MASQUERADE --random-fully
I0717 14:18:02.893989       1 main.go:305] Setting up masking rules
I0717 14:18:02.897844       1 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0717 14:18:02.898190       1 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0717 14:18:02.899742       1 main.go:325] Running backend.
I0717 14:18:02.903394       1 main.go:343] Waiting for all goroutines to exit
I0717 14:18:02.903460       1 vxlan_network.go:60] watching for new subnet leases
I0717 14:18:02.918124       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0717 14:18:02.919143       1 iptables.go:167] Deleting iptables rule: -s 10.2.0.0/16 -d 10.2.0.0/16 -j RETURN
I0717 14:18:02.929375       1 iptables.go:167] Deleting iptables rule: -s 10.2.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0717 14:18:02.966513       1 iptables.go:167] Deleting iptables rule: ! -s 10.2.0.0/16 -d 10.2.0.0/24 -j RETURN
I0717 14:18:03.036071       1 iptables.go:167] Deleting iptables rule: ! -s 10.2.0.0/16 -d 10.2.0.0/16 -j MASQUERADE --random-fully
I0717 14:18:03.038955       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0717 14:18:03.039002       1 iptables.go:167] Deleting iptables rule: -s 10.2.0.0/16 -j ACCEPT
I0717 14:18:03.042125       1 iptables.go:155] Adding iptables rule: -s 10.2.0.0/16 -d 10.2.0.0/16 -j RETURN
I0717 14:18:03.047967       1 iptables.go:167] Deleting iptables rule: -d 10.2.0.0/16 -j ACCEPT
I0717 14:18:03.083728       1 iptables.go:155] Adding iptables rule: -s 10.2.0.0/16 -j ACCEPT
I0717 14:18:03.085302       1 iptables.go:155] Adding iptables rule: -s 10.2.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0717 14:18:03.095075       1 iptables.go:155] Adding iptables rule: -d 10.2.0.0/16 -j ACCEPT
I0717 14:18:03.138615       1 iptables.go:155] Adding iptables rule: ! -s 10.2.0.0/16 -d 10.2.0.0/24 -j RETURN
I0717 14:18:03.169260       1 iptables.go:155] Adding iptables rule: ! -s 10.2.0.0/16 -d 10.2.0.0/16 -j MASQUERADE --random-fully

The communication in the Kubernetes cluster is broken. Kubernetes services are not reachable. Trying to access the Kubernetes dashboard results in

Error trying to reach service: 'dial tcp 10.2.2.2:8443: i/o timeout'

Possible Solution

Killing the container or triggering the rollout of the flannel daemon set fixes the issue,

iptables:

-P INPUT ACCEPT -c 7288 1103002
-P FORWARD DROP -c 0 0
-P OUTPUT ACCEPT -c 7381 1278666
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-KUBELET-CANARY
-A INPUT -c 560996 131612110 -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -c 0 0 -j KUBE-FORWARD
-A FORWARD -c 0 0 -j DOCKER-USER
-A FORWARD -c 0 0 -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT
-A FORWARD -o docker0 -c 0 0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -c 0 0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -c 0 0 -j ACCEPT
-A FORWARD -s 10.2.0.0/16 -c 0 0 -j ACCEPT
-A FORWARD -d 10.2.0.0/16 -c 0 0 -j ACCEPT
-A OUTPUT -c 552157 146779456 -j KUBE-FIREWALL
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -c 0 0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -c 0 0 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -c 0 0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -c 0 0 -j RETURN
-A DOCKER-USER -c 0 0 -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -c 0 0 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -c 0 0 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -c 0 0 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -c 0 0 -j ACCEPT

flannel container logs:

kubectl -n kube-system logs flannel-68knm flannel
I0717 14:07:17.182309       1 main.go:454] Searching for interface using 172.17.4.12
I0717 14:07:17.182754       1 main.go:531] Using interface with name enp0s3 and address 172.17.4.12
I0717 14:07:17.182779       1 main.go:548] Defaulting external address to interface address (172.17.4.12)
W0717 14:07:17.182837       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0717 14:07:17.207268       1 kube.go:119] Waiting 10m0s for node controller to sync
I0717 14:07:17.210264       1 kube.go:306] Starting kube subnet manager
I0717 14:07:18.213835       1 kube.go:126] Node controller sync successful
I0717 14:07:18.214204       1 main.go:246] Created subnet manager: Kubernetes Subnet Manager - xxx.xx
I0717 14:07:18.214476       1 main.go:249] Installing signal handlers
I0717 14:07:18.214961       1 main.go:390] Found network config - Backend type: vxlan
I0717 14:07:18.215134       1 vxlan.go:121] VXLAN config: VNI=1 Port=4789 GBP=false Learning=false DirectRouting=false
I0717 14:07:18.237839       1 main.go:305] Setting up masking rules
I0717 14:07:18.240675       1 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0717 14:07:18.241220       1 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0717 14:07:18.241456       1 main.go:325] Running backend.
I0717 14:07:18.242213       1 main.go:343] Waiting for all goroutines to exit
I0717 14:07:18.242244       1 vxlan_network.go:60] watching for new subnet leases

Context

We are using Typhoon to provision bare metal Fedora CoreOS cluster: https://github.com/poseidon/terraform-render-bootstrap/tree/master/resources/flannel

It feels like a race condition and seems to be related to the iptables.

Your Environment

  • Flannel version: v0.12.0-amd64
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: v3.4.9
  • Kubernetes version (if used): v1.18.5
  • Operating System and version: Fedora CoreOS 32.20200715.2.2
  • iptables version: iv1.8.4 (legacy)
@vlastv
Copy link

vlastv commented Jul 29, 2020

I get some error after update to 32.20200715.3.0

@masaeedu
Copy link

masaeedu commented Aug 9, 2020

Same issue here, with quay.io/coreos/flannel:v0.12.0-amd64. I'm unable to ping the tunnel device of a node from the master and vice versa. After running kubectl delete pods --selector=app=flannel -n=kube-system and waiting for the pods to be redeployed, everything works. I spent so much time trying to debug the issue, I'm almost disappointed all it took to fix it was turning it off and then on again.

@masaeedu
Copy link

masaeedu commented Aug 9, 2020

Apparently this also comes back if you restart one of the nodes. Fix is the same each time it happens.

@wkruse
Copy link
Author

wkruse commented Aug 21, 2020

Adding

    - path: /etc/systemd/network/50-flannel.link
      contents:
        inline: |
          [Match]
          OriginalName=flannel*
          [Link]
          MACAddressPolicy=none

fixed it for me.

For more details see poseidon/typhoon@78e6409#diff-149624b2865445eeec93184c0504e405

@zhangguanzhang
Copy link
Contributor

same issue on v0.22.1

I0816 09:13:35.586652       1 main.go:482] Current network or subnet (172.27.0.0/16, 172.27.2.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules

If I restart all the flannel pod, k8s network will work fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants