NodePort accessible only from host where POD is running #89632

cfabio · 2020-03-29T15:08:07Z

What happened:
I have a 1 master 2 slaves Kubernetes cluster, nodes IP addresses are 192.168.122.[110-111-112]/24.
Deployment and Service defined as follows:

apiVersion: v1
kind: Service
metadata:
  name: test
  labels:
    app: test
spec:
  type: NodePort
  ports:
    - port: 1883
      name: eclipse-mosquitto
      protocol: TCP
      targetPort: 1883
      nodePort: 31883
  selector:
    app: test

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
        - name: eclipse-mosquitto
          image: eclipse-mosquitto
          ports:
          - containerPort: 1883

What you expected to happen:
I would expect to be able to reach the service exposed on port 31883 from every IP of the cluster, this worked fine in Kubernetes version 1.15 and 1.16.
Since Kubernetes 1.17.0 this is not the case anymore, service exposed on port 31883 is only accessible on the IP address of the physical node where the POD is actually running.

How to reproduce it (as minimally and precisely as possible):
Kubernetes cluster is created using kubeadm init --pod-network-cidr=10.244.0.0/16 command, network plugin used is flannel which is installed using the yaml definition file provided here: https://github.com/coreos/flannel#deploying-flannel-manually

Anything else we need to know?:
I tried going back to Kubernetes version 1.15.4 and 1.16.8 and the same service/deployment definition magically start to work again.
When using kubernetes 1.18.0 if I run nmap against all 3 nodes this is the output I get:

$ nmap 192.168.122.110,111,112 -p 31883
Starting Nmap 7.80 ( https://nmap.org ) at 2020-03-29 17:01 CEST
Nmap scan report for 192.168.122.110
Host is up (0.00017s latency).

PORT      STATE    SERVICE
31883/tcp filtered unknown

Nmap scan report for 192.168.122.111
Host is up (0.00036s latency).

PORT      STATE SERVICE
31883/tcp open  unknown

Nmap scan report for 192.168.122.112
Host is up (0.00021s latency).

PORT      STATE    SERVICE
31883/tcp filtered unknown

Nmap done: 3 IP addresses (3 hosts up) scanned in 0.23 second

I am not really sure if this is actually a bug in Kubernetes, flannel or some kind of incompatibility between some other component of the system.

Environment:

Kubernetes version (use kubectl version): 1.18.0 (same issue also on version 1.17.4)
Cloud provider or hardware configuration: on-premise virtual machines
OS (e.g: cat /etc/os-release): CentOS Linux release 7.7.1908
Kernel (e.g. uname -a): Linux centos 3.10.0-1062.18.1.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadm
Network plugin and version (if this is a network-related bug): flannel
Others:

The text was updated successfully, but these errors were encountered:

neolit123 · 2020-03-29T21:03:47Z

/sig network

uablrek · 2020-03-31T05:44:08Z

Proxy-mode ipvs or iptables?

uablrek · 2020-03-31T07:58:43Z

It seems I can reproduce this problem. It is not consistent, I can reach the nodeport on some nodes. Not only the one where the pod is executing. But on some nodes it doesn't work.

I never get this problem with proxy-mode=ipvs, only with iptables.

@cfabio If possible try with proxy-mode=ipvs.

/assign

uablrek · 2020-03-31T12:08:27Z

Weirdest finding today? Well here it is;

On nodes where no pod has ever executed forwarding to NodePort doesn't work(?!)

It works to access the nodeport from within a node where no pod has executed but traffic from an external source to the nodeport via such a node does not work.

Tested on K8s v1.16.7, v1.17.3, v1.18.0.

So it is not bound to K8s > v1.16.x

@cfabio Can it be that when you tested on k8s < v1.17 pods had been executing on the nodes?

Story

I accidentally loaded a deployment with many replicas for test and thought I just scale it down to 1, but then .... it worked! Access to nodeport worked via all nodes. I then loaded an alpine daemonset completely unrelated to the test-app and access to the nodeport worked via all nodes. Finally I added the alpine daemonsed, removed it again, waited until no alpine pods were running (and 5 extra sec after that) and then loaded the test-app and it still worked fine!

The alpine manifest;

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: alpine-daemonset
spec:
  selector:
    matchLabels:
      app: alpine
  template:
    metadata:
      labels:
        app: alpine
    spec:
      containers:
      - name: alpine
        image: library/alpine:latest
        command: ["tail", "-f", "/dev/null"]

@cfabio When you have the problem can you load this manifest and see if it helps?

By coincidence (I think) the header for this issue is very accurate 😄

uablrek · 2020-03-31T12:09:23Z

My first idea was that some kernel module was not loaded, but lsmod was identical.

cfabio · 2020-03-31T12:18:36Z

I am currently playing with IPVS to see if it makes any difference.

@cfabio Can it be that when you tested on k8s < v1.17 pods had been executing on the nodes?

Before rolling back to v1.15.* and v1.16.* I killed the cluster (drain, delete nodes, remove-etcd-member, cleaned up iptables, etc) and started again from scratch.

I'll run your Alpine daemonset and let you know about that.

cfabio · 2020-03-31T13:16:18Z

@uablrek here is what I did:

configured a cluster with KubeProxy in iptables mode, executed my usual MQTT broker POD and as expected the NodePort was open only on the host where the POD is being run.
Your Alpine DaemonSet runs just fine but the issue with the MQTT broker is still there.
killed the cluster and rebuilt it with kubeProxy in ipvs mode, even in this case NodePort is accessible only on the IP of the host where the POD is being run.
KubeProxy logs are full of this error though:

E0331 13:11:06.676253       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 244 2 90 0 0 0 0 0 0 0 0 0 0 0 0]
E0331 13:11:06.676265       1 proxier.go:1533] Failed to sync endpoint for service: 192.168.122.112:31883/TCP, err: parseIP Error ip=[10 244 2 90 0 0 0 0 0 0 0 0 0 0 0 0]

This might have to do with me doing something wrong when changing KubeProxy config from iptables to ipvs though.

uablrek · 2020-03-31T13:19:13Z

The error printouts are caused by this problem;
#89520

It is not present in k8s v1.17.x.

uablrek · 2020-03-31T13:22:21Z

An advantage with ipvs is that it is easier to troubleshoot. When you use proxy-mode=ipvs, do;

ipvsadm -Ln

on nodes where forwarding doesn't work.

uablrek · 2020-03-31T13:26:07Z

This is how it looks on my test system. My test server uses port 5001 instead of 1883 otherwise I took your manifests more or less as-is;

vm-002 ~ # ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.2:31883 rr
  -> 11.0.3.2:5001                Masq    1      0          0         
TCP  192.168.1.2:31883 rr
  -> 11.0.3.2:5001                Masq    1      0          1         
TCP  12.0.0.1:443 rr
  -> 192.168.1.1:6443             Masq    1      0          0         
TCP  12.0.144.99:1883 rr
  -> 11.0.3.2:5001                Masq    1      0          0         
TCP  12.0.162.229:443 rr
  -> 11.0.2.2:4443                Masq    1      0          0         
TCP  127.0.0.1:31883 rr
  -> 11.0.3.2:5001                Masq    1      0          0

cfabio · 2020-03-31T13:37:24Z

Here is the output of ipvsadm -Ln command executed on a node where NodePort is filtered:

Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  127.0.0.1:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  172.17.0.1:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  172.18.0.1:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  172.19.0.1:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  192.168.122.111:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  10.96.0.1:443 rr
  -> 192.168.122.110:6443         Masq    1      1          0         
TCP  10.96.0.10:53 rr
  -> 10.244.0.2:53                Masq    1      0          0         
  -> 10.244.0.3:53                Masq    1      0          0         
TCP  10.96.0.10:9153 rr
  -> 10.244.0.2:9153              Masq    1      0          0         
  -> 10.244.0.3:9153              Masq    1      0          0         
TCP  10.107.158.203:1883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
TCP  10.244.1.0:31883 rr
  -> 10.244.2.3:1883              Masq    1      0          0         
UDP  10.96.0.10:53 rr
  -> 10.244.0.2:53                Masq    1      0          0         
  -> 10.244.0.3:53                Masq    1      0          0

I am using CentOS 7 which has a very old kernel 3.10.*, which seems also the case for bug #89520.

uablrek · 2020-03-31T13:44:10Z

Looks good actually. The entry in ipvs is there with 2 inactive connections. Your problem seem to be another than the one I found 😞

cfabio · 2020-03-31T13:48:35Z

The way I have setup KubeProxy to work with IPVS is the following:

kubectl edit configmap kube-proxy -n kube-system and changed mode: "" to mode: ipvs.
kubectl delete po -n kube-system kube-proxy-*** and deleted every kube-proxy pod.
kubectl logs kube-proxy-*** | grep -i "using ipvs" checked that the freshly started kube-proxy pods actually use ipvs.
rebooted the machines of the cluster just to be safe.

Does this procedure make sense to you?

uablrek · 2020-03-31T14:11:39Z

Yes looks ok, but I might have missed something. To use kubeadm use a config file with the --config param and add att the end;

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

cfabio · 2020-03-31T14:40:41Z

@uablrek manually changing kube-proxy configuration by editing the configmap or rebuilding the cluster from scratch using kubeadm --config doesn't seem to make any difference; in both cases kube-proxy PODs logs are spammed with the error:

E0331 14:38:40.106113       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 244 2 2 0 0 0 0 0 0 0 0 0 0 0 0]
E0331 14:38:40.106142       1 proxier.go:1533] Failed to sync endpoint for service: 172.18.0.1:31883/TCP, err: parseIP Error ip=[10 244 2 2 0 0 0 0 0 0 0 0 0 0 0 0]

This is the yml file used to build the cluster with kubeadm --config:

apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.0
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

At this point the only thing I can think of is trying to use an OS with a more up to date Linux kernel or staying with Kubernetes v1.16.8 till these issues are ironed out.

uablrek · 2020-03-31T14:45:48Z

The "parseIP Error" is in k8s v1.18.0 only. It should work for v1.17.3 for instance.

uablrek · 2020-04-01T09:22:43Z

@cfabio SInce the problem I had when reproducing this issue does not seem to be the same as yours I can't take this much further. Since the problem does not appear in other cluster it must be something in your environment. I have not time to duplicate your setup but can only give some advice for trouble-shooting.

At least the ipvs setup seems OK so next step may be some traffic monitoring with tcpdump. On a node where forwarding does not work it is interresting to see if the request is forwarded. The port should be NAT'ed so trace with;

tcpdump -ni <outgoing-if> port 1883

Where the "outgoing-if" is the interface your CNI-plugin (flannel) uses to forward packets to other nodes. Compare with a trace where it works.

If traffic never leaves the receiving node the problem is there. However there are multiple steps that can fail. Packets may reach the POD but return traffic may fail, etc

cfabio · 2020-04-01T09:49:24Z

@uablrek thanks for the suggestion.
If I may ask, what operating system did you use for your tests?
The issue I am facing with NodePort (iptables and ipvs) might be related to CentOS 7.

uablrek · 2020-04-01T11:04:59Z

I use an image I built myself. Bare minimum BusyBox based. It removes interference from other things and give control over tool versions. E.g my kernel is linux-5.6. I tried btw to back down to linux-3.10 but then cri-o (my CRI-plugin, replaces docker) stopped working because some fs-overlay feature was missing. So I dropped that track.

uablrek · 2020-04-01T11:07:00Z

But the 3.10 kernel in CentOS 7 is very old. So IMHO it is worth upgrading just to rule out kernel-version problems at least.

cfabio · 2020-04-02T10:28:47Z

I setup a brand new cluster made of 3 CentOS 8 nodes (1 master 2 slaves) and tried every combination of Kubernetes versions 1.18.0, 1.17.4 and 1.17.3 with ipvs and iptables; the situation is exactly the same as what I had on CentOS 7, same issues, same errors.
Network configuration is:

master 192.168.122.120
slave1 192.168.122.121
slave2 192.168.122.122

Choosen backend is ipvs.
Configured a POD with Eclipse Mosquitto MQTT broker as described in OP, it gets executed on node slave1 and gets IP address 10.244.1.2.
Physical nodes (master and node2) can ping 10.244.1.2 and nmap 10.244.1.2 -p 1883 confirms that MQTT broker is listening on port 1883.
When checking port 31883 from outside the cluster with nmap, port 31883 is reported open only on the physical node where the POD is executed (node1).
I did what you suggested and run tcpdump on a physical node (slave2) where port 31883 is reported closed, the output is the following one:

# tcpdump -ni flannel.1 port 1883
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:35:37.955774 IP 10.244.2.0.58212 > 10.244.1.2.mqtt: Flags [S], seq 4005778623, win 64240, options [mss 1460,sackOK,TS val 2731514499 ecr 0,nop,wscale 7], length 0

which seems correct, as traffic is being routed to the IP address and port of the MQTT broker POD (10.244.1.2.mqtt).
What is interesting is that, if I open a shell into the MQTT broker POD and run tcpdump inside it nothing shows up, not a single packet.
Same result if I run tcpdump -ni any port 1883 on the physical host where the POD is in execution (slave1).
slave1 ipvs rules:

# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  centos8-k8s-slave1:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  centos8-k8s-slave1:https rr
  -> 192.168.122.120:sun-sr-https Masq    1      1          0         
TCP  centos8-k8s-slave1:domain rr
  -> 10.244.0.2:domain            Masq    1      0          0         
  -> 10.244.0.3:domain            Masq    1      0          0         
TCP  centos8-k8s-slave1:9153 rr
  -> 10.244.0.2:9153              Masq    1      0          0         
  -> 10.244.0.3:9153              Masq    1      0          0         
TCP  centos8-k8s-slave1:mqtt rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  centos8-k8s-slave1:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  centos8-k8s-slave1:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  localhost:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
UDP  centos8-k8s-slave1:domain rr
  -> 10.244.0.2:domain            Masq    1      0          0         
  -> 10.244.0.3:domain            Masq    1      0          0

slave2 ipvs rules:

# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  centos8-k8s-slave2:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      2          0         
TCP  centos8-k8s-slave2:https rr
  -> 192.168.122.120:sun-sr-https Masq    1      1          0         
TCP  centos8-k8s-slave2:domain rr
  -> 10.244.0.2:domain            Masq    1      0          0         
  -> 10.244.0.3:domain            Masq    1      0          0         
TCP  centos8-k8s-slave2:9153 rr
  -> 10.244.0.2:9153              Masq    1      0          0         
  -> 10.244.0.3:9153              Masq    1      0          0         
TCP  centos8-k8s-slave2:mqtt rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  centos8-k8s-slave2:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
TCP  localhost:31883 rr
  -> 10.244.1.2:mqtt              Masq    1      0          0         
UDP  centos8-k8s-slave2:domain rr
  -> 10.244.0.2:domain            Masq    1      0          0         
  -> 10.244.0.3:domain            Masq    1      0          0

thockin · 2020-04-02T17:12:44Z

Both IPVS and iptables modes us iptables for NodePort forwarding, I believe. But there are no other reports of this happening, so I don't buy that it is broken IN GENERAL.

I am (obviously) more inclined to blame flannel, but before I do that maybe we can rule out the iptables parts.

Can you provide a full iptables-save (feel free to redact IPs if you need, just replace them with unique identifiers :) from a working node and a failing node?

rikatz · 2020-04-03T19:44:10Z

@cfabio please take a look into #88986

So far, if this is the same case as I'm seeing (CentOS 7.7, Kernel 3.10.0-1062.*, flannel + vxlan) you should disable tx offloading in the source of the communication and it will work.

I've been taking a look into ClusterIP cases but can confirm that this happens also with NodePort (made a test with your case).

@thockin do you mind I take this issue and agreggate into #88986 ?

whites11 · 2020-04-04T06:45:40Z

I think I'm facing this same issue on two different clusters.
Not using flannel (we use AWS CNI and Azure CNI).
kube-proxy in iptables mode.

Playing around with tcpdump, it seems that the TCP handshake is not answered in a broken node:

# tcpdump -i any "port 80 or port 30010"
06:52:32.994186 IP worker2.58138 > worker1.30010: Flags [S], seq 3335691952, win 64240, options [mss 1418,nop,nop,sackOK,nop,wscale 7], length 0
06:52:34.056953 IP worker2.58138 > worker1.30010: Flags [S], seq 3335691952, win 64240, options [mss 1418,nop,nop,sackOK,nop,wscale 7], length 0

worker1 is a node without the pod running.
worker2 is another node in the cluster.
30010 is my nodeport
80 is the containerPort.

The TCP SYN packet arrives into worker1 but never gets an answer nor it gets sent elsewhere AFAIU.

I also tried to tcpdump on all interfaces in the node where the only instance of the POD is running and I get zero packets for either ports (30010 and 80).
I tried to compare the iptables-save results from one node that has the POD and one that does not.
In the broken node, I see these rules:

-A KUBE-NODEPORTS -p tcp -m comment --comment "kube-system/nginx-ingress-controller:http" -m tcp --dport 30010 -j KUBE-XLB-5URCD7LMTHSEGXBZ
-A KUBE-XLB-5URCD7LMTHSEGXBZ -m comment --comment "kube-system/nginx-ingress-controller:http has no local endpoints" -j KUBE-MARK-DROP

While in the working one I see:

-A KUBE-NODEPORTS -p tcp -m comment --comment "kube-system/nginx-ingress-controller:http" -m tcp --dport 30010 -j KUBE-XLB-5URCD7LMTHSEGXBZ
-A KUBE-XLB-5URCD7LMTHSEGXBZ -m comment --comment "Balancing rule 0 for kube-system/nginx-ingress-controller:http" -j KUBE-SEP-WGDPEU6D5NJVWF7U
-A KUBE-SEP-WGDPEU6D5NJVWF7U -s 10.0.132.147/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-WGDPEU6D5NJVWF7U -p tcp -m tcp -j DNAT --to-destination 10.0.132.147:80

My only concern is about the KUBE-MARK-DROP from the first output. I'd expect the packet to be forwarded to one of the other nodes, not to be dropped.

whites11 · 2020-04-04T10:28:17Z

For what it's worth, I can't reproduce this issue with a multi-node kind kubernetes v1.16.3 nor v1.17.0 cluster.

cfabio · 2020-04-04T14:43:45Z

@thockin iptables-save from slave1 (the one where the POD is in execution):

# Generated by iptables-save v1.4.21 on Sat Apr  4 16:01:29 2020
*mangle
:PREROUTING ACCEPT [34057:20971889]
:INPUT ACCEPT [34043:20970013]
:FORWARD ACCEPT [1:60]
:OUTPUT ACCEPT [30134:2322377]
:POSTROUTING ACCEPT [29688:2292643]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Sat Apr  4 16:01:29 2020
# Generated by iptables-save v1.4.21 on Sat Apr  4 16:01:29 2020
*filter
:INPUT ACCEPT [182:52274]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [137:15752]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o br-968244e70a0e -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-968244e70a0e -j DOCKER
-A FORWARD -i br-968244e70a0e ! -o br-968244e70a0e -j ACCEPT
-A FORWARD -i br-968244e70a0e -o br-968244e70a0e -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A FORWARD -s 10.244.0.0/16 -j ACCEPT
-A FORWARD -d 10.244.0.0/16 -j ACCEPT
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-968244e70a0e ! -o br-968244e70a0e -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-968244e70a0e -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Sat Apr  4 16:01:29 2020
# Generated by iptables-save v1.4.21 on Sat Apr  4 16:01:29 2020
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-LOAD-BALANCER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODE-PORT - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SERVICES - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.19.0.0/16 ! -o br-968244e70a0e -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o docker_gwbridge -j MASQUERADE
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
-A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i br-968244e70a0e -j RETURN
-A DOCKER -i docker_gwbridge -j RETURN
-A KUBE-FIREWALL -j KUBE-MARK-DROP
-A KUBE-LOAD-BALANCER -j KUBE-MARK-MASQ
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port for masquerade purpose" -m set --match-set KUBE-NODE-PORT-TCP dst -j KUBE-MARK-MASQ
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-set KUBE-LOOP-BACK dst,dst,src -j MASQUERADE
-A KUBE-SERVICES ! -s 10.244.0.0/16 -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-IP dst,dst -j KUBE-MARK-MASQ
-A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODE-PORT
-A KUBE-SERVICES -m set --match-set KUBE-CLUSTER-IP dst,dst -j ACCEPT
COMMIT
# Completed on Sat Apr  4 16:01:29 2020

iptables-save from slave2:

# Generated by iptables-save v1.4.21 on Sat Apr  4 16:02:41 2020
*mangle
:PREROUTING ACCEPT [34529:21104949]
:INPUT ACCEPT [34516:21103133]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [30632:2366423]
:POSTROUTING ACCEPT [30174:2336057]
:KUBE-KUBELET-CANARY - [0:0]
COMMIT
# Completed on Sat Apr  4 16:02:41 2020
# Generated by iptables-save v1.4.21 on Sat Apr  4 16:02:41 2020
*filter
:INPUT ACCEPT [92:18439]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [63:10736]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -o br-968244e70a0e -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-968244e70a0e -j DOCKER
-A FORWARD -i br-968244e70a0e ! -o br-968244e70a0e -j ACCEPT
-A FORWARD -i br-968244e70a0e -o br-968244e70a0e -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A FORWARD -s 10.244.0.0/16 -j ACCEPT
-A FORWARD -d 10.244.0.0/16 -j ACCEPT
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-968244e70a0e ! -o br-968244e70a0e -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-968244e70a0e -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Sat Apr  4 16:02:41 2020
# Generated by iptables-save v1.4.21 on Sat Apr  4 16:02:41 2020
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-LOAD-BALANCER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODE-PORT - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SERVICES - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o docker_gwbridge -j MASQUERADE
-A POSTROUTING -s 172.19.0.0/16 ! -o br-968244e70a0e -j MASQUERADE
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
-A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.2.0/24 -j RETURN
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i docker_gwbridge -j RETURN
-A DOCKER -i br-968244e70a0e -j RETURN
-A KUBE-FIREWALL -j KUBE-MARK-DROP
-A KUBE-LOAD-BALANCER -j KUBE-MARK-MASQ
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port for masquerade purpose" -m set --match-set KUBE-NODE-PORT-TCP dst -j KUBE-MARK-MASQ
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SERVICES ! -s 10.244.0.0/16 -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-IP dst,dst -j KUBE-MARK-MASQ
-A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODE-PORT
-A KUBE-SERVICES -m set --match-set KUBE-CLUSTER-IP dst,dst -j ACCEPT
COMMIT
# Completed on Sat Apr  4 16:02:41 2020

@rikatz disabing TX and RX offload for flannel.1 interface on every node of the cluster (ansible centos7-k8s-local -u root -a "ethtool --offload flannel.1 rx off tx off") seems to be an effective workaround on CentOS 7 with Kubernetes 1.18.0, Flannel and IPVS.
I wonder what the implications of disabling TX offload are.

whites11 · 2020-04-04T15:19:15Z

It would be awesome to have an iptables-save example for a cluster not facing this issue to compare the iptables rules.

whites11 · 2020-04-05T08:09:47Z

I realised that changing the Service's externalTrafficPolicy from Local to Cluster makes the networking work again. Not really a solution in my case, but it might help explaining the behaviour.

rikatz · 2020-04-06T17:21:03Z

@rikatz disabing TX and RX offload for flannel.1 interface on every node of the cluster (ansible centos7-k8s-local -u root -a "ethtool --offload flannel.1 rx off tx off") seems to be an effective workaround on CentOS 7 with Kubernetes 1.18.0, Flannel and IPVS.
I wonder what the implications of disabling TX offload are.

Yes, this is a known issue with RH/CentOS 7 kernel (3.10.0-1062) with vxlan and txoffload, so the only workaround right now is to disable the txoffload, wait for an update into the kernel or use another OS.

If this solves the problem, so this is a known problem. I'll mark this issue as a Duplicate of #88986 and close this, if you want you can also follow in this other issue, as soon as there is a release of an updated kernel version we can test and post there.

But reinforcing, in this case is not a Kubernetes problem, but a CNI problem with the current CentOS 7.7 kernel

Thank you!

/remove-triage unresolved
/triage duplicate
/close

k8s-ci-robot · 2020-04-06T17:21:05Z

@rikatz: The label(s) triage/ cannot be applied, because the repository doesn't have them

In response to this:

@rikatz disabing TX and RX offload for flannel.1 interface on every node of the cluster (ansible centos7-k8s-local -u root -a "ethtool --offload flannel.1 rx off tx off") seems to be an effective workaround on CentOS 7 with Kubernetes 1.18.0, Flannel and IPVS.
I wonder what the implications of disabling TX offload are.

Yes, this is a known issue with RH/CentOS 7 kernel (3.10.0-1062) with vxlan and txoffload, so the only workaround right now is to disable the txoffload, wait for an update into the kernel or use another OS.

If this solves the problem, so this is a known problem. I'll mark this issue as a Duplicate of #88986 and close this, if you want you can also follow in this other issue, as soon as there is a release of an updated kernel version we can test and post there.

But reinforcing, in this case is not a Kubernetes problem, but a CNI problem with the current CentOS 7.7 kernel

Thank you!

/remove-triage unresolved
/triage duplicate
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-04-06T17:21:25Z

@rikatz: Closing this issue.

In response to this:

@rikatz disabing TX and RX offload for flannel.1 interface on every node of the cluster (ansible centos7-k8s-local -u root -a "ethtool --offload flannel.1 rx off tx off") seems to be an effective workaround on CentOS 7 with Kubernetes 1.18.0, Flannel and IPVS.
I wonder what the implications of disabling TX offload are.

Yes, this is a known issue with RH/CentOS 7 kernel (3.10.0-1062) with vxlan and txoffload, so the only workaround right now is to disable the txoffload, wait for an update into the kernel or use another OS.

If this solves the problem, so this is a known problem. I'll mark this issue as a Duplicate of #88986 and close this, if you want you can also follow in this other issue, as soon as there is a release of an updated kernel version we can test and post there.

But reinforcing, in this case is not a Kubernetes problem, but a CNI problem with the current CentOS 7.7 kernel

Thank you!

/remove-triage unresolved
/triage duplicate
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rikatz · 2020-04-06T17:21:28Z

Oh and please feel free to reopen if you don't think its the same case :)

cfabio · 2020-04-06T20:18:19Z

I am not sure this can really be considered "closed", there are two issues here, present in both CentOS 7 and CentOS 8:

kube-proxy iptables backend is broken in any kubernetes versions from 1.17.0 up to 1.18.0.
kube-proxy ipvs backend also has some issues considering it only works with tx offload disabled.
For the first one there isn't even a workaround aside from downgrading to kubernetes 1.16.x.

In the next days I will try Debian or Fedora with an up to date kernel.

baokiemanh · 2021-09-29T11:14:03Z

May be this issue related to firewall setting on cloud provider.

In my case, I setup cluster (kubeadm) on AWS EC2 instances & got the same issue. After I change Security Group (firewall) setting to allow all traffic then everything back to normal.

I never got this issue when setup cluster on bare metal.

Hope it help!

cfabio added the kind/bug Categorizes issue or PR as related to a bug. label Mar 29, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 29, 2020

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 29, 2020

k8s-ci-robot assigned uablrek Mar 31, 2020

thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 2, 2020

thockin self-assigned this Apr 2, 2020

SataQiu mentioned this issue Apr 4, 2020

NodePort not accessible on LAN address if Pod is not running in the node #89831

Closed

k8s-ci-robot added triage/duplicate Indicates an issue is a duplicate of other open issue. and removed triage/unresolved Indicates an issue that can not or will not be resolved. labels Apr 6, 2020

k8s-ci-robot closed this as completed Apr 6, 2020

uablrek removed their assignment Apr 7, 2020

irozzo-1A mentioned this issue Jul 16, 2020

Kube-proxy 1.8 does not work with CentOS 7 kubermatic/machine-controller#795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodePort accessible only from host where POD is running #89632

NodePort accessible only from host where POD is running #89632

cfabio commented Mar 29, 2020

neolit123 commented Mar 29, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020 •

edited

Loading

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020 •

edited

Loading

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Apr 1, 2020

cfabio commented Apr 1, 2020

uablrek commented Apr 1, 2020 •

edited

Loading

uablrek commented Apr 1, 2020

cfabio commented Apr 2, 2020

thockin commented Apr 2, 2020

rikatz commented Apr 3, 2020

whites11 commented Apr 4, 2020 •

edited

Loading

whites11 commented Apr 4, 2020 •

edited

Loading

cfabio commented Apr 4, 2020

whites11 commented Apr 4, 2020

whites11 commented Apr 5, 2020

rikatz commented Apr 6, 2020

k8s-ci-robot commented Apr 6, 2020

k8s-ci-robot commented Apr 6, 2020

rikatz commented Apr 6, 2020

cfabio commented Apr 6, 2020 •

edited

Loading

baokiemanh commented Sep 29, 2021

NodePort accessible only from host where POD is running #89632

NodePort accessible only from host where POD is running #89632

Comments

cfabio commented Mar 29, 2020

neolit123 commented Mar 29, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

Story

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020 • edited Loading

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020 • edited Loading

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

cfabio commented Mar 31, 2020

uablrek commented Mar 31, 2020

uablrek commented Apr 1, 2020

cfabio commented Apr 1, 2020

uablrek commented Apr 1, 2020 • edited Loading

uablrek commented Apr 1, 2020

cfabio commented Apr 2, 2020

thockin commented Apr 2, 2020

rikatz commented Apr 3, 2020

whites11 commented Apr 4, 2020 • edited Loading

whites11 commented Apr 4, 2020 • edited Loading

cfabio commented Apr 4, 2020

whites11 commented Apr 4, 2020

whites11 commented Apr 5, 2020

rikatz commented Apr 6, 2020

k8s-ci-robot commented Apr 6, 2020

k8s-ci-robot commented Apr 6, 2020

rikatz commented Apr 6, 2020

cfabio commented Apr 6, 2020 • edited Loading

baokiemanh commented Sep 29, 2021

cfabio commented Mar 31, 2020 •

edited

Loading

cfabio commented Mar 31, 2020 •

edited

Loading

uablrek commented Apr 1, 2020 •

edited

Loading

whites11 commented Apr 4, 2020 •

edited

Loading

whites11 commented Apr 4, 2020 •

edited

Loading

cfabio commented Apr 6, 2020 •

edited

Loading