Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodePort only responding on node where pod is running #58908

Closed
wittlesouth opened this issue Jan 27, 2018 · 36 comments
Closed

NodePort only responding on node where pod is running #58908

wittlesouth opened this issue Jan 27, 2018 · 36 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@wittlesouth
Copy link

wittlesouth commented Jan 27, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Running a 3-node cluster, NodePort service with one pod only responds on one node, the node where the pod is running. Attempts to access the service on the NodePort via other nodes times out with no response.

What you expected to happen:
Service should respond on the service port from any node.

How to reproduce it (as minimally and precisely as possible):
Base OS Ubuntu 16.04. Install with Calico as the network provider, with the pod CIDR set to a non-default value (in my case, 10.5.0.0/16).

I created my cluster with the following kubeadm init statement:

kubeadm init --pod-network-cidr=10.5.0.0/16 --apiserver-cert-extra-sans ['kubemaster.wittlesouth.com','192.168.5.10']

I deployed Calico 3.0.1 with a one-line change to the deployment YAML file to set the environment variable CALICO_IPV4POOL_CIDR to the value 10.5.0.0/16 to match the CIDR set for Kubernetes.

Anything else we need to know?:
If you run "iptables -P FORWARD ACCEPT" on all of the nodes, the service works as expected, and responds from each node in the cluster.

Environment:

  • Kubernetes version (use kubectl version):
Eric:kubernetes eric$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T11:52:23Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:

Three Intel NUC computers with 32GB RAM.

  • OS (e.g. from /etc/os-release):
eric@nuc1:~$ more /etc/os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
eric@nuc1:~$ uname -a
Linux nuc1 4.13.0-31-generic #34~16.04.1-Ubuntu SMP Fri Jan 19 17:11:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:

kubeadm

  • Others:

Calico 3.0.1

The symptoms are similar to issue #39823. I tried the resolution suggested in one of the comments, (iptables -P FORWARD ACCEPT), although that seems like a hack. I'm wondering if I missed a configuration options somewhere, as it seems like the pods in my environment are not getting IP address from the pod CIDR range provided to kubeadm init. Here is a pod list:

eric@nuc1:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                      READY     STATUS    RESTARTS   AGE       IP                NODE
default       mysql-6f499c6f58-gvx58                    1/1       Running   2          4d        192.168.43.56     nuc1
default       ols-server-5959cf6cdb-54p65               1/1       Running   2          4d        192.168.43.52     nuc1
default       ols-web-54b68bff86-d6kj4                  1/1       Running   3          4d        192.168.43.50     nuc1
kube-system   calico-etcd-6vspc                         1/1       Running   3          4d        192.168.5.10      nuc1
kube-system   calico-kube-controllers-d669cc78f-b67rc   1/1       Running   5          5d        192.168.5.10      nuc1
kube-system   calico-node-526md                         2/2       Running   9          5d        192.168.5.10      nuc1
kube-system   calico-node-5trgt                         2/2       Running   3          3d        192.168.5.12      nuc3
kube-system   calico-node-r9ww4                         2/2       Running   3          3d        192.168.5.11      nuc2
kube-system   etcd-nuc1                                 1/1       Running   6          5d        192.168.5.10      nuc1
kube-system   kube-apiserver-nuc1                       1/1       Running   7          5d        192.168.5.10      nuc1
kube-system   kube-controller-manager-nuc1              1/1       Running   6          5d        192.168.5.10      nuc1
kube-system   kube-dns-6f4fd4bdf-dt5fp                  3/3       Running   12         5d        192.168.43.49     nuc1
kube-system   kube-proxy-8xf4r                          1/1       Running   1          3d        192.168.5.11      nuc2
kube-system   kube-proxy-tq4wk                          1/1       Running   4          5d        192.168.5.10      nuc1
kube-system   kube-proxy-wcsxt                          1/1       Running   1          3d        192.168.5.12      nuc3
kube-system   kube-registry-proxy-cv8x9                 1/1       Running   4          5d        192.168.43.55     nuc1
kube-system   kube-registry-proxy-khpdx                 1/1       Running   1          3d        192.168.124.203   nuc3
kube-system   kube-registry-proxy-r5qcv                 1/1       Running   1          3d        192.168.40.201    nuc2
kube-system   kube-registry-v0-wcs5w                    1/1       Running   2          5d        192.168.43.51     nuc1
kube-system   kube-scheduler-nuc1                       1/1       Running   6          5d        192.168.5.10      nuc1
kube-system   kubernetes-dashboard-845747bdd4-dp7gg     1/1       Running   4          5d        192.168.43.47     nuc1
wittlesouth   jenkins-deployment-7df556c549-mmnsl       1/1       Running   2          5d        192.168.43.46     nuc1
wittlesouth   jira-d54949568-bzfh5                      1/1       Running   0          1d        192.168.124.204   nuc3

I noticed the issue with the jira service, and tried the kube-proxy diagnostics steps in the kubernetes documentation. Here are the IPTABLES entries for the Jira service:

eric@nuc1:/var/lib$ sudo iptables-save | grep jira
-A KUBE-NODEPORTS -p tcp -m comment --comment "wittlesouth/jira:" -m tcp --dport 30007 -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p tcp -m comment --comment "wittlesouth/jira:" -m tcp --dport 30007 -j KUBE-SVC-MO7XZ6ASHGM5BOPI
-A KUBE-SEP-RDEBPTMQORZ6CWA2 -s 192.168.124.204/32 -m comment --comment "wittlesouth/jira:" -j KUBE-MARK-MASQ
-A KUBE-SEP-RDEBPTMQORZ6CWA2 -p tcp -m comment --comment "wittlesouth/jira:" -m tcp -j DNAT --to-destination 192.168.124.204:8082
-A KUBE-SERVICES ! -s 10.5.0.0/16 -d 10.105.201.170/32 -p tcp -m comment --comment "wittlesouth/jira: cluster IP" -m tcp --dport 8082 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.105.201.170/32 -p tcp -m comment --comment "wittlesouth/jira: cluster IP" -m tcp --dport 8082 -j KUBE-SVC-MO7XZ6ASHGM5BOPI
-A KUBE-SVC-MO7XZ6ASHGM5BOPI -m comment --comment "wittlesouth/jira:" -j KUBE-SEP-RDEBPTMQORZ6CWA2

Here is the KUBE-FORWARD rules:

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.5.0.0/16          anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.5.0.0/16          /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

So I'm not sure if the issue is that the KUBE-FORWARD rules would work if the pod and service IPs were actually in the range provided in the pod-network-cidr setting and that is the problem, or there is something else happening with the forwarding rules. Please let me know if there is any additional information that would be useful.

/sig network

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 27, 2018
@wittlesouth
Copy link
Author

/sig network

@bgeesaman
Copy link

I have the exact same issue, and the iptables -P FORWARD ACCEPT "hack" alleviates it, but I agree that it doesn't feel correct. I can reach the pod via the node it's running on, but not via the other nodes unless I add that forward accept policy.

Ubuntu 16.04.3 LTS
docker 1.13.1
DOCKER_OPTS=--ip-masq=false --iptables=false --log-driver=json-file --log-level=warn --log-opt=max-file=5 --log-opt=max-size=10m --storage-driver=overlay
k8s: v1.9.2

Here's my service definition "nginx"

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-02-01T00:41:03Z
  labels:
    run: nginx
  name: nginx
  namespace: default
  resourceVersion: "867"
  selfLink: /api/v1/namespaces/default/services/nginx
  uid: 95014ccf-06e8-11e8-afc3-02acbd38e3f6
spec:
  clusterIP: 10.3.68.58
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 31256
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

@timothysc
Copy link
Member

timothysc commented Feb 9, 2018

@dcbw @thockin - this is pretty bad. We are seeing this on 1.9
@kubernetes/sig-network-bugs

@timothysc timothysc added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cherrypick-candidate labels Feb 9, 2018
@caseydavenport
Copy link
Member

So I'm not sure if the issue is that the KUBE-FORWARD rules would work if the pod and service IPs were actually in the range provided in the pod-network-cidr setting and that is the problem, or there is something else happening with the forwarding rules

yeah, you should have the pod IP range within the --cluster-cidr that you give to Kubernetes.

From above, it looks like Calico is configured to use 192.168.0.0/16 for Pod IPs and Kubernetes is configured to expect 10.5.0.0/16. I think you need to either change the pool that Calico is using, or change the cluster-cidr provided to k8s such that the two match.

@bgeesaman
Copy link

In my case, I'm not using Calico (purposefully using kubenet): https://github.com/hardening-kubernetes/from-scratch/blob/master/docs/launch-configure-master.md and https://github.com/hardening-kubernetes/from-scratch/blob/master/docs/launch-configure-workers.md Unsure if that means my issues is then identical to @wittlesouth. Thanks for looking at this!

@yue9944882
Copy link
Member

AFAIK kube-proxy doesn't make sure if FORWARD ACCEPT is present on the host at the first place. Maybe kube-proxy should at least throw an error before applying rules about NodePort if FORWARD policy is forbidden.

@caseydavenport
Copy link
Member

Unsure if that means my issues is then identical to @wittlesouth. Thanks for looking at this!

@bgeesaman while still possible, I think it's less likely you'll hit the same issue, since the --cluster-cidr arg on the controller manager is used to dish out the subnets used by kubenet.

Could you check to see if you're passing --cluster-cidr to the kube-proxy, and if so double check it matches the value passed to the controller manager?

@wittlesouth
Copy link
Author

Unfortunately, I can't do more to help triage/explain this. In addition to this problem, I later discovered that I was unable to pass traffic on the cluster network in some cases when using DNS names in the cluster namespace. This seemed like a separate bug. Also, annoyingly, I was unable to send traffic to my local network, which seemed actually to be a "feature" of Calico's default configuration. While investigating that, I found some docs on how to fix that by changing the definition of the default IP pool, but decided not to follow up on that. I ended up re-initializing my cluster entirely using Flannel instead of Calico. The revised cluster shows none of the issues I had with Calico. Specifically:

  • NodePort services are available from any node without host-specific route changes (this bug)
  • Pods can reach each other via the internal DNS names (separate issue)
  • Pods can reach other services on the local network (separate issue)

On the plus side, I found that re-building the cluster was pretty easy. I hope the Kubernetes team at some point fleshes out the description of the various available networking options on the main site. I picked Calico because it sounded like it might be better performing, and perhaps a bit because it was the first tab (in what I assume is an alphabetical order).

My revised cluster is running the same set of services as the original, except they're now working. It is possible that my bad experience with Calico was due to using a different value for the cluster cidr; perhaps if you follow only the default, you might have a better experience than I did.

Based on my limited experience, I wouldn't recommend that Kubernetes novices (which I certainly am) start with Calico. So far, Flannel is working well for me.

@wittlesouth
Copy link
Author

One last comment responding to @caseydavenport's comment above. I agree that Kubernetes and Calico should have the same definition for the pool of pod network IPs. I thought I had addressed this by modifying the default Calico pool in the Calico 3.0.1 yaml when I deployed it. Specifically, I copied the Calico 3.0.1 yaml, and change the following:

            # Configure the IP Pool from which Pod IPs will be chosen.
            - name: CALICO_IPV4POOL_CIDR
              value: "10.5.0.0/16"

I expected from the Calico docs that this would ensure that Calico and Kubernetes were sharing the same definition of the desired cluster CIDR. It is certainly possible, if not likely, that I missed something.

@timothysc
Copy link
Member

So I think this is worthy of closing, all signs are pointing to a configuration problem.

We fixed it on our end a while ago and last comment was 2 months ago.

@geertschuring
Copy link

I've been pulling my hair over the last couple of days on why my cluster does not react like the documentation says it should. Very frustrating if you're trying to learn about kubernetes... now it turns out its a bug??? Sigh.....

@geertschuring
Copy link

"iptables -P FORWARD ACCEPT" didn't work in my setup either.

@Bessonov
Copy link

@timothysc

Experience same problem with this setup. Checked CALICO_IPV4POOL_CIDR: 172.16.0.0/16 and --pod-network-cidr=172.16.0.0/16. This happens after some days with multiple apply/delete and I can test it.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.4", GitCommit:"f49fa022dbe63faafd0da106ef7e05a29721d3f1", GitTreeState:"clean", BuildDate:"2018-12-14T07:10:00Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}

@Bessonov
Copy link

Bessonov commented Jan 23, 2019

Same result with weave net:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Discovered this comment. Indeed, it looks like it's working with externalTrafficPolicy: Cluster instead of Local, which is fine for me. Although it's a bug from my point of view.

@fabifrank
Copy link

Solved it by assigning deployments to nodes via nodeSelector (for example by hostname) so the pod is guaranteed to be available on a specific ip. If you're using externalTrafficPolicy: Cluster then obviously you're not able to handle source ip address in pods which may be critical in some cases (allow/deny etc..).

@Seljuke
Copy link

Seljuke commented Aug 9, 2019

Tried it with flannel and used --pod-network-cidr=10.244.10.0/16 as cidr pool but didn't work.
Suspected from iptables rules and run iptables -P FORWARD ACCEPT but still didn't work. Tried to install a new cluster with kubespray and used ipvs, this didn't work either. So I think the bug fixed at Kubernetes v1.9 came back.

@mvukadinoff
Copy link

mvukadinoff commented Sep 18, 2019

I can confirm, we're hitting the same thing with:
Kubernetes: 1.15.1
Calico: v3.7.3
In ipvs mode.

NodePort service definition

  clusterIP: 10.233.8.51
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30072
    port: 5600
    protocol: TCP
    targetPort: 5600
  selector:
    app: myapp
  sessionAffinity: None
  type: NodePort

Listen port

root@xxxxnode02:~# netstat -nlp  | grep 30072
tcp6       0      0 :::30072                :::*                    LISTEN      21639/kube-proxy

Works from node itself:

root@xxxxnode02:~# telnet 10.xx.0.4 30072
Trying 10.xx.0.4...
Connected to 10.xx.0.4.
Escape character is '^]'.
### From another machine in the same subnet
~# telnet 10.134.0.4 30072
Trying 10.134.0.4...
^C

But now working from anywhere else. There are no rules for that port in iptables filter or nat.
For example hostPort works

/reopen

We tried the FORWARD policy change, it didn't work.
Based on Calico docs we also tried setting FELIX_KUBENODEPORTRANGES , but it didn't help as well. https://docs.projectcalico.org/v3.5/usage/enabling-ipvs

@k8s-ci-robot
Copy link
Contributor

@mvukadinoff: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

I can confirm, we're hitting the same thing with:
Kubernetes: 1.15.1
Calico: v3.7.3
In ipvs mode.

NodePort service definition

 clusterIP: 10.233.8.51
 externalTrafficPolicy: Cluster
 ports:
 - nodePort: 30072
   port: 5600
   protocol: TCP
   targetPort: 5600
 selector:
   app: myapp
 sessionAffinity: None
 type: NodePort

Listen port

root@xxxxnode02:~# netstat -nlp  | grep 30072
tcp6       0      0 :::30072                :::*                    LISTEN      21639/kube-proxy

Works from node itself:

root@xxxxnode02:~# telnet 10.xx.0.4 30072
Trying 10.xx.0.4...
Connected to 10.xx.0.4.
Escape character is '^]'.
### From another machine in the same subnet
~# telnet 10.134.0.4 30072
Trying 10.134.0.4...
^C

But now working from anywhere else. There are no rules for that port in iptables filter or nat.
For example hostPort works

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aneesinaec
Copy link

I hit the same issue in an EKS 2 node cluster.
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:41:22Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Any quick help?

@dignajar
Copy link

dignajar commented Aug 24, 2020

I had the same problem, but I remember that I have the network policies enabled for my namespace, and traffic from outside was not allowed :D

$ kubectl get networkpolicy

@nurhun
Copy link

nurhun commented Apr 29, 2021

Any updates here ?

@jdwnetapp
Copy link

jdwnetapp commented Aug 13, 2021

I ran into the same issue and wanted to provide some notes on how I resolved this.

  1. I used kubeadm to set up my cluster and had passed in a pod-network-cidr 192.168.0.0/16

kubeadm init --pod-network-cidr=192.168.0.0/16

  1. However...I use weave for my pod network. Which by default uses 10.32.0.0/12 for the POD network. And sticks with the default despite what you set the pod-network-cidr to. Weave documentation does discuss changing the default pod network cidr...but I didn't do this.

  2. So in my environment, kube-proxy's configmap referenced a clusterCIDR of 192.168.0.0/16. My controller manager referenced 192.168.0.0/16 for --cluster-cidr in it's manifest YAML. And interestingly enough, my nodes also referenced this network:

kubectl get node k8worker01 -o yaml | grep -A 1 -i podCIDR
  podCIDR: 192.168.1.0/24
  podCIDRs:
  - 192.168.1.0/24
  1. In weave works documentation (https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#pod-network) they mention that:

"If you do set the --cluster-cidr option on kube-proxy, make sure it matches the IPALLOC_RANGE given to Weave Net (see below)."

So basically my pods were using IP addresses from 10.32.0.0/12. While my cluster-cidr / podCIDR was all set to 192.168.0.0/16. So a configuration issue according to weave works.

Now I didn't notice a problem with this configuration in my environment...except when I was testing out ingress and nodePorts. Essentially if there wasn't a pod local to the nodePort, than traffic from the NodePort was timing out trying to get to the Pod (according to the logs from my nginx ingress controller).

So given that this environment is a test lab...and I could kubeadm reset if I really messed things up...I tried the following:

  1. Changed podSubnet in kubeadm's configmap to match weave's default cidr
  2. Changed clusterCIDR in kubeproxy's configmap to match weave's default cidr
  3. Change --cluster-cidr in controller manager's manifest YAML to match weave's default cidr

It's at this point that the controller manager pod crashed. Reviewing the logs showed me:

controllermanager.go:235] error starting controllers: failed to mark cidr[192.168.0.0/24] at idx [0] as occupied for node: : cidr 192.168.0.0/24 is out the range of cluster cidr 10.32.0.0/12

So that led me to review the node's podCIDR...which I saw was still set to a 192.168.x.x cidr.

  1. So I exported each node's configuration into a YAML and edited podCIDR and podCIDRs.

kubectl get no worker01 -o yaml > worker01.yaml

Updated to:

spec:
  podCIDR: 10.32.1.0/12
  podCIDRs:
  - 10.32.1.0/12

NOTE Each node has a different podCIDR!! So pay attention to this. It'll look something like:

worker01 - podCIDR: 192.168.1.0/24
worker02 - podCIDR: 192.168.2.0/24
worker03 - podCIDR: 192.168.3.0/24

NOTE2 I'm not sure if I should have stuck with a subnet mask of /24 when making this change. I noticed on the nodes the subnet mask is different for podCIDR. In hindsight I probably should have stuck with /24 instead of using /12 like I did. Again...this is my lab environment where I commonly break things.

So once I edited each node's YAML, I ran the following:

kubectl delete no worker02 && kubectl create -f worker02.yaml
kubectl delete no worker01 && kubectl create -f worker01.yaml
kubectl delete no k8master && kubectl create -f master.yaml
  1. I then had to delete the pods in my ingress deployment and allow them to be recreated. Once recreated, no more issue! Even if a node didn't have my application running on it, traffic would still be routed over to another node where the application pod lived.

So the takeaway for me is that if your clusterCIDR settings on kube-proxy, controller manager and I think most importantly on your nodes don't overlap with the IP's your POD's are using, you'll run into this issue.

Again...I did the above on a lab environment that I could easily blow away and recreate. So if you find this issue on a production environment you might want to be careful doing what I did above. But I wanted to try and flesh out the configuration issue others have mentioned where clusterCIDR / podCIDR don't match / overlap the actual IP's your pods are using. This configuration issue definitely can cause the routing issue with NodePort from what I've found.

Edit: Just a brief follow up note. After making the above changes I had to delete and allow my CoreDNS pods to be recreated. When I was running nslookup, I noticed DNS queries seemed to be timing out. So pods existing before the change will likely need to be recreated. But outside of this, so far I'm not noticing any issues and my ingress and nodeports are working as expected.

@baokiemanh
Copy link

This is NOT a bug at all. In my case, I setup cluster (kubeadm) on AWS EC2 instances & got the same issue. After I change Security Group (firewall) setting to allow ALL traffic then everything back to normal.

I never got this issue when setup cluster on Bare metal.

The root cause should be firewall setting on cloud provider.

Hope it help!

@winkee01
Copy link

winkee01 commented Dec 14, 2021

This issue should not be closed because it wasn't solved. Simply put, although NodePort is open and listen on every node, I can only access it on one node, if this is the normal behavior, I don't see a lot of sense of opening the same port on every node.

@winkee01
Copy link

my setup is simple, 2 node cluster on local LAN (both nodes can be assigned pods),

  • master: 10.10.0.91
  • worker1: 10.10.0.92

flannel as the cni plugin, deployment is 2-pod ghost website, NodePort service on port 31480, one pod runs on master, and the other pod on worker1, on master node I can run curl localhost:31483 or curl 10.10.0.91:31480 or curl 10.10.0.92:31480 with success. On worker1, I run curl localhost:31480 or curl 10.10.0.92:31480, neither will work. but I have to wait like 60 seconds, and it will work one time and wait another about 60 seconds, it works another time, and repeat.

@AntoineNGUYEN31
Copy link

AntoineNGUYEN31 commented Dec 14, 2021

Hello,
@dignajar was right that this isuee is related to the network policy.
In my case, the cluster (rancher rke) has: pod isolation diabled but the network policy created previsouly remaines.
Simply remove all network policies in the namespace where the pod is deployed can solve the issue

@Abdullah-AlSawalmeh
Copy link

I fixed the issue by:
1- changing from docker to containerd
2- allowing 179 port (sudo ufw allow 179/tcp) --- mentioned here https://projectcalico.docs.tigera.io/getting-started/kubernetes/requirements

@Meppo
Copy link

Meppo commented Mar 2, 2022

I had the same problem: "iptables -P FORWARD ACCEPT" on all of the nodes, the service works as expected ...

@stanfordpeng
Copy link

Still not fixed I guess

@castellanoj
Copy link

Same here! Not fixed on 29 April 2022, using Kubeadm + Docker.

@prtkdave
Copy link

prtkdave commented Sep 7, 2022

Facing the same issue. Not able to resolve even after applying any of the above solution.
k8 v1.23.5
calico v3.22.2

@Moody-san
Copy link

i was having the same problem ( my nginx deployment was only accessible on the worker node where it was deployed ) .
Having a look at the iptable i found this :

Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-PROXY-FIREWALL all -- anywhere anywhere ctstate NEW /* kubernetes load balancer firewall /
KUBE-FORWARD all -- anywhere anywhere /
kubernetes forwarding rules /
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /
kubernetes service portals /
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /
kubernetes externally-visible service portals /
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
FLANNEL-FWD all -- anywhere anywhere /
flanneld forward */

what i did was move reject rule at the bottom using these two commands:

sudo iptables -D FORWARD -j REJECT --reject-with icmp-host-prohibited
sudo iptables -A FORWARD -j REJECT --reject-with icmp-host-prohibited

Thankfully now eveything is working as intended and nodeport service is responding on all nodes

@mbrekhov
Copy link

In my case, I configured a Kubernetes cluster on Vultr cloud. The instances on Vultr have 2 NICs - private and public-facing.
The problem was with the Flannel Daemon set, which by default, used the public-facing NIC where UDP traffic was blocked on the firewall side.
So, I simply added '--iface=enp0s8' (with the name of my private-facing NIC) to flannel as described here: https://stackoverflow.com/a/48755233."

@dlapianosipos
Copy link

Could be useful check this documentation:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#forwarding-ipv4-and-letting-iptables-see-bridged-traffic

I faced this issue after a reboot cause I had forgotten to persist ip forward settings.

@Pepzik
Copy link

Pepzik commented May 9, 2024

Hello,

I have faced the same issue in AWS (EKS). In my case, I needed to configure the firewall (Security Group) to allow traffic between nodes on the port specified by the targetPort parameter, so having such a configuration:

apiVersion: v1
kind: Service
metadata:
  name: postfix-node-port
  namespace: default
spec:
  type: NodePort
  selector:
    app: postfix
  ports:
    - name: mail
      protocol: TCP
      port: 50
      targetPort: 25
      nodePort: 32125

I needed to allow Node -> Node traffic on the port 25 and allow client -> Node traffic on the port 32125.

@Meppo
Copy link

Meppo commented May 10, 2024

I had the same problem: "iptables -P FORWARD ACCEPT" on all of the nodes, the service works as expected ...

i have fixed the issue. i my case, i found the pod kube-proxy is dead on some nodes, so i restart the kube-proxy , now nodeport service is responding on all nodes.

get all command: kubectl -n kube-system get all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests