-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodePort only responding on node where pod is running #58908
Comments
/sig network |
I have the exact same issue, and the Ubuntu 16.04.3 LTS Here's my service definition "nginx"
|
yeah, you should have the pod IP range within the From above, it looks like Calico is configured to use |
In my case, I'm not using Calico (purposefully using kubenet): https://github.com/hardening-kubernetes/from-scratch/blob/master/docs/launch-configure-master.md and https://github.com/hardening-kubernetes/from-scratch/blob/master/docs/launch-configure-workers.md Unsure if that means my issues is then identical to @wittlesouth. Thanks for looking at this! |
AFAIK kube-proxy doesn't make sure if |
@bgeesaman while still possible, I think it's less likely you'll hit the same issue, since the Could you check to see if you're passing |
Unfortunately, I can't do more to help triage/explain this. In addition to this problem, I later discovered that I was unable to pass traffic on the cluster network in some cases when using DNS names in the cluster namespace. This seemed like a separate bug. Also, annoyingly, I was unable to send traffic to my local network, which seemed actually to be a "feature" of Calico's default configuration. While investigating that, I found some docs on how to fix that by changing the definition of the default IP pool, but decided not to follow up on that. I ended up re-initializing my cluster entirely using Flannel instead of Calico. The revised cluster shows none of the issues I had with Calico. Specifically:
On the plus side, I found that re-building the cluster was pretty easy. I hope the Kubernetes team at some point fleshes out the description of the various available networking options on the main site. I picked Calico because it sounded like it might be better performing, and perhaps a bit because it was the first tab (in what I assume is an alphabetical order). My revised cluster is running the same set of services as the original, except they're now working. It is possible that my bad experience with Calico was due to using a different value for the cluster cidr; perhaps if you follow only the default, you might have a better experience than I did. Based on my limited experience, I wouldn't recommend that Kubernetes novices (which I certainly am) start with Calico. So far, Flannel is working well for me. |
One last comment responding to @caseydavenport's comment above. I agree that Kubernetes and Calico should have the same definition for the pool of pod network IPs. I thought I had addressed this by modifying the default Calico pool in the Calico 3.0.1 yaml when I deployed it. Specifically, I copied the Calico 3.0.1 yaml, and change the following:
I expected from the Calico docs that this would ensure that Calico and Kubernetes were sharing the same definition of the desired cluster CIDR. It is certainly possible, if not likely, that I missed something. |
So I think this is worthy of closing, all signs are pointing to a configuration problem. We fixed it on our end a while ago and last comment was 2 months ago. |
I've been pulling my hair over the last couple of days on why my cluster does not react like the documentation says it should. Very frustrating if you're trying to learn about kubernetes... now it turns out its a bug??? Sigh..... |
"iptables -P FORWARD ACCEPT" didn't work in my setup either. |
Experience same problem with this setup. Checked CALICO_IPV4POOL_CIDR: 172.16.0.0/16 and --pod-network-cidr=172.16.0.0/16. This happens after some days with multiple apply/delete and I can test it.
|
Same result with weave net:
Discovered this comment. Indeed, it looks like it's working with |
Solved it by assigning deployments to nodes via |
Tried it with flannel and used |
I can confirm, we're hitting the same thing with: NodePort service definition
Listen port
Works from node itself:
But now working from anywhere else. There are no rules for that port in iptables filter or nat. /reopen We tried the FORWARD policy change, it didn't work. |
@mvukadinoff: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I hit the same issue in an EKS 2 node cluster. Any quick help? |
I had the same problem, but I remember that I have the network policies enabled for my namespace, and traffic from outside was not allowed :D
|
Any updates here ? |
I ran into the same issue and wanted to provide some notes on how I resolved this.
"If you do set the --cluster-cidr option on kube-proxy, make sure it matches the IPALLOC_RANGE given to Weave Net (see below)." So basically my pods were using IP addresses from 10.32.0.0/12. While my cluster-cidr / podCIDR was all set to 192.168.0.0/16. So a configuration issue according to weave works. Now I didn't notice a problem with this configuration in my environment...except when I was testing out ingress and nodePorts. Essentially if there wasn't a pod local to the nodePort, than traffic from the NodePort was timing out trying to get to the Pod (according to the logs from my nginx ingress controller). So given that this environment is a test lab...and I could kubeadm reset if I really messed things up...I tried the following:
It's at this point that the controller manager pod crashed. Reviewing the logs showed me:
So that led me to review the node's podCIDR...which I saw was still set to a 192.168.x.x cidr.
kubectl get no worker01 -o yaml > worker01.yaml Updated to:
NOTE Each node has a different podCIDR!! So pay attention to this. It'll look something like: worker01 - podCIDR: 192.168.1.0/24 NOTE2 I'm not sure if I should have stuck with a subnet mask of /24 when making this change. I noticed on the nodes the subnet mask is different for podCIDR. In hindsight I probably should have stuck with /24 instead of using /12 like I did. Again...this is my lab environment where I commonly break things. So once I edited each node's YAML, I ran the following:
So the takeaway for me is that if your clusterCIDR settings on kube-proxy, controller manager and I think most importantly on your nodes don't overlap with the IP's your POD's are using, you'll run into this issue. Again...I did the above on a lab environment that I could easily blow away and recreate. So if you find this issue on a production environment you might want to be careful doing what I did above. But I wanted to try and flesh out the configuration issue others have mentioned where clusterCIDR / podCIDR don't match / overlap the actual IP's your pods are using. This configuration issue definitely can cause the routing issue with NodePort from what I've found. Edit: Just a brief follow up note. After making the above changes I had to delete and allow my CoreDNS pods to be recreated. When I was running nslookup, I noticed DNS queries seemed to be timing out. So pods existing before the change will likely need to be recreated. But outside of this, so far I'm not noticing any issues and my ingress and nodeports are working as expected. |
This is NOT a bug at all. In my case, I setup cluster (kubeadm) on AWS EC2 instances & got the same issue. After I change Security Group (firewall) setting to allow ALL traffic then everything back to normal. I never got this issue when setup cluster on Bare metal. The root cause should be firewall setting on cloud provider. Hope it help! |
This issue should not be closed because it wasn't solved. Simply put, although NodePort is open and listen on every node, I can only access it on one node, if this is the normal behavior, I don't see a lot of sense of opening the same port on every node. |
my setup is simple, 2 node cluster on local LAN (both nodes can be assigned pods),
flannel as the cni plugin, deployment is |
Hello, |
I fixed the issue by: |
I had the same problem: " |
Still not fixed I guess |
Same here! Not fixed on 29 April 2022, using Kubeadm + Docker. |
Facing the same issue. Not able to resolve even after applying any of the above solution. |
i was having the same problem ( my nginx deployment was only accessible on the worker node where it was deployed ) . Chain FORWARD (policy ACCEPT) what i did was move reject rule at the bottom using these two commands: sudo iptables -D FORWARD -j REJECT --reject-with icmp-host-prohibited Thankfully now eveything is working as intended and nodeport service is responding on all nodes |
In my case, I configured a Kubernetes cluster on Vultr cloud. The instances on Vultr have 2 NICs - private and public-facing. |
Could be useful check this documentation: I faced this issue after a reboot cause I had forgotten to persist ip forward settings. |
Hello, I have faced the same issue in AWS (EKS). In my case, I needed to configure the firewall (Security Group) to allow traffic between nodes on the port specified by the
I needed to allow Node -> Node traffic on the port |
i have fixed the issue. i my case, i found the pod kube-proxy is dead on some nodes, so i restart the kube-proxy , now nodeport service is responding on all nodes. get all command: |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Running a 3-node cluster, NodePort service with one pod only responds on one node, the node where the pod is running. Attempts to access the service on the NodePort via other nodes times out with no response.
What you expected to happen:
Service should respond on the service port from any node.
How to reproduce it (as minimally and precisely as possible):
Base OS Ubuntu 16.04. Install with Calico as the network provider, with the pod CIDR set to a non-default value (in my case, 10.5.0.0/16).
I created my cluster with the following kubeadm init statement:
kubeadm init --pod-network-cidr=10.5.0.0/16 --apiserver-cert-extra-sans ['kubemaster.wittlesouth.com','192.168.5.10']
I deployed Calico 3.0.1 with a one-line change to the deployment YAML file to set the environment variable CALICO_IPV4POOL_CIDR to the value 10.5.0.0/16 to match the CIDR set for Kubernetes.
Anything else we need to know?:
If you run "iptables -P FORWARD ACCEPT" on all of the nodes, the service works as expected, and responds from each node in the cluster.
Environment:
kubectl version
):Three Intel NUC computers with 32GB RAM.
uname -a
):kubeadm
Calico 3.0.1
The symptoms are similar to issue #39823. I tried the resolution suggested in one of the comments, (iptables -P FORWARD ACCEPT), although that seems like a hack. I'm wondering if I missed a configuration options somewhere, as it seems like the pods in my environment are not getting IP address from the pod CIDR range provided to kubeadm init. Here is a pod list:
I noticed the issue with the jira service, and tried the kube-proxy diagnostics steps in the kubernetes documentation. Here are the IPTABLES entries for the Jira service:
Here is the KUBE-FORWARD rules:
So I'm not sure if the issue is that the KUBE-FORWARD rules would work if the pod and service IPs were actually in the range provided in the pod-network-cidr setting and that is the problem, or there is something else happening with the forwarding rules. Please let me know if there is any additional information that would be useful.
/sig network
The text was updated successfully, but these errors were encountered: