New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packet loss with NodePort port forwarding, no SYN-ACK being sent by worker node #93801
Comments
/sig network |
/triage unresolved Comment 🤖 I am a bot run by vllry. 👩🔬 |
This seems to be a problem with the data-path. K8s does not handle the data-path directly, that is handled by the "CNI-plugin" in your case So this is likely a If you install kube-router with If possible please make an installation with some other CNI-plugin, e.g Flannel and check if that works. |
/assign @eth0xFEED |
@jayunit100: GitHub didn't allow me to assign the following users: eth0xfeed. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I can take a look , thanks @jayunit100 ! |
/assign |
per the above comment, the best way to probably fix this is
If you want to actually follow up on the specifics of |
@eth0xFEED If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
/remove-triage unresolved think we found the issue but not much more can do , @Frondall do you feel this is correct ? |
@eth0xFEED: Those labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
When trying to access a port that is forwarded with NodePort, large packet loss is being seen (upwards of 80% packet loss).
This is occurring on a freshly installed Kubernetes cluster running on baremetal and using kube-router as the networking interface. This occurs both locally on the node itself (i.e. curl http://localhost:<node_port>) and remotely (i.e. curl http://<node_ip>:<node_port>).
This occurs on pods running on any node. I have added 5 replicas and tried to reach the service on all 4 worker nodes that are running in the cluster. All worker nodes experience the same problem.
Machine is reachable from the network and standard ports (e.g. SSH) have no issues. Any access on the forwarded NodePort will timeout intermittently.
A Wireshark dump of the network shows that SYN packets are being received on the worker node but nothing further happens. The worker node also receives the SYN re-transmissions but does not send a SYN-ACK back, which should indicate no issues on the physical networking side.
Logs and information about the infrastructure setup and issue are below
Curl timing out when trying to reach the worker node. Worked once in 6 attempts
hping3 results to node-3
Wireshark dump from node-3. One unsuccessful connection followed by a successful connection.
iptables shows that all packets are being accepted by the KUBE-FORWARD rule. After sending 64 requests to node-3 the KUBE-FORWARD iptables rule shows as
What you expected to happen:
No packet loss.
How to reproduce it (as minimally and precisely as possible):
Fresh install of a Kubernetes cluster on Xenserver (XCP-ng) with the Kubernetes XOA recipe.
Installed nginx with
kubectl create deployment nginx --image=nginx
Forward port with
kubectl create service nodeport nginx --tcp=80:80
Try to reach the port either locally on the node or remotely with curl, accessing nginx through the browser, or with hping3
Anything else we need to know?:
I have tried multiple solutions, including changing the pod CIDR range so that they do not interfere with any other IP ranges on the network. I have used a 10.0.0.0/8, and a 192.168.0.0/16. This has happened on three different cluster setups.
I have read through the following blogs and have verified that the solutions presented were already implemented (i.e. iptables already drops INVALID packets, and already has --random-fully enabled)
https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/
https://github.blog/2019-11-21-debugging-network-stalls-on-kubernetes/
https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02
Environment:
kubectl version
):1 master node and 4 worker nodes running on xcp-ng on an AMD Epyc 7401p
cat /etc/os-release
):uname -a
):Linux master 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: