-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod service endpoint unreachable from same host #87426
Comments
/sig network |
/triage unresolved Comment 🤖 I am a bot run by vllry. 👩🔬 |
This is likely not a k8s fault but an effect on how DNS resolv works. I have had this problem also, DNS-queries fails if the DNS-server happens to be on the same node. The problem is that the reply comes back but does not have the ClusterIP as source which is the dest in the query. The local resolver discards the reply since it has an "invalid" source. I think you can fix the problem by specifying
but I have not verified this. BTW I solved my problem by setting up local coredns in main netns on all nodes. |
Now I have tested and yes, |
/assign @satyasm |
@danwinship: GitHub didn't allow me to assign the following users: satyasm. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @satyasm |
This problem occurs when a CNI-plugin that provides L2 connectivity between pods on the same node is used like the "bridge" CNI-plugin used in this issue (that I also am using). When I switch to Calico which uses L3 and routes traffic back to the node it works fine. The bridge CNI-plugin sends the reply directly to the client pod on the same node (L2) and thus bypass the connection-tracker in main netns. |
Thanks for the reply @uablrek. Unfortunately
What do you think of my solution to assign a |
@satyasm If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
Here is a sequence-diagram to illustrate the problem (plantuml source masquerade-all.puml.txt); The difference for Wthout masqueradeAll;
With masqueradeAll;
For CNI-plugins that do not provide L2 connectivity within the k8s node there is no problem. Here is the setup inside a POD with "Calico";
|
@satyasm If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
ping @satyasm |
/unassign @satyasm |
@satyasm: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have not had a chance to look at this or add any more details on top of what is already in the discussion above. Un-assigning myself to remove confusion so other can help. Thanks! |
@eraclitux There's something weird going on. The podCIDR with /24 is per-node. You do not want to change that to /32. Your TCPDump shows the servfail from the other pod, so it does seem to be making a connection. I also reject the assertion that DNS on same-node doesn't work. It works just fine for me, and has worked forever. Running an interactive pod on the same node as the only DNS replica:
So there's SOMETHING ELSE happening. The conntrack record seems correct - why is it not reversing the tracking? |
DNS queries to a server on the same node does not work for the same reason that TCP doesn't work; The query is sent to the DNS service address, but the reply arrives with the pod-address as source (because it is not NAT'ed back) and is rejected by the local resolver. But the reply does arrive. |
I have observed something; Even with an L2 network I do not see this problem when I install with |
Here is a tcpdump for a DNS query from a POD when the DNS-server POD is on the same node with L2 networking (cni-bridge) and masquerade-all=false. The DNS ClusterIP is 12.0.0.5;
And the corresponding conntrack entry in main netns on the same node;
|
Exactly @uablrek, as showed by my
Can you elaborate this @thockin? It seems that assigning a |
The same with masquerade-all=true;
Corresponding conntrack entry in main netns on the same node;
Where |
@eraclitux You can't assign a /32 address in an L2 network. Packets will likely go out from the POD if you set a detault route to eth0 which in this case is a veth i.e point-to-point. But nothing will find it's way back. You must then set correctponding routes in main netns to the "other side" of the veth. What you then have done is transform the L2 network (using arp) to your own version of an L3 network. Then IMHO you should switch to a maintained CNI-plugin that uses L3, e.g Calico. BTW, please make another try with masquerade-all and verify that the
in the kube-proxy config file (or ConfigMap if kube-proxy is in a POD). |
I believe you that you're seeing this, but I am asserting that this is not what we want and not "acceptable". As you said you are seeing it in some installs and not others, there's clearly SOMETHING misconfigured - it's taking a shortcut and bypassing conntrack, which is not what we need. Looking at kubenet code, I see that we set /proc/sys/net/bridge/bridge-nf-call-iptables - can you check that? I can't reproduce it, so I'm just shooting in the dark...
"podCIDR" is a per-node field, not a per-pod field. If you set the node's podCIDR to /32 it only has 1 IP to use. |
@thockin Bullseye! Pretty good for shooting in the dark 😃 I removed "masquerade-all" and added;
on node start-up and both TCP to a local POD and DNS queries to a local server works perfectly. I will raise an issue on https://github.com/kelseyhightower/kubernetes-the-hard-way refering to this issue. @eraclitux Please try the sysctl above and forget all about "masquerade-all". If it works, please close this issue. |
When contacting a service IP that maps the same cluster IP for the pod, --masquerade-all is needed anyway. Environment:
POC: Create a nginx pod, then create a Service. |
What happened:
Establishing TCP/UDP traffic to a ClusterIP fails when connection is load balanced via iptables to a pod on the same host.
What you expected to happen:
conntrack
shows that the udp datagram is DNATted to10.200.1.37
from10.200.1.36
(the host haspodCIDR: "10.200.1.0/24"
).From my understanding, because pods have /24 mask, the reply from
.37
doesn't get throughcnio0
but directly to.36
braking the DNAT. This is thetcpdump
that shows that:How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I'm not able to assign /32 subnet to pods, both:
don't work. I ended up changing manually the subnet to see if my hypotesis was right.
This fixed the problem because forced the packets to flow back to
cnio0
to the DNAT tracked byconntrack
:Am I doing something wrong?
Why is not possible to assign a /32 subnet to pods?
Is there a cleaner solution?
Even if the conditions are different, the problem could be similar to #87263
Environment:
kubectl version
):Virtualbox VMs
cat /etc/os-release
):Ubuntu 18.04.3 LTS
uname -a
):Linux worker-0 4.15.0-74-generic # 84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Manual installation following https://github.com/kelseyhightower/kubernetes-the-hard-way
L2 networks and linux bridging
CNI conf:
iptables-save.txt
The text was updated successfully, but these errors were encountered: