New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnsPolicy in hostNetwork not working as expected #87852
Comments
/sig network |
/sig Network |
/triage unresolved Comment 🤖 I am a bot run by vllry. 👩🔬 |
/assign @aojea |
@rdxmb based on your description it seems that the Kubernetes DNS service is not able to resolve or forward the external domain queries
The fact that this is happening on clusters being upgraded from 1.13 makes me wonder if there could be some issue on the cluster DNS upgrade. What DNS are you using kube-dns or CoreDNS? Can you check if there are any errors on the DNS pods? If there are no errors we should debug the DNS queries as explained here |
I have kinda the same issue.
Logs are pretty clear:
using the serviceIP I cant curl it, using the podIP I can curl it directly |
that sounds more related to a connectivity problem, maybe related to the CNI, maybe related to iptables, .... that you are not able to access the services from the nodes 🤷♂ |
If I do a kubectl run and it ends up on the same node: zero issues. Will try if I can adjust the pod to dns by podIP and see if that connects |
Didnt work |
@aojea If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
Seeing the same thing when trying to run kiam on 1.17, seen the issue from at least rc.2 through 1.17.3, but wasn't sure at the time where the issue was. Ticket I logged w/ the kiam folks: uswitch/kiam#378 As this seems networking related, I will state we are running Canal CNI, and IPVS mode of kube-proxy. (Correction, we were on Canal, not Calico, which means the mentioned Flannel issue is likely at the root of it....) |
Did find a workaround: switching flannel to host-gw instead of vxlan:
|
No. The problem seems to be the routing over the kubernetes-service, not the dns itself. Have a look at the following output:
BUT
|
|
then ... it is not a problem in Kubernetes ... it has to be something external or related to the CNI plugin, right? |
Maybe, though it would be good to understand what changed in 1.17 that's causing issues with flannel and vxlan so we get to a root cause. |
I agree with you, but having 2 issues open in parallel will not help to focus the investigation, and since this seems a flannel specific issue I will close this one in favor of flannel-io/flannel#1243 Please feel free to reopen if there are more CNIs affected or there is any evidence that this is a Kubernetes issue /close |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I see exactly the same issue using GCP with calico vxlan for networking. Had to switch to ipip to solve it. I didn't have much time to figure out vxlan issue, so expect to bump into it in some virtual environments that might use vxlan under the hood. Not sure if it is related to kubernetes 17.0 I didn't test earlier versions in that environment. |
I noticed that packets directed to the service address receive the iptables 0x4000 label |
also help workaround from issue #88986
|
What happened:
In kubernetes 1.17, pods running with
hostNetwork: true
are not able to get dns answers from the coredns-service - especially if using the strongly recommendedclusterPolicy: ClusterFirstWithHostNet
Also, I noticed that the coredns Service seems to be not always reachable from the host itself.
What you expected to happen:
The coredns Service is reachable from within the pod in the hostNetwork, especially when using
clusterPolicy: ClusterFirstWithHostNet
. Also, the coredns Service is reachable from the host, like this is in kubernetes 1.15How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I noticed this on three small clusters with kubernetes 1.17, each running with 1 master and 2 or 3 nodes. Most of them were upgraded from lower kubernetes-versions (e.g. starting from 1.13 -> 1.14 -> 1.15 -> 1.16 -> 1.17)
Environment:
kubectl version
): 1.17cat /etc/os-release
):uname -a
): Linux eins 4.15.0-74-generic add travis integration #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: