Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-dns 1.14.7 does not resolve cluster services without external dns #169

Closed
tuminoid opened this issue Nov 29, 2017 · 16 comments
Closed
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@tuminoid
Copy link

If host does not have working DNS nameserver in /etc/resolv.conf, kube-dns fails to resolve cluster services.

I created a single-machine k8s cluster (1.7.4, kube-dns 1.14.7) in a VM for a machine that has no internet/intranet connectivity for a hack lab. Thus, it has no working DNS on the host, but the DNS IP in the /etc/resolv.conf is blocked by firewall.

In this case, kube-dns fails to resolve any cluster service in any namespace, including kube-dns itself, unless queried using FQDN, kube-dns.kube-system.svc.cluster.local, despite /etc/resolv.conf in container pointing correctly to kube-dns, and containing correct search options (kube-system.svc.cluster.local svc.cluster.local cluster.local).

@tuminoid
Copy link
Author

This is 100% reproducible on a vagrant box too. Just change nameserver to a IP that doesn't point anywhere.

As a side note, if /etc/resolv.conf is missing from VM, kube-dns won't even start. Does that warrant separate issue?

@bowei
Copy link
Member

bowei commented Nov 29, 2017

Can you try creating a pod with an unbound server that serves NXDOMAIN (see this gist) and setting it as the upstream nameserver using the kube-dns configmap?

@tuminoid
Copy link
Author

Using configurations in this gist (please correct if something wrong), this does not help. It also doesn't make a difference if nameserver 192.168.200.7 is used for hosts /etc/resolv.conf. Also tried without hostNetwork, no difference. Removing unbound's access-control makes no difference either.

If I use kube-dns 1.9, which we had prior upgrading to 1.14.7, it just works, no matter the resolv.conf on the host has.

@tuminoid
Copy link
Author

Made it work with unbound and upstreamNameservers. It appears that local-zone "." is not valid (or doesn't trigger right response), but adding local-zone "local." and local-zone "cluster.local." made the trick. I'm thus now running unbound in a container with clusterIP: 10.254.0.3 and pointing kube-dns upstreamNameservers to that IP.

That said, I'd consider needing such tricks a bug, especially when its regression from kube-dns 1.9.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 12, 2018
@tuminoid
Copy link
Author

/remove-lifecycle stale

Very much valid issue still.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 12, 2018
@mikehollinger
Copy link

Hitting this as well!

@bowei bowei added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2018
@jjustinwhite
Copy link

I had a similar issue, but removing hostNetwork fixed it for me, so not the same as above. Figured I'd mention it though, since this post helped me realize it was having hostNetwork set to true, that was causing my pods to be unable to resolve the FQDNS.

@tuminoid
Copy link
Author

Thanks @MrHohn for a ping, but unfortunately kubernetes/kubernetes#67302 does not resolve this issue. With non-responding DNS server, kube-dns still fails to resolve cluster-local DNS names and just times out. Tested with k8s 1.10.4 and kube-dns 1.14.10.

@chrisohaver
Copy link
Contributor

chrisohaver commented Aug 15, 2018

kube-dns fails to resolve any cluster service in any namespace, including kube-dns itself, unless queried using FQDN

@tuminoid, How are you executing the queries? For example, if using dig I recall it does not follow the search path unless you specify +search ...

(edit) Also , recent busybox builds have a "broken" nslookup, that does not follow search path.

@MrHohn
Copy link
Member

MrHohn commented Aug 15, 2018

With non-responding DNS server, kube-dns still fails to resolve cluster-local DNS names and just times out. Tested with k8s 1.10.4 and kube-dns 1.14.10.

Humm, I got a different result than what you described, probably our setup is different.

I basically set upstreamNameservers in kube-dns configmap to [127.0.0.1] such that any external name should be unresolvable. With kubernetes/kubernetes#67302, below is what I got within the cluster.

:~# nslookup kubernetes.default.svc.cluster.local.
Server:         10.0.0.10                                          
Address:        10.0.0.10#53
                                                                                         
Name:   kubernetes.default.svc.cluster.local
Address: 10.0.0.1                                                                        
                                                 
:~# nslookup google.com                     
Server:         10.0.0.10                            
Address:        10.0.0.10#53                                                             
                                                                                         
** server can't find google.com: REFUSED

cc @hedayat

@tuminoid
Copy link
Author

tuminoid commented Aug 16, 2018

@chrisohaver I'm using busybox with nslookup, and it works as it should in both positive and negative test. I'm aware of dig behaving differently, won't use it for testing.

@MrHohn:
Test case is if kubernetes and kubernetes.default resolves correctly. kubernetes.default.svc.cluster.local always resolves, there is never issue with that.

Result is the same whether you set upstreamNameServers in kube-dns config or have the same directly in /etc/resolv.conf:

  1. IP with no route to it: each DNS request hangs until timeout, and nothing gets resolved

  2. 127.0.0.1 (something has route but it refuses connection): instant lookup failure for everything, no timeouts

  3. No IP address at all. Same as above. Note that in case upstreamNameServers is empty, it falls back to resolv.conf, which also then has to be empty.

  4. `IP of any DNS server that connects, regardless of query response>: everything resolves as they should

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 14, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants