Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stubDomains problem - upstream server is tcp only thanks to aws elb #81

Closed
chrislovecnm opened this issue May 4, 2017 · 13 comments
Closed
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@chrislovecnm
Copy link

chrislovecnm commented May 4, 2017

My configMap, for configuring kube-dns, is loading, with a stubDomain. Because I am working in EC2 I have to use an ELB that does not support UDP, but only support tcp.

Layout

  • k8s 1.6.1 Two clusters
  • one in us-east-1, and one in us-west-2
  • separate dnsmasq server is exposing internal kube-dns on and ELB
  • each cluster has a "dig" container pod setup for testing
  • clusters are setup with different domains based on the region

Logs

I0504 00:03:32.944847       1 sync.go:167] Updated stubDomains to map[us-east-1-cluster.local:[52.20.35.216]]
I0504 00:03:32.945039       1 nanny.go:186] Restarting dnsmasq with new configuration
I0504 00:03:32.945063       1 nanny.go:135] Killing dnsmasq
I0504 00:03:32.945092       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/us-west-2-cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/in6.arpa/127.0.0.1#10053 --server /us-east-1-cluster.local/52.20.35.216]
I0504 00:03:32.945491       1 nanny.go:111]
W0504 00:03:32.945534       1 nanny.go:112] Got EOF from stderr
I0504 00:03:33.082404       1 nanny.go:111]
W0504 00:03:33.082500       1 nanny.go:112] Got EOF from stdout
I0504 00:03:33.082561       1 nanny.go:108] dnsmasq[11]: started, version 2.76 cachesize 1000
I0504 00:03:33.082626       1 nanny.go:108] dnsmasq[11]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0504 00:03:33.082662       1 nanny.go:108] dnsmasq[11]: using nameserver 52.20.35.216#53 for domain us-east-1-cluster.local
I0504 00:03:33.082699       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in6.arpa
I0504 00:03:33.082725       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 00:03:33.082749       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain us-west-2-cluster.local
I0504 00:03:33.082822       1 nanny.go:108] dnsmasq[11]: reading /etc/resolv.conf
I0504 00:03:33.082851       1 nanny.go:108] dnsmasq[11]: using nameserver 52.20.35.216#53 for domain us-east-1-cluster.local
I0504 00:03:33.082885       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in6.arpa
I0504 00:03:33.082910       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0504 00:03:33.082958       1 nanny.go:108] dnsmasq[11]: using nameserver 127.0.0.1#10053 for domain us-west-2-cluster.local
I0504 00:03:33.082985       1 nanny.go:108] dnsmasq[11]: using nameserver 172.60.0.205#53
I0504 00:03:33.083012       1 nanny.go:108] dnsmasq[11]: using nameserver 172.60.0.2#53
I0504 00:03:33.083073       1 nanny.go:108] dnsmasq[11]: read /etc/hosts - 7 addresses

Diagnostics

  1. login to the dig pod in west
  2. dig us-east-1-cluster.local - fails
  3. dig +tcp @52.20.35.216 elb - success (52.20.35.216 is an elb / LoadBalancer service in east fronting dnsmasq)

Ideas?

@chrislovecnm chrislovecnm changed the title stubDomains connection timed out; no servers could be reached stubDomains problem - error: connection timed out; no servers could be reached May 4, 2017
@chrislovecnm
Copy link
Author

I am assuming this is a tcp vs udp problem.

  1. changed the configmap to use a pod ip, which allows for udp.
  2. login to the dig pod in west
  3. dig us-east-1-cluster.local - succeeds

@chrislovecnm chrislovecnm changed the title stubDomains problem - error: connection timed out; no servers could be reached stubDomains problem - upstream server is tcp only thanks to aws elb May 4, 2017
@cmluciano
Copy link

/sig aws

@k8s-ci-robot
Copy link
Contributor

@cmluciano: These labels do not exist in this repository: sig/aws.

In response to this:

/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cmluciano
Copy link

Any ideas on this @justinsb

@bowei
Copy link
Member

bowei commented May 16, 2017

Do normal resolvers try TCP if UDP fails? I know the truncation code path tries TCP if the reply is too big, but not sure what happens when there is no reply at all...

@chrislovecnm
Copy link
Author

@cmluciano I am one of the sig-aws maintainers btw :)

@bowei it is not working with the ELB that is TCP only. So I would say no

@bowei bowei self-assigned this May 25, 2017
@chrislovecnm
Copy link
Author

chrislovecnm commented May 26, 2017

@bowei to add a bit more clarity.

  • ELB in place with TCP only
  • Outside client executing a dig command setting dig to work only with TCP.
  • Dig command fails

From this workflow, I would say that the DNS pod is not working on the ELB. I am able to use the pod locally fine. Any ideas on debugging?

If the client communicated with the DNS internal cluster IP address it works fine, but the IP pod ip address is not guaranteed, so I need an external ELB service.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017
@chrislovecnm
Copy link
Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 3, 2018
@prameshj
Copy link
Contributor

prameshj commented Sep 18, 2019

You can try using nodelocaldns, which connects to upstream servers via TCP by design. It now supports stubDomains configured in kube-dns configmap. Updated image - k8s.gcr.io/k8s-dns-node-cache:1.15.6

You can use a yaml similar to https://github.com/kubernetes/kubernetes/pull/82845/files, you don't need all those changes. Just the part that mounts the kube-dns configmap.

@prameshj
Copy link
Contributor

/assign @chrislovecnm

anything more needed here?

@chrislovecnm
Copy link
Author

I have not tested this @prameshj or touched this in a while. I would say close it unless someone wants to recreate it

@prameshj
Copy link
Contributor

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants