Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns can't resolve kubernetes.default and/or cluster.local #66924

Closed
cparjaszewski opened this issue Aug 2, 2018 · 18 comments
Closed

dns can't resolve kubernetes.default and/or cluster.local #66924

cparjaszewski opened this issue Aug 2, 2018 · 18 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@cparjaszewski
Copy link

cparjaszewski commented Aug 2, 2018

/kind bug
What happened:
I've setup Kubernetes cluster on Ubuntu 18.04, v1.11.1:

KubeDNS:

$ kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   10d
kubernetes-dashboard   ClusterIP   10.99.230.158    <none>        443/TCP         4d
tiller-deploy          ClusterIP   10.111.190.156   <none>        44134/TCP       8d

Version:

$ kubectl version --short
Client Version: v1.11.1
Server Version: v1.11.1

When I run busybox for testing:

kubectl create -f https://k8s.io/examples/admin/dns/busybox.yaml

I am getting this:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl exec -ti busybox -- nslookup cluster.local
Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find cluster.local: NXDOMAIN

*** Can't find cluster.local: No answer

What you expected to happen:

I expect the kubernetes.default or cluster.local to be resolved.

How to reproduce it (as minimally and precisely as possible):
Maybe try to install new k8s cluster on Ubuntu 18.04 following official instructions.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Bare metal, OVH, Ubuntu 18.04
  • OS (e.g. from /etc/os-release):
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
$ uname -a
Linux kubernetes-slave 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:
    These are my pods:
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name
pod/coredns-78fcdf6894-c4sk8
pod/coredns-78fcdf6894-mzv9t
pod/kube-dns-569b8c4c96-bwwvm

Here are pod logs:

$ kubectl logs --namespace=kube-system kube-dns-569b8c4c96-bwwvm -c sidecar
ERROR: logging before flag.Parse: W0802 17:51:49.028526       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:59054->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:51:54.029062       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:51343->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:51:59.029389       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58205->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:04.029922       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:37475->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:09.030484       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:39067->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:14.030962       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:38175->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:19.031436       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:56535->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:24.031820       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:57310->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:29.032374       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:37181->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:34.032952       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:37284->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:39.033511       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:51098->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:44.034022       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:36836->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:49.034444       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:57543->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:54.034865       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:38068->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:52:59.035304       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:59394->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:04.035717       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:36127->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:09.036246       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:42850->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:14.036602       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:43571->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:19.037163       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:45439->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:24.037654       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:35007->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:29.038002       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:46336->127.0.0.1:53: read: connection refused
ERROR: logging before flag.Parse: W0802 17:53:34.038500       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:50540->127.0.0.1:53: read: connection refused
$ kubectl logs --namespace=kube-system kube-dns-569b8c4c96-bwwvm -c dnsmasq
I0802 17:53:35.100942       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0802 17:53:35.101079       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0802 17:53:35.336808       1 nanny.go:111]
W0802 17:53:35.336832       1 nanny.go:112] Got EOF from stdout
I0802 17:53:35.336849       1 nanny.go:108] dnsmasq[18]: started, version 2.78-security-prerelease cachesize 1000
I0802 17:53:35.336870       1 nanny.go:108] dnsmasq[18]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0802 17:53:35.336877       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0802 17:53:35.336880       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0802 17:53:35.336883       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0802 17:53:35.336887       1 nanny.go:108] dnsmasq[18]: reading /etc/resolv.conf
I0802 17:53:35.336895       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0802 17:53:35.336901       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0802 17:53:35.336907       1 nanny.go:108] dnsmasq[18]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0802 17:53:35.336912       1 nanny.go:108] dnsmasq[18]: using nameserver 10.125.211.1#53
I0802 17:53:35.336917       1 nanny.go:108] dnsmasq[18]: using nameserver 10.96.0.10#53
I0802 17:53:35.336922       1 nanny.go:108] dnsmasq[18]: using nameserver 213.186.33.99#53
I0802 17:53:35.336939       1 nanny.go:108] dnsmasq[18]: read /etc/hosts - 7 addresses
$ kubectl logs --namespace=kube-system kube-dns-569b8c4c96-bwwvm -c kubedns
I0802 17:49:38.070785       1 dns.go:48] version: 1.14.4-2-g5584e04
I0802 17:49:38.071345       1 server.go:66] Using configuration read from ConfigMap: kube-system:kube-dns
I0802 17:49:38.071371       1 server.go:113] FLAG: --alsologtostderr="false"
I0802 17:49:38.071379       1 server.go:113] FLAG: --config-dir=""
I0802 17:49:38.071383       1 server.go:113] FLAG: --config-map="kube-dns"
I0802 17:49:38.071387       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0802 17:49:38.071390       1 server.go:113] FLAG: --config-period="10s"
I0802 17:49:38.071394       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0802 17:49:38.071397       1 server.go:113] FLAG: --dns-port="10053"
I0802 17:49:38.071402       1 server.go:113] FLAG: --domain="cluster.local."
I0802 17:49:38.071406       1 server.go:113] FLAG: --federations=""
I0802 17:49:38.071410       1 server.go:113] FLAG: --healthz-port="8081"
I0802 17:49:38.071413       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0802 17:49:38.071416       1 server.go:113] FLAG: --kube-master-url=""
I0802 17:49:38.071420       1 server.go:113] FLAG: --kubecfg-file=""
I0802 17:49:38.071422       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0802 17:49:38.071428       1 server.go:113] FLAG: --log-dir=""
I0802 17:49:38.071433       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0802 17:49:38.071440       1 server.go:113] FLAG: --logtostderr="true"
I0802 17:49:38.071445       1 server.go:113] FLAG: --nameservers=""
I0802 17:49:38.071452       1 server.go:113] FLAG: --stderrthreshold="2"
I0802 17:49:38.071457       1 server.go:113] FLAG: --v="2"
I0802 17:49:38.071464       1 server.go:113] FLAG: --version="false"
I0802 17:49:38.071474       1 server.go:113] FLAG: --vmodule=""
I0802 17:49:38.071525       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0802 17:49:38.071749       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0802 17:49:38.071757       1 dns.go:147] Starting endpointsController
I0802 17:49:38.071761       1 dns.go:150] Starting serviceController
I0802 17:49:38.071836       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0802 17:49:38.071855       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0802 17:49:38.082493       1 sync_configmap.go:107] ConfigMap kube-system:kube-dns was created
I0802 17:49:38.581981       1 dns.go:171] Initialized services and endpoints from apiserver
I0802 17:49:38.582016       1 server.go:129] Setting up Healthz Handler (/readiness)
I0802 17:49:38.582031       1 server.go:134] Setting up cache handler (/cache)
I0802 17:49:38.582045       1 server.go:120] Status HTTP port 8081
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Aug 2, 2018
@neolit123
Copy link
Member

/sig cli
/sig network

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 3, 2018
@cparjaszewski
Copy link
Author

Actually the same problem is on newest MacOS, when I run busybox pod and nslookup the domain:

** Can't find kubernetes.default: ***
** Can't find cluster.local: ***

@chrisohaver
Copy link
Contributor

chrisohaver commented Aug 3, 2018

ERROR: logging before flag.Parse: W0802 17:51:49.028526 1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:59054->127.0.0.1:53: read: connection refused

This means that the health container is getting no response from dnsmasq container - which is all via local addresses in the same pod - so shouldn't be a k8s level networking issue.
It implies something broken with the dnsmasq container.

Edit: Actually - looking at the time stamps on the logs, its not clear, all the connection refusal messages were before dnsmasq was listening... so those messages are expected. Presumably they stopped after 17:53:35?

@cparjaszewski
Copy link
Author

cparjaszewski commented Aug 3, 2018

How about my MacOS (docker-for-desktop) issue, do you know why I'm getting: Can't find: [..] No answer response?

$ kubectl -n default exec -ti busybox nslookup kubernetes.default

Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl -n default exec -ti busybox nslookup svc.cluster.local

Address:	10.96.0.10:53

** server can't find svc.cluster.local: NXDOMAIN

*** Can't find svc.cluster.local: No answer

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name:

pod/kube-dns-86f4d74b45-b4dd8

$ kubectl logs --namespace=kube-system kube-dns-86f4d74b45-b4dd8 -c kubedns:

I0728 05:57:29.240435       1 dns.go:48] version: 1.14.8
I0728 05:57:29.242195       1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I0728 05:57:29.242734       1 server.go:119] FLAG: --alsologtostderr="false"
I0728 05:57:29.243178       1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I0728 05:57:29.243466       1 server.go:119] FLAG: --config-map=""
I0728 05:57:29.243638       1 server.go:119] FLAG: --config-map-namespace="kube-system"
I0728 05:57:29.243803       1 server.go:119] FLAG: --config-period="10s"
I0728 05:57:29.243970       1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I0728 05:57:29.244136       1 server.go:119] FLAG: --dns-port="10053"
I0728 05:57:29.244309       1 server.go:119] FLAG: --domain="cluster.local."
I0728 05:57:29.244450       1 server.go:119] FLAG: --federations=""
I0728 05:57:29.244620       1 server.go:119] FLAG: --healthz-port="8081"
I0728 05:57:29.244774       1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I0728 05:57:29.244988       1 server.go:119] FLAG: --kube-master-url=""
I0728 05:57:29.245240       1 server.go:119] FLAG: --kubecfg-file=""
I0728 05:57:29.245382       1 server.go:119] FLAG: --log-backtrace-at=":0"
I0728 05:57:29.245435       1 server.go:119] FLAG: --log-dir=""
I0728 05:57:29.245774       1 server.go:119] FLAG: --log-flush-frequency="5s"
I0728 05:57:29.245935       1 server.go:119] FLAG: --logtostderr="true"
I0728 05:57:29.246061       1 server.go:119] FLAG: --nameservers=""
I0728 05:57:29.246217       1 server.go:119] FLAG: --stderrthreshold="2"
I0728 05:57:29.246346       1 server.go:119] FLAG: --v="2"
I0728 05:57:29.246561       1 server.go:119] FLAG: --version="false"
I0728 05:57:29.246724       1 server.go:119] FLAG: --vmodule=""
I0728 05:57:29.246916       1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I0728 05:57:29.247653       1 server.go:220] Skydns metrics enabled (/metrics:10055)
I0728 05:57:29.247906       1 dns.go:146] Starting endpointsController
I0728 05:57:29.248149       1 dns.go:149] Starting serviceController
I0728 05:57:29.248673       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0728 05:57:29.248849       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0728 05:57:29.796767       1 dns.go:170] Initialized services and endpoints from apiserver
I0728 05:57:29.797167       1 server.go:135] Setting up Healthz Handler (/readiness)
I0728 05:57:29.797257       1 server.go:140] Setting up cache handler (/cache)
I0728 05:57:29.798125       1 server.go:126] Status HTTP port 8081

$ kubectl logs --namespace=kube-system kube-dns-86f4d74b45-b4dd8 -c dnsmasq:

I0728 05:58:10.959380       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0728 05:58:10.959543       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0728 05:58:11.198627       1 nanny.go:116] dnsmasq[10]: started, version 2.78 cachesize 1000
I0728 05:58:11.198810       1 nanny.go:116] dnsmasq[10]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0728 05:58:11.198975       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0728 05:58:11.199114       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0728 05:58:11.199239       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0728 05:58:11.199295       1 nanny.go:116] dnsmasq[10]: reading /etc/resolv.conf
I0728 05:58:11.199407       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0728 05:58:11.199528       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0728 05:58:11.199623       1 nanny.go:116] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0728 05:58:11.199706       1 nanny.go:116] dnsmasq[10]: using nameserver 192.168.65.1#53
I0728 05:58:11.199748       1 nanny.go:116] dnsmasq[10]: read /etc/hosts - 7 addresses
I0728 05:58:11.199876       1 nanny.go:119]
W0728 05:58:11.200030       1 nanny.go:120] Got EOF from stdout

$ kubectl logs --namespace=kube-system kube-dns-86f4d74b45-b4dd8 -c sidecar:

I0728 05:57:51.268168       1 main.go:51] Version v1.14.8
I0728 05:57:51.268542       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
I0728 05:57:51.268645       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
I0728 05:57:51.269120       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
W0728 05:57:51.270089       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:55056->127.0.0.1:53: read: connection refused
W0728 05:57:56.270841       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:45051->127.0.0.1:53: read: connection refused
W0728 05:58:01.238587       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:50466->127.0.0.1:53: read: connection refused
W0728 05:58:06.239288       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:47205->127.0.0.1:53: read: connection refused
W0802 05:48:17.102974       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:56156->127.0.0.1:53: i/o timeout

@chrisohaver
Copy link
Contributor

chrisohaver commented Aug 3, 2018

Try querying one of the the kube-dns pods directly, to see if it's a network layer issue... e.g.

kubectl -n default exec -ti busybox nslookup kubernetes.default <ip-address-of-pod>

@cparjaszewski
Copy link
Author

cparjaszewski commented Aug 3, 2018

$ kubectl -n kube-system get pods | grep dns:

kube-dns-86f4d74b45-b4dd8                    3/3       Running   0          6d

$ kubectl -n default exec -ti busybox nslookup kubernetes.default 10.1.0.3:

Server:		10.1.0.3
Address:	10.1.0.3:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl -n default exec -ti busybox nslookup cluster.local 10.1.0.3:

Server:		10.1.0.3
Address:	10.1.0.3:53

** server can't find cluster.local: NXDOMAIN

*** Can't find cluster.local: No answer

$ kubectl -n kube-system describe pod kube-dns-86f4d74b45-b4dd8:

$ kubectl -n kube-system describe pod kube-dns-86f4d74b45-b4dd8
Name:           kube-dns-86f4d74b45-b4dd8
Namespace:      kube-system
Node:           docker-for-desktop/192.168.65.3
Start Time:     Sat, 28 Jul 2018 07:56:47 +0200
Labels:         k8s-app=kube-dns
                pod-template-hash=4290830601
Annotations:    <none>
Status:         Running
IP:             10.1.0.3
Controlled By:  ReplicaSet/kube-dns-86f4d74b45
Containers:
  kubedns:
    Container ID:  docker://579234d28e9a514f721654bd618998b6a519e3da338a5aaacf7e2b187ce5d1fb
    Image:         k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.8
    Image ID:      docker-pullable://k8s.gcr.io/k8s-dns-kube-dns-amd64@sha256:6d8e0da4fb46e9ea2034a3f4cab0e095618a2ead78720c12e791342738e5f85d
    Ports:         10053/UDP, 10053/TCP, 10055/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
      --config-dir=/kube-dns-config
      --v=2
    State:          Running
      Started:      Sat, 28 Jul 2018 07:57:28 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:      100m
      memory:   70Mi
    Liveness:   http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:  http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Environment:
      PROMETHEUS_PORT:  10055
    Mounts:
      /kube-dns-config from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-nhjhs (ro)
  dnsmasq:
    Container ID:  docker://c4a8664085b6e3a6c80c77cf6066198e5b75582c3abdb5845fdb4e71ccf4f70e
    Image:         k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.8
    Image ID:      docker-pullable://k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64@sha256:93c827f018cf3322f1ff2aa80324a0306048b0a69bc274e423071fb0d2d29d8b
    Ports:         53/UDP, 53/TCP
    Host Ports:    0/UDP, 0/TCP
    Args:
      -v=2
      -logtostderr
      -configDir=/etc/k8s/dns/dnsmasq-nanny
      -restartDnsmasq=true
      --
      -k
      --cache-size=1000
      --no-negcache
      --log-facility=-
      --server=/cluster.local/127.0.0.1#10053
      --server=/in-addr.arpa/127.0.0.1#10053
      --server=/ip6.arpa/127.0.0.1#10053
    State:          Running
      Started:      Sat, 28 Jul 2018 07:58:10 +0200
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        150m
      memory:     20Mi
    Liveness:     http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-nhjhs (ro)
  sidecar:
    Container ID:  docker://096dd9329bac9a7bbbc5c60e45cdc436a87a60f999a07e69ee99a358d6e3c97f
    Image:         k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.8
    Image ID:      docker-pullable://k8s.gcr.io/k8s-dns-sidecar-amd64@sha256:23df717980b4aa08d2da6c4cfa327f1b730d92ec9cf740959d2d5911830d82fb
    Port:          10054/TCP
    Host Port:     0/TCP
    Args:
      --v=2
      --logtostderr
      --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
      --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
    State:          Running
      Started:      Sat, 28 Jul 2018 07:57:51 +0200
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Liveness:     http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-nhjhs (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  kube-dns-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-dns
    Optional:  true
  kube-dns-token-nhjhs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-dns-token-nhjhs
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

@chrisohaver
Copy link
Contributor

I just noticed you are running coredns and kube-dns in parallel...

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name
pod/coredns-78fcdf6894-c4sk8
pod/coredns-78fcdf6894-mzv9t
pod/kube-dns-569b8c4c96-bwwvm

Can you query the coredns pods directly via pod IP?

@cparjaszewski
Copy link
Author

cparjaszewski commented Aug 3, 2018

On my MacOS I have only one pod with kube-dns:

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name:

pod/kube-dns-86f4d74b45-b4dd8

On my server (Ubuntu 18.04) I have 3 pods:
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name:

pod/coredns-78fcdf6894-c4sk8
pod/coredns-78fcdf6894-mzv9t
pod/kube-dns-569b8c4c96-bwwvm

$ kubectl -n default exec -ti busybox nslookup kubernetes.default 10.244.0.84:

Server:     10.244.0.84
Address:    10.244.0.84:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl -n default exec -ti busybox nslookup cluster.local 10.244.0.84:

Server:     10.244.0.84
Address:    10.244.0.84:53


*** Can't find cluster.local: No answer

$ kubectl -n default exec -ti busybox nslookup kubernetes.default 10.244.0.82:

Server:     10.244.0.82
Address:    10.244.0.82:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl -n default exec -ti busybox nslookup cluster.local 10.244.0.82:

Server:     10.244.0.82
Address:    10.244.0.82:53


*** Can't find cluster.local: No answer

More details on these 2 coredns pods on Ubuntu:

$ kubectl -n kube-system describe pod coredns-78fcdf6894-c4sk8:

Name:               coredns-78fcdf6894-c4sk8
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               kubernetes-slave/37.59.16.40
Start Time:         Mon, 23 Jul 2018 13:56:35 +0000
Labels:             k8s-app=kube-dns
                    pod-template-hash=3497892450
Annotations:        <none>
Status:             Running
IP:                 10.244.0.84
Controlled By:      ReplicaSet/coredns-78fcdf6894
Containers:
  coredns:
    Container ID:  docker://e76a47934a878a44158d8fa90bc3c0077fa3f11f8c82eb3c62f9615f24e76337
    Image:         k8s.gcr.io/coredns:1.1.3
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:db2bf53126ed1c761d5a41f24a1b82a461c85f736ff6e90542e9522be4757848
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 01 Aug 2018 14:25:51 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 01 Aug 2018 14:24:59 +0000
      Finished:     Wed, 01 Aug 2018 14:25:50 +0000
    Ready:          True
    Restart Count:  2
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-ch7j7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-ch7j7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-ch7j7
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From                       Message
  ----     ------            ----                 ----                       -------
  Warning  DNSConfigForming  59s (x9409 over 8d)  kubelet, kubernetes-slave  Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.125.211.1 213.186.33.99 127.0.0.1

$ kubectl -n kube-system describe pod coredns-78fcdf6894-mzv9t:

Name:               coredns-78fcdf6894-mzv9t
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               kubernetes-slave/37.59.16.40
Start Time:         Mon, 23 Jul 2018 13:56:35 +0000
Labels:             k8s-app=kube-dns
                    pod-template-hash=3497892450
Annotations:        <none>
Status:             Running
IP:                 10.244.0.82
Controlled By:      ReplicaSet/coredns-78fcdf6894
Containers:
  coredns:
    Container ID:  docker://03aee6a3ae008ddbe14ec6ad14190ab9bc6e6e20ff492d4d3cd903cd55d89bf1
    Image:         k8s.gcr.io/coredns:1.1.3
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:db2bf53126ed1c761d5a41f24a1b82a461c85f736ff6e90542e9522be4757848
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 01 Aug 2018 14:25:23 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 23 Jul 2018 13:56:38 +0000
      Finished:     Wed, 01 Aug 2018 14:24:56 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-ch7j7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-ch7j7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-ch7j7
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                       Message
  ----     ------            ----                ----                       -------
  Warning  DNSConfigForming  2m (x9382 over 8d)  kubelet, kubernetes-slave  Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.125.211.1 213.186.33.99 127.0.0.1

@qgicup
Copy link

qgicup commented Aug 7, 2018

Any solution on this? Also having this problem

@cgebe
Copy link

cgebe commented Aug 9, 2018

I am facing the same issue, running core-dns in a kubeadm cluster. v1.11.1
Making statefulsets difficult to deploy correctly.

@gogene
Copy link

gogene commented Aug 9, 2018

It looks like DNS inside busybox does not work properly.
At least it works for me with busybox images <= 1.28.4

@cparjaszewski
Copy link
Author

cparjaszewski commented Aug 9, 2018

@gogene Ok - version 1.28.4 solves it, works like a charm, thank you, I think we can close this issue.`

$ kubectl -n default exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

Btw. do you know why it can't resolve svc.cluster.local?

kubectl -n default exec -ti busybox -- nslookup svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'svc.cluster.local'
command terminated with exit code 1

@vicykie
Copy link

vicykie commented May 31, 2019

It looks like DNS inside busybox does not work properly.
At least it works for me with busybox images <= 1.28.4

THX

reshordling pushed a commit to reshordling/kubescaler that referenced this issue Jun 25, 2019
keshavpeswani added a commit to ExpediaDotCom/haystack that referenced this issue Oct 24, 2019
Fixing version for busybox as DNS for busybox doesnt from version > 1.28.4
For more details refer here: kubernetes/kubernetes#66924 (comment)
kapilra pushed a commit to ExpediaDotCom/haystack that referenced this issue Oct 24, 2019
Fixing version for busybox as DNS for busybox doesnt from version > 1.28.4
For more details refer here: kubernetes/kubernetes#66924 (comment)
@cgebe
Copy link

cgebe commented May 27, 2020

In my case it was a missing IP tables rule on a dedicated server. Resolved by executing on the server:

iptables -w -P FORWARD ACCEPT

@fzyzcjy
Copy link

fzyzcjy commented Aug 9, 2020

@gogene P.S. In 2020.08, the 1.32.0 still has problem in nslookup. (2 years has passed...)

@plantegg
Copy link

plantegg commented Sep 17, 2020

there are two reason caused this issue:

  1. the nslookup program with busybox:latest can not extract the record from dns response udp packet( the correct ip response from dns server)
  2. options ndots:5 in the /etc/resolv.conf caused some other issue when the domain with many dot. after i changed the ndots from 5 to 7, then i can get:
/ # nslookup -debug -timeout=2 mysql-0.mysql.default.svc.cluster.local. 
Server:		10.68.0.2
Address:	10.68.0.2:53

Query #0 completed in 1ms:
Name:	mysql-0.mysql.default.svc.cluster.local
Address: 172.20.185.197

*** Can't find mysql-0.mysql.default.svc.cluster.local.: No answer

/ # cat /etc/resolv.conf 
nameserver 10.68.0.2
search default.svc.cluster.local. svc.cluster.local. cluster.local.
options ndots:7

@gaganyaan2
Copy link

gaganyaan2 commented Mar 7, 2022

I tried @plantegg solution adding options ndots:7 to /etc/resolv.conf but it had no effect. It's seems like hit or miss. It gave me result first time but then it was giving error. I tried multiple times and some time I got nslookup result but It was very infrequent.

I also ran nslookup for ndots:5, ndots:7, ndots:10 in while loop approx. 200 times with timeout=2 seconds. Below are the results.

  • ndots:5 = 39 times nslookup query worked/200
  • ndots:7 = 22 times nslookup query worked/200
  • ndots:10 = 16 times nslookup query worked/200

Below shell script I used to calculate this result.

echo 'while(true); do
nslookup -timeout=2 kubernetes > /dev/null 2>&1
result=$?
if [ "$result" == "0" ]; then
	echo "$(date +%s) : $result : pass" >> /tmp/nslookup_status
elif [ "$result" == "1" ]; then
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
else
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
fi
done' > nslookup_status.sh

chmod +x nslookup_status.sh
./nslookup_status.sh &

busybox-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: "busybox1"
spec:
  containers:
  - image: busybox
    name: busybox
    command: [ "sleep","6000"]
  dnsConfig:
    options:
      - name: ndots
        value: "7"

busybox Image hash : busybox:latest@sha256:34c3559bbdedefd67195e766e38cfbb0fcabff4241dbee3f390fd6e3310f5ebc

@guettli
Copy link
Contributor

guettli commented Mar 16, 2022

Just for the records, I opened a new issue at the bugtracker of busybox: https://bugs.busybox.net/show_bug.cgi?id=14671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests