New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq pod CrashLoopBackOff #2027

Open
Siilwyn opened this Issue Oct 3, 2017 · 22 comments

Comments

Projects
None yet
@Siilwyn
Copy link

Siilwyn commented Oct 3, 2017

BUG REPORT:

Environment:

Minikube version: v0.22.2

  • OS: Ubuntu 16.04
  • VM Driver: none
  • Install tools: ???
  • Others:
    • Kubernetes version: v1.7.5

What happened:
Trying to resolve URLs won't work, for example connecting to the GH API from a pod returns: Error: getaddrinfo EAI_AGAIN api.github.com.

What you expected to happen:
Resolve an URL.

How to reproduce it (as minimally and precisely as possible):

  1. sudo minikube start --vm-drive=none
  2. kubectl create -f busybox.yaml (busyboxy from k8s docs)
  3. kubectl exec -ti busybox -- nslookup kubernetes.default

Returns:

Server:    10.0.0.10
Address 1: 10.0.0.10

nslookup: can't resolve 'kubernetes.default'

Output of minikube logs (if applicable):
⚠️ It looks like dnsmaq is failing to start, tail from minikube logs:

Oct 03 16:34:18 glooming-asteroid localkube[26499]: I1003 16:34:18.653793   26499 kuberuntime_manager.go:457] Container {Name:dnsmasq Image:gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.4 Command:[] Args:[-v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:150 scale:-3} d:{Dec:<nil>} s:150m Format:DecimalSI} memory:{i:{value:20971520 scale:0} d:{Dec:<nil>} s:20Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-config ReadOnly:false MountPath:/etc/k8s/dns/dnsmasq-nanny SubPath:} {Name:default-token-dkjg2 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthcheck/dnsmasq,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Oct 03 16:34:18 glooming-asteroid localkube[26499]: I1003 16:34:18.653979   26499 kuberuntime_manager.go:741] checking backoff for container "dnsmasq" in pod "kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"
Oct 03 16:34:18 glooming-asteroid localkube[26499]: I1003 16:34:18.654088   26499 kuberuntime_manager.go:751] Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)
Oct 03 16:34:18 glooming-asteroid localkube[26499]: E1003 16:34:18.654121   26499 pod_workers.go:182] Error syncing pod 7b11e42b-a79a-11e7-b83c-0090f5ed1486 ("kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"), skipping: failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"
Oct 03 16:34:32 glooming-asteroid localkube[26499]: I1003 16:34:32.653745   26499 kuberuntime_manager.go:457] Container {Name:dnsmasq Image:gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.4 Command:[] Args:[-v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:150 scale:-3} d:{Dec:<nil>} s:150m Format:DecimalSI} memory:{i:{value:20971520 scale:0} d:{Dec:<nil>} s:20Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-config ReadOnly:false MountPath:/etc/k8s/dns/dnsmasq-nanny SubPath:} {Name:default-token-dkjg2 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthcheck/dnsmasq,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Oct 03 16:34:32 glooming-asteroid localkube[26499]: I1003 16:34:32.653993   26499 kuberuntime_manager.go:741] checking backoff for container "dnsmasq" in pod "kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"
Oct 03 16:34:32 glooming-asteroid localkube[26499]: I1003 16:34:32.654136   26499 kuberuntime_manager.go:751] Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)
Oct 03 16:34:32 glooming-asteroid localkube[26499]: E1003 16:34:32.654174   26499 pod_workers.go:182] Error syncing pod 7b11e42b-a79a-11e7-b83c-0090f5ed1486 ("kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"), skipping: failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"
Oct 03 16:34:44 glooming-asteroid localkube[26499]: I1003 16:34:44.654035   26499 kuberuntime_manager.go:457] Container {Name:dnsmasq Image:gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.4 Command:[] Args:[-v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:150 scale:-3} d:{Dec:<nil>} s:150m Format:DecimalSI} memory:{i:{value:20971520 scale:0} d:{Dec:<nil>} s:20Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-config ReadOnly:false MountPath:/etc/k8s/dns/dnsmasq-nanny SubPath:} {Name:default-token-dkjg2 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthcheck/dnsmasq,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Oct 03 16:34:44 glooming-asteroid localkube[26499]: I1003 16:34:44.654621   26499 kuberuntime_manager.go:741] checking backoff for container "dnsmasq" in pod "kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)"
Oct 03 16:34:44 glooming-asteroid localkube[26499]: I1003 16:34:44.655048   26499 kuberuntime_manager.go:751] Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-910330662-r7x4d_kube-system(7b11e42b-a79a-11e7-b83c-0090f5ed1486)

Anything else do we need to know:
Some troubleshooting commands with output:
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

NAME                       READY     STATUS    RESTARTS   AGE
kube-dns-910330662-r7x4d   3/3       Running   11         20h

kubectl get svc --namespace=kube-system

NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns               10.0.0.10    <none>        53/UDP,53/TCP   20h
kubernetes-dashboard   10.0.0.193   <nodes>       80:30000/TCP    20h

⚠️ Endpoint is empty: kubectl get ep kube-dns --namespace=kube-system

NAME       ENDPOINTS   AGE
kube-dns               20h

Tail from kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq

I1003 14:43:04.930205     263 nanny.go:108] dnsmasq[280]: Maximum number of concurrent DNS queries reached (max: 150)
I1003 14:43:14.948913     263 nanny.go:108] dnsmasq[280]: Maximum number of concurrent DNS queries reached (max: 150)
@Siilwyn

This comment has been minimized.

Copy link

Siilwyn commented Oct 3, 2017

I would be very happy with a workaround so I'm not stuck on this issue.

@chrisohaver

This comment has been minimized.

Copy link

chrisohaver commented Oct 4, 2017

As a workaround, you may try deploying coredns instead of kube-dns. If you do, take care to disable kube-dns in the add on manager ("minikube addons disable kube-dns").

Deployment: https://github.com/coredns/deployment/tree/master/kubernetes

@Siilwyn

This comment has been minimized.

Copy link

Siilwyn commented Oct 5, 2017

👋 thanks for your suggestion @chrisohaver, sadly it didn't solve my issue.

@r2d4 r2d4 added the drv/none label Oct 5, 2017

@dlorenc

This comment has been minimized.

Copy link
Contributor

dlorenc commented Oct 10, 2017

Could you provide any more information on the host you're running the None driver on?

@Siilwyn

This comment has been minimized.

Copy link

Siilwyn commented Oct 10, 2017

@dlorenc yes, what kind of information would you like?

@dlorenc

This comment has been minimized.

Copy link
Contributor

dlorenc commented Oct 10, 2017

Is it just a stock ubuntu 16.04 installation? Have you done anything special with the network settings? Is it running on a VM, or a physical machine?

@Siilwyn

This comment has been minimized.

Copy link

Siilwyn commented Oct 10, 2017

Yes stock Ubuntu 16.04, running on a physical machine (development laptop). Nothing special with network settings as far as I know...

`ifconfig` output when minikube is running
docker0   Link encap:Ethernet  HWaddr 02:42:05:86:af:80  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:5ff:fe86:af80/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:53165 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58600 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5366119 (5.3 MB)  TX bytes:20857731 (20.8 MB)

enp0s25   Link encap:Ethernet  HWaddr 00:90:f5:ed:14:86  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:20 Memory:f7e00000-f7e20000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:1591921 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1591921 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:705999303 (705.9 MB)  TX bytes:705999303 (705.9 MB)

veth4502c39 Link encap:Ethernet  HWaddr e2:e3:46:ed:6d:3a  
          inet6 addr: fe80::e0e3:46ff:feed:6d3a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:4865 (4.8 KB)

veth5232af3 Link encap:Ethernet  HWaddr 06:2c:3d:85:50:ee  
          inet6 addr: fe80::42c:3dff:fe85:50ee/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:68 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1027 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:22908 (22.9 KB)  TX bytes:128739 (128.7 KB)

vethebfe5aa Link encap:Ethernet  HWaddr ae:77:ef:4f:52:53  
          inet6 addr: fe80::ac77:efff:fe4f:5253/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:63 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5046 (5.0 KB)  TX bytes:25277 (25.2 KB)

vethf27d244 Link encap:Ethernet  HWaddr 9e:ae:23:6e:04:81  
          inet6 addr: fe80::9cae:23ff:fe6e:481/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2257 (2.2 KB)  TX bytes:7789 (7.7 KB)

vethf83f683 Link encap:Ethernet  HWaddr 66:8c:af:9f:6f:08  
          inet6 addr: fe80::648c:afff:fe9f:6f08/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:250 (250.0 B)  TX bytes:4865 (4.8 KB)

vethf85c67f Link encap:Ethernet  HWaddr fa:55:14:f3:72:81  
          inet6 addr: fe80::f855:14ff:fef3:7281/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:250 (250.0 B)  TX bytes:4865 (4.8 KB)

wlp3s0    Link encap:Ethernet  HWaddr b4:b6:76:a2:13:b3  
          inet addr:192.168.3.192  Bcast:192.168.3.255  Mask:255.255.254.0
          inet6 addr: fe80::4166:368b:55ff:fbd2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1300  Metric:1
          RX packets:4290172 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2467188 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5145628303 (5.1 GB)  TX bytes:270743860 (270.7 MB)
@jgoclawski

This comment has been minimized.

Copy link
Contributor

jgoclawski commented Oct 20, 2017

It affects me too. DNS works when using the Virtualbox driver and doesn't work when using the none driver.

System:

  • Ubuntu 17.04
  • Kernel 4.10.0-37-generic
  • Docker 17.04.0-ce

Nothing fancy with networking or DNS as far as I know
cat /etc/resolv.conf

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53

I'm using the newest minikube from Github.

kubectl exec -ti busybox -- nslookup kubernetes.default hangs with:

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

kubedns container seems to start correctly, the logs are the same as in virtuablox version.

dnsmasq seems to start correctly and after a while I see an info about reached limit:

I1020 12:24:30.649214       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I1020 12:24:30.649315       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I1020 12:24:30.705616       1 nanny.go:111] 
W1020 12:24:30.705637       1 nanny.go:112] Got EOF from stdout
I1020 12:24:30.705687       1 nanny.go:108] dnsmasq[14]: started, version 2.78-security-prerelease cachesize 1000
I1020 12:24:30.705700       1 nanny.go:108] dnsmasq[14]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I1020 12:24:30.705702       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I1020 12:24:30.705705       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I1020 12:24:30.705707       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I1020 12:24:30.705709       1 nanny.go:108] dnsmasq[14]: reading /etc/resolv.conf
I1020 12:24:30.705711       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I1020 12:24:30.705713       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I1020 12:24:30.705715       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I1020 12:24:30.705717       1 nanny.go:108] dnsmasq[14]: using nameserver 127.0.0.53#53
I1020 12:24:30.705720       1 nanny.go:108] dnsmasq[14]: read /etc/hosts - 7 addresses
I1020 12:33:07.139512       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)
I1020 12:33:17.149489       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)
I1020 12:33:27.158616       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)
I1020 12:33:37.164004       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)
I1020 12:33:47.176527       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)
I1020 12:33:57.188216       1 nanny.go:108] dnsmasq[14]: Maximum number of concurrent DNS queries reached (max: 150)

Logs from sidecar container:

ERROR: logging before flag.Parse: I1020 11:43:05.394883       1 main.go:48] Version v1.14.4-2-g5584e04
ERROR: logging before flag.Parse: I1020 11:43:05.394935       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I1020 11:43:05.394965       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I1020 11:43:05.394995       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: W1020 11:43:22.399309       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:50271->127.0.0.1:53: i/o timeout
ERROR: logging before flag.Parse: W1020 11:43:29.399631       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:50260->127.0.0.1:53: i/o timeout
ERROR: logging before flag.Parse: W1020 11:43:36.399957       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:43683->127.0.0.1:53: i/o timeout
ERROR: logging before flag.Parse: W1020 11:43:43.400251       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:38300->127.0.0.1:53: i/o timeout
ERROR: logging before flag.Parse: W1020 11:43:50.400500       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:45071->127.0.0.1:53: i/o timeout
ERROR: logging before flag.Parse: W1020 11:44:04.187002       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:60537->127.0.0.1:53: i/o timeout
...

What's interesting is that switching to coredns helps as it works but with errors!

kubectl exec -ti busybox -- nslookup kubernetes.default

Server:    10.0.0.10
Address 1: 10.0.0.10 coredns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local

Logs from kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=coredns -o name):

.:53
CoreDNS-011
2017/10/20 12:41:11 [INFO] CoreDNS-011
2017/10/20 12:41:11 [INFO] linux/amd64, go1.9, 1b60688d
linux/amd64, go1.9, 1b60688d
172.17.0.5 - [20/Oct/2017:12:44:25 +0000] "PTR IN 10.0.0.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 91 192.043µs
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 2.649373ms
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 4.830272ms
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 8.396219ms
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 8.569993ms
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 6.769748ms
127.0.0.1 - [20/Oct/2017:12:44:27 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 9.991105ms
20/Oct/2017:12:44:27 +0000 [ERROR 0 kubernetes.default. AAAA] unreachable backend: no upstream host

[ loooong spam of the same messages with SERVFAIL ]

127.0.0.1 - [20/Oct/2017:12:44:53 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 2.990799414s
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "AAAA IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 2.990947158s
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 115 39.579µs
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,rd,ra 107 51.747µs
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "A IN kubernetes.default. udp 36 false 512" SERVFAIL qr,rd 36 33.742µs
20/Oct/2017:12:44:53 +0000 [ERROR 0 kubernetes.default. A] unreachable backend: no upstream host
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 115 54.224µs
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 70 84.575µs
172.17.0.5 - [20/Oct/2017:12:44:53 +0000] "PTR IN 1.0.0.10.in-addr.arpa. udp 39 false 512" NOERROR qr,aa,rd,ra 89 84.99µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "AAAA IN grafana.com.kube-system.svc.cluster.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 112 155.98µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "A IN grafana.com.kube-system.svc.cluster.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 112 60.314µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "A IN grafana.com.svc.cluster.local. udp 47 false 512" NXDOMAIN qr,aa,rd,ra 100 78.454µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "AAAA IN grafana.com.svc.cluster.local. udp 47 false 512" NXDOMAIN qr,aa,rd,ra 100 49.168µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "A IN grafana.com.cluster.local. udp 43 false 512" NXDOMAIN qr,aa,rd,ra 96 54.093µs
172.17.0.4 - [20/Oct/2017:12:53:07 +0000] "AAAA IN grafana.com.cluster.local. udp 43 false 512" NXDOMAIN qr,aa,rd,ra 96 53.235µs
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 3.110907ms
20/Oct/2017:12:53:08 +0000 [ERROR 0 grafana.com. A] unreachable backend: no upstream host
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 4.576001ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 6.993853ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "AAAA IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 107.552µs
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "AAAA IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 2.241218ms
20/Oct/2017:12:53:08 +0000 [ERROR 0 grafana.com. AAAA] unreachable backend: no upstream host
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 8.727964ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 11.217801ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 9.548173ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "AAAA IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 2.931659ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "AAAA IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 4.744248ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 12.459702ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 14.463761ms
127.0.0.1 - [20/Oct/2017:12:53:08 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 15.324802ms
127.0.0.1 - [20/Oct/2017:12:53:09 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 2.368374381s
127.0.0.1 - [20/Oct/2017:12:53:09 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 2.368445804s
172.17.0.4 - [20/Oct/2017:12:53:09 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 2.368663172s
172.17.0.4 - [20/Oct/2017:12:53:09 +0000] "A IN grafana.com. udp 29 false 512" SERVFAIL qr,rd 29 45.614µs
20/Oct/2017:12:53:09 +0000 [ERROR 0 grafana.com. A] unreachable backend: no upstream host

When those logs happen nslookup takes longer, but it returns the correct result.

kubectl exec -ti busybox -- nslookup monitoring-grafana.kube-system

Server:    10.0.0.10
Address 1: 10.0.0.10 coredns.kube-system.svc.cluster.local

Name:      monitoring-grafana.kube-system
Address 1: 10.0.0.133 monitoring-grafana.kube-system.svc.cluster.local

So kubedns doesn't work at all and coredns works, but it's not stable.
I'll test with kubeadm bootstrapper instead of localkube to see how things go.

Edit:
kubeadm with none driver doesn't seem to work, the cluster doesn't start. I guess it's too many of experimental features activated together :) kubeadm generates certificates for 127.0.0.1, 10.0.0.1, but the components try to use my eth0 interface's IP: 192.168.42.13.

sudo minikube logs

paź 20 15:43:36 my-host kubelet[20915]: W1020 15:43:36.577232   20915 status_manager.go:431] Failed to get status for pod "kube-controller-manager-my-host_kube-system(653e629f9f8a6d3380c54427dbc4d941)": Get https://192.168.42.13:8443/api/v1/namespaces/kube-system/pods/kube-controller-manager-my-host: x509: certificate is valid for 127.0.0.1, 10.0.0.1, not 192.168.42.13
paź 20 15:43:36 my-host kubelet[20915]: W1020 15:43:36.581473   20915 status_manager.go:431] Failed to get status for pod "kube-apiserver-my-host_kube-system(81edacb80dc81e85783254aa3d65d40a)": Get https://192.168.42.13:8443/api/v1/namespaces/kube-system/pods/kube-apiserver-my-host: x509: certificate is valid for 127.0.0.1, 10.0.0.1, not 192.168.42.13
paź 20 15:43:36 my-host kubelet[20915]: E1020 15:43:36.644770   20915 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.42.13:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dmy-host&resourceVersion=0: x509: certificate is valid for 127.0.0.1, 10.0.0.1, not 192.168.42.13
paź 20 15:43:37 my-host kubelet[20915]: E1020 15:43:37.056110   20915 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://192.168.42.13:8443/api/v1/nodes?fieldSelector=metadata.name%3Dmy-host&resourceVersion=0: x509: certificate is valid for 127.0.0.1, 10.0.0.1, not 192.168.42.13
@jgoclawski

This comment has been minimized.

Copy link
Contributor

jgoclawski commented Oct 20, 2017

Ok, I think I've found the reason and the solution.

It's my /etc/resolv.conf, in Ubuntu 17.04 it contains:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53

I've ran:

sudo systemctl stop systemd-resolved
sudo systemctl disable systemd-resolved

and edited /etc/resolv.conf to contain only:

nameserver 8.8.8.8

After that the cluster and DNS works!
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

NAME                        READY     STATUS    RESTARTS   AGE
kube-dns-6fc954457d-p7sd9   3/3       Running   0          3m

kubectl exec -ti busybox -- nslookup kubernetes.default

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local

I'm not sure if it's a workaround or a solution though. The /etc/resolv.conf thing might be default for Ubuntu. Should the none driver work with the original configuration or should the configuration be changed?

@jgoclawski

This comment has been minimized.

Copy link
Contributor

jgoclawski commented Oct 20, 2017

It's the same issue as kubernetes/kubernetes#45828, so it's not minikube specific. Unless minikube can implement a workaround so that the none driver works on Ubuntu Desktop out of the box?

@Siilwyn can you check if this is the case for you as well?

@r2d4 r2d4 added the kind/bug label Oct 20, 2017

@Siilwyn

This comment has been minimized.

Copy link

Siilwyn commented Oct 23, 2017

Can confirm @jgoclawski found the underlying issue and workaround! Think it would be nice if this is solved by minikube as editing the system resolv config is not so nice.

@John-Lin

This comment has been minimized.

Copy link

John-Lin commented Nov 26, 2017

Same issue here. DNS is not working with the none driver but works when using Virtualbox driver

kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq

I1126 06:18:35.615019       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:18:45.630824       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:18:55.646374       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:05.662111       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:15.677589       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:25.693250       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:35.708908       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:45.725092       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:19:55.740858       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:05.756598       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:15.772095       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:25.788192       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:35.803835       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:45.819531       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
I1126 06:20:55.838375       1 nanny.go:108] dnsmasq[27]: Maximum number of concurrent DNS queries reached (max: 150)
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 24, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@a4abhishek

This comment has been minimized.

Copy link

a4abhishek commented Mar 19, 2018

I am also facing this issue with --vm-driver=none where a Pod fails to establish connection with the server running on web. Exact error string: Get https://<the-server-name>.io: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers).
I am waiting for any solution, suggestion, workaround.

CoreDNS didn't help here.

minikube version: v0.24.0
kubectl version: Client: 1.9.3 Server: 1.8.0

/remove-lifecycle stale

@fikin

This comment has been minimized.

Copy link

fikin commented Apr 16, 2018

follow up on @jgoclawski suggestion of disabling systemd-resolved:

when systemd-network service is in use, the only needed steps would be:

$ sudo rm /etc/resolv.conf
$ sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

with these, the dns being injected to minikube is the one from dhcp.
for reference take a look at http://xmodulo.com/switch-from-networkmanager-to-systemd-networkd.html

@jgoclawski

This comment has been minimized.

Copy link
Contributor

jgoclawski commented Apr 16, 2018

There's also a solution which doesn't involve changing host system. Instead you can disable using host's resolv.conf by applying the following config map (details: https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configmap-options):

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  upstreamNameservers: |-
    ["8.8.8.8", "8.8.4.4"]
@aclowkey

This comment has been minimized.

Copy link

aclowkey commented Jun 27, 2018

@jgoclawski This solved internet access but now my nodes can't access the services..

@altairpearl

This comment has been minimized.

Copy link

altairpearl commented Jul 5, 2018

Hmm. This solution is not working for me. I'm also facing the same issue. Unfortunately, none of the solution presented here is working for me!

Environment:

Minikube version: v0.25

OS: Ubuntu 18.04
VM Driver: none

@bw2

This comment has been minimized.

Copy link

bw2 commented Sep 9, 2018

Finally fixed this in my instance
Google Cloud VM - Unbuntu v18.04
Minikube: v0.28.1
started with

sudo -E minikube start --vm-driver=none --kubernetes-version=v1.11.3

The answers above and reading https://zwischenzugs.com/2018/08/06/anatomy-of-a-linux-dns-lookup-part-iv/ really helped.

In my case I needed to run

sudo rm /etc/resolv.conf &&  sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

and then stop/start minikube.

For reasons I'm still trying to figure out, setting core-dns upstream to 8.8.8.8 using the config below didn't help:

kind: ConfigMap
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream 8.8.8.8
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
    }
metadata:
  creationTimestamp: 2018-09-09T18:24:22Z
  name: coredns
  namespace: kube-system
  resourceVersion: "198"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns

If anyone knows why ^ doesn't fix DNS resolution issues, I'd appreciate some pointers.

@bw2 bw2 referenced this issue Sep 9, 2018

Closed

setting upstream #2089

@johnbelamaric

This comment has been minimized.

Copy link

johnbelamaric commented Sep 10, 2018

Change upstream 8.8.8.8 back to upstream

Change proxy . /etc/resolv.conf to proxy . 8.8.8.8

@tstromberg tstromberg changed the title DNS not working dnsmasq pod CrashLoopBackOff Sep 20, 2018

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 19, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 18, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment