New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredns service patch no longer works #72826

Open
GregSilverman opened this Issue Jan 11, 2019 · 35 comments

Comments

Projects
None yet
4 participants
@GregSilverman
Copy link

GregSilverman commented Jan 11, 2019

What happened:

Previously running kubeadm, kubelet 1.12.1, had to apply CoreDNS patch to fix issue that has been diagnosed as per dns-debugging-resolution

[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Applied fix to CoreDNS, as per selector of kube-dns svc does not match coredns pod

unfortunately, this fix no longer works after upgrading to kubeadm, kubelet 1.13.1

What you expected to happen:

I expect CoreDNS network to function properly

How to reproduce it (as minimally and precisely as possible):

Noted above.

Anything else we need to know?:

[gms@thalia0 ~]$ kubectl get deployment --namespace=kube-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
calico-typha           0/0     0            0           77d
coredns                2/2     2            2           5s
kubernetes-dashboard   1/1     1            1           21d

[gms@thalia0 ~]$ kubectl get services --namespace=kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
calico-typha           ClusterIP   10.101.212.32    <none>        5473/TCP        77d
kube-dns               ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   35s
kubernetes-dashboard   ClusterIP   10.106.105.232   <none>        443/TCP         21d

[gms@thalia0 ~]$ kubectl describe svc kube-dns --namespace=kube-system
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         192.168.2.20:53,192.168.3.44:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         192.168.2.20:53,192.168.3.44:53
Session Affinity:  None
Events:            <none>

[gms@thalia0 ~]$ kubectl describe deployment coredns  --namespace=kube-system
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Fri, 11 Jan 2019 09:41:10 -0600
Labels:                 k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               k8s-app=kube-dns
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:           k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       coredns/coredns:1.2.2
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   coredns-69cbb76ff8 (2/2 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m27s  deployment-controller  Scaled up replica set coredns-69cbb76ff8 to 2

Pods are these:

[gms@thalia0 ~]$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-69cbb76ff8-2px6h   1/1     Running   0          2m
coredns-69cbb76ff8-g4wbd   1/1     Running   0          2m

LOGS are as follows (and look problematic, since it is not logging anything, even after adding log to the Corefile section in the coredns configmap):

[gms@thalia0 ~]$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=coredns -o name); do kubectl logs --namespace=kube-system $p; done

End point is up and running:

[gms@thalia0 ~]$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                                       AGE
kube-dns   192.168.1.198:53,192.168.4.57:53,192.168.1.198:53 + 1 more...   6m32s

I tried the section "Are DNS queries being received/processed", but the logs seem funky, since there is nothing being logged.

Also of note, is if I do a ./force-update-deploy as per ./force-update-deployment coredns -n kube-system and delete and recreate the coredns service as outlined above, the network functions fine for a few minutes, but then it will fail again.

How can I get to the logs, given the problem noted above with log output?

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Also using Kubeadm 1.13.1 and latest Calico CNI

  • Cloud provider or hardware configuration:

VM Sphere virtualized machines in local server farm.

  • OS (e.g. from /etc/os-release):

RHEL Linux 7.6 VM

  • Kernel (e.g. uname -a):

Linux thalia0.domain 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
  • Others:

/sig network

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

/sig network

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Okay, got the coredns logs:

[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-69cbb76ff8-2px6h
.:53
2019/01/11 17:30:17 [INFO] CoreDNS-1.2.2
2019/01/11 17:30:17 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 17:30:17 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
2019/01/11 17:38:53 [INFO] Reloading
2019/01/11 17:38:53 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
2019/01/11 17:38:53 [INFO] Reloading complete
127.0.0.1:40114 - [11/Jan/2019:17:38:53 +0000] 21440 "HINFO IN 1672163436722873690.3303266828883004461. udp 57 false 512" NXDOMAIN qr,rd,ra 133 0.002981497s
[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-69cbb76ff8-g4wbd
.:53
2019/01/11 17:30:16 [INFO] CoreDNS-1.2.2
2019/01/11 17:30:16 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 17:30:16 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
2019/01/11 17:38:53 [INFO] Reloading
2019/01/11 17:38:53 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
2019/01/11 17:38:53 [INFO] Reloading complete
127.0.0.1:49829 - [11/Jan/2019:17:38:53 +0000] 32182 "HINFO IN 8948521493344047710.4981664226999757184. udp 57 false 512" NXDOMAIN qr,rd,ra 133 0.002764032s
[gms@thalia0 ~]$
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

/area dns

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 11, 2019

CoreDNS is logging the loop probe tests it sends itself (the HINFO request)... so it seems to be logging queries. So, according to logs, it did not receive any other queries during the time.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Strange, since I've run kubectl exec -ti busybox -- nslookup kubernetes.default a boatload of times.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 11, 2019

try sending queries directly to one of the coredns pods... e.g. assuming one is still 192.168.1.198

kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.1.198
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Also, looked at journalctl and did not see any outstanding errors being thrown.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

@chrisohaver, returning nothing in the logs if I do that, and still getting

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

when I do it.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Okay, deleted and recreated coredns and now getting the following:

[gms@thalia0 ~]$ kubectl run -it --rm --image=infoblox/dnstools dns-client
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
dnstools# dig kubernetes.default.svc.cluster.local

; <<>> DiG 9.11.3 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; connection timed out; no servers could be reached
dnstools# exit
Session ended, resume using 'kubectl attach dns-client-86c6d59f7-pftmc -c dns-client -i -t' command when the pod is running
deployment.apps "dns-client" deleted
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl run -it --rm --image=infoblox/dnstools dns-client
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
dnstools# dig kubernetes.default.svc.cluster.local

; <<>> DiG 9.11.3 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; connection timed out; no servers could be reached
dnstools# dig 192.168.2.36

; <<>> DiG 9.11.3 <<>> 192.168.2.36
;; global options: +cmd
;; connection timed out; no servers could be reached
dnstools# exit
Session ended, resume using 'kubectl attach dns-client-86c6d59f7-6t8qp -c dns-client -i -t' command when the pod is running
deployment.apps "dns-client" deleted
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

nothing in logs:

[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6889c8cfd9-52f2f
.:53
2019/01/11 19:24:03 [INFO] CoreDNS-1.2.2
2019/01/11 19:24:03 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 19:24:03 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6889c8cfd9-q4pll
.:53
2019/01/11 19:24:02 [INFO] CoreDNS-1.2.2
2019/01/11 19:24:02 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 19:24:02 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381

Why is it failing inside infoblox/dnstools?

Very strange.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

[gms@thalia0 ~]$ !1164
kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Okay, here it succeeds and fails again on kubernetes.default with coredns pod thrown in to start with (still nothing in the logs):

[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 11, 2019

Why is it failing inside infoblox/dnstools?

In the output you showed, the infoblox/dnstools query is using the the dns server cluster ip (10.96.0.10), and the busybox query is specifying one of the pod ips directly. That could indicate problems with cluster ip routing...

Regarding the lack of logging, I don't see the loop checks in the last set of logs you sent. Can you share your coredns configmap?

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Finally (a series of successful queries on kubernetes.default with a failure at the end -> but, this time it managed to log the output)

[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6889c8cfd9-q4pll
.:53
2019/01/11 19:24:02 [INFO] CoreDNS-1.2.2
2019/01/11 19:24:02 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 19:24:02 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
2019/01/11 19:55:03 [INFO] Reloading
2019/01/11 19:55:03 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
2019/01/11 19:55:03 [INFO] Reloading complete
[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6889c8cfd9-52f2f
.:53
2019/01/11 19:24:03 [INFO] CoreDNS-1.2.2
2019/01/11 19:24:03 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 19:24:03 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
2019/01/11 19:55:03 [INFO] Reloading
2019/01/11 19:55:03 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
2019/01/11 19:55:03 [INFO] Reloading complete
127.0.0.1:60462 - [11/Jan/2019:19:55:03 +0000] 49704 "HINFO IN 4270709671997394360.5245860839346936077. udp 57 false 512" NXDOMAIN qr,rd,ra 133 0.002886679s
192.168.2.14:47855 - [11/Jan/2019:19:55:06 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000209888s
192.168.2.14:56200 - [11/Jan/2019:19:55:36 +0000] 10 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.002435387s
192.168.2.14:54276 - [11/Jan/2019:19:55:36 +0000] 11 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000164791s
192.168.2.14:49390 - [11/Jan/2019:19:55:36 +0000] 12 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000131921s
192.168.2.14:37727 - [11/Jan/2019:19:55:36 +0000] 13 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000112218s
192.168.2.14:36380 - [11/Jan/2019:19:57:02 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000315532s
192.168.2.14:58573 - [11/Jan/2019:19:57:22 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00146996s
192.168.2.14:58422 - [11/Jan/2019:19:57:22 +0000] 9 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000174214s
192.168.2.14:57224 - [11/Jan/2019:19:57:22 +0000] 10 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000132463s
192.168.2.14:54443 - [11/Jan/2019:19:57:22 +0000] 11 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000115189s
192.168.2.14:54583 - [11/Jan/2019:19:57:56 +0000] 2 "PTR IN 36.2.168.192.in-addr.arpa. udp 43 false 512" NOERROR qr,aa,rd,ra 133 0.000235264s
192.168.2.14:42828 - [11/Jan/2019:19:57:56 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00151142s
192.168.2.14:45572 - [11/Jan/2019:19:57:56 +0000] 4 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000105042s
192.168.2.14:44668 - [11/Jan/2019:19:57:56 +0000] 5 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000128659s
192.168.2.14:36378 - [11/Jan/2019:19:57:57 +0000] 6 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001275711s
192.168.2.14:56084 - [11/Jan/2019:19:57:57 +0000] 7 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000134637s
192.168.2.14:44698 - [11/Jan/2019:19:57:57 +0000] 8 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000131808s
192.168.2.14:54349 - [11/Jan/2019:19:57:57 +0000] 9 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000117907s
192.168.2.14:32837 - [11/Jan/2019:19:58:26 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000264864s
192.168.2.14:52285 - [11/Jan/2019:19:58:26 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00009506s
192.168.2.14:50404 - [11/Jan/2019:19:58:26 +0000] 5 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000073029s
192.168.2.14:58859 - [11/Jan/2019:19:58:36 +0000] 8 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001742389s
192.168.2.14:33615 - [11/Jan/2019:19:59:01 +0000] 14 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001532826s
192.168.2.14:44879 - [11/Jan/2019:19:59:01 +0000] 15 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000205726s
192.168.2.14:42272 - [11/Jan/2019:19:59:01 +0000] 16 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000139515s
192.168.2.14:52774 - [11/Jan/2019:19:59:01 +0000] 17 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.00014s
192.168.2.14:43434 - [11/Jan/2019:20:00:41 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000231209s
192.168.2.14:54509 - [11/Jan/2019:20:00:46 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001414658s
192.168.2.14:50005 - [11/Jan/2019:20:01:01 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001526107s
192.168.2.14:42234 - [11/Jan/2019:20:01:06 +0000] 10 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00015308s
192.168.2.14:58073 - [11/Jan/2019:20:01:16 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000183647s
192.168.2.14:44840 - [11/Jan/2019:20:01:26 +0000] 2 "PTR IN 36.2.168.192.in-addr.arpa. udp 43 false 512" NOERROR qr,aa,rd,ra 133 0.000248318s
192.168.2.14:51095 - [11/Jan/2019:20:01:26 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001661274s
192.168.2.14:60299 - [11/Jan/2019:20:01:26 +0000] 4 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000223807s
192.168.2.14:52968 - [11/Jan/2019:20:01:26 +0000] 5 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000135078s
192.168.2.14:50590 - [11/Jan/2019:20:01:26 +0000] 6 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000125131s
192.168.2.14:38573 - [11/Jan/2019:20:01:26 +0000] 7 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000156165s
192.168.2.14:33552 - [11/Jan/2019:20:01:26 +0000] 8 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000123416s
192.168.2.14:53852 - [11/Jan/2019:20:01:26 +0000] 9 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000129028s
[gms@thalia0 ~]$
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Another series of tests with odd results (several successful busybox -- nslookup kubernetes.default lookups, followed by a failure, followed by a success in infoblox/dnstools followed by another failure on busybox):

[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default 192.168.2.36
Server:    192.168.2.36
Address 1: 192.168.2.36 192-168-2-36.kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
[gms@thalia0 ~]$ kubectl run -it --rm --image=infoblox/dnstools dns-client
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
dnstools# dig kubernetes.default.svc.cluster.local

; <<>> DiG 9.11.3 <<>> kubernetes.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16697
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: e7ff9985742946ea (echoed)
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 5	IN A	10.96.0.1

;; Query time: 52 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri Jan 11 20:09:33 UTC 2019
;; MSG SIZE  rcvd: 129

dnstools# exit
Session ended, resume using 'kubectl attach dns-client-86c6d59f7-rwfjb -c dns-client -i -t' command when the pod is running
deployment.apps "dns-client" deleted
[gms@thalia0 ~]$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6889c8cfd9-52f2f
.:53
2019/01/11 19:24:03 [INFO] CoreDNS-1.2.2
2019/01/11 19:24:03 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/11 19:24:03 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
2019/01/11 19:55:03 [INFO] Reloading
2019/01/11 19:55:03 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
2019/01/11 19:55:03 [INFO] Reloading complete
127.0.0.1:60462 - [11/Jan/2019:19:55:03 +0000] 49704 "HINFO IN 4270709671997394360.5245860839346936077. udp 57 false 512" NXDOMAIN qr,rd,ra 133 0.002886679s
192.168.2.14:47855 - [11/Jan/2019:19:55:06 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000209888s
192.168.2.14:56200 - [11/Jan/2019:19:55:36 +0000] 10 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.002435387s
192.168.2.14:54276 - [11/Jan/2019:19:55:36 +0000] 11 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000164791s
192.168.2.14:49390 - [11/Jan/2019:19:55:36 +0000] 12 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000131921s
192.168.2.14:37727 - [11/Jan/2019:19:55:36 +0000] 13 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000112218s
192.168.2.14:36380 - [11/Jan/2019:19:57:02 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000315532s
192.168.2.14:58573 - [11/Jan/2019:19:57:22 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00146996s
192.168.2.14:58422 - [11/Jan/2019:19:57:22 +0000] 9 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000174214s
192.168.2.14:57224 - [11/Jan/2019:19:57:22 +0000] 10 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000132463s
192.168.2.14:54443 - [11/Jan/2019:19:57:22 +0000] 11 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000115189s
192.168.2.14:54583 - [11/Jan/2019:19:57:56 +0000] 2 "PTR IN 36.2.168.192.in-addr.arpa. udp 43 false 512" NOERROR qr,aa,rd,ra 133 0.000235264s
192.168.2.14:42828 - [11/Jan/2019:19:57:56 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00151142s
192.168.2.14:45572 - [11/Jan/2019:19:57:56 +0000] 4 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000105042s
192.168.2.14:44668 - [11/Jan/2019:19:57:56 +0000] 5 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000128659s
192.168.2.14:36378 - [11/Jan/2019:19:57:57 +0000] 6 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001275711s
192.168.2.14:56084 - [11/Jan/2019:19:57:57 +0000] 7 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000134637s
192.168.2.14:44698 - [11/Jan/2019:19:57:57 +0000] 8 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000131808s
192.168.2.14:54349 - [11/Jan/2019:19:57:57 +0000] 9 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000117907s
192.168.2.14:32837 - [11/Jan/2019:19:58:26 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000264864s
192.168.2.14:52285 - [11/Jan/2019:19:58:26 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00009506s
192.168.2.14:50404 - [11/Jan/2019:19:58:26 +0000] 5 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000073029s
192.168.2.14:58859 - [11/Jan/2019:19:58:36 +0000] 8 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001742389s
192.168.2.14:33615 - [11/Jan/2019:19:59:01 +0000] 14 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001532826s
192.168.2.14:44879 - [11/Jan/2019:19:59:01 +0000] 15 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000205726s
192.168.2.14:42272 - [11/Jan/2019:19:59:01 +0000] 16 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000139515s
192.168.2.14:52774 - [11/Jan/2019:19:59:01 +0000] 17 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.00014s
192.168.2.14:43434 - [11/Jan/2019:20:00:41 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000231209s
192.168.2.14:54509 - [11/Jan/2019:20:00:46 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001414658s
192.168.2.14:50005 - [11/Jan/2019:20:01:01 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001526107s
192.168.2.14:42234 - [11/Jan/2019:20:01:06 +0000] 10 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00015308s
192.168.2.14:58073 - [11/Jan/2019:20:01:16 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000183647s
192.168.2.14:44840 - [11/Jan/2019:20:01:26 +0000] 2 "PTR IN 36.2.168.192.in-addr.arpa. udp 43 false 512" NOERROR qr,aa,rd,ra 133 0.000248318s
192.168.2.14:51095 - [11/Jan/2019:20:01:26 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001661274s
192.168.2.14:60299 - [11/Jan/2019:20:01:26 +0000] 4 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000223807s
192.168.2.14:52968 - [11/Jan/2019:20:01:26 +0000] 5 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000135078s
192.168.2.14:50590 - [11/Jan/2019:20:01:26 +0000] 6 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000125131s
192.168.2.14:38573 - [11/Jan/2019:20:01:26 +0000] 7 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000156165s
192.168.2.14:33552 - [11/Jan/2019:20:01:26 +0000] 8 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000123416s
192.168.2.14:53852 - [11/Jan/2019:20:01:26 +0000] 9 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000129028s
192.168.2.14:43468 - [11/Jan/2019:20:05:35 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.00029125s
192.168.2.14:47525 - [11/Jan/2019:20:05:35 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001429584s
192.168.2.14:49790 - [11/Jan/2019:20:05:55 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001478868s
192.168.2.14:60787 - [11/Jan/2019:20:05:55 +0000] 9 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000154265s
192.168.2.14:44994 - [11/Jan/2019:20:06:00 +0000] 11 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000164742s
192.168.2.14:53444 - [11/Jan/2019:20:06:05 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.0001582s
192.168.2.14:40238 - [11/Jan/2019:20:06:05 +0000] 14 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000156233s
192.168.2.14:53873 - [11/Jan/2019:20:06:05 +0000] 15 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.00024s
192.168.2.14:44172 - [11/Jan/2019:20:06:05 +0000] 16 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000139795s
192.168.2.14:53909 - [11/Jan/2019:20:06:27 +0000] 4 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000250797s
192.168.2.14:39336 - [11/Jan/2019:20:06:27 +0000] 5 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001567874s
192.168.2.14:60861 - [11/Jan/2019:20:06:32 +0000] 7 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000164834s
192.168.2.14:35427 - [11/Jan/2019:20:06:37 +0000] 9 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000119195s
192.168.2.14:54100 - [11/Jan/2019:20:06:37 +0000] 10 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000121608s
192.168.2.14:57368 - [11/Jan/2019:20:06:37 +0000] 11 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000106026s
192.168.2.14:50120 - [11/Jan/2019:20:06:37 +0000] 12 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001266445s
192.168.2.14:59756 - [11/Jan/2019:20:06:37 +0000] 13 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000124698s
192.168.2.14:47529 - [11/Jan/2019:20:06:42 +0000] 15 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000123233s
192.168.2.14:44369 - [11/Jan/2019:20:06:42 +0000] 16 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000071767s
192.168.2.14:56731 - [11/Jan/2019:20:06:52 +0000] 19 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000220947s
192.168.2.14:51886 - [11/Jan/2019:20:06:52 +0000] 20 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000121301s
192.168.2.14:57990 - [11/Jan/2019:20:06:52 +0000] 21 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000161499s
192.168.2.14:55812 - [11/Jan/2019:20:06:57 +0000] 23 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000305519s
192.168.2.14:42228 - [11/Jan/2019:20:07:02 +0000] 2 "PTR IN 36.2.168.192.in-addr.arpa. udp 43 false 512" NOERROR qr,aa,rd,ra 133 0.000301294s
192.168.2.14:41158 - [11/Jan/2019:20:07:02 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001878896s
192.168.2.14:40857 - [11/Jan/2019:20:07:02 +0000] 4 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000112098s
192.168.2.14:55953 - [11/Jan/2019:20:07:02 +0000] 5 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,rd,ra 147 0.000107708s
192.168.2.14:48482 - [11/Jan/2019:20:07:02 +0000] 6 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000095162s
192.168.2.14:43019 - [11/Jan/2019:20:07:02 +0000] 7 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000116078s
192.168.2.14:60028 - [11/Jan/2019:20:07:02 +0000] 8 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000191406s
192.168.2.14:40950 - [11/Jan/2019:20:07:02 +0000] 9 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000195527s
192.168.2.14:45593 - [11/Jan/2019:20:07:12 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000262842s
192.168.2.14:54143 - [11/Jan/2019:20:07:22 +0000] 6 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000124749s
192.168.2.14:33393 - [11/Jan/2019:20:07:22 +0000] 7 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.0001475s
192.168.2.14:39729 - [11/Jan/2019:20:07:47 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001429512s
192.168.2.14:56638 - [11/Jan/2019:20:07:47 +0000] 14 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000168166s
192.168.2.14:60282 - [11/Jan/2019:20:07:47 +0000] 15 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000138791s
192.168.2.14:45276 - [11/Jan/2019:20:07:47 +0000] 16 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000108677s
192.168.2.14:45619 - [11/Jan/2019:20:08:40 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000204904s
192.168.2.14:34659 - [11/Jan/2019:20:08:45 +0000] 5 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001605689s
192.168.2.14:38615 - [11/Jan/2019:20:08:45 +0000] 6 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000175882s
192.168.2.14:57820 - [11/Jan/2019:20:08:55 +0000] 9 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00015803s
192.168.2.14:55546 - [11/Jan/2019:20:09:00 +0000] 11 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001585091s
192.168.2.37:48975 - [11/Jan/2019:20:09:33 +0000] 16697 "A IN kubernetes.default.svc.cluster.local. udp 78 false 4096" NOERROR qr,aa,rd,ra 130 0.00044174s
192.168.2.14:35740 - [11/Jan/2019:20:09:49 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000237054s
192.168.2.14:59796 - [11/Jan/2019:20:09:59 +0000] 6 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001512927s
192.168.2.14:41626 - [11/Jan/2019:20:09:59 +0000] 7 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000132716s
192.168.2.14:48518 - [11/Jan/2019:20:09:59 +0000] 8 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000158525s
192.168.2.14:36314 - [11/Jan/2019:20:09:59 +0000] 9 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001368413s
192.168.2.14:49143 - [11/Jan/2019:20:09:59 +0000] 10 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000170507s
192.168.2.14:52629 - [11/Jan/2019:20:10:09 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000137391s
[gms@thalia0 ~]$
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

@chrisohaver: when I deleted and recreated coredns the last time, I forgot to add the log option to the config map. In any case, hopefully the above tests are sufficient.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

One last set of tests, tail end of logs (success in both busybox and dnstools, then failure in both):

192.168.2.14:47252 - [11/Jan/2019:20:19:42 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000407966s
192.168.2.14:37208 - [11/Jan/2019:20:19:47 +0000] 5 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001776637s
192.168.2.14:45598 - [11/Jan/2019:20:19:52 +0000] 7 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000155462s
192.168.2.14:36262 - [11/Jan/2019:20:19:52 +0000] 8 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000174415s
192.168.2.14:37797 - [11/Jan/2019:20:19:52 +0000] 9 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000146563s
192.168.2.14:54337 - [11/Jan/2019:20:19:57 +0000] 11 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001426681s
192.168.2.14:51082 - [11/Jan/2019:20:19:57 +0000] 12 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000133651s
192.168.2.14:58944 - [11/Jan/2019:20:20:02 +0000] 14 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000165492s
192.168.2.14:47210 - [11/Jan/2019:20:20:02 +0000] 15 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,rd,ra 155 0.000193583s
192.168.2.14:52630 - [11/Jan/2019:20:20:02 +0000] 16 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000205056s
192.168.2.14:59095 - [11/Jan/2019:20:20:07 +0000] 18 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000241136s
192.168.2.14:48118 - [11/Jan/2019:20:21:28 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000225607s
192.168.2.14:39754 - [11/Jan/2019:20:21:28 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00126334s
192.168.2.14:46424 - [11/Jan/2019:20:21:33 +0000] 6 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00013366s
192.168.2.14:55198 - [11/Jan/2019:20:21:38 +0000] 8 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000152765s
192.168.2.14:54052 - [11/Jan/2019:20:21:43 +0000] 10 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00010726s
192.168.2.14:54118 - [11/Jan/2019:20:21:53 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001830805s
192.168.2.14:56919 - [11/Jan/2019:20:21:58 +0000] 15 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000171903s
192.168.2.14:60790 - [11/Jan/2019:20:22:13 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000176781s
@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Here's the config map, all the same:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-01-11T19:05:02Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "8531225"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: cc7cdb48-15d3-11e9-93c2-00505689c0e9
@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 11, 2019

The logs look sane. That is, each response that coredns receives in the log appears to have a correct answer. It is as if some queries are occasionally being dropped by something in between.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 11, 2019

Yep, that was my impression, too. Any ideas would be most welcome. This is rather odd, and quite frustrating.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 14, 2019

Hi,
One thing I've noticed that is consistent with the logs when I have success/failure is that as in the chunk below (this could just be my imagination, of course!):

192.168.4.79:38828 - [14/Jan/2019:16:05:51 +0000] 10 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 494 0.018963613s
192.168.4.79:39181 - [14/Jan/2019:16:05:51 +0000] 11 "PTR IN 142.192.58.216.in-addr.arpa. udp 45 false 512" NOERROR qr,rd,ra 373 0.021713776s
192.168.4.79:59494 - [14/Jan/2019:16:07:02 +0000] 5 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000258788s
192.168.4.79:33623 - [14/Jan/2019:16:07:22 +0000] 10 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.002741917s
192.168.4.79:43456 - [14/Jan/2019:16:07:22 +0000] 11 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000199224s
192.168.4.79:44650 - [14/Jan/2019:16:07:22 +0000] 12 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000201779s
192.168.4.79:37345 - [14/Jan/2019:16:07:22 +0000] 13 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000198628s
192.168.4.79:38355 - [14/Jan/2019:16:08:10 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000318891s
192.168.4.79:38896 - [14/Jan/2019:16:08:10 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.002506539s
192.168.4.79:48501 - [14/Jan/2019:16:08:10 +0000] 5 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000178561s
192.168.4.79:40590 - [14/Jan/2019:16:08:10 +0000] 6 "AAAA IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 147 0.000302279s
192.168.4.79:39280 - [14/Jan/2019:16:08:15 +0000] 8 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001564481s
192.168.4.79:58012 - [14/Jan/2019:16:08:15 +0000] 9 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000195199s
192.168.4.79:48918 - [14/Jan/2019:16:08:15 +0000] 10 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000229766s
192.168.4.79:38788 - [14/Jan/2019:16:08:15 +0000] 11 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000177701s
192.168.4.79:53421 - [14/Jan/2019:17:03:09 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000551386s
192.168.4.79:45552 - [14/Jan/2019:17:03:29 +0000] 7 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 494 0.001572149s
192.168.4.79:56198 - [14/Jan/2019:17:03:29 +0000] 8 "PTR IN 142.192.58.216.in-addr.arpa. udp 45 false 512" NOERROR qr,rd,ra 373 0.001702642s
192.168.4.79:59566 - [14/Jan/2019:17:04:00 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000395578s
192.168.4.79:50373 - [14/Jan/2019:17:04:00 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00142514s
192.168.4.79:53560 - [14/Jan/2019:17:04:00 +0000] 5 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.00034735s
192.168.4.79:51952 - [14/Jan/2019:17:04:05 +0000] 7 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000270913s
192.168.4.79:56045 - [14/Jan/2019:17:04:10 +0000] 9 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000212269s
192.168.4.79:50599 - [14/Jan/2019:17:04:15 +0000] 11 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000136018s
192.168.4.79:34197 - [14/Jan/2019:17:04:20 +0000] 13 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.00150507s
192.168.4.79:50852 - [14/Jan/2019:17:04:20 +0000] 14 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.00021623s
192.168.4.79:33514 - [14/Jan/2019:17:04:20 +0000] 15 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000236457s
192.168.4.79:37923 - [14/Jan/2019:17:04:20 +0000] 16 "PTR IN 1.0.96.10.in-addr.arpa. udp 40 false 512" NOERROR qr,aa,rd,ra 112 0.000180532s
192.168.4.79:44800 - [14/Jan/2019:17:16:52 +0000] 3 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.000668782s
192.168.4.79:58125 - [14/Jan/2019:17:16:52 +0000] 4 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001679949s
192.168.4.79:60192 - [14/Jan/2019:17:17:02 +0000] 7 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000166459s
192.168.4.79:43872 - [14/Jan/2019:17:17:02 +0000] 8 "AAAA IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000276446s
192.168.4.79:55649 - [14/Jan/2019:17:17:51 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.00021697s
192.168.4.79:38598 - [14/Jan/2019:17:17:51 +0000] 3 "AAAA IN google.com. udp 28 false 512" NOERROR qr,rd,ra 506 0.001487202s
192.168.4.79:56232 - [14/Jan/2019:17:18:06 +0000] 7 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 494 0.00175414s
192.168.4.79:38977 - [14/Jan/2019:17:18:06 +0000] 8 "PTR IN e.0.0.2.0.0.0.0.0.0.0.0.0.0.0.0.c.0.8.0.9.0.0.4.0.b.8.f.7.0.6.2.ip6.arpa. udp 90 false 512" NOERROR qr,rd,ra 387 0.023146506s
192.168.4.79:60763 - [14/Jan/2019:17:18:06 +0000] 9 "PTR IN 46.4.217.172.in-addr.arpa. udp 43 false 512" NOERROR qr,rd,ra 400 0.001736294s
192.168.4.79:57393 - [14/Jan/2019:17:23:33 +0000] 10 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 494 0.001697884s
192.168.4.79:47826 - [14/Jan/2019:17:23:38 +0000] 12 "PTR IN 78.9.217.172.in-addr.arpa. udp 43 false 512" NOERROR qr,rd,ra 340 0.001958985s
192.168.4.79:57576 - [14/Jan/2019:17:24:12 +0000] 2 "PTR IN 10.0.96.10.in-addr.arpa. udp 41 false 512" NOERROR qr,aa,rd,ra 116 0.00042196s
192.168.4.79:49082 - [14/Jan/2019:17:24:12 +0000] 3 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001420902s
192.168.4.79:40461 - [14/Jan/2019:17:24:17 +0000] 5 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.000243563s
192.168.4.79:51345 - [14/Jan/2019:17:24:42 +0000] 11 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 112 0.001443778s
192.168.4.79:49850 - [14/Jan/2019:17:24:42 +0000] 12 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd,ra 155 0.000139859s

the first two sets of queries to google.com and kubernetes.default, both are successful. But, for the third query to google.com and then kubernetes.default, only google.com was successful. What makes this regular, is that when things start going south, I get a logged response like this:

192.168.4.79:38977 - [14/Jan/2019:17:18:06 +0000] 8 "PTR IN e.0.0.2.0.0.0.0.0.0.0.0.0.0.0.0.c.0.8.0.9.0.0.4.0.b.8.f.7.0.6.2.ip6.arpa. udp 90 false 512" NOERROR qr,rd,ra 387 0.023146506s

Is this a possible routing table error/issue?

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 14, 2019

nslookup also does reverse lookups of the name server ip used in the query, so it can display it in the output (afaik no other reason than for pretty output).

In the log entry you highlighted, for some reason nslookup is doing a reverse lookup on an ipv6 address, as if nslookup was directed to use an upstream ipv6 address as the nameserver.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 14, 2019

Everything is default-out-of-the-box, so I'm not sure why that would be happening.

Things are still flakey. I'm not really sure how to proceed in troubleshooting this.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 14, 2019

I would recommend trying to troubleshoot with dig ... I think nslookup does "extra stuff" in various situations, which could be obfuscating the issue. It may be more obvious to see whats going on with dig.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 15, 2019

Got it.

Ran this a couple times, and got:

[gms@thalia0 ~]$ kubectl logs --namespace=kube-system coredns-6c84688bd7-7zjf5
.:53
2019/01/14 17:57:42 [INFO] CoreDNS-1.2.2
2019/01/14 17:57:42 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2019/01/14 17:57:42 [INFO] plugin/reload: Running configuration MD5 = 0f335252e2f71668b22f1bf6b3ef9c83
127.0.0.1:56592 - [14/Jan/2019:17:57:42 +0000] 49579 "HINFO IN 8082436276235276212.2364019416020434515. udp 57 false 512" NXDOMAIN qr,rd,ra 133 0.003246461s
192.168.2.61:48431 - [14/Jan/2019:22:57:27 +0000] 1171 "A IN google.com. udp 52 false 4096" NOERROR qr,rd,ra 506 0.001882405s
192.168.2.61:40684 - [14/Jan/2019:22:57:53 +0000] 13275 "A IN google.com. udp 52 false 4096" NOERROR qr,rd,ra 518 0.000164405s
192.168.2.61:44905 - [14/Jan/2019:22:58:53 +0000] 8230 "A IN kubernetes.default. udp 60 false 4096" NXDOMAIN qr,rd,ra 124 0.001362051s
192.168.2.61:56864 - [14/Jan/2019:22:59:21 +0000] 22386 "A IN kubernetes.default. udp 60 false 4096" NXDOMAIN qr,rd,ra 136 0.000186676s

The interesting thing is that this was logged on the other coredns pod, which until this last test had not been logging anything. Everything till this test was going into the other coredns pod's log.

This seems to be up and down, just tried it, it was timing out and then tried 2-minutes later and got:

[gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig google.com

; <<>> DiG 9.11.3 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39607
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		10	IN	A	216.58.192.174

;; AUTHORITY SECTION:
google.com.		10	IN	NS	ns4.google.com.
google.com.		10	IN	NS	ns2.google.com.
google.com.		10	IN	NS	ns1.google.com.
google.com.		10	IN	NS	ns3.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.		10	IN	A	216.239.34.10
ns1.google.com.		10	IN	AAAA	2001:4860:4802:32::a
ns4.google.com.		10	IN	AAAA	2001:4860:4802:38::a
ns4.google.com.		10	IN	A	216.239.38.10
ns3.google.com.		10	IN	A	216.239.36.10
ns1.google.com.		10	IN	A	216.239.32.10
ns2.google.com.		10	IN	AAAA	2001:4860:4802:34::a
ns3.google.com.		10	IN	AAAA	2001:4860:4802:36::a

;; Query time: 2 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Jan 15 01:27:46 UTC 2019
;; MSG SIZE  rcvd: 505

[gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig kubernetes.default

; <<>> DiG 9.11.3 <<>> kubernetes.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 9982
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2019011401 1800 900 604800 86400

;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Jan 15 01:27:52 UTC 2019
;; MSG SIZE  rcvd: 122

And now again, one minute later:

gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig kubernetes.default

; <<>> DiG 9.11.3 <<>> kubernetes.default
;; global options: +cmd
;; connection timed out; no servers could be reached
command terminated with exit code 9

Very strange. We are looking at potential networking problems here. Between nodes we haven't been able to identify anything though.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 15, 2019

You could drop the replicas to 1 during troubleshooting so you don't have to deal with 2 sets of logs.

One thing about dig, it does not use search domains by default, so a query for "kubernetes.default" will return an NXDOMAIN... because its searching literally for "kubernetes.default". The +search option tells dig to use the search domains e.g. dig kubernetes.default +search

Do you have the kind of periodic timeouts when querying the upstream DNS directly? e.g.

kubectl exec -ti dnstools -- dig @your-upstream-ip google.com 

If you schedule dnstools pod on the same node as the coredns pod, do you still see the timeouts? (would help to have just 1 coredns replica in this case)

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 15, 2019

Thanks, I'll try this later today.

By upstream-ip, do you mean that of my CoreDNS server @10.96.0.10 or company DNS service?

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 15, 2019

By upstream, i mean upstream of Coredns - i.e. the server to which coredns fowards requests that are not in "cluster.local". In kubeadm created clusters, this defaults to be the same nameserver that is in /etc/resolv.conf of the nodes.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 15, 2019

Hi, I get this working consistently:

[gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig @128.101.101.101 google.com

; <<>> DiG 9.11.3 <<>> @128.101.101.101 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64058
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		12	IN	A	172.217.8.174

;; AUTHORITY SECTION:
google.com.		154085	IN	NS	ns1.google.com.
google.com.		154085	IN	NS	ns2.google.com.
google.com.		154085	IN	NS	ns3.google.com.
google.com.		154085	IN	NS	ns4.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.		326909	IN	A	216.239.32.10
ns1.google.com.		154085	IN	AAAA	2001:4860:4802:32::a
ns2.google.com.		326909	IN	A	216.239.34.10
ns2.google.com.		154085	IN	AAAA	2001:4860:4802:34::a
ns3.google.com.		326909	IN	A	216.239.36.10
ns3.google.com.		154085	IN	AAAA	2001:4860:4802:36::a
ns4.google.com.		326909	IN	A	216.239.38.10
ns4.google.com.		154085	IN	AAAA	2001:4860:4802:38::a

;; Query time: 1 msec
;; SERVER: 128.101.101.101#53(128.101.101.101)
;; WHEN: Tue Jan 15 19:53:55 UTC 2019
;; MSG SIZE  rcvd: 303

It is not being logged in the coredns pods, but I assume since I am bypassing "cluster.local", that would be why.

If there was a way to have my queries bypass coredns upstream, that would be ideal. In any case, I'll try moving dnstools to the same nodes that the coredns pods are on next.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 16, 2019

I put the dnstools pod on same as one of the coredns pods last night. It was working fine then and is still now:

gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig @10.96.0.10 kubernetes.default

; <<>> DiG 9.11.3 <<>> @10.96.0.10 kubernetes.default
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 41240
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kubernetes.default.		IN	A

;; AUTHORITY SECTION:
.			30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2019011600 1800 900 604800 86400

;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Wed Jan 16 14:54:42 UTC 2019
;; MSG SIZE  rcvd: 122


[gms@thalia0 ~]$ kubectl exec -ti dnstools -- dig @10.96.0.10 google.com


; <<>> DiG 9.11.3 <<>> @10.96.0.10 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7417
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		30	IN	A	172.217.4.46

;; AUTHORITY SECTION:
google.com.		30	IN	NS	ns3.google.com.
google.com.		30	IN	NS	ns1.google.com.
google.com.		30	IN	NS	ns2.google.com.
google.com.		30	IN	NS	ns4.google.com.

;; ADDITIONAL SECTION:
ns4.google.com.		30	IN	AAAA	2001:4860:4802:38::a
ns1.google.com.		30	IN	A	216.239.32.10
ns2.google.com.		30	IN	A	216.239.34.10
ns1.google.com.		30	IN	AAAA	2001:4860:4802:32::a
ns2.google.com.		30	IN	AAAA	2001:4860:4802:34::a
ns4.google.com.		30	IN	A	216.239.38.10
ns3.google.com.		30	IN	A	216.239.36.10
ns3.google.com.		30	IN	AAAA	2001:4860:4802:36::a

;; Query time: 2 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Wed Jan 16 14:50:20 UTC 2019
;; MSG SIZE  rcvd: 505

Any suggestions for next steps in isolating where the problem may be? Thanks!

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 16, 2019

Could be this long standing issue ... #56903
There are various workaround in that issue, some work for some people, some do not.

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 16, 2019

Symptoms are there when I move dnstools to a non-coredns node:

kubectl exec -ti dnstools -- time dig @10.96.0.10 kubernetes.default

; <<>> DiG 9.11.3 <<>> @10.96.0.10 kubernetes.default
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
Command exited with non-zero status 9
real	0m 15.01s
user	0m 0.00s
sys	0m 0.00s
command terminated with exit code 9

I'll poke around as per recommendations on linked issue. Let's hope for the best!

@vainu-arto

This comment has been minimized.

Copy link
Contributor

vainu-arto commented Jan 18, 2019

@GregSilverman Make sure you are running a version of Calico which contains this PR: projectcalico/felix#1901, it has a workaround/mitigation applied for the kernel NAT race conditions (these have been diagnosed as the cause of DNS problems in our clusters).

@GregSilverman

This comment has been minimized.

Copy link

GregSilverman commented Jan 18, 2019

@vainu-arto, as per support for NF_NAT_RANGE_PROTO_RANDOM_FULLY in SNAT rules "There is a race condition in a netfilter code where one of simultaneous connections to the same dest host/port get rejected. This is visible, when DNS requests are made and searchdomains are configured, then multiple simultaneous DNS requests are done, some of which got rejected and have to be retried."

Do the rejected requests get logged? How exactly did you diagnose that this was the issue? Thanks!

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 18, 2019

IIUC, that fix is related to NAT source port collision, and only occurs when there are multiple simultaneous requests on the service. The colliding requests would not get logged in CoreDNS because they never get that far. The chance of this happening would increase with more query volume. In this issue however the logs are pretty quiet. That suggests the only queries occurring during the failures are manual digs from a single pod. I don't think that would be enough volume to see this issue occur at its apparent frequency, but still worth trying to see if it helps.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Jan 18, 2019

This is visible when DNS requests are made and searchdomains are configured, then multiple simultaneous DNS requests are done...

Search domains are queried serially, in order that they appear in /etc/resolv.conf, one at a time. For each domain, the client waits for a response or timeout before trying the next domain. Not in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment