Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh deploy with CoreDNS not resolving any dns lookup #1776

Closed
bramvdklinkenberg opened this issue Sep 6, 2019 · 12 comments
Closed

Fresh deploy with CoreDNS not resolving any dns lookup #1776

bramvdklinkenberg opened this issue Sep 6, 2019 · 12 comments

Comments

@bramvdklinkenberg
Copy link

@bramvdklinkenberg bramvdklinkenberg commented Sep 6, 2019

#1247 Versions

kubeadm version (use kubeadm version): 1.14.6

Environment:

  • Kubernetes version (use kubectl version): 1.14.6
  • Cloud provider or hardware configuration: Google
  • OS (e.g. from /etc/os-release): ubuntu 16.04
  • Kernel (e.g. uname -a): 4.15.0-1041-gcp
  • Others:

What happened?

I deployed a cluster with kubeadm including calico on google cloud vm's and I cannot resolve any dns.

What you expected to happen?

That services are being resolved.

How to reproduce it (as minimally and precisely as possible)?

The installation:

kubeadm init --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml

Trying to resolve the kubernetes svc in the default namespace:

$ kubectl exec -it busybox nslookup kubernetes.default.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
command terminated with exit code 1

Anything else we need to know?

resolve.conf

$ kubectl exec -it busybox cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local fullstaq c.xifeo-cka-training-fullstaq.internal google.internal
options ndots:5

Logs of coredns pods:

.:53
2019-09-06T20:58:31.168Z [INFO] CoreDNS-1.3.1
2019-09-06T20:58:31.168Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-06T20:58:31.168Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
.:53
2019-09-06T20:58:34.109Z [INFO] CoreDNS-1.3.1
2019-09-06T20:58:34.109Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-06T20:58:34.109Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669

resources:

$ kubectl -n kube-system get pod,svc,ep -o wide
NAME                                                    READY   STATUS    RESTARTS   AGE     IP                NODE                         NOMINATED NODE   READINESS GATES
pod/calico-kube-controllers-f9dbcb664-zgvd7             1/1     Running   0          8m28s   192.168.184.129   bvandenklinkenberg-worker1   <none>           <none>
pod/calico-node-64vk9                                   1/1     Running   0          8m28s   10.0.0.74         bvandenklinkenberg-worker1   <none>           <none>
pod/calico-node-n79cr                                   1/1     Running   0          8m28s   10.0.0.106        bvandenklinkenberg-worker2   <none>           <none>
pod/calico-node-pfzzm                                   1/1     Running   0          8m28s   10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/coredns-584795fc57-r2gvj                            1/1     Running   0          13m     192.168.170.1     bvandenklinkenberg-master    <none>           <none>
pod/coredns-584795fc57-stf42                            1/1     Running   0          13m     192.168.184.130   bvandenklinkenberg-worker1   <none>           <none>
pod/etcd-bvandenklinkenberg-master                      1/1     Running   0          12m     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-apiserver-bvandenklinkenberg-master            1/1     Running   0          12m     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-controller-manager-bvandenklinkenberg-master   1/1     Running   0          12m     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-proxy-lprbs                                    1/1     Running   0          13m     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-proxy-t5qqf                                    1/1     Running   0          10m     10.0.0.106        bvandenklinkenberg-worker2   <none>           <none>
pod/kube-proxy-xtc89                                    1/1     Running   0          10m     10.0.0.74         bvandenklinkenberg-worker1   <none>           <none>
pod/kube-scheduler-bvandenklinkenberg-master            1/1     Running   0          12m     10.0.0.93         bvandenklinkenberg-master    <none>           <none>

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   13m   k8s-app=kube-dns

NAME                                ENDPOINTS                                                          AGE
endpoints/kube-controller-manager   <none>                                                             13m
endpoints/kube-dns                  192.168.170.1:53,192.168.184.130:53,192.168.170.1:53 + 3 more...   13m
endpoints/kube-scheduler            <none>                                                             13m
@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 6, 2019

I also can't reach the internet from within a pod:

@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 6, 2019

The busybox image i test with has version 1.28.4

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Sep 6, 2019

try a different image - e.g. debian:stretch, busybox is known to have dns issues.

@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

I also tried debian, but I cannot reach the internet. So I cannot deploy dnsutils. And busybox v1.28.4 works on a AKS cluster.

@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

For 15 trainees we have 3 vm's per trainee, but they are all on the same subnet. Could that be causing these resolving issues?

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Sep 7, 2019

the same subnet for N VMs should not be an issue. it seems to me that this is GCE specific. what happens if you delete the coredns pods on the problem GCE cluster and try nslookup again?

a large chunk of the issues we get are actually caused by networking.
i don't think this is a coredns or a kubeadm issue, but if you confirm otherwise, please either open a new issue in the coredns repository or re-open this one for kubeadm.

there are a lot of really helpful people on stack overflow, reddit and in the #kubeadm k8s slack channel that probably already run this setup!

i'm going to close this, but we can continue the discussion.
thanks.

@neolit123 neolit123 closed this Sep 7, 2019
@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

Hi @neolit123 , I created 2 kubeadm clusters on 6vm's in the same subnet on Azure. Same way as I did on gcloud. Same issue.

The first cluster has no issue of resolving dns but it is only the clusters after the 1st one having issues resolving dns

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Sep 7, 2019

please try:

what happens if you delete the coredns pods on the problem GCE cluster and try nslookup again?

The first cluster has no issue of resolving dns but it is only the clusters after the 1st one having issues resolving dns

the test clusters i create, always have nodes in the same subnetwork, so not sure how subnetworks are a problem here.

@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

removing the coredns pods doesn't solve the issue.

So I have nodes in a 10.0.0.0/8 network.
I create a master with

kubeadm init --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml

I join the 2 workers and it works all fine.

Then I create another master and workers the exact same way on 3 other nodes, but dns resolving doesn't work anymore.

NAME                                                    READY   STATUS    RESTARTS   AGE     IP                NODE                         NOMINATED NODE   READINESS GATES
pod/calico-kube-controllers-f9dbcb664-zgvd7             1/1     Running   0          14h     192.168.184.129   bvandenklinkenberg-worker1   <none>           <none>
pod/calico-node-64vk9                                   1/1     Running   0          14h     10.0.0.74         bvandenklinkenberg-worker1   <none>           <none>
pod/calico-node-n79cr                                   1/1     Running   0          14h     10.0.0.106        bvandenklinkenberg-worker2   <none>           <none>
pod/calico-node-pfzzm                                   1/1     Running   0          14h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/coredns-584795fc57-8gccx                            1/1     Running   0          3m39s   192.168.170.2     bvandenklinkenberg-master    <none>           <none>
pod/coredns-584795fc57-tc5wb                            1/1     Running   0          3m39s   192.168.184.131   bvandenklinkenberg-worker1   <none>           <none>
pod/etcd-bvandenklinkenberg-master                      1/1     Running   0          15h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-apiserver-bvandenklinkenberg-master            1/1     Running   0          15h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-controller-manager-bvandenklinkenberg-master   1/1     Running   0          15h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-proxy-lprbs                                    1/1     Running   0          15h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>
pod/kube-proxy-t5qqf                                    1/1     Running   0          15h     10.0.0.106        bvandenklinkenberg-worker2   <none>           <none>
pod/kube-proxy-xtc89                                    1/1     Running   0          15h     10.0.0.74         bvandenklinkenberg-worker1   <none>           <none>
pod/kube-scheduler-bvandenklinkenberg-master            1/1     Running   0          15h     10.0.0.93         bvandenklinkenberg-master    <none>           <none>

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   15h   k8s-app=kube-dns

NAME                                ENDPOINTS                                                          AGE
endpoints/kube-controller-manager   <none>                                                             15h
endpoints/kube-dns                  192.168.170.2:53,192.168.184.131:53,192.168.170.2:53 + 3 more...   15h
endpoints/kube-scheduler            <none>                                                             15h
@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

So my kube-system pods have an ip in the 10 range but everything that get deployed after cluster creation gets the calico pod range.
Although this doesn't seem to bother the first created cluster I would think my pods should be in the same range or not?

@neolit123

This comment has been minimized.

Copy link
Member

@neolit123 neolit123 commented Sep 7, 2019

So my kube-system pods have an ip in the 10 range but everything that get deployed after cluster creation gets the calico pod range.

a CIDR conflict?

you could try the weave-net or Cilium pod network plugins instead of Calico:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

@bramvdklinkenberg

This comment has been minimized.

Copy link
Author

@bramvdklinkenberg bramvdklinkenberg commented Sep 7, 2019

hi @neolit123 , with flannel it works fine with multiple clusters in one subnet! for now will just use that and try to figure out laten why calico isn't working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.