Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignored --resolv-conf - coreDNS fails #76569

Open
thzois opened this Issue Apr 14, 2019 · 11 comments

Comments

Projects
None yet
6 participants
@thzois
Copy link

thzois commented Apr 14, 2019

What happened:
coreDNS is failing with the following message:

2019-04-14T22:31:56.340Z [INFO] CoreDNS-1.3.1
2019-04-14T22:31:56.340Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-04-14T22:31:56.340Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2019-04-14T22:32:01.343Z [ERROR] plugin/errors: 2 8626528133061726072.6199488701963047542. HINFO: dial udp [2a01:7c8:7000:195:0:8:195:8]:53: connect: network is unreachable
2019-04-14T22:32:02.342Z [ERROR] plugin/errors: 2 8626528133061726072.6199488701963047542. HINFO: read udp 10.244.0.150:34269->195.135.195.135:53: read: no route to host
2019-04-14T22:32:03.342Z [ERROR] plugin/errors: 2 8626528133061726072.6199488701963047542. HINFO: dial udp [2a01:7c8:7000:195:0:8:195:8]:53: connect: network is unreachable

I added --resolv-conf flag to the kubelet but it seems that it is getting ignored ( /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf). I reloaded the daemon and restarted the cluster.

[Service]
Environment="KUBELET_EXTRA_ARGS=--resolv-conf"
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

Kubelet is missing the flag:

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2019-04-15 00:43:56 CEST; 1s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 1795 (kubelet)
    Tasks: 14
   Memory: 26.1M
   CGroup: /system.slice/kubelet.service
           └─1795 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: bare-metal
  • OS (e.g: cat /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a): 3.10.0-957.el7.x86_64
  • Others:
    Host /etc/resolv.conf
search colo.transip.net
nameserver 195.8.195.8
nameserver 195.135.195.135
nameserver 2a01:7c8:7000:195:0:8:195:8
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 2a01:7c8:7000:195:0:135:195:135

kubectl exec busybox cat /etc/resolv.conf

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local colo.transip.net
options ndots:5

kubectl exec -ti busybox -- nslookup kubernetes.default

Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Also:
/etc/sysconfig/kubelet

KUBELET_EXTRA_ARGS=

In /var/lib/kubelet/config.yaml

resolvConf: /etc/resolv.conf

If I hardcode it:

Apr 15 02:07:45 136-144-186-218.colo.transip.net kubelet[29223]: F0415 02:07:45.879299   29223 server.go:151] flag needs an argument: --resolv-conf
@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Apr 15, 2019

2019-04-14T22:32:02.342Z [ERROR] plugin/errors: [...] read udp 10.244.0.150:34269->195.135.195.135:53: read: no route to host

kubelet defaults to using /etc/resolv.conf of the node it's running on. Are you trying to set it to something else?
Is 195.135.195.135 one of the upstream DNS servers you want to use? If so, it appears to not be reachable from your pod network.

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Apr 15, 2019

/sig network
/triage needs-information

@athenabot

This comment has been minimized.

Copy link

athenabot commented Apr 15, 2019

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by @vllry. 👩‍🔬

@thzois

This comment has been minimized.

Copy link
Author

thzois commented Apr 15, 2019

@chrisohaver I am not trying to set it to anything. The /etc/resolv.conf is the default from the VPS and it is generated by a NetworkManager. I also tried to resolve other services running, but no hope. Everything is failing. CoreDNS after some restarts:

2019-04-15T20:16:18.431Z [INFO] CoreDNS-1.3.1
2019-04-15T20:16:18.431Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-04-15T20:16:18.431Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669

But the busybox still can't resolve not only kubernetes.default but also every other service.

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Apr 15, 2019

Multiple things going on here possibly:

  1. Presumably 195.135.195.135 one of the upstream DNS servers in your /etc/resolv.conf...
    It appears to not be reachable from your pod network, hence the error message. So, need to figure out why your pod network cannot reach 195.135.195.135.

  2. Recent versions of busybox have a bug that prevent dns short name resolution. Try the fqdn kubernetes.default.svc.cluster.local.

@thzois

This comment has been minimized.

Copy link
Author

thzois commented Apr 15, 2019

I did tried the second one. It still cannot be resolved. For the first one, where should I start?

-- UPDATE --
Now that I restart everything, suddenly works. Sometimes I also get: Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 195.135.195.135 195.8.195.8 2a01:7c8:7000:195:0:135:195:135

-- UPDATE --
Restarting again the cluster brought back the problem. Kubernetes cannot resolve internal hostnames and coreDNS cannot find route to host

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Apr 16, 2019

For the first one, where should I start?

I'll have to pass the mic to someone more versed at troubleshooting cluster network problems.

@thzois

This comment has been minimized.

Copy link
Author

thzois commented Apr 17, 2019

Could it be similar issue: kubernetes/kubeadm#1056 (comment) ?

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Apr 17, 2019

I don't think the source of the problem is in CoreDNS. CoreDNS displays those errors because the network cannot route requests your upstream DNS server. Something is not right with the Pod networking in the cluster.

Your Pod network needs to be able to route to 195.135.195.135, or DNS will not work correctly.

@thzois

This comment has been minimized.

Copy link
Author

thzois commented Apr 18, 2019

I changed the nameservers to the ones from Google. I get the same error..

@chrisohaver

This comment has been minimized.

Copy link
Contributor

chrisohaver commented Apr 18, 2019

Can your Pods route to anything outside of your cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.