Ubuntu 18.04 dns problems in pods #448

giovannicandido · 2018-06-24T18:13:56Z

Symptoms:

kubectl forward doens't work, Example:

kubectl -n kube-system port-forward service/tiller-deploy 44134:44134

Results in

E0624 12:06:20.664091 34312 portforward.go:331] an error occurred forwarding 42399 -> 44134: error forwarding port 44134 to pod 255e06439c2da94a4b6a8b1ad2d3d7f4d6d1ba1f82ab6eb2ae519133b1f2bc58, uid : exit status 1: 2018/06/24 15:06:20 socat[22114] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Temporary failure in name resolution

Geting the log in kubelet on the worker node

journalctl -u kubelet

Results in something like:

7797 httpstream.go:251] error forwarding port 44134 to pod 255e06439c2da94a4b6a8b1ad2d3d7f4d6d1ba1f82ab6eb2ae519133b1f2bc58, uid : exit status 1: 2018/06/24 15:06:21 socat[22130] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Temporary failure in namJun 24 15:09:53 worker0 kubelet[7797]: E0624 15:09:53.133330 7797 httpstream.go:251] error forwarding port 44134 to pod 255e06439c2da94a4b6a8b1ad2d3d7f4d6d1ba1f82ab6eb2ae519133b1f2bc58, uid : exit status 1: 2018/06/24 15:09:53 socat[22418] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Temporary failure in namJun 24 15:15:56 worker0 kubelet[7797]: E0624 15:15:56.303891 7797 httpstream.go:251] error forwarding port 44134 to pod

Which points to getaddrinfo("localhost")..., that means pod is not able to resolve localhost.

Other commands that use port-forward like helm version or other helm commands (helm package interact with tiller server using port forwards) has the same symptoms

Cause

Ubuntu 18.04 uses systemd-resolve which change /etc/resolv.conf to use a local dns. Kubelet needs to be started with the flag --resolv-conf=/run/systemd/resolve/resolv.conf on this systems.

Possible Solutions

This have been addressed by kubernetes/kubeadm#787 and probably will be on kubernetes 1.11, as a work around there is two easy fix:

Create a systemd dropping to override kubelet. Example:

[Service]
Environment='KUBELET_DNS_ARGS=--cluster-dns=172.31.0.10 --cluster-domain=cluster.local --resolv-conf=/run/systemd/resolve/resolv.conf'

Restart kubelet:

systemctl daemon-reload
systemctl restart kubelet

Create a symlink on /etc/resolv.conf pointing to /run/systemd/resolve/resolv.conf (backup first)

Do the same on all machines.

Pharos Installer

I suggest that pharos-cluster check for the existence of the file in Ubuntu 18.04 and perform one of the fix above. We need to check after the kubeadm fix is released to make sure it do not conflict (possible resulting in two flags added). Check the pull request to see how it was fixed in kubeadm side.

The text was updated successfully, but these errors were encountered:

giovannicandido · 2018-06-24T20:16:14Z

Update: After restarting kubelet, destroy kube-dns otherwise dns queries will stop working:

kubectl -n kube-system get pods -l k8s-app=kube-dns

Delete all pods. You may kubectl -n kube-system delete pods -l k8s-app=kube-dns --all or do one by one

SpComb · 2018-06-25T09:13:02Z

Seems like the issue isn't necessarily specific to Ubuntu 18.04 and systemd-resolved, but any configuration where /etc/resolv.conf contains localhost as a resolver will cause the kube-dns pod to use localhost as the upstream? Because kube-dns runs in a pod network namespace, the kube-dns upstream queries will loop back to itself and fail...

The fix will however need to be specific to the local resolver in use... for the systemd-resolved case we can assume that the real upstream resolvers are available at /run/systemd/resolve/resolv.conf... but for e.g. Ubuntu xenial desktop with NetworkManager, /etc/resolv.conf also contains 127.0.0.1, but the upstream nameservers are only available within dnsmasq internally, as set dynamically via DBus... they are not available anywhere on the filesystem:

Jun 25 09:43:29 tehobari dnsmasq[2015]: setting upstream servers from DBus
Jun 25 09:44:03 tehobari dnsmasq[2015]: setting upstream servers from DBus
Jun 25 09:44:03 tehobari dnsmasq[2015]: using nameserver 172.28.0.1#53(via wlp4s0)

The upstream kubernetes fix for 1.11 seems to have kubeadm init/join conditionally generate the systemd dropin with a --resolv-conf=/run/systemd/resolve/resolv.conf flag for the kubelet: kubernetes/kubernetes#64665

Our fix for this in pharos 1.3 would be to upgrade to kube 1.11 which would fix this for new installs... to fix this for pharos 1.2 as well as existing 1.2 -> 1.3 upgrades, then we will need to detect this configuration and set that ourselves.

SpComb · 2018-06-25T12:20:01Z

Confirm that dnsPolicy: Default pods are broken on Ubuntu bionic / 18.04.

Normal dnsPolicy: ClusterFirst pods work, but the cluster DNS will be broken if the kube-dns pod lands on a bionic node.

I0625 12:18:27.906305       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0625 12:18:27.907288       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --no-negcache --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0625 12:18:28.281314       1 nanny.go:119] 
W0625 12:18:28.281524       1 nanny.go:120] Got EOF from stdout
I0625 12:18:28.283464       1 nanny.go:116] dnsmasq[9]: started, version 2.78 cachesize 1000
I0625 12:18:28.283637       1 nanny.go:116] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0625 12:18:28.283771       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0625 12:18:28.283872       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0625 12:18:28.284017       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0625 12:18:28.284217       1 nanny.go:116] dnsmasq[9]: reading /etc/resolv.conf
I0625 12:18:28.284353       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0625 12:18:28.284477       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0625 12:18:28.284595       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0625 12:18:28.284684       1 nanny.go:116] dnsmasq[9]: using nameserver 127.0.0.53#53
I0625 12:18:28.284882       1 nanny.go:116] dnsmasq[9]: read /etc/hosts - 7 addresses
I0625 12:18:48.681942       1 nanny.go:116] dnsmasq[9]: Maximum number of concurrent DNS queries reached (max: 150)
I0625 12:18:58.688477       1 nanny.go:116] dnsmasq[9]: Maximum number of concurrent DNS queries reached (max: 150)
I0625 12:19:08.702231       1 nanny.go:116] dnsmasq[9]: Maximum number of concurrent DNS queries reached (max: 150)
I0625 12:19:18.708944       1 nanny.go:116] dnsmasq[9]: Maximum number of concurrent DNS queries reached (max: 150)

jakolehm added this to the 1.2.0 milestone Jun 25, 2018

jakolehm added the bug Something isn't working label Jun 25, 2018

jakolehm assigned SpComb Jun 25, 2018

SpComb mentioned this issue Jun 25, 2018

Kubernetes 1.11 #419

Closed

SpComb mentioned this issue Jun 25, 2018

Detect systemd-resolved stub resolver at localhost and bypass using kubelet --resolv-conf #450

Merged

jakolehm closed this as completed in #450 Jun 26, 2018

pacoxu mentioned this issue Dec 20, 2022

uid : exit status 1: 2022/12/15 22:48:30 socat[2186711] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Temporary failure in name resolution kubernetes/kubernetes#114512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ubuntu 18.04 dns problems in pods #448

Ubuntu 18.04 dns problems in pods #448

giovannicandido commented Jun 24, 2018 •

edited

Loading

giovannicandido commented Jun 24, 2018

SpComb commented Jun 25, 2018 •

edited

Loading

SpComb commented Jun 25, 2018

Ubuntu 18.04 dns problems in pods #448

Ubuntu 18.04 dns problems in pods #448

Comments

giovannicandido commented Jun 24, 2018 • edited Loading

Symptoms:

Cause

Possible Solutions

Pharos Installer

giovannicandido commented Jun 24, 2018

SpComb commented Jun 25, 2018 • edited Loading

SpComb commented Jun 25, 2018

giovannicandido commented Jun 24, 2018 •

edited

Loading

SpComb commented Jun 25, 2018 •

edited

Loading