New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install on a system using `systemd-resolved` leads to broken DNS #273

Closed
gjcarneiro opened this Issue May 19, 2017 · 18 comments

Comments

Projects
None yet
@gjcarneiro
Copy link

gjcarneiro commented May 19, 2017

What keywords did you search in kubeadm issues before filing this one?

systemd resolved dns

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.6.3
Environment:

  • Kubernetes version (use kubectl version): v1.6.3
  • Cloud provider or hardware configuration: bare metal
  • OS (e.g. from /etc/os-release): Ubuntu 17.04
  • Kernel (e.g. uname -a): Linux gjc-XPS-8500 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

Installed kubernetes on bare metal using kubeadm. Dns inside pods did not work.

What you expected to happen?

Would expect dns inside pods to work.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

As noted in kubernetes/kubernetes#45828, the problem is due to the fact that on a normal Ubuntu desktop (and maybe other desktop Linux OSes), /etc/resolve.conf contains 127.0.0.35, which doesn't work inside Pods.

The correct thing to do is to add --resolv-conf=/run/systemd/resolve/resolv.conf to the kubelet config in case systemd-resolved is running with DNSStubListener and /etc/resolv.conf is configured with the local resolver (solution suggested by @antoineco and @thockin).

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented May 25, 2017

So kubeadm doesn't lay down the kubelet startup, that's done in the system unit file, which is done here: https://github.com/kubernetes/release

/cc @marcoceppi @castrojo - this appears to be an ubuntu default for desktop setups.

@luxas

This comment has been minimized.

Copy link
Member

luxas commented May 29, 2017

@timothysc @marcoceppi @castrojo Critical for v1.7?

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Jun 6, 2017

@luxas no.

@erikbgithub

This comment has been minimized.

Copy link

erikbgithub commented Jun 7, 2017

Sorry, not sure if anybody will still look at closed issues. #272 is not resolved by the solution suggested here.

@erikbgithub

This comment has been minimized.

Copy link

erikbgithub commented Jun 20, 2017

Please reopen #272 or start working on this issue considering the other context as well.

@fasaxc

This comment has been minimized.

Copy link

fasaxc commented Dec 11, 2017

I'm hitting this when I try to use kubeadm with GCE's ubuntu-1710 image so it looks like it's not limited to the desktop install.

@mt-inside

This comment has been minimized.

Copy link

mt-inside commented Jan 9, 2018

As an FYI: as I commented on kubernetes/kubernetes#45828, I don't believe that over-riding kubelet's resolv.conf reference will work anyway. This will just dump a broken (referencing 127.0.0.53) resolv.conf into all the pods and bypass cluster-local resolution. The current state of affairs is that just external resolution is broken because kube-dns has a broken upstream, but it is able to stub the cluster-local zones off to k8s. The only fix I can see is adding / editing config to kube-dns / CoreDNS.

NB

  • It's not just ubuntu desktop, this isn't a NetworkManager thing, this is systemd-resolved, which is used on server version 17.10 at least.
  • It's 127.0.0.53 (as in the DNS port), not 35
@antoineco

This comment has been minimized.

Copy link

antoineco commented Jan 9, 2018

@mt-inside that's why pointing kubelet to /run/systemd/resolve/resolv.conf makes sense because in an environment running systemd-resolved

  1. /etc/resolv.conf contains only one entry: localhost
  2. /run/systemd/resolve/resolv.conf contains your actual DNS servers

kube-dns merely uses whatever nameservers kubelet provides as its forwarders, so if kubeadm configures kubelet to use 2) instead of 1) you're all set.

@mt-inside

This comment has been minimized.

Copy link

mt-inside commented Jan 9, 2018

@antoineco I agree that'll get kube-dns forwarding correctly, but won't every other user-level Pod in the system then go straight to your upstream servers and not query kube-dns at all? When I tried the --resolv-conf option, it just used that file verbatim and didn't inject the kube-dns Service ClusterIP (the --resolv-conf option was ignored until I removed the --cluster-dns option)

@antoineco

This comment has been minimized.

Copy link

antoineco commented Jan 9, 2018

By default, if --cluster-dns is set (should be!), all user workloads send DNS requests to kube-dns, which in turn does the forwarding job for you.

What you described is the behaviour of ClusterFirst.

ref https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods-dns-policy

@mt-inside

This comment has been minimized.

Copy link

mt-inside commented Jan 9, 2018

@antoineco Ah, you're right. I was confused about dnsPolicy. I was confused about what coredns is running as, because Default isn't the default. I also confused myself by looking at a ClusterFirst Pod that was failing back to Default when I didn't specify --cluster-dns in some of my tests. Also the scope of --resolv-conf (not applying to ClusterFirst) and --cluster-dns (not applying to Default) isn't documented, and I didn't think of it until I really grokked the different dns modes.

I agree this fix is perfectly sensible.

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Jan 31, 2018

So what is the consensus?

@mt-inside

This comment has been minimized.

Copy link

mt-inside commented Feb 1, 2018

@timothysc Sorry, it's not spelt out. A combination of what @antoineco says here and @thockin says on kubernetes/kubernetes#45828
Kubelet needs the argument --resolv-conf=/run/systemd/resolve/resolv.conf
My kubeadm wrapper script adds that to KUBELET_DNS_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

However (deferring to the kubeadm authors here):

  • I don't know what kubelet's behaviour is wrt non-existant files. If it doesn't like them, this should only be done on systems running systemd-resolvd
  • You seem to think the kubelet's args file isn't laid down by kubeadm, but by https://github.com/kubernetes/release ? I take it .../10-kubeadm.conf comes from this project at least and could be used?
@codepainters

This comment has been minimized.

Copy link

codepainters commented Apr 1, 2018

I've hit the very same issue with kubeadm 1.10.0 and CoreDNS - with even worse results, as CoreDNS asked to resolve any external name starts looping to itself, consuming all allowed RAM and getting OOM-killed.

Obviously it can be fixed either by kubelet --resolv-conf param (as mentioned above), or by editing config map with Corefile, but it takes a moment to realise what's failing and why. It's unfortunate that default setup fails so miserably.

I've raised an issue in CoreDNS tracker for better handling of such a misconfiguration on CoreDNS side: coredns/coredns#1647

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Apr 7, 2018

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented May 10, 2018

seems like a duplicate of #787
which is being worked on.

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue May 11, 2018

Merge pull request #63691 from detiber/warn_systemd-resolved
Automatic merge from submit-queue (batch tested with PRs 63673, 63712, 63691, 63684). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm - add preflight warning when using systemd-resolved

**What this PR does / why we need it**:

This PR adds a preflight warning when the host is running systemd-resolved.

Newer Ubuntu releases (artful and bionic in particular) run systemd-resolved by default and in the dfeault configuration have an /etc/resolv.conf file that references 127.0.0.53 which is not accessible from containers running on the host. We will now provide a warning to the user to tell them that the kubelet args should include `--resolv-conf=/run/systemd/resolve/resolv.conf`. `/run/systemd/resolve/resolv.conf`. 

**Which issue(s) this PR fixes**:
This does not resolve the following issues, but it does provide better output to the users affected by the issues: kubernetes/kubeadm#273 kubernetes/kubeadm#787

**Release note**:
```release-note
NONE
```
@luxas

This comment has been minimized.

Copy link
Member

luxas commented May 11, 2018

Yes, this one and #787 are duplicates. I'll close #787 as this one is older.

@luxas luxas changed the title Install on system with systemd-resolved with DNSStubListener leads to broken kube-dns Install on system with systemd-resolved leads to broken DNS May 14, 2018

@luxas luxas changed the title Install on system with systemd-resolved leads to broken DNS Install on a system using `systemd-resolved` leads to broken DNS May 14, 2018

@timothysc timothysc assigned timothysc and unassigned detiber May 15, 2018

@luxas

This comment has been minimized.

Copy link
Member

luxas commented May 29, 2018

As we have the preflight check (added in kubernetes/kubernetes#63691), I'm gonna close this
To make this work automatically, we have filed #845

Thank you a lot everyone who have contributed to fixing this!

@luxas luxas closed this May 29, 2018

asksven added a commit to asksven/kubernetes-the-hard-way-vagrant that referenced this issue Sep 12, 2018

Fixed #1
Make sure kubeletes use `/run/systemd/resolve/resolv.conf` and not `/etc/resolv.conf` to make sure that any dnsmasq / resolved installed on the workers does not interfere with the clusters DNS resolution

Refs:
kubernetes/kubeadm#273
https://blog.sophaskins.net/blog/misadventures-with-kube-dns/

vannrt added a commit to platform9/nodeadm that referenced this issue Dec 7, 2018

Fix for systemd-resolved DNS incompatibility
This problem occurs because systems using systemd-resolved copy
127.0.0.53 from the host's /etc/resolv.conf.

More discussion here: kubernetes/kubernetes#45828

Related issues:
kubernetes/kubeadm#787
kubernetes/kubeadm#273
kubernetes/kubeadm#845

The upstream fix is now in v1.11.

vannrt added a commit to platform9/nodeadm that referenced this issue Dec 7, 2018

Fix for systemd-resolved DNS incompatibility
This problem occurs because systems using systemd-resolved copy
127.0.0.53 from the host's /etc/resolv.conf.

More discussion here: kubernetes/kubernetes#45828

Related issues:
kubernetes/kubeadm#787
kubernetes/kubeadm#273
kubernetes/kubeadm#845

The upstream fix is now in v1.11.

vannrt added a commit to platform9/nodeadm that referenced this issue Dec 7, 2018

Fix for systemd-resolved DNS incompatibility
This problem occurs because kube-dns on systems using systemd-resolved
copy 127.0.0.53 from the host's /etc/resolv.conf.

Since 127.0.0.53 is a loopback address, dns queries never get past
kube-dns causing our conformance tests to fail on DNS related issues.

More discussion here: kubernetes/kubernetes#45828

Related issues:
kubernetes/kubeadm#787
kubernetes/kubeadm#273
kubernetes/kubeadm#845

The upstream fix is now in v1.11.

vannrt added a commit to platform9/nodeadm that referenced this issue Dec 7, 2018

Fix for systemd-resolved DNS incompatibility
This problem occurs because kube-dns on systems using systemd-resolved
copy 127.0.0.53 from the host's /etc/resolv.conf.

Since 127.0.0.53 is a loopback address, dns queries never get past
kube-dns causing our conformance tests to fail on DNS related issues.

More discussion here: kubernetes/kubernetes#45828

Related issues:
kubernetes/kubeadm#787
kubernetes/kubeadm#273
kubernetes/kubeadm#845

The upstream fix is now in v1.11.

Without the fix, the kubedns and dnsmasq containers would copy the host's `/etc/resolv.conf`:
```
\# This file is managed by man:systemd-resolved(8). Do not edit.
\#
\# This is a dynamic resolv.conf file for connecting local clients to the
\# internal DNS stub resolver of systemd-resolved. This file lists all
\# configured search domains.
\#
\# Run "systemd-resolve --status" to see details about the uplink DNS servers
\# currently in use.
\#
\# Third party programs must not access this file directly, but only through the
\# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
\# replace this symlink by a static file or a different symlink.
\#
\# See man:systemd-resolved.service(8) for details about the supported modes of
\# operation for /etc/resolv.conf.

nameserver 127.0.0.53
search platform9.sys
```

After the fix:
```
\# This file is managed by man:systemd-resolved(8). Do not edit.
\#
\# This is a dynamic resolv.conf file for connecting local clients directly to
\# all known uplink DNS servers. This file lists all configured search domains.
\#
\# Third party programs must not access this file directly, but only through the
\# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
\# replace this symlink by a static file or a different symlink.
\#
\# See man:systemd-resolved.service(8) for details about the supported modes of
\# operation for /etc/resolv.conf.

nameserver 10.105.16.2
nameserver 10.105.16.4
search platform9.sys
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment