Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localhost.<dns-search-path> DNS records trick etcd into attempting to bind to unavailable, non-loopback IP address #57709

Closed
andremarianiello opened this issue Dec 29, 2017 · 9 comments

Comments

@andremarianiello
Copy link

commented Dec 29, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I set up a cluster using kubeadm

etcd in the image gcr.io/google_containers/etcd-amd64 fails to start, producing the following output:

# docker logs -f f61d7c0d57fa
2017-12-29 14:48:45.066564 I | etcdmain: etcd Version: 3.1.10
2017-12-29 14:48:45.066619 I | etcdmain: Git SHA: 0520cb9
2017-12-29 14:48:45.066622 I | etcdmain: Go Version: go1.8.3
2017-12-29 14:48:45.066625 I | etcdmain: Go OS/Arch: linux/amd64
2017-12-29 14:48:45.066632 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-12-29 14:48:45.067597 C | etcdmain: listen tcp 10.168.91.198:2380: bind: cannot assign requested address

What you expected to happen:

I expected the etcd pod to correctly listen on a loopback IP address, not some external IP address.

How to reproduce it (as minimally and precisely as possible):

Add a DNS record to your local DNS server mapping the domain name "localhost." to a non-loopback IP.

For example, if you have "example.local" as a search path, then add a DNS A record for "localhost.example.local" pointing to some IP (10.168.91.198 in this example).

$ ping localhost.example.local
PING localhost.example.local (10.168.91.198) 56(84) bytes of data.
64 bytes from localhost.example.local (10.168.91.198): icmp_seq=1 ttl=63 time=0.858 ms
^C
--- localhost.example.local ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.858/0.858/0.858/0.000 ms
$ docker run -it --rm gcr.io/google_containers/etcd-amd64:3.1.11 etcd
2017-12-29 14:44:24.912939 I | etcdmain: etcd Version: 3.1.11
2017-12-29 14:44:24.913064 I | etcdmain: Git SHA: 960f4604b
2017-12-29 14:44:24.913074 I | etcdmain: Go Version: go1.8.5
2017-12-29 14:44:24.913083 I | etcdmain: Go OS/Arch: linux/amd64
2017-12-29 14:44:24.913109 I | etcdmain: setting maximum number of CPUs to 16, total number of available CPUs is 16
2017-12-29 14:44:24.913127 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2017-12-29 14:44:24.914747 C | etcdmain: listen tcp 10.168.91.198:2380: bind: cannot assign requested address

Anything else we need to know?:

This is caused by a combination of contributing factors. Firstly, etcd uses "localhost" for default URLs, not loopback addresses, and this is not going to change (etcd-io/etcd#9070) so "localhost" needs to be resolved via /etc/hosts.

Secondly, DNS resolution is prioritized over /etc/hosts due to the way that Go handles hostname resolution in GODEBUG=netdns=go mode.
https://groups.google.com/forum/#!topic/golang-nuts/G-faJ0bthz0
https://golang.org/src/net/conf.go#L203
Non-Go-based methods of resolving "localhost" like ping in busybox containers work correctly. Setting GODEBUG=netdns=cgo or creating /etc/nsswitch.conf (influxdata/influxdata-docker#76, sgerrand/alpine-pkg-glibc#4) makes etcd resolve localhost correctly.

Environment:

  • Kurbenetes version: 1.9.0
  • Cloud provider or hardware configuration: vSphere virtualized amd64
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a): 3.10.0-693.2.2.el7.x86_64
  • Install tools: kubeadm-1.9.0
  • Others: etcd version 3.1.10, 3.1.11

Workaround: For anyone running into this issue, you can set etcd.extraArgs.listen-peer-urls=http://127.0.0.1:2380 in your kubeadm config file to force etcd to use the correct IP. Or, you can edit /etc/kubernetes/manifests/etcd.yaml directly.

@andremarianiello andremarianiello changed the title localhost.<dns-search-path> DNS records can interfere with correct functioning of etcd localhost.<dns-search-path> DNS records trick etcd into attempting to bind to unavailable, non-loopback IP address Dec 29, 2017

@andremarianiello

This comment has been minimized.

Copy link
Author

commented Dec 29, 2017

@kubernetes/sig-cluster-lifecycle-bugs

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2017

@andremarianiello: Reiterating the mentions to trigger a notification:
@kubernetes/sig-cluster-lifecycle-bugs

In response to this:

@kubernetes/sig-cluster-lifecycle-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Arnavion

This comment has been minimized.

Copy link

commented Jan 6, 2018

To clarify the workaround, the minimal config file is:

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
etcd:
  extraArgs:
    'listen-peer-urls': 'http://127.0.0.1:2380'

and used as kubeadm init --config /path/to/kubeadm.yaml

@sunriax

This comment has been minimized.

Copy link

commented Jan 16, 2018

Maybe this issue happens because the etcd tries to resolve localhost with dns and not with the /etc/hosts files. We have got the same issue and tried your config which works fine. After a while we tried to resolve localhost with dns and there was a dns entry to an other server ;-). So etcd tried to bind to this address. We did not check again if this was the problem but we will try the next time and report...

Try

nslookup localhost

and you should get the address of the kubernetes master server...

Happy Kuberneting...

@rayterrill

This comment has been minimized.

Copy link

commented Jan 24, 2018

This bit me so hard - been struggling with this for weeks. Your note about 'nslookup localhost' clued me in, @sunriax. Up and going now. Cheers.

@gslightmage

This comment has been minimized.

Copy link

commented Feb 21, 2018

nslookup ignores /etc/hosts on ubunbtu 16.

This is the only workaround that works - and I bashed myhead after two days.

mistio-gitlab pushed a commit to mistio/kubernetes-blueprint that referenced this issue Apr 27, 2018

mistio-gitlab pushed a commit to mistio/kubernetes-blueprint that referenced this issue Apr 27, 2018

Run `kubeadm init` with `--config` instead of providing all arguments…
… in the command line

All arguments used upon initialization are meant to be included in /etc/kubernetes/admin.yaml

This is also a work-around for kubernetes/kubernetes#57709
@rongou

This comment has been minimized.

Copy link

commented May 3, 2018

Doing this before kubeadm init worked for me:

sudo sed -e 's/^search/#search/' -i /etc/resolv.conf
@luxas

This comment has been minimized.

Copy link
Member

commented Jun 11, 2018

Closing this in favor of kubernetes/kubeadm#909 (which should have a better title), that is in the kubeadm repo. We track all our issues in the kubeadm repo, hence I'm closing this one as an issue move.

@oz123

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

@lukas this issue you link has nothing to do with the DNS problems discussed here.

hswong3i added a commit to alvistack/ansible-role-minikube that referenced this issue May 2, 2019

hswong3i added a commit to alvistack/ansible-role-kubernetes that referenced this issue May 2, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.