Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL: Use Debian as a base container instead of Alpine since Alpine causes DNS issues. #1161

Closed
cdrage opened this issue Dec 21, 2018 · 5 comments

Comments

@cdrage
Copy link

@cdrage cdrage commented Dec 21, 2018

So here's the issue...

In some clusters, DNS will not resolve correctly due to Alpine not handling DNS resolution correctly. Alpine is used as a base image for cert-manager.

This is a critical problem as I'm unable to get this to work within my large Kubernetes cluster with Let's Encrypt.

There is a HUGE chain of issues that describe's what's happening. Essentially, Alpine does not resolve the DNS queries correctly and either returns incorrect queries, or (depending if the provider uses Cloudflare), returns them incorrectly.

I'm unfamiliar with Bazel, but it'd be good to change it from Alpine to Debian here:

tag = "3.7-v20180822-0201cfb11",

How to replicate the bug and what happens:

Deploy cert-manager:

helm install \
  --name cert-manager \
  --namespace kube-system \
  --version v0.5.2 \
  stable/cert-manager

Now try to do an nslookup within the cert-manager container:

▶ kubectl exec -it cert-manager-5d5bc6cd7f-fw7dx -n kube-system -- /bin/sh
/ $ nslookup letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve

Name:      letsencrypt.org
Address 1: 23.23.86.44 ec2-23-23-86-44.compute-1.amazonaws.com

This returns an INCORRECT dns entry. The reasoning behind this can be found in multiple issues: kubernetes/kubernetes#30215 gliderlabs/docker-alpine#8 JiscSD/rdss-arkivum-nextcloud#24 kubernetes/dns#119

Larger projects have also switched over to using Debian instead of Alpine due to an incredible amount of DNS issues: apache/openwhisk#4052

This is due to Alpine not resolving the /etc/resolv.conf file correctly:

/ $ cat /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local net
options ndots:5
/ $

After removing "net" (provided by Kubernetes) from /etc/resolv.conf, DNS now resolves correctly:

/ $ nslookup letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve
         
Name:      letsencrypt.org
Address 1: 23.195.219.207 a23-195-219-207.deploy.static.akamaitechnologies.com
Address 2: 2600:140a:0:384::ce0 g2600-140a-0000-0384-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com
Address 3: 2600:140a:0:3b0::ce0 g2600-140a-0000-03b0-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com
/ $ ^C       

I highly suggest changing the base image from Alpine (the current one) to Debian in order to resolve these DNS issues as at the moment, cert-manager is incompatible with Let's Encrypt due to DNS issues not being able to resolve correctly with the current Alpine image.

I'd honestly open a PR, but it looks like the Alpine image is being built somewhere else and is pushed to gcr.io.

Ping @munnerz @kragniz

@cdrage

This comment has been minimized.

Copy link
Author

@cdrage cdrage commented Dec 21, 2018

Here's an article that outlines the issue with Alpine:
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

Here's an open issue with regards to running Alpine on Kubernetes clusters:
kubernetes/kubernetes#56903

Another open issue with Alpine + Go + DNS dropping:
golang/go#29358

An open issue with Rancher:
rancher/rancher#16018

Another open issue on the Alpine repo, which even involved editing an AWS ECS AMI:
gliderlabs/docker-alpine#255

There's 6 more projects I found that has the exact same issue, but I'm not going to post any more haha.

@cdrage

This comment has been minimized.

Copy link
Author

@cdrage cdrage commented Dec 21, 2018

Actually, it was an issue on my host, but regardless, Alpine will not take in multiple DNS servers and will not fall-back to another "search" in /etc/resolv.conf

I ended up removing "net" from my host /etc/resolv.conf and it fixed the issue. But regardless, I think we should still switch to Debian 😄

@Vonor

This comment has been minimized.

Copy link

@Vonor Vonor commented Jan 13, 2019

Can't reproduce your issue.

kubectl run -it --rm alpine --image=alpine --overrides='{ "apiVersion": "apps/v1beta1", "spec": { "template": { "spec": { "nodeSelector": { "beta.kubernetes.io/arch": "arm64" } } } } }'
/ # nslookup letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve

Name:      letsencrypt.org
Address 1: 104.111.245.93 a104-111-245-93.deploy.static.akamaitechnologies.com
Address 2: 2a02:26f0:6c00:186::ce0 g2a02-26f0-6c00-0186-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com
Address 3: 2a02:26f0:6c00:18f::ce0 g2a02-26f0-6c00-018f-0000-0000-0000-0ce0.deploy.static.akamaitechnologies.com

/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local fritz.box
options ndots:5
/ # 

The only thing that doesn't work for me is resolving hosts in the LAN (*.fritz.box) but that seems to be a config issue with coredns rather than with alpine.

@lorenz

This comment has been minimized.

Copy link

@lorenz lorenz commented Jan 20, 2019

Just FYI: cert-manager won't use Alpine's (musl) DNS resolver, it's fully statically compiled and uses Go's built-in resolver. So nslookup will tell you absolutely nothing about how cert-manager looks up DNS. You technically don't even need an OS to run it and I'm not sure why they include one.

@munnerz

This comment has been minimized.

Copy link
Member

@munnerz munnerz commented Feb 7, 2019

I'm going to close this as it doesn't seem to be an issue, given we don't use alpine's own DNS resolution.

If I'm wrong, or there's some other issue with the way we handle networking then please feel free to re-open 😄

@munnerz munnerz closed this Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.