Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable CoreDNS host forwarding when DNS is not configured #6787

Closed
apcheamitru opened this issue Jan 19, 2023 · 7 comments
Closed

Disable CoreDNS host forwarding when DNS is not configured #6787

apcheamitru opened this issue Jan 19, 2023 · 7 comments

Comments

@apcheamitru
Copy link

Is your feature request related to a problem? Please describe.

When there are no DNS servers defined in /etc/resolv.conf, CoreDNS will not properly start -- it will remain in CrashLoopBackoff with the following error:

# cat /etc/resolv.conf
# kubectl -n kube-system get pods | grep coredns
coredns-7b5bbc6644-znxrp                               0/1     CrashLoopBackOff   5 (47s ago)   4m22s
# kubectl -n kube-system logs coredns-7b5bbc6644-znxrp
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
plugin/forward: no nameservers found

The error from plugin/forward is encountered because CoreDNS is configured to forward DNS requests to the host's resolv.conf nameservers:

# grep forward /var/lib/rancher/k3s/server/manifests/coredns.yaml
        forward . /etc/resolv.conf

Describe the solution you'd like

I don't expect this to be a common scenario for most, but the default Corefile prevents K3s from starting properly in an air-gapped environment where DNS is not available (and nameservers are intentionally omitted from /etc/resolv.conf).

I think it would be reasonable for CoreDNS forwarding to be dynamically configured based on whether or not nameservers have been defined on the host -- i.e., if nameservers are present in resolv.conf, add the forward . /etc/resolv.conf line; otherwise, do not. Then, if the configMap changes, trigger a rollout restart of the coredns deployment.

Describe alternatives you've considered

As a very (VERY) crude workaround, I was able to implement the desired behavior in an ExecStartPost scriptlet in k3s.service. The logic is as follows:

  1. Wait for /run/k3s/containerd/containerd.sock to be created – this tells me that K3s has started.
  2. Wait for /var/lib/rancher/k3s/server/manifests/coredns.yaml to be created – this file has to exist before we can change it.
  3. Wait for coredns.yaml to be newer than containerd.sock (as determined by mtime) – when the manifest is newer than the containerd socket, assume K3s has touched it on startup and will not be touched again. (Probably a bad assumption?)
  4. If no DNS nameservers are defined in /etc/resolv.conf, comment out the “forward” line in the Corefile to prevent DNS requests from being forwarded to the host’s nameservers.
  5. If the CoreDNS manifest was changed in STEP 4, restart the deployment: kubectl -n kube-system rollout restart deployment coredns.

Unfortunately, I do not know Go well enough to implement this change properly 😝

Additional context

I am running K3s v1.24 in an "air-gapped" environment with an empty /etc/resolv.conf.

# k3s --version
k3s version v1.24.8+k3s1 (648004e4)
go version go1.18.8

Airgap images have been dropped onto the filesystem:

# tar -xOf /var/lib/rancher/k3s/agent/images/k3s-airgap-images-amd64.tar manifest.json  | jq '[.[].RepoTags] | add'
[
  "rancher/klipper-helm:v0.7.3-build20220613",
  "rancher/klipper-lb:v0.3.5",
  "rancher/local-path-provisioner:v0.0.23",
  "rancher/mirrored-coredns-coredns:1.9.4",
  "rancher/mirrored-library-busybox:1.34.1",
  "rancher/mirrored-library-traefik:2.9.4",
  "rancher/mirrored-metrics-server:v0.6.1",
  "rancher/mirrored-pause:3.6"
]
@brandond
Copy link
Member

brandond commented Jan 19, 2023

Most airgap environments I've worked with have some sort of stub resolver available. I'm not sure that adding code to actively rewrite the CoreDNS manifest to handle this corner case is worth it? Kubernetes also requires a default route (even if it just just a black-hole default route) to function properly, so I would probably lean towards leaving this as something that needs to be set up properly for airgap scenarios.

@apcheamitru
Copy link
Author

@brandond Very good points. I did in fact run into the default route issue and solved that as you described with a blackhole default route. The DNS configuration proved to be trickier.

I had initially configured a dnsmasq caching only nameserver listening on my primary IP address. This seemed to work well, and ran without issues for months; however, in some environments we started to observe DNS timeouts. Something like:

# kubectl -n kube-system logs coredns-84c56f7bfb-qdfzk --timestamps=true --since=5m | tail -n5
2022-12-07T19:04:56.094951284Z [ERROR] plugin/errors: 2 34.0.42.10.in-addr.arpa. PTR: read udp 10.42.0.2:57772->10.10.251.101:53: i/o timeout
2022-12-07T19:04:57.901449219Z [ERROR] plugin/errors: 2 <SERVICE_NAME>. A: read udp 10.42.0.2:57168->10.10.251.101:53: i/o timeout
2022-12-07T19:04:57.901485398Z [ERROR] plugin/errors: 2 <SERVICE_NAME>. AAAA: read udp 10.42.0.2:53732->10.10.251.101:53: i/o timeout
2022-12-07T19:04:58.096333515Z [ERROR] plugin/errors: 2 34.0.42.10.in-addr.arpa. PTR: read udp 10.42.0.2:42494->10.10.251.101:53: i/o timeout
2022-12-07T19:05:02.030090661Z [ERROR] plugin/errors: 2 34.0.42.10.in-addr.arpa. PTR: read udp 10.42.0.2:53321->10.10.251.101:53: i/o timeout

These timeouts caused poor performance and ultimately prevented readiness probes from executing successfully. Unfortunately, the issue was intermittent, and I never was able to determine root cause. In an effort to bring the system back to a working state, I ended up removing dnsmasq and configuring CoreDNS as described above.

(IIRC, I ran into different issues with systemd-resolved, but that was so long ago that it would be worth revisiting.)

At the end of the day, the experience did not give me much confidence. I felt that I was hacking around things that I expected to work out-of-the-box. I never really wanted a stub resolver, I just wanted to make K3s (CoreDNS) happy 😂.

I'd like to leave this open to discussion and am happy to provide more information if needed. Hopefully others can chime in with their DNS resolver workarounds. If we feel that rewriting the CoreDNS manifest isn't worth it, maybe we can at least update the K3s Airgap documentation to address some of these requirements? Maybe with examples of supported resolver configurations?

@brandond
Copy link
Member

brandond commented Jan 19, 2023

Yeah, I'm happy to leave it open.

I will also add that most of the air-gap environments we target are air-gapped in the traditional sense that they do not have a connection to the internet. They usually do have a local network environment with functioning routing/dns/dhcp. Think secure office, remote lab, airplane, or ship.

If your air-gap environment is more of a "this box isn't plugged in to anything at all" type setup, then then our docs probably don't cover your use case.

@dereknola
Copy link
Member

I have added a section to the docs about needing a default route https://docs.k3s.io/installation/airgap#prerequisites

@thooooooomas
Copy link

These timeouts caused poor performance and ultimately prevented readiness probes from executing successfully. Unfortunately, the issue was intermittent, and I never was able to determine root cause. In an effort to bring the system back to a working state, I ended up removing dnsmasq and configuring CoreDNS as described above.

Could you explain what you did in "configuring CoreDNS as described above."
Did you add an entry in /etc/resolv.conf? What was the entry?

@apcheamitru
Copy link
Author

@thooooooomas Here's the hack I threw together to comment out the "forward" line in the Corefile after the CoreDNS manifest has been generated. This change is only made when there are no nameservers defined in /etc/resolv.conf.

I'm executing it an ExecStartPost for the "k3s" systemd service:

[Service]
ExecStartPost=/usr/bin/timeout 600 /usr/libexec/k3s/update-coredns-forwarding.sh

Not very elegant, but it works 🤷‍♂️

@caroline-suse-rancher
Copy link
Contributor

Just going to convert this to a discussion for now

@k3s-io k3s-io locked and limited conversation to collaborators Apr 18, 2023
@caroline-suse-rancher caroline-suse-rancher converted this issue into discussion #7306 Apr 18, 2023
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Apr 18, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants