Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NodeLocal DNSCache #1024

Closed
davidnuzik opened this issue May 19, 2021 · 6 comments
Closed

Support NodeLocal DNSCache #1024

davidnuzik opened this issue May 19, 2021 · 6 comments
Assignees
Labels
kind/dev-validation Dev will be validating this issue kind/rke1-feature-parity

Comments

@davidnuzik
Copy link
Contributor

Node Local Cache work for Rancher 2.6.x

@davidnuzik davidnuzik added this to the Rancher 2.6.x milestone May 19, 2021
@davidnuzik davidnuzik added this to To Triage in Development [DEPRECATED] via automation May 19, 2021
@davidnuzik davidnuzik moved this from To Triage to Backlog in Development [DEPRECATED] May 19, 2021
@cjellick cjellick moved this from Backlog to Next Up in Development [DEPRECATED] Jul 14, 2021
@cjellick cjellick changed the title Node Local Cache Support NodeLocal DNSCache Jul 14, 2021
@cjellick
Copy link
Contributor

RKE1 did this. @superseb can give more context. Or perhaps @Oats87 can.

If feasible, we should also do in k3s.

@manuelbuil
Copy link
Contributor

The upper mentions are wrong. I mixed up this issue with the autoscaler coreDNS one. Sorry for the confusion

@manuelbuil
Copy link
Contributor

manuelbuil commented Jul 20, 2021

Tasks:

  • Add nodelocal dns manifests to coredns rke2-charts
  • Prepare logic to enable it
  • Prepare different logic for ipvs and normal iptables
  • Remove nodelocaldns interface in the rke2-uninstall script
  • Remove generated nodelocaldns iptables in the rke2-uninstall script
  • Add new images into rke2 airgap
  • Document everything

@manuelbuil
Copy link
Contributor

How to test:

  1. Enable it via nodelocal.enabled: true
  2. Verify there is a new daemonset node-local-dns and the pods have no errors in the log
  3. Verify there is a node-local-dns configMap binding to 169.254.20.10 and 10.43.0.10
  4. Check that there is a new interface in the node node-local-dns
  5. Run a dummy pod and check that dns resolution works for an internal service (e.g. rke2-metrics-server.kube-system) and for an external service (www.google.com)

@manuelbuil manuelbuil moved this from Next Up to Working in Development [DEPRECATED] Jul 26, 2021
@manuelbuil manuelbuil moved this from Working to Peer Review in Development [DEPRECATED] Jul 27, 2021
@manuelbuil manuelbuil moved this from Peer Review to To Test in Development [DEPRECATED] Aug 2, 2021
@rancher-max
Copy link
Contributor

Here's an example helmchart config for the nodelocal enabled:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-coredns
  namespace: kube-system
spec:
  valuesContent: |
    nodelocal:
      enabled: true

@bmdepesa bmdepesa added the kind/dev-validation Dev will be validating this issue label Aug 12, 2021
@galal-hussein
Copy link
Contributor

Validated against master commit 09bb5c2

  1. Was able to successfully enable using HelmChartConfig:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-coredns
  namespace: kube-system
spec:
  valuesContent: |
    nodelocal:
      enabled: true
  1. There was a new daemonset and running pod with no errors.
# kubectl get ds -n kube-system
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
node-local-dns                  1         1         1       1            1           kubernetes.io/os=linux   13m
  1. Confirmed there was a new configmap, which correctly has the data:
# kubectl get configmap -n kube-system node-local-dns -o yaml
apiVersion: v1
data:
  Corefile: |
    cluster.local:53 {
        errors
        cache {
                success 9984 30
                denial 9984 5
        }
        reload
        loop
        bind 169.254.20.10 10.43.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        health 169.254.20.10:8080
        }
    in-addr.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.43.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        }
    ip6.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.43.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        }
    .:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.43.0.10
        forward . __PILLAR__UPSTREAM__SERVERS__ {
                force_tcp
        }
        prometheus :9253
        }
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: rke2-coredns
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2021-09-23T18:28:18Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    app.kubernetes.io/managed-by: Helm
  name: node-local-dns
  namespace: kube-system
  resourceVersion: "642"
  uid: c5a6b6af-8d4f-4ff8-b694-e8747bfed77d
  1. There was a new interface on the node with the expected values:
$ ip addr
...
16: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether c2:1c:2d:03:67:b1 brd ff:ff:ff:ff:ff:ff
    inet 169.254.20.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever
    inet 10.43.0.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever
  1. Ran a dummy pod: kubectl run tester --image=ranchertest/mytestcontainer and a simple deployment with service:
apiVersion: v1
kind: Service
metadata:
  name: busyb
spec:
  selector:
    app: busy
  clusterIP: None
  ports:
  - name: foo # Actually, no port is needed.
    port: 1234
    targetPort: 1234
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busydep
spec:
  replicas: 3
  selector:
    matchLabels:
      app: busy
  template:
    metadata:
      labels:
        app: busy
    spec:
      containers:
        - name: busybox
          image: busybox:1.28
          command:
            - sleep
            - "3600"
  1. From the dummy pod, dns resolution and pings are successful:
$ kubectl exec -it tester -- /bin/bash
# nslookup busyb.default.svc.cluster.local 
Server:		10.43.0.10
Address:	10.43.0.10#53

Name:	busyb.default.svc.cluster.local
Address: 10.42.0.12
Name:	busyb.default.svc.cluster.local
Address: 10.42.0.10
Name:	busyb.default.svc.cluster.local
Address: 10.42.0.11

#  ping busyb.default.svc.cluster.local
PING busyb.default.svc.cluster.local (10.42.0.12) 56(84) bytes of data.
64 bytes from 10-42-0-12.busyb.default.svc.cluster.local (10.42.0.12): icmp_seq=1 ttl=63 time=0.083 ms
64 bytes from 10-42-0-12.busyb.default.svc.cluster.local (10.42.0.12): icmp_seq=2 ttl=63 time=0.055 ms
64 bytes from 10-42-0-12.busyb.default.svc.cluster.local (10.42.0.12): icmp_seq=3 ttl=63 time=0.048 ms


# ping google.com
PING google.com (142.251.33.78) 56(84) bytes of data.
64 bytes from sea09s28-in-f14.1e100.net (142.251.33.78): icmp_seq=1 ttl=89 time=8.48 ms
64 bytes from sea09s28-in-f14.1e100.net (142.251.33.78): icmp_seq=2 ttl=89 time=8.33 ms
64 bytes from sea09s28-in-f14.1e100.net (142.251.33.78): icmp_seq=3 ttl=89 time=8.29 ms

# nslookup rke2-metrics-server.kube-system.svc.cluster.local
Server:		10.43.0.10
Address:	10.43.0.10#53

Name:	rke2-metrics-server.kube-system.svc.cluster.local
Address: 10.43.231.21

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/dev-validation Dev will be validating this issue kind/rke1-feature-parity
Projects
No open projects
Development [DEPRECATED]
Done Issue / Merged PR
Development

No branches or pull requests

7 participants