dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

rochacon · 2019-12-17T03:57:05Z

Problem

When booting a cluster with --networking=cni, dns-controller will not start due to the master node being tainted as "network unreachable". This adds an extra step when managing your own CNI setup, having to SSH into a master and publish the CNI manifests from there.

Desired behavior

An operator should expect kops to provide a managable cluster by the end of kops update cluster --yes step, even if managing the CNI externally, so the next step is a natural "kubectl apply -f my-cni.yaml".

Proposal

This PR adds tolerance and configuration that allows dns-controller pod to start when running with --networking=cni, properly creating the DNS records so the operator can remotely publish the CNI and extra manifests to have a full working cluster.

This also removes the dependency on kube-proxy, by adding the KUBERNETES_SERVICE_HOST environment variable, bypassing kube-proxy when disabled.

Presumably, as a side-effect, this change also allows for "host network only" clusters to work.

More context

I'm currently running a kube-proxyless Cilum setup, and due to the dns-controller failing to start I had to script around kops to SSH and publish the CNI setup from the master node itself.

k8s-ci-robot · 2019-12-17T03:57:13Z

Hi @rochacon. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

olemarkus · 2019-12-17T05:52:18Z

Kube-router is also running without kube-proxy. Could be an idea to look at the code paths that makes this cni work.

rochacon · 2019-12-17T14:58:52Z

@olemarkus the main difference is that with kube-router integration kops will manage the CNI, leading to a working cluster by the end of provisioning, so no extra steps are required.
The networking=cni option informs kops to not manage the CNI settings, leading to the mentioned issues.

I also used the networking=cni approach while maintaining a PCI compliant cluster, to improve control of the security groups and CNI version (at the time we used Calico 3.8).

olemarkus · 2019-12-17T15:45:03Z

We are also using cilium with networking=cni, but still using kube-proxy. Just to confirm, the issue you mention above is only when using cilium without kube-proxy, not when using networking=cni and deploying cilium with the more default settings.

I want to add support for running cilium without kube-proxy managed by kops, but hadn't had much time to look at it yet.

rochacon · 2019-12-17T17:01:13Z

It is both @olemarkus.

The tolerations change is so dns-controller starts on a master while in the NotReady state (caused by the lack of CNI configuration) [1]

The KUBERNETES_SERVICE_HOST is so dns-controller's k8s.io/client-go skips the DNS resolution and connect to the local Kube API Server (this supports the kube-proxyless approach, since there is no kube-proxy running and no CoreDNS at this stage, that also depends on the CNI).

[1] This is the node and pod list after launching the cluster with networking=cni, notice the dns-controller pod in the Pending state:

$ kubectl get --all-namespaces -owide no,po
NAME                              STATUS     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP      OS-IMAGE                                        KERNEL-VERSION   CONTAINER-RUNTIME
node/ip-10-42-1-73.ec2.internal   NotReady   master   2m45s   v1.15.7   10.42.1.73    34.200.234.230   Container Linux by CoreOS 2303.3.0 (Rhyolite)   4.19.86-coreos   docker://18.6.3

NAMESPACE     NAME                                                     READY   STATUS    RESTARTS   AGE     IP           NODE                         NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-86766bdccb-284jf                             0/1     Pending   0          2m45s   <none>       <none>                       <none>           <none>
kube-system   pod/coredns-autoscaler-c9cfff856-x6jn7                   0/1     Pending   0          2m44s   <none>       <none>                       <none>           <none>
kube-system   pod/dns-controller-597cb8d48d-ddp9t                      0/1     Pending   0          2m47s   <none>       <none>                       <none>           <none>
kube-system   pod/etcd-manager-events-ip-10-42-1-73.ec2.internal       1/1     Running   0          111s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/etcd-manager-main-ip-10-42-1-73.ec2.internal         1/1     Running   0          112s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-apiserver-ip-10-42-1-73.ec2.internal            1/1     Running   3          2m11s   10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-controller-manager-ip-10-42-1-73.ec2.internal   1/1     Running   0          117s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-proxy-ip-10-42-1-73.ec2.internal                1/1     Running   0          113s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-scheduler-ip-10-42-1-73.ec2.internal            1/1     Running   0          99s     10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>

[2] Node not ready due to lack of CNI configuration:

$ kubectl describe no  -l node-role.kubernetes.io/master | grep Ready
  Ready            False   Tue, 17 Dec 2019 16:59:09 +0000   Tue, 17 Dec 2019 16:56:09 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

edit: added more info

rochacon · 2019-12-17T17:14:51Z

Extra note: k8s.io/client-go in-cluster configuration depends both on KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables, both are injected automatically in the pod, but we override the KUBERNETES_SERVICE_HOST to skip the DNS call.

olemarkus · 2019-12-17T18:12:33Z

Just out of curiousity, which AMI are you using with this? enable-node-port requires a newer image than kops default.

olemarkus · 2019-12-17T18:42:51Z

When you have the three NotReady masters, the API (obviously) is available so you can simply deploy the cilium resources. Then the nodes become Ready and eventually the rest of the nodes join the cluster. There is no need to change taints or anything for that to work.

rochacon · 2019-12-17T18:50:54Z

I'm using CoreOS 2303.3.0, latest stable.

The API is available but only from one of the masters servers, since without dns-controller the master endpoint will be pointing to the placeholder address: 203.0.113.123. I believe this is only true when not using a load balancer in front of the Kubernetes API, which is also part of my use-case (small cluster, single master and no master load balancer).

Since dns-controller also controls the etcd endpoints, I believe full etcd quorum won't be available when running a multi-master setup, until dns-controller starts (but haven't tested this).

olemarkus · 2019-12-17T20:50:41Z

dns-controller only sets the api.internal domain. protokube will handle etcd and the external api.
I can start a cluster with networking=cni and I can access the API locally and dns-controller is still pending.
If you have as single-master without a load balancer, I am not sure what happens. I didn't think that was optional.

I'll see if I can get enable-node-port working with kops-managed cilium and coreos sometime soon though.

rochacon · 2019-12-18T23:54:27Z

When running with a LB for the masters, Kops creates the DNS record during the kops update cluster [1] step. When running without a master LB, dns-controller is the one who creates the record [2], following the kube-apiserver pod annotations [3], since the master runs in a ASG and may be replaced.

Without this patch the cluster stays with the public/external DNS record pointing to the placeholder [4], resulting in the need to SSH into the master to publish the CNI settings.

[1]

% kops update cluster kopscluster.example.com
# ...truncated...
  DNSName/api.kopscluster.example.com
  	Zone                	name:ZAAAAAAAAAAAAA id:ZAAAAAAAAAAAAA
  	ResourceType        	A
  	TargetLoadBalancer  	name:api.kopscluster.example.com id:api.kopscluster.example.com
# ...truncated...

[2]

% ssh 10.0.1.225 -- kubectl logs -n kube-system -l k8s-app=dns-controller 
I1218 23:37:02.931582       1 dnscontroller.go:651] Update desired state: pod/kube-system/kube-apiserver-ip-10.0-1-225.ec2.internal: [{_alias api.kopscluster.example.com. node/ip-10.0-1-225.ec2.internal/external false}]
I1218 23:37:06.207122       1 dnscontroller.go:260] Using default TTL of 1m0s
I1218 23:37:06.207154       1 dnscontroller.go:462] Querying all dnsprovider records for zone "kopscluster.example.com."
I1218 23:37:06.242522       1 dnscontroller.go:611] Adding DNS changes to batch {A api.kopscluster.example.com.} [18.215.62.143]
I1218 23:37:06.242551       1 dnscontroller.go:305] applying DNS changeset for zone kopscluster.example.com.::ZAAAAAAAAAAAAA
I1218 23:37:07.936026       1 dnscontroller.go:651] Update desired state: pod/kube-system/kube-apiserver-ip-10.0-1-225.ec2.internal: [{_alias api.kopscluster.example.com. node/ip-10.0-1-225.ec2.internal/external false} {A api.internal.kopscluster.example.com. 10.0.1.225 false}]
I1218 23:37:11.285091       1 dnscontroller.go:260] Using default TTL of 1m0s
I1218 23:37:11.285129       1 dnscontroller.go:462] Querying all dnsprovider records for zone "kopscluster.example.com."
I1218 23:37:11.309563       1 dnscontroller.go:611] Adding DNS changes to batch {A api.internal.kopscluster.example.com.} [10.0.1.225]
I1218 23:37:11.309655       1 dnscontroller.go:305] applying DNS changeset for zone kopscluster.example.com.::ZAAAAAAAAAAAAA

[3]

$ kubectl describe -n kube-system po kube-apiserver-ip-10.0-1-225.ec2.internal | grep dns
Annotations:          dns.alpha.kubernetes.io/external: api.kopscluster.example.com
                      dns.alpha.kubernetes.io/internal: api.internal.kopscluster.example.com

[4]

λ dig +short api.kopscluster.example.com
203.0.113.123

rochacon · 2019-12-18T23:56:37Z

This is where we currently add the annotation for this kind of setup:

kops/nodeup/pkg/model/kube_apiserver.go

Lines 515 to 517 in a21bdb2

    
           if b.Cluster.Spec.API.DNS != nil { 
        
           	annotations["dns.alpha.kubernetes.io/external"] = b.Cluster.Spec.MasterPublicName 
        
           }

johngmyers · 2019-12-19T01:51:57Z

Looks like you need to run hack/update-expected.sh

When booting a cluster with `--networking=cni`, `dns-controller` will not start due to the master node being _tainted_ as "network unreachable". This adds an extra step when managing your own CNI setup, having to SSH into a master and publish the CNI manifests from there. This commit adds tolerance and configuration that allows `dns-controller` pod to start when running with `--networking=cni`, properly creating the DNS records so the operator can remotely publish the CNI and extra manifests to have a full working cluster. This also removes the dependency on `kube-proxy`, by adding the `KUBERNETES_SERVICE_HOST` environment variable, bypassing `kube-proxy` when disabled. Presumably, as a side-effect, this change also allows for "host network only" clusters to work. Signed-off-by: Rodrigo Chacon <rochacon@gmail.com>

rochacon · 2019-12-19T05:17:46Z

Thanks @johngmyers, updated the test artifacts, tests should pass now.

justinsb · 2019-12-28T02:12:34Z

This looks great - thanks @rochacon

There is the risk that we start before something else is ready, but if we encounter such a thing we should be able to tweak dns-controller to tolerate it.

The nodeSelector will continue to keep this running on the master. Moreover this should help things start a little bit faster - dns-controller is very much in the critical path for cluster bringup.

/approve
/lgtm
/retest

k8s-ci-robot · 2019-12-28T02:12:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, rochacon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [justinsb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

justinsb · 2019-12-28T02:13:49Z

upup/models/cloudup/resources/addons/dns-controller.addons.k8s.io/k8s-1.12.yaml.template

        env:
+        - name: KUBERNETES_SERVICE_HOST
+          value: "127.0.0.1"
+{{- if .EgressProxy }}
 {{ range $name, $value := ProxyEnv }}


Now you've refactored this, I'm wondering if we need the "if .EgressProxy" guard. For another PR though!

I moved the if below to add the new environment variable by default. There was actually a bug here, since if using DigitalOcean and EgressProxy together a duplicated env: key would be rendered.
Given the range on ProxyEnv, I'm not sure we need the if either, but didn't want to change this right now, I'm not familiar with this feature.

rifelpet · 2019-12-28T04:57:08Z

/test pull-kops-e2e-kubernetes-aws

k8s-ci-robot · 2019-12-29T10:28:12Z

@rochacon: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kops-e2e-kubernetes-aws	`e449467`	link	`/test pull-kops-e2e-kubernetes-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 17, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 17, 2019

k8s-ci-robot requested review from joshbranham and robinpercy December 17, 2019 03:57

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 17, 2019

rochacon force-pushed the cniless-dns-controller branch 2 times, most recently from 660eefd to 67c9e03 Compare December 17, 2019 03:58

rochacon force-pushed the cniless-dns-controller branch from 67c9e03 to e449467 Compare December 19, 2019 05:16

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 19, 2019

k8s-ci-robot assigned justinsb Dec 28, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 28, 2019

justinsb reviewed Dec 28, 2019

View reviewed changes

k8s-ci-robot merged commit 423233c into kubernetes:master Dec 28, 2019

k8s-ci-robot added this to the v1.18 milestone Dec 28, 2019

rochacon deleted the cniless-dns-controller branch December 28, 2019 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

rochacon commented Dec 17, 2019

k8s-ci-robot commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019 •

edited

Loading

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 18, 2019

rochacon commented Dec 18, 2019

johngmyers commented Dec 19, 2019

rochacon commented Dec 19, 2019 •

edited

Loading

justinsb commented Dec 28, 2019

k8s-ci-robot commented Dec 28, 2019

justinsb Dec 28, 2019 •

edited

Loading

rochacon Dec 28, 2019

rifelpet commented Dec 28, 2019

k8s-ci-robot commented Dec 29, 2019

dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

Conversation

rochacon commented Dec 17, 2019

Problem

Desired behavior

Proposal

More context

k8s-ci-robot commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019 • edited Loading

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 17, 2019

olemarkus commented Dec 17, 2019

rochacon commented Dec 18, 2019

rochacon commented Dec 18, 2019

johngmyers commented Dec 19, 2019

rochacon commented Dec 19, 2019 • edited Loading

justinsb commented Dec 28, 2019

k8s-ci-robot commented Dec 28, 2019

justinsb Dec 28, 2019 • edited Loading

Choose a reason for hiding this comment

rochacon Dec 28, 2019

Choose a reason for hiding this comment

rifelpet commented Dec 28, 2019

k8s-ci-robot commented Dec 29, 2019

rochacon commented Dec 17, 2019 •

edited

Loading

rochacon commented Dec 19, 2019 •

edited

Loading

justinsb Dec 28, 2019 •

edited

Loading