Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns-controller: allow it to run on CNI networking mode and remove dependency on kube-proxy #8131

Merged
merged 1 commit into from
Dec 28, 2019

Conversation

rochacon
Copy link
Contributor

Problem

When booting a cluster with --networking=cni, dns-controller will not start due to the master node being tainted as "network unreachable". This adds an extra step when managing your own CNI setup, having to SSH into a master and publish the CNI manifests from there.

Desired behavior

An operator should expect kops to provide a managable cluster by the end of kops update cluster --yes step, even if managing the CNI externally, so the next step is a natural "kubectl apply -f my-cni.yaml".

Proposal

This PR adds tolerance and configuration that allows dns-controller pod to start when running with --networking=cni, properly creating the DNS records so the operator can remotely publish the CNI and extra manifests to have a full working cluster.

This also removes the dependency on kube-proxy, by adding the KUBERNETES_SERVICE_HOST environment variable, bypassing kube-proxy when disabled.

Presumably, as a side-effect, this change also allows for "host network only" clusters to work.

More context

I'm currently running a kube-proxyless Cilum setup, and due to the dns-controller failing to start I had to script around kops to SSH and publish the CNI setup from the master node itself.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 17, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @rochacon. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 17, 2019
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 17, 2019
@rochacon rochacon force-pushed the cniless-dns-controller branch 2 times, most recently from 660eefd to 67c9e03 Compare December 17, 2019 03:58
@olemarkus
Copy link
Member

Kube-router is also running without kube-proxy. Could be an idea to look at the code paths that makes this cni work.

@rochacon
Copy link
Contributor Author

@olemarkus the main difference is that with kube-router integration kops will manage the CNI, leading to a working cluster by the end of provisioning, so no extra steps are required.
The networking=cni option informs kops to not manage the CNI settings, leading to the mentioned issues.

I also used the networking=cni approach while maintaining a PCI compliant cluster, to improve control of the security groups and CNI version (at the time we used Calico 3.8).

@olemarkus
Copy link
Member

We are also using cilium with networking=cni, but still using kube-proxy. Just to confirm, the issue you mention above is only when using cilium without kube-proxy, not when using networking=cni and deploying cilium with the more default settings.

I want to add support for running cilium without kube-proxy managed by kops, but hadn't had much time to look at it yet.

@rochacon
Copy link
Contributor Author

rochacon commented Dec 17, 2019

It is both @olemarkus.

The tolerations change is so dns-controller starts on a master while in the NotReady state (caused by the lack of CNI configuration) [1]

The KUBERNETES_SERVICE_HOST is so dns-controller's k8s.io/client-go skips the DNS resolution and connect to the local Kube API Server (this supports the kube-proxyless approach, since there is no kube-proxy running and no CoreDNS at this stage, that also depends on the CNI).

[1] This is the node and pod list after launching the cluster with networking=cni, notice the dns-controller pod in the Pending state:

$ kubectl get --all-namespaces -owide no,po
NAME                              STATUS     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP      OS-IMAGE                                        KERNEL-VERSION   CONTAINER-RUNTIME
node/ip-10-42-1-73.ec2.internal   NotReady   master   2m45s   v1.15.7   10.42.1.73    34.200.234.230   Container Linux by CoreOS 2303.3.0 (Rhyolite)   4.19.86-coreos   docker://18.6.3

NAMESPACE     NAME                                                     READY   STATUS    RESTARTS   AGE     IP           NODE                         NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-86766bdccb-284jf                             0/1     Pending   0          2m45s   <none>       <none>                       <none>           <none>
kube-system   pod/coredns-autoscaler-c9cfff856-x6jn7                   0/1     Pending   0          2m44s   <none>       <none>                       <none>           <none>
kube-system   pod/dns-controller-597cb8d48d-ddp9t                      0/1     Pending   0          2m47s   <none>       <none>                       <none>           <none>
kube-system   pod/etcd-manager-events-ip-10-42-1-73.ec2.internal       1/1     Running   0          111s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/etcd-manager-main-ip-10-42-1-73.ec2.internal         1/1     Running   0          112s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-apiserver-ip-10-42-1-73.ec2.internal            1/1     Running   3          2m11s   10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-controller-manager-ip-10-42-1-73.ec2.internal   1/1     Running   0          117s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-proxy-ip-10-42-1-73.ec2.internal                1/1     Running   0          113s    10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>
kube-system   pod/kube-scheduler-ip-10-42-1-73.ec2.internal            1/1     Running   0          99s     10.42.1.73   ip-10-42-1-73.ec2.internal   <none>           <none>

[2] Node not ready due to lack of CNI configuration:

$ kubectl describe no  -l node-role.kubernetes.io/master | grep Ready
  Ready            False   Tue, 17 Dec 2019 16:59:09 +0000   Tue, 17 Dec 2019 16:56:09 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

edit: added more info

@rochacon
Copy link
Contributor Author

Extra note: k8s.io/client-go in-cluster configuration depends both on KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables, both are injected automatically in the pod, but we override the KUBERNETES_SERVICE_HOST to skip the DNS call.

@olemarkus
Copy link
Member

Just out of curiousity, which AMI are you using with this? enable-node-port requires a newer image than kops default.

@olemarkus
Copy link
Member

When you have the three NotReady masters, the API (obviously) is available so you can simply deploy the cilium resources. Then the nodes become Ready and eventually the rest of the nodes join the cluster. There is no need to change taints or anything for that to work.

@rochacon
Copy link
Contributor Author

I'm using CoreOS 2303.3.0, latest stable.

The API is available but only from one of the masters servers, since without dns-controller the master endpoint will be pointing to the placeholder address: 203.0.113.123. I believe this is only true when not using a load balancer in front of the Kubernetes API, which is also part of my use-case (small cluster, single master and no master load balancer).

Since dns-controller also controls the etcd endpoints, I believe full etcd quorum won't be available when running a multi-master setup, until dns-controller starts (but haven't tested this).

@olemarkus
Copy link
Member

dns-controller only sets the api.internal domain. protokube will handle etcd and the external api.
I can start a cluster with networking=cni and I can access the API locally and dns-controller is still pending.
If you have as single-master without a load balancer, I am not sure what happens. I didn't think that was optional.

I'll see if I can get enable-node-port working with kops-managed cilium and coreos sometime soon though.

@rochacon
Copy link
Contributor Author

When running with a LB for the masters, Kops creates the DNS record during the kops update cluster [1] step. When running without a master LB, dns-controller is the one who creates the record [2], following the kube-apiserver pod annotations [3], since the master runs in a ASG and may be replaced.

Without this patch the cluster stays with the public/external DNS record pointing to the placeholder [4], resulting in the need to SSH into the master to publish the CNI settings.

[1]

% kops update cluster kopscluster.example.com
# ...truncated...
  DNSName/api.kopscluster.example.com
  	Zone                	name:ZAAAAAAAAAAAAA id:ZAAAAAAAAAAAAA
  	ResourceType        	A
  	TargetLoadBalancer  	name:api.kopscluster.example.com id:api.kopscluster.example.com
# ...truncated...

[2]

% ssh 10.0.1.225 -- kubectl logs -n kube-system -l k8s-app=dns-controller 
I1218 23:37:02.931582       1 dnscontroller.go:651] Update desired state: pod/kube-system/kube-apiserver-ip-10.0-1-225.ec2.internal: [{_alias api.kopscluster.example.com. node/ip-10.0-1-225.ec2.internal/external false}]
I1218 23:37:06.207122       1 dnscontroller.go:260] Using default TTL of 1m0s
I1218 23:37:06.207154       1 dnscontroller.go:462] Querying all dnsprovider records for zone "kopscluster.example.com."
I1218 23:37:06.242522       1 dnscontroller.go:611] Adding DNS changes to batch {A api.kopscluster.example.com.} [18.215.62.143]
I1218 23:37:06.242551       1 dnscontroller.go:305] applying DNS changeset for zone kopscluster.example.com.::ZAAAAAAAAAAAAA
I1218 23:37:07.936026       1 dnscontroller.go:651] Update desired state: pod/kube-system/kube-apiserver-ip-10.0-1-225.ec2.internal: [{_alias api.kopscluster.example.com. node/ip-10.0-1-225.ec2.internal/external false} {A api.internal.kopscluster.example.com. 10.0.1.225 false}]
I1218 23:37:11.285091       1 dnscontroller.go:260] Using default TTL of 1m0s
I1218 23:37:11.285129       1 dnscontroller.go:462] Querying all dnsprovider records for zone "kopscluster.example.com."
I1218 23:37:11.309563       1 dnscontroller.go:611] Adding DNS changes to batch {A api.internal.kopscluster.example.com.} [10.0.1.225]
I1218 23:37:11.309655       1 dnscontroller.go:305] applying DNS changeset for zone kopscluster.example.com.::ZAAAAAAAAAAAAA

[3]

$ kubectl describe -n kube-system po kube-apiserver-ip-10.0-1-225.ec2.internal | grep dns
Annotations:          dns.alpha.kubernetes.io/external: api.kopscluster.example.com
                      dns.alpha.kubernetes.io/internal: api.internal.kopscluster.example.com

[4]

λ dig +short api.kopscluster.example.com
203.0.113.123

@rochacon
Copy link
Contributor Author

This is where we currently add the annotation for this kind of setup:

if b.Cluster.Spec.API.DNS != nil {
annotations["dns.alpha.kubernetes.io/external"] = b.Cluster.Spec.MasterPublicName
}

@johngmyers
Copy link
Member

Looks like you need to run hack/update-expected.sh

When booting a cluster with `--networking=cni`, `dns-controller` will
not start due to the master node being _tainted_ as "network unreachable".
This adds an extra step when managing your own CNI setup, having to SSH
into a master and publish the CNI manifests from there.

This commit adds tolerance and configuration that allows `dns-controller`
pod to start when running with `--networking=cni`, properly creating the
DNS records so the operator can remotely publish the CNI and extra
manifests to have a full working cluster.

This also removes the dependency on `kube-proxy`, by adding the
`KUBERNETES_SERVICE_HOST` environment variable, bypassing `kube-proxy`
when disabled.

Presumably, as a side-effect, this change also allows for
"host network only" clusters to work.

Signed-off-by: Rodrigo Chacon <rochacon@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 19, 2019
@rochacon
Copy link
Contributor Author

rochacon commented Dec 19, 2019

Thanks @johngmyers, updated the test artifacts, tests should pass now.

@justinsb
Copy link
Member

This looks great - thanks @rochacon

There is the risk that we start before something else is ready, but if we encounter such a thing we should be able to tweak dns-controller to tolerate it.

The nodeSelector will continue to keep this running on the master. Moreover this should help things start a little bit faster - dns-controller is very much in the critical path for cluster bringup.

/approve
/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, rochacon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 28, 2019
env:
- name: KUBERNETES_SERVICE_HOST
value: "127.0.0.1"
{{- if .EgressProxy }}
{{ range $name, $value := ProxyEnv }}
Copy link
Member

@justinsb justinsb Dec 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you've refactored this, I'm wondering if we need the "if .EgressProxy" guard. For another PR though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the if below to add the new environment variable by default. There was actually a bug here, since if using DigitalOcean and EgressProxy together a duplicated env: key would be rendered.
Given the range on ProxyEnv, I'm not sure we need the if either, but didn't want to change this right now, I'm not familiar with this feature.

@rifelpet
Copy link
Member

/test pull-kops-e2e-kubernetes-aws

@k8s-ci-robot k8s-ci-robot merged commit 423233c into kubernetes:master Dec 28, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 28, 2019
@rochacon rochacon deleted the cniless-dns-controller branch December 28, 2019 08:41
@k8s-ci-robot
Copy link
Contributor

@rochacon: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kops-e2e-kubernetes-aws e449467 link /test pull-kops-e2e-kubernetes-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants