Add CoreDNS "soft" nodeAffinity for controller nodes #188

dghubble · 2020-05-21T03:41:20Z

Add nodeAffinity to CoreDNS deployment PodSpec to prefer running CoreDNS pods on controllers, while relying on podAntiAffinity for spreading.
For single master clusters, running two CoreDNS pods on the master or running one pod on a worker is permissible.
Note: Its still possible to end up with CoreDNS pods all running on workers since we only express scheduling preference ("soft"), but unlikely. Plus the motivating scenario (below) is also rare.

Background:

CoreDNS replicas are set to the higher of 2 or the number of control plane nodes to (at a minimum) support Deployment updates or pod restarts and match the cluster size (e.g. 5 master/controller nodes likely means a larger cluster, so run 5 CoreDNS replicas)
In the past (before v1.14), we required kube-dns (CoreOS predecessor) to run CoreDNS pods on master nodes. With CoreDNS this node selection was relaxed. We'd like a gentler form of it now.

Motivation:

On clusters using 100% preemptible/spot workers, it is possible that CoreDNS pods schedule to workers that are all preempted at the same time, causing a loss of cluster internal DNS service until a CoreDNS pod reschedules (1 min). We'd like CoreDNS to prefer controller/master nodes (which aren't preempted) to reduce the possibility of control plane disruption

* Add nodeAffinity to CoreDNS deployment PodSpec to prefer running CoreDNS pods on controllers, while relying on podAntiAffinity for spreading. * For single master clusters, running two CoreDNS pods on the master or running one pod on a worker is permissible. * Note: Its still _possible_ to end up with CoreDNS pods all running on workers since we only express scheduling preference ("soft"), but unlikely. Plus the motivating scenario (below) is also rare. Background: * CoreDNS replicas are set to the higher of 2 or the number of control plane nodes to (at a minimum) support Deployment updates or pod restarts and match the cluster size (e.g. 5 master/controller nodes likely means a larger cluster, so run 5 CoreDNS replicas) * In the past (before v1.14), we required kube-dns (CoreOS predecessor) to run CoreDNS pods on master nodes. With CoreDNS this node selection was relaxed. We'd like a gentler form of it now. Motivation: * On clusters using 100% preemptible/spot workers, it is possible that CoreDNS pods schedule to workers that are all preempted at the same time, causing a loss of cluster internal DNS service until a CoreDNS pod reschedules (1 min). We'd like CoreDNS to prefer controller/master nodes (which aren't preempted) to reduce the possibility of control plane disruption

dghubble merged commit a83ddbb into master May 21, 2020

dghubble deleted the coredns-node-affinity branch May 21, 2020 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CoreDNS "soft" nodeAffinity for controller nodes #188

Add CoreDNS "soft" nodeAffinity for controller nodes #188

dghubble commented May 21, 2020

Add CoreDNS "soft" nodeAffinity for controller nodes #188

Add CoreDNS "soft" nodeAffinity for controller nodes #188

Conversation

dghubble commented May 21, 2020