Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-dns should support custom tolerations #57659

Closed
JeanMertz opened this issue Dec 27, 2017 · 21 comments
Closed

kube-dns should support custom tolerations #57659

JeanMertz opened this issue Dec 27, 2017 · 21 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@JeanMertz
Copy link

/kind feature

(note: we're not using the latest Kubernetes version yet, but I couldn't find any changes related to this in the latest updates, so please close this if I simply updating will fix this for us)

Currently, when you edit a kube-dns deployment, some changes are persisted (f.e. adding a nodeAffinity config block), while others are reverted (f.e. adding a second toleration to the already existing CriticalAddonsOnly toleration).

The last example is problematic for us (running on GKE).

We have two node pools:

  • a regular node pool with a taint applied to the nodes to only allow pods to run on them that we manually approve
  • a preemptible node pool without any taints

We want kube-dns (and any other critical pods) to run on the regular nodes for better uptime (see also: #41125), so we update its deployment with the following blocks:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: type
          operator: In
          values:
          - regular
tolerations:
- key: CriticalAddonsOnly # this one is already present by default
  operator: Exists
- key: dedicated # we add this one to allow scheduling on the tainted regular nodes
  operator: Equal
  value: regular
  effect: NoSchedule

After saving this change, for a brief moment, the pods are scheduled as expected, but moments after, the change is reverted, and specifically the added toleration is removed, the nodeAffinity is still there however.

Can we make toleration changes persistent and not be reverted by the API itself?

Environment:

  • Kubernetes version (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T21:08:42Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.3-gke.0", GitCommit:"86d3ac5eaf57223302c95e7d9fc1aeff55fb0c15", GitTreeState:"clean", BuildDate:"2017-11-08T21:42:58Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration: GKE

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/feature Categorizes issue or PR as related to a new feature. labels Dec 27, 2017
@JeanMertz
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 27, 2017
@thockin
Copy link
Member

thockin commented Jan 3, 2018

Unfortunately, there's not really a way to have a field that takes values from two different sources. In this case, the addon's base YAML (managed by GKE) sets one toleration and you are trying to set another. The patch logic will throw your changes out periodically, when the addon is re-asserted, as you are experiencing. Once we are using priority and preemption, we won't need this toleration. In fact, I can not convince myself it is being used at all.

@bsalamat or @vishh do you have any context?

@bsalamat
Copy link
Member

bsalamat commented Jan 3, 2018

I agree with @thockin that GKE and your deployment race against one another in setting tolerations for kube-dns. One solution that comes to my mind is to set specific labels to the nodes in each of the node pools and create all your pods with node affinity to one of the two labels depending on where you want them to run. For example, all the nodes in your first node pool have "pool:groupA" label and all the nodes in the second pool have "pool:groupB". Then all the pods created in your cluster will have node affinity to one of the two labels depending on where you want them to run. Your kube-dns will have node affinity to "pool:groupA". By doing this, you will get rid of the node taints and the pod tolerations.

As @thockin also mentioned, priority and preemption remove the need to have critical pods and CriticalAddonsOnly taint, but the feature is in alpha and is not enabled in GKE. Our plan is to move it to Beta in the next release.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2018
@JeanMertz
Copy link
Author

/lifecycle frozen
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2018
@fotinakis
Copy link

Any updates on this issue? We'd love to apply tolerations/affinity to kube-dns in a proper and maintainable way.

@Glennvd
Copy link

Glennvd commented Jun 27, 2018

+1 for this, at the moment it's not possible to add pre-emptible pools to your GKE cluster if you want to run any other service reliably on it. Since you can give your essential pods a nodeAffinity but not the kube-system pods it relies on (like kube-dns).

@bgrant0607
Copy link
Member

Ref #23233

@thockin
Copy link
Member

thockin commented Jul 3, 2018

@Glennvd You can create a node pool with a taint - any pod that doesn't explicitly tolerate that taint will avoid those nodes.

@matti
Copy link

matti commented Sep 25, 2018

@thockin but doing that causes "Does not have minimum availability" for kube-dns, because apparently once GKE sees more nodes it changes replicas: 2

@pajel
Copy link

pajel commented Nov 12, 2018

Related issue #kubernetes/kops#6063

@bgrant0607
Copy link
Member

cc @bowei @justinsb

@ahsan-storm
Copy link

Any update on adding tolerations on kube-dns via kops?

@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@cgsimmons
Copy link

Also would like an update for this issue. We currently experience dns latency because we have a large tainted node-pool (relative to untainted) that we cannot run dns pods on. Thus traffic is constantly routed for the dns service to nodes with no kube-dns pods.

@freehan freehan removed the triage/unresolved Indicates an issue that can not or will not be resolved. label May 16, 2019
@jagadishg
Copy link

Any update on this issue? Our node pool has all nodes tainted. But coz of this issue, it has become necessary for us to have node pools with untainted nodes. We can't have untainted node just to run this pod. Ideally, we would like to be able to add custom tolerations to kube-dns, kube-dns-autoscaler, heapster, so we can run them on tainted nodes

@lutierigb
Copy link

lutierigb commented Feb 5, 2020

priority and preemption is GA since k8s 1.14 and in GKE(checked in 1.15) we even have two PriorityClass resources for critical workloads in the cluster and one of them already set in the kube-dns deployment.

isn't time to get rid of the toleration in in the kube-dns addon ? @thockin @bsalamat

@BouweCeunen
Copy link

would love to see this implemented so I can set some tolerations on those pods.

@VengefulAncient
Copy link

Please look into this issue, it's been years. kube-dns is already quite problematic in smaller clusters due to its high CPU requests and pod anti-affinity not really working, resulting in it choking out workloads from smaller nodes, and not having custom tolerations means it can't even be pinned to a node where it won't affect anything.

@aojea
Copy link
Member

aojea commented Oct 25, 2021

As of Kubernetes v1.12, CoreDNS is the recommended DNS Server, replacing kube-dns.

@VengefulAncient
Copy link

kube-dns is still used by GKE even on the latest Kubernetes version. This is not a solution.

@aojea
Copy link
Member

aojea commented Oct 25, 2021

point is that is a GKE issue, not a kubernetes or sig-network issue 😅
/remove sig-network

@thockin thockin closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests