Improvement: fluentd-gcp to get same toleration as kube-proxy #44445

JorritSalverda · 2017-04-13T14:05:11Z

When adding a noexecute taint to a node most kube-system pods get evicted, except for kube-proxy because it has toleration =:Exists:NoExecute.

It would be nice if fluentd-gcp and heapster get the same =:Exists:NoExecute toleration to make sure they keep running on that node.

Currently in GKE 1.6.0 they have the following tolerations:

fluentd-gcp

  Tolerations:  
    node.alpha.kubernetes.io/ismaster=:NoSchedule
    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
    node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s

kube-proxy

  Tolerations:  
    =:Exists:NoExecute

heapster

  Tolerations:  
    CriticalAddonsOnly=:Exists
    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
    node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s

The text was updated successfully, but these errors were encountered:

JorritSalverda · 2017-04-15T12:48:58Z

Perhaps the even more liberal toleration below would be best for all 3, although there's no way to avoid those applications to get scheduled on your node in that case.

      tolerations:
      - operator: Exists

davidopp · 2017-05-05T07:29:18Z

@kubernetes/sig-scheduling-feature-requests
@piosz @vishh @gmarek

davidopp · 2017-05-05T07:32:48Z

I assume you mean "When adding a noexecute taint to a node"

FYI the node.alpha.kubernetes.io/{notReady,unreachable} tolerations don't do anything in 1.6 unless you have the alpha feature enabled (I forgot the exact flag).

But can you explain what behavior you want? You want fluentd and heapster are never evicted due to taints under any circumstances? I think that makes sense for fluentd since it runs on every node but not heapster.

gmarek · 2017-05-05T07:58:47Z

I'm not sure about heapster (@piosz), and the fact that fluentd doesn't have infinite toleration is a bug, that's getting fixed in #45349 - I hope I'll manage to put it into next patch release.

JorritSalverda · 2017-05-05T08:01:43Z

@davidopp I updated the typo (taint instead of toleration).

When filing this ticket I was still under the impression heapster was also a DaemonSet, but realized afterwards that it isn't. So yes, it only applies to fluentd.

gmarek · 2017-05-05T08:38:43Z

All DaemonSets shoud have infinite toleration, and we actually tried to do that, but failed. The fix is already merged in the head and I'm working on cherry-pick.

gmarek · 2017-05-05T08:55:45Z

Cherry-pick: #45401

nimeshksingh · 2017-05-10T21:22:34Z

I think I'm missing something, but how does #45349 add the =:Exists:NoExecute toleration to fluentd-gcp?

gmarek · 2017-05-11T08:26:24Z

It's not for all NoExecutes, only for ones that are created by NodeController (i.e. when Node is unresponsive). I don't think it's OK to add general toleration for all NoExecute taints, even user-specified ones. We do want user to be able to set a Taint that will remove all Pods from the given Node, even ones created by Daemons.

If you want your particular Daemon to tolerate all NoExecute Taints you can add this toleration to it's spec. I don't think that the system should do it automatically though. @kubernetes/sig-scheduling-feature-requests

nimeshksingh · 2017-05-11T16:57:46Z

Oh, my understanding of this issue was that currently, in GKE, it's not possible to use any user-defined NoExecute taints because fluentd-gcp gets evicted. fluentd-gcp is from a DaemonSet that users don't own, so we can't add arbitrary tolerations to it, but it is intended to be on every GKE node, just like kube-proxy.

davidopp · 2017-05-11T18:25:57Z

@nimeshksingh I guess your point is that user-added taints are not usable if users can't adjust the set of default tolerations on system-generated pods, which is the case on GKE. That seems like a valid point. We will eventually have user-configurable admission controller (kubernetes/community#132 will enable this) but that won't be soon.

I need to think about what is the best solution for this. For runs-on-every-node system pods we probably need to tolerate all taints by default. For runs-on-one-or-a-few-nodes system pods (like Heapster or DNS) I'm not sure what to do.

davidopp · 2017-05-11T18:46:09Z

cc/ @mikedanese

aveshagarwal · 2017-05-11T18:47:40Z

@davidopp I wonder why runs-on-one-or-a-few-nodes system pods would be different than runs-on-every-node system pods, as those few nodes could be any nodes, that means they would have to tolerate all taints by default too? Or in other case, if those few nodes nodes are some special nodes, then I dont think user defined taint would be allowed by default?

gmarek · 2017-05-11T20:15:57Z

We certainly need to discuss this, but I think that only problem that's left is dealing with system-pods, that user can't modify (e.g. fluentd on GKE @crassirostris). For this particular case I think we just need to add a 'tolerate all' Toleration to it and call it a day (we probably should do the same thing with kube-proxy and NPD if we didn't already - @bowei @dchen1107 @Random-Liu).

For "run one" things (like Heapster or DNS), we can't add them. System depends on them running reasonably well, and Taints will be used as a method for Pod evictions in case of Node problems. I.e. when Node dies you really want your DNS to die and be scheduled somewhere else. Tolerations currently don't support negations IIRC, so we can't specify to tolerate all Taints except ones created by NC. @kubernetes/sig-scheduling-misc

@JorritSalverda

Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715) Add general NoExecute Toleration to fluentd in gcp configuration Ref #44445 Once merged I'll create a cherry-pick that will be picked up in GKE together with the next patch release. cc @JorritSalverda @davidopp @aveshagarwal @nimeshksingh @piosz ```release-note fluentd will tolerate all NoExecute Taints when run in gcp configuration. ```

dchen1107 · 2017-05-26T19:56:16Z

re: #44445 (comment)
pr #43116 was applying taint tolerations for NoExecute for all static pods, including kube-proxy.

@davidopp

Automatic merge from submit-queue Add generic NoExecute Toleration to NPD Ref. #44445 cc @davidopp ```release-note Add generic Toleration for NoExecute Taints to NodeProblemDetector ```

nimeshksingh · 2017-06-27T22:39:12Z

I hate to keep this going, but somehow, even after k8s 1.6.6, which has pr #45715 and therefore the NoExecute toleration on fluentd-gcp, I ran into the following behavior:

Add new node. kube-proxy, fluentd-gcp, and my own daemonset pod with a universal 'Exists' toleration start running.
Taint the node with a NoSchedule effect:
kubectl taint nodes node/my-node-name mykey=myvalue:NoSchedule
The fluentd-gcp pod gets evicted, but kube-proxy and my own pod stays.
Remove the taint with:
kubectl taint nodes node/my-node-name mykey:NoSchedule-
The fluentd-gcp pod comes back.

If I understand the effects correctly, the NoSchedule taint should not affect the fluentd-gcp pod, as it was already running on the node, but it did for some reason. But, given that fluentd should probably come back if it somehow dies anyways, should it have a 'NoSchedule' toleration in addition to the 'NoExecute' toleration?

nimeshksingh · 2017-06-27T22:45:21Z

Just to clarify, using a NoExecute effect is a reasonable workaround for me, but the current behavior for NoSchedule still strikes me as strange.

davidopp · 2017-06-28T05:56:41Z

If I understand the effects correctly, the NoSchedule taint should not affect the fluentd-gcp pod, as it was already running on the node,

Your understanding is correct. Is the behavior you described repeatable? What PodStatus does the fluentd-gcp pod should when it is evicted? It would useful to see the kubelet and NodeController logs, I guess. Most likely the fluentd-gcp pod is getting evicted for some other reason. (Maybe you could try the same scenario as you described, but don't taint the node, and see if the fluentd-gcp pod gets evicted anyway.)

given that fluentd should probably come back if it somehow dies anyways, should it have a 'NoSchedule' toleration in addition to the 'NoExecute' toleration?

Yeah, it seems like fluentd-gcp should tolerate all taint effects, not just NoExecute.

cc/ @kubernetes/sig-cluster-lifecycle-bugs
cc/ @kubernetes/sig-scheduling-bugs

gmarek · 2017-06-28T07:47:37Z

This is actually strange - TaintController isn't even looking at 'NoSchedule' Taints. If adding 'NoSchedule' evicts Daemon, it sounds like DaemonSetController bug. @kubernetes/sig-apps-bugs

gmarek · 2017-06-28T08:32:14Z

I looked into such problem, and indeed it looks like DSC is deleting daemon:

I0628 07:56:56.634089       5 wrap.go:75] DELETE /api/v1/namespaces/kube-system/pods/fluentd-gcp-v2.0-n6lm7: (3.09306ms) 200 [[kube-controller-manager/v1.6.6 (linux/amd64) kubernetes/7fa1c17/system:serviceaccount:kube-system:daemon-set-controller] [::1]:36844]

So it's certainly DaemonSet bug - it shouldn't remove Daemon from the Node even if it doesn't tolerate NoSchedule taint that was put on it.

I'll write a quick-fix that add general NoSchedule toleration for fluentd.

…-fix for kubernetes#44445

mikedanese · 2017-06-28T08:46:04Z

This is a regression in the daemonset controller. See #46577 (comment)

gmarek · 2017-06-28T08:52:55Z

Assinged @erictune as a @kubernetes/sig-apps-bugs lead, but it's very likely that @mikedanese will fix it before Eric will wake up:)

Automatic merge from submit-queue (batch tested with PRs 48192, 48182) Add generic NoSchedule toleration to fluentd in gcp config as a quick… …-fix for #44445

davidopp · 2017-06-28T17:24:36Z

Thanks for the debugging, @gmarek !

nimeshksingh · 2017-06-28T18:38:56Z

Looks like this is already all figured out, so just let me know if anyone needs me to provide more info.

davidopp · 2017-06-28T18:44:06Z

Thanks a lot for reporting the problem, @nimeshksingh

…-fix for kubernetes#44445

mikedanese · 2017-11-30T19:08:46Z

The original issue was resolved along with another that got rolled up into this. What is unresolved:

@gmarek

I don't think it's OK to add general toleration for all NoExecute taints, even user-specified ones. We do want user to be able to set a Taint that will remove all Pods from the given Node, even ones created by Daemons.

@davidopp

For runs-on-every-node system pods we probably need to tolerate all taints by default. For runs-on-one-or-a-few-nodes system pods (like Heapster or DNS) I'm not sure what to do.

But these were side conversations to the original issue. If you guys think these are important to continue discussing, can you break out a dedicated issue?

@davidopp

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add wildcard tolerations to kube-proxy - Add wildcard tolerations to kube-proxy. - Add `nvidia.com/gpu` toleration to nvidia-gpu-device-plugin. Related to #55080 and #44445. /kind bug /priority critical-urgent /sig scheduling **Release note**: ```release-note kube-proxy addon tolerates all NoExecute and NoSchedule taints by default. ``` /assign @davidopp @bsalamat @vishh @jiayingz

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 5, 2017

JorritSalverda changed the title ~~Improvement: fluentd-gcp and heapster to get same toleration as kube-proxy~~ Improvement: fluentd-gcp to get same toleration as kube-proxy May 5, 2017

davidopp mentioned this issue May 11, 2017

Taint Based Eviction kubernetes/enhancements#166

Closed

gmarek mentioned this issue May 12, 2017

Add general NoExecute Toleration to fluentd in gcp configuration #45715

Merged

This was referenced May 12, 2017

Automated cherry pick of #45715 #45722

Merged

Add generic NoExecute Toleration to NPD #45883

Merged

davidopp mentioned this issue May 17, 2017

Support scheduling tolerating workloads on NotReady Nodes #45717

Closed

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. kind/bug Categorizes issue or PR as related to a bug. labels Jun 28, 2017

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. kind/bug Categorizes issue or PR as related to a bug. labels Jun 28, 2017

gmarek added a commit to gmarek/kubernetes that referenced this issue Jun 28, 2017

Add generic NoSchedule toleration to fluentd in gcp config as a quick…

3f57d8d

…-fix for kubernetes#44445

gmarek assigned erictune Jun 28, 2017

gmarek mentioned this issue Jun 28, 2017

Add generic NoSchedule toleration to fluentd in gcp config as a quick… #48182

Merged

mikedanese mentioned this issue Jun 28, 2017

DaemonSet controller doesn't handle NoSchedule taints correctly #48190

Closed

k8s-github-robot pushed a commit that referenced this issue Jun 28, 2017

Merge pull request #48182 from gmarek/fluentd

ec729ad

Automatic merge from submit-queue (batch tested with PRs 48192, 48182) Add generic NoSchedule toleration to fluentd in gcp config as a quick… …-fix for #44445

gmarek added a commit to gmarek/kubernetes that referenced this issue Jun 29, 2017

Add generic NoSchedule toleration to fluentd in gcp config as a quick…

aef96e5

…-fix for kubernetes#44445

gmarek added a commit to gmarek/kubernetes that referenced this issue Jun 29, 2017

Add generic NoSchedule toleration to fluentd in gcp config as a quick…

aac0e1c

…-fix for kubernetes#44445

crassirostris mentioned this issue Jun 30, 2017

Add possibility to change fluentd requests on master and nodes independently #48071

Closed

piascikj mentioned this issue Nov 3, 2017

remove this toleration once #44445 is properly fixed. imdone/kubernetes#8

Open

rohitagarwal003 mentioned this issue Nov 29, 2017

Add wildcard tolerations to kube-proxy #56589

Merged

mikedanese closed this as completed Nov 30, 2017

derekperkins mentioned this issue Jul 12, 2018

How to add tolerations to kube-proxy / kube-svc-redirect Azure/AKS#363

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement: fluentd-gcp to get same toleration as kube-proxy #44445

Improvement: fluentd-gcp to get same toleration as kube-proxy #44445

JorritSalverda commented Apr 13, 2017 •

edited

JorritSalverda commented Apr 15, 2017

davidopp commented May 5, 2017

davidopp commented May 5, 2017

gmarek commented May 5, 2017

JorritSalverda commented May 5, 2017

gmarek commented May 5, 2017

gmarek commented May 5, 2017

nimeshksingh commented May 10, 2017

gmarek commented May 11, 2017 •

edited

nimeshksingh commented May 11, 2017

davidopp commented May 11, 2017

davidopp commented May 11, 2017

aveshagarwal commented May 11, 2017

gmarek commented May 11, 2017 •

edited

dchen1107 commented May 26, 2017

nimeshksingh commented Jun 27, 2017

nimeshksingh commented Jun 27, 2017

davidopp commented Jun 28, 2017 •

edited

gmarek commented Jun 28, 2017

gmarek commented Jun 28, 2017

mikedanese commented Jun 28, 2017

gmarek commented Jun 28, 2017

davidopp commented Jun 28, 2017

nimeshksingh commented Jun 28, 2017

davidopp commented Jun 28, 2017

mikedanese commented Nov 30, 2017

Improvement: fluentd-gcp to get same toleration as kube-proxy #44445

Improvement: fluentd-gcp to get same toleration as kube-proxy #44445

Comments

JorritSalverda commented Apr 13, 2017 • edited

JorritSalverda commented Apr 15, 2017

davidopp commented May 5, 2017

davidopp commented May 5, 2017

gmarek commented May 5, 2017

JorritSalverda commented May 5, 2017

gmarek commented May 5, 2017

gmarek commented May 5, 2017

nimeshksingh commented May 10, 2017

gmarek commented May 11, 2017 • edited

nimeshksingh commented May 11, 2017

davidopp commented May 11, 2017

davidopp commented May 11, 2017

aveshagarwal commented May 11, 2017

gmarek commented May 11, 2017 • edited

dchen1107 commented May 26, 2017

nimeshksingh commented Jun 27, 2017

nimeshksingh commented Jun 27, 2017

davidopp commented Jun 28, 2017 • edited

gmarek commented Jun 28, 2017

gmarek commented Jun 28, 2017

mikedanese commented Jun 28, 2017

gmarek commented Jun 28, 2017

davidopp commented Jun 28, 2017

nimeshksingh commented Jun 28, 2017

davidopp commented Jun 28, 2017

mikedanese commented Nov 30, 2017

JorritSalverda commented Apr 13, 2017 •

edited

gmarek commented May 11, 2017 •

edited

gmarek commented May 11, 2017 •

edited

davidopp commented Jun 28, 2017 •

edited