Extending taint-based-eviction schedule by lengthening `tolerationSeconds` is not possible. #102993

dbenque · 2021-06-18T13:28:53Z

What happened:

Working with taint based eviction, the tolerationSeconds parameter is taken into account when the eviction schedule is created.
Once an eviction schedule is create, updating the tolerationSeconds only works if it creates an earlier schedule.

This does not give a good user experience because the behavior appear to be not consistent. If tolerationSeconds is updated for a group of pods associated with N nodes the behavior will differ depending if some eviction schedule are set or not on some nodes.

Let's not that the modification of tolerationSeconds in the pod toleration is always accepted by the APIServer.

Let's take an example:

Initial condition:
Fleet of 100 pods all running on a different node, so we have 100 nodes. All pods have a tolerationSeconds of 1hours against the taint node.kubernetes.io/not-ready:NoExecute.

State change (T0)
12 of the 100 nodes becomes not-ready and are being tainted. The controller manager will schedule 12 evictions to be triggered in 1h

User decision:
The user decided tolerationSeconds should be updated on the fleet of pods to 1day. Let' imagine that this extension of toleration is decided to give more time to the team to do some remediations and fixes on the cluster that is not healthy (reason for the notReady nodes potentially) .
All tolerationSeconds are updated to 1day. APIServer accept all the modification. Behind the scene the controller manager observe that change but do not reschedule the 12 pending evictions.

State change(T1):
other 13 nodes becomes not-ready. We have now 25 not-ready nodes. 13 new eviction schedules are created and they will trigger in 24h.

What happen later:
at T0+1h: 12 pods are going to be evicted!!! (not really what a user would expect)
at T1+24h: 13 pods are going to be evicted (expected)

What you expected to happen:

If an eviction schedule is pending, if the tolerationSeconds is updated, the new value is taken into account in all cases: schedule moved later or earlier.

How to reproduce it (as minimally and precisely as possible):

1- Taint a node with NoExecute taint.
2- Extend the tolerationSeconds value of an impacted pod by 1 day

you will notice that the pod is evicted after 5 minutes (which is the default tolerationSeconds value)

Anything else we need to know?:

The code is made to prevent such extension, but there is not clear explanation for the that in the documentation or in the code. The only think that reflect that intention is this unit-test:

kubernetes/pkg/controller/nodelifecycle/scheduler/taint_manager_test.go

Line 295 in fddb3ad

description: "lengthening toleration shouldn't work",

I will propose a simple PR that allow moving the schedule earlier or later depending on the modification done to tolerationSeconds, but of course that would break the unit-test above.

Is there any reason for blocking the extension of an established eviction schedule and not respecting tolerationSeconds update?

For more context: we are we trying to extends eviction schedule because under some catastrophic scenario (many/all kubelet losing connection to apiserver) eviction schedule are created. Of course we could play with the rate of eviction, but this is clearly not satisfying, the main reasons are:
1- we cannot change it while the CM is running
2- thresholds are defined for at cluster level while we would like to work per group of nodes, or application, or namespace, or whatever dimension
3- extending/pushing eviction schedule is the only way to give time to operation people to deal with remediation without impacting workload

Environment:

Kubernetes version (use kubectl version): 1.21
Cloud provider or hardware configuration: aws
OS (e.g: cat /etc/os-release): ubuntu
Kernel (e.g. uname -a):5.8.0
Install tools: -
Network plugin and version (if this is a network-related bug): -
Others:

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-06-18T13:29:00Z

@dbenque: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2021-06-21T15:07:35Z

/sig scheduling

k8s-triage-robot · 2021-09-19T15:25:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-10-19T16:09:34Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-11-18T16:15:31Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-11-18T16:15:52Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dbenque added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2021

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 18, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 18, 2021

This was referenced Jun 21, 2021

allow extending taint based eviction schedule DataDog/kubernetes#41

Closed

taint based eviction: allow lengthening eviction schedule DataDog/kubernetes#42

Open

taint based eviction: allow lengthening eviction schedule #103055

Closed

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 19, 2021

k8s-ci-robot closed this as completed Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending taint-based-eviction schedule by lengthening `tolerationSeconds` is not possible. #102993

Extending taint-based-eviction schedule by lengthening `tolerationSeconds` is not possible. #102993

dbenque commented Jun 18, 2021

k8s-ci-robot commented Jun 18, 2021

neolit123 commented Jun 21, 2021

k8s-triage-robot commented Sep 19, 2021

k8s-triage-robot commented Oct 19, 2021

k8s-triage-robot commented Nov 18, 2021

k8s-ci-robot commented Nov 18, 2021

Extending taint-based-eviction schedule by lengthening tolerationSeconds is not possible. #102993

Extending taint-based-eviction schedule by lengthening tolerationSeconds is not possible. #102993

Comments

dbenque commented Jun 18, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

k8s-ci-robot commented Jun 18, 2021

neolit123 commented Jun 21, 2021

k8s-triage-robot commented Sep 19, 2021

k8s-triage-robot commented Oct 19, 2021

k8s-triage-robot commented Nov 18, 2021

k8s-ci-robot commented Nov 18, 2021

Extending taint-based-eviction schedule by lengthening `tolerationSeconds` is not possible. #102993

Extending taint-based-eviction schedule by lengthening `tolerationSeconds` is not possible. #102993