Throttle pod eviction when past evictions led to the same node being chosen by the scheduler #424

codablock · 2020-10-07T12:42:14Z

Is your feature request related to a problem? Please describe.
Currently, multiple missing features in descheduler cause re-scheduling of pods onto the same nodes again and again. Two such examples are #335 and #335 (comment). The descheduler currently never gives up.

Describe the solution you'd like
pod eviction should be throttled when a configurable number of past evictions led to the re-scheduling on the same node. This should be configurable via options (default values) and via annotations per pod.

I assume that there are more cases where pods are re-scheduled to the same node. I already encountered 2 such cases on the first run of descheduler. I also assume that it will take some time until all these cases are fixed, so I'd suggest to implement this feature first so that there is some sensible default/fallback behavior for such cases.

damemi · 2020-10-07T13:48:08Z

I think this would be related to our long-term goal of incorporating the actual scheduler framework into the descheduler to better inform eviction decisions (see related issues like #261, #238).

That work is making some progress, but currently at the step of breaking the scheduler framework out of core kubernetes (see upstream issue kubernetes/kubernetes#89930).

Like you mention, there are a lot of cases where an evicted pod will end up on the same node. The descheduler is pretty naïve and optimistic, so trying to cover all of these now would be a lot like re-writing existing scheduler logic.

We could potentially add some sort of "throttling" (maybe an annotation like last-evicted-time/count?) that could de-prioritize pods for eviction, but ultimately I think the ask here is a known issue of incorporating more scheduler logic.

fejta-bot · 2021-01-05T13:52:56Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

damemi · 2021-01-05T16:08:32Z

/remove-lifecycle stale

fejta-bot · 2021-04-05T17:07:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

seanmalloy · 2021-04-06T03:18:54Z

remove-lifecycle stale

fejta-bot · 2021-05-06T03:51:02Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-05T04:13:59Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-05T04:14:03Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

codablock added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 7, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2021

k8s-ci-robot closed this as completed Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle pod eviction when past evictions led to the same node being chosen by the scheduler #424

Throttle pod eviction when past evictions led to the same node being chosen by the scheduler #424

codablock commented Oct 7, 2020

damemi commented Oct 7, 2020

fejta-bot commented Jan 5, 2021

damemi commented Jan 5, 2021

fejta-bot commented Apr 5, 2021

seanmalloy commented Apr 6, 2021

fejta-bot commented May 6, 2021

fejta-bot commented Jun 5, 2021

k8s-ci-robot commented Jun 5, 2021

Throttle pod eviction when past evictions led to the same node being chosen by the scheduler #424

Throttle pod eviction when past evictions led to the same node being chosen by the scheduler #424

Comments

codablock commented Oct 7, 2020

damemi commented Oct 7, 2020

fejta-bot commented Jan 5, 2021

damemi commented Jan 5, 2021

fejta-bot commented Apr 5, 2021

seanmalloy commented Apr 6, 2021

fejta-bot commented May 6, 2021

fejta-bot commented Jun 5, 2021

k8s-ci-robot commented Jun 5, 2021