Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Remove critical-pod annotations from scheduler & controller-manager #519

Closed
aaronlevy opened this issue May 17, 2017 · 6 comments
Closed
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. low hanging fruit priority/P0

Comments

@aaronlevy
Copy link
Contributor

If we mark the scheduler as critical, for example, then it happens to get in a state where a pod is unscheduled because the scheduler is down (e.g. #112) -- will the rescheduler start evicting other workloads?

This would be pretty dangerous as it would likely just keep evicting workloads even though it will never be scheduled (no scheduler to schedule it).

Ideally the rescheduler would only evict based on "known" scheduling failures (e.g. no more allocatable CPU), not just on "can't schedule" -- but I'm unsure of the exact behavior.

We don't currently deploy the rescheduler, but we should evaluate this behavior before doing so (possibly remove the critical pod annotation from controller-manager / scheduler).

xref: #474

@aaronlevy aaronlevy added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/friction priority/P1 labels May 17, 2017
@luxas
Copy link

luxas commented Aug 16, 2017

cc @davidopp @bsalamat @timothysc

@bsalamat
Copy link

When a critical pod cannot be scheduled, rescheduler tries to find a node for it and removes enough pods from the node to allow scheduling of the critical pod, but once it does so, it adds the pod to a list of processed pods. Once the pod is added to the list, rescheduler will not remove more workloads from other nodes to make it schedule.
That said, after we add priority and preemption to Kubernetes, we will get rid of the way critical pods are annotated today and will probably retire rescheduler. In that world, scheduler and other critical system pods will get very high priority and won't get evicted/preempted.

@bsalamat
Copy link

Actually, when I looked at the code again, I noticed that if the processed pod is not scheduled within a certain amount of time, it is removed from the processed list and, therefore, is processed again, which means that more pods may be removed to let it schedule. So, the scenario described in the issue can happen and is dangerous.
As explained above, the priority and preemption work will allow us to retire the existing rescheduler.

@aaronlevy aaronlevy changed the title Marking controller-manager and scheduler as "critical" and using rescheduler might be dangerous Remove critical-pod annotations from scheduler & controller-manager Aug 16, 2017
@aaronlevy aaronlevy added kind/bug Categorizes issue or PR as related to a bug. low hanging fruit priority/P0 and removed priority/P1 labels Aug 16, 2017
squat added a commit to squat/tectonic-installer that referenced this issue Aug 16, 2017
Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.
squat added a commit to squat/tectonic-installer that referenced this issue Oct 2, 2017
Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.
squat added a commit to squat/tectonic-installer that referenced this issue Oct 2, 2017
Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.
@dghubble
Copy link
Contributor

dghubble commented Oct 25, 2017

This is confusing. #728 (comment) states the annotations are only for the rescheduler. #474 (comment) states the annotations aren't safe if we do run the rescheduler. Since we don't run the rescheduler, it seems like these annotations are useless rather than dangerous currently? Upstream salt templates still includes the critical-pod annotation on kube-system components, Tectonic does not. @diegs @aaronlevy

Can we come to a decision to merge or close #728 (@squat) and plan whether we ever want the rescheduler or no if its being retired. #474

@diegs
Copy link
Contributor

diegs commented Oct 25, 2017

I just closed #474. We should merge #728. Upstream is out of date by the sounds of it. The rescheduler is deprecated in favor of priorities.

@diegs
Copy link
Contributor

diegs commented Dec 5, 2017

Fixed in #777

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. low hanging fruit priority/P0
Projects
None yet
Development

No branches or pull requests

5 participants