No proper scheduling retries could be made when Extender filters out some Nodes #122019
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
sig/scheduling
Categorizes an issue or PR as relevant to SIG Scheduling.
What happened?
When Extender filters out some Nodes, we don't set any unschedulable plugins at all. It means Extender is completely ignored during the requeueing process.
So, what's happening is:
The latter case is serious because it could make Pods being stuck in unschedulable pod pool in 5min.
What did you expect to happen?
We should have a short-term solution for the latter case.
We can requeue Pods, which were rejected by Extender, by any kind of cluster events because we cannot know which events make Pods schedulable.
Eventually, this issue makes us wonder how Pods rejected by Extender should be requeued. We cannot implement QHint equivalent in Extenders because it'd be too slow to call Extender every time any cluster events happen. Probably, somehow implementing EventsToRegister equivalent in Extender?
How can we reproduce it (as minimally and precisely as possible)?
Use Extender which does something in Filter.
Anything else we need to know?
No response
Kubernetes version
master
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: