New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"scheduler: non-compatible change in default topology spread constraints" #102136
Comments
A potential fix would like this: In the default constraints case, we do not ignore nodes in prescore phase, so nodes with no topology-labels will be grouped to an empty-value topology and count pod per topology normally. At the later score phase, those empty-value topology's scores will not be added to the final score as we have a label check as below, so the last result will favour nodes with all required topology keys presented and penalize without.
can this fix make sense? feel free to leave comments. |
There are two options:
@alculquicondor we faced/discussed this issue before, what was the conclusion? |
My suggestion was to special case the system defaulting to still provide some spreading if the zone label doesn't exist #98480 (comment) Feel free to implement that suggestion. This should apply for backporting. |
/triage accepted |
cc @Huang-Wei who was in the previous thread. |
Btw, in the meantime, you can disable the feature gate |
@alculquicondor 's suggestion looks good, but where should can we achieve that? Not during init time as
If in runtime, that implies we have to do it in every scoring cycle, and iterate the filtered nodes to know if the zone label is present or not. |
This may work except for the unnecessary calculation on the "artificial" empty-value topology. |
Yes, it should be in runtime. We already iterate over all the nodes in
Something like that, but it should be done just for the |
This isn't pretty, but I guess we have to do it because it was a breaking change. |
/retitle "scheduler: non-compatible change in default topology spread constraints" |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
We probably need to solve this before graduating to GA kubernetes/enhancements#1258 @gaorong I noticed you still have a PR open. Are you still working on it? |
/unassign @gaorong I can take it over |
I'm not going to backport this, but I will support a cherry pick if anybody needs it. 1.20 would be the oldest that can be fixed. |
cc @damemi |
/open for backporting |
/reopen |
@alculquicondor: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
we need to backport this |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/close |
@ahg-g: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
From v1.19, the scheduler use
PodTopologySpread
plugin to do default spreading, with the following constraints, and the originalSelectorSpread
is disabled by default.These default constraints suppose nodes have a
topology.kubernetes.io/zone
label. The algorithm will ignore nodes that don't have all the required topology keys presented when scoring.kubernetes/pkg/scheduler/framework/plugins/podtopologyspread/scoring.go
Lines 77 to 82 in ee9f365
However, in many on-premise clusters, those labels are not added by default and no guaranteed to exist, which leads to the
PodTopologySpread
will not take effect at all.In a worse case, If we assign a
NodeResourcesMostAllocated
plugin and thePodTopologySpread
do not take effect as described above, theNodeResourcesMostAllocated
will dominate the scheduling and be likely to bind all pods belonging to ReplicaSets to the same node, which will lead to low availability and is not accepted by our user.In the
SelectorSpread
plugin, we have already taken care of this case:kubernetes/pkg/scheduler/framework/plugins/selectorspread/selector_spread.go
Lines 153 to 154 in ee9f365
When changed to
PodTopologySpread
, we do not care zome, so are not compatible with the original behavior.What you expected to happen:
the default
PodTopologySpread
plugin should take care of the case in which nodes do not have zone labels and work properly.How to reproduce it (as minimally and precisely as possible):
create a deployment in a cluster whose nodes do not have zone labels and observe the default spreading.
Anything else we need to know?:
Environment:
kubectl version
):master branch
/assign
/sig scheduling
The text was updated successfully, but these errors were encountered: