New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pod with affinity can schedule only on nodes with matching topology keys #91168
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahg-g. This makes the Filter() logic pretty neat!
Some comments below.
return framework.NewStatus(framework.UnschedulableAndUnresolvable, ErrReasonAffinityNotMatch, ErrReasonAffinityRulesNotMatch) | ||
} | ||
|
||
if !satisfiesPodsAntiAffinity(state, nodeInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can evaluate that, in a common case, which one is more expensive: satisfiesExistingPodsAntiAffinity
or satisfiesPodsAntiAffinity
? and then put the cheaper one at the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non of them are "expensive" per se, but probably satisfiesExistingPodsAntiAffinity
will lead to the most iterations since it iterates over all the node's labels, so I moved it to last.
func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool { | ||
toposExist := true | ||
podsExist := true | ||
for _, term := range state.podInfo.RequiredAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to refactor the whole function to return early when there is a simple negative path:
func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool {
podsExist := true
for _, term := range state.podInfo.RequiredAffinityTerms {
topologyValue, ok := nodeInfo.Node().Labels[term.TopologyKey]
if !ok {
return false
}
pair := topologyPair{key: term.TopologyKey, value: topologyValue}
if state.topologyToMatchedAffinityTerms[pair] <= 0 {
podsExist = false
break
}
}
if podsExist {
return true
}
// This pod may be the first pod in a series that have affinity to themselves. In order
// to not leave such pods in pending state forever, we check that if no other pod
// in the cluster matches the namespace and selector of this pod, the pod matches
// its own terms, and the node has all the requested topologies, then we allow the pod
// to pass the affinity check.
podInfo := state.podInfo
if len(state.topologyToMatchedAffinityTerms) == 0 && podMatchesAllAffinityTerms(podInfo.Pod, podInfo.RequiredAffinityTerms) {
return true
}
return false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be one or two affinity terms and so exiting early is not really beneficial and makes the code more complicated, I prefer simplicity in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we can just eliminate the variable toposExist
- directly return false in L367, and the logic afterwards would be only checking podsExist
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
aa09d7d
to
a6a18d4
Compare
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits. Good to squash if you resolve them.
for topologyKey, topologyValue := range nodeInfo.Node().Labels { | ||
if topologyMap[topologyPair{key: topologyKey, value: topologyValue}] > 0 { | ||
return false | ||
func satisfyExistingPodsAntiAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Pods/Pod/ to be consistent with others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should stay because we are checking existing pods (multiple pods), in the other two we are checking the incoming pod (single pod).
func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool { | ||
toposExist := true | ||
podsExist := true | ||
for _, term := range state.podInfo.RequiredAffinityTerms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we can just eliminate the variable toposExist
- directly return false in L367, and the logic afterwards would be only checking podsExist
.
Thanks @Huang-Wei I squashed. |
/lgtm |
/hold cancel |
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
Currently the first pod with required affinity can schedule on any node, this is not correct as no future pod with matching selector/namespace can be placed on that node. This PR fixes this by mandating that a node can accept a first pod only if it has all the topology keys.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/assign @Huang-Wei