Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pod with affinity can schedule only on nodes with matching topology keys #91168

Merged
merged 1 commit into from May 18, 2020

Conversation

ahg-g
Copy link
Member

@ahg-g ahg-g commented May 16, 2020

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:
Currently the first pod with required affinity can schedule on any node, this is not correct as no future pod with matching selector/namespace can be placed on that node. This PR fixes this by mandating that a node can accept a first pod only if it has all the topology keys.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

First pod with required affinity terms can schedule only on nodes with matching topology keys.

/assign @Huang-Wei

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 16, 2020
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahg-g. This makes the Filter() logic pretty neat!

Some comments below.

return framework.NewStatus(framework.UnschedulableAndUnresolvable, ErrReasonAffinityNotMatch, ErrReasonAffinityRulesNotMatch)
}

if !satisfiesPodsAntiAffinity(state, nodeInfo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can evaluate that, in a common case, which one is more expensive: satisfiesExistingPodsAntiAffinity or satisfiesPodsAntiAffinity? and then put the cheaper one at the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non of them are "expensive" per se, but probably satisfiesExistingPodsAntiAffinity will lead to the most iterations since it iterates over all the node's labels, so I moved it to last.

func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool {
toposExist := true
podsExist := true
for _, term := range state.podInfo.RequiredAffinityTerms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to refactor the whole function to return early when there is a simple negative path:

func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool {
	podsExist := true
	for _, term := range state.podInfo.RequiredAffinityTerms {
		topologyValue, ok := nodeInfo.Node().Labels[term.TopologyKey]
		if !ok {
			return false
		}
		pair := topologyPair{key: term.TopologyKey, value: topologyValue}
		if state.topologyToMatchedAffinityTerms[pair] <= 0 {
			podsExist = false
			break
		}
	}

	if podsExist {
		return true
	}

	// This pod may be the first pod in a series that have affinity to themselves. In order
	// to not leave such pods in pending state forever, we check that if no other pod
	// in the cluster matches the namespace and selector of this pod, the pod matches
	// its own terms, and the node has all the requested topologies, then we allow the pod
	// to pass the affinity check.
	podInfo := state.podInfo
	if len(state.topologyToMatchedAffinityTerms) == 0 && podMatchesAllAffinityTerms(podInfo.Pod, podInfo.RequiredAffinityTerms) {
		return true
	}
	return false
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be one or two affinity terms and so exiting early is not really beneficial and makes the code more complicated, I prefer simplicity in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we can just eliminate the variable toposExist - directly return false in L367, and the logic afterwards would be only checking podsExist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@ahg-g ahg-g force-pushed the ahg-affinity5 branch 2 times, most recently from aa09d7d to a6a18d4 Compare May 18, 2020 16:40
@ahg-g
Copy link
Member Author

ahg-g commented May 18, 2020

/hold
for squash.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2020
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Good to squash if you resolve them.

for topologyKey, topologyValue := range nodeInfo.Node().Labels {
if topologyMap[topologyPair{key: topologyKey, value: topologyValue}] > 0 {
return false
func satisfyExistingPodsAntiAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Pods/Pod/ to be consistent with others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should stay because we are checking existing pods (multiple pods), in the other two we are checking the incoming pod (single pod).

func satisfiesPodsAffinity(state *preFilterState, nodeInfo *framework.NodeInfo) bool {
toposExist := true
podsExist := true
for _, term := range state.podInfo.RequiredAffinityTerms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we can just eliminate the variable toposExist - directly return false in L367, and the logic afterwards would be only checking podsExist.

@ahg-g
Copy link
Member Author

ahg-g commented May 18, 2020

Thanks @Huang-Wei I squashed.

@Huang-Wei
Copy link
Member

/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2020
@Huang-Wei
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2020
@k8s-ci-robot k8s-ci-robot merged commit 9eb097c into kubernetes:master May 18, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone May 18, 2020
@ahg-g ahg-g deleted the ahg-affinity5 branch October 25, 2021 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants