Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sample] Cross node preemption plugin #56

Merged

Conversation

Huang-Wei
Copy link
Contributor

The PostFilter extension point was introduced in Kubernetes Scheduler since 1.19,
and the default implementation in upstream is to preempt Pods on the same node
to make room for the unschedulable Pod.

In contrast to the "same-node-preemption" strategy, we can come up with a "cross-node-preemption"
strategy to preempt Pods across multiple nodes, which is useful when a Pod cannot be
scheduled due to "cross node" constraints such as PodTopologySpread and PodAntiAffinity.
This was also mentioned in the original design document of Preemption.

This plugin is built as a sample to demonstrate how to use PostFilter extension point,
as well as inspiring users to built their own innovative strategies, such as preepmpting
a group of Pods.

⚠️ CAVEAT: Current implementation doesn't do any branch cutting, but uses a DFS algorithm
to iterate all possible preemption strategies. DO NOT use it in your production env.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2020
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 22, 2020
@@ -50,7 +50,7 @@ update-vendor:
hack/update-vendor.sh

.PHONY: unit-test
unit-test: update-vendor
unit-test: autogen
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now needed as the testing in crossnodepreemption now needs the open API stuff to be fully generated. Otherwise, the test fails:

?   	sigs.k8s.io/scheduler-plugins/pkg/apis/config/scheme	[no test files]
?   	sigs.k8s.io/scheduler-plugins/pkg/apis/config/v1beta1	[no test files]
ok  	sigs.k8s.io/scheduler-plugins/pkg/coscheduling	0.756s
# k8s.io/kubernetes/cmd/kube-apiserver/app
vendor/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:467:70: undefined: "k8s.io/kubernetes/pkg/generated/openapi".GetOpenAPIDefinitions
FAIL	sigs.k8s.io/scheduler-plugins/pkg/crossnodepreemption [build failed]
ok  	sigs.k8s.io/scheduler-plugins/pkg/noderesources	0.187s
ok  	sigs.k8s.io/scheduler-plugins/pkg/qos	0.150s
FAIL
make: *** [Makefile:54: unit-test] Error 2

@Huang-Wei
Copy link
Contributor Author

/cc @denkensk
cc/ @everpeace

Copy link
Member

@denkensk denkensk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks some small comments.

pkg/crossnodepreemption/README.md Outdated Show resolved Hide resolved
func (s *candidate) Victims() *extenderv1.Victims {
return &extenderv1.Victims{
Pods: s.victims,
NumPDBViolations: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add the field NumPDBViolations here other than use the struct *extenderv1.Victims like defaultPreempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to make candidate irrelevant with the extenderv1 pkg, so that in the preemption logic, we don't need to worry about extenderv1 pkg. And returning extenderv1.Victims here does nothing magic but ensuring it adheres to Candidate interface.


// nodesWherePreemptionMightHelp returns a list of nodes with failed predicates
// that may be satisfied by removing pods from the node.
func nodesWherePreemptionMightHelp(nodes []*framework.NodeInfo, m framework.NodeToStatusMap) []*framework.NodeInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also use this function in capacityScheduling.go. Make it public in defaultPreempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.

@denkensk
Copy link
Member

Thanks
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2020
@k8s-ci-robot k8s-ci-robot merged commit 88c2dea into kubernetes-sigs:master Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants