Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An interface that allows pre-filter plugins to update their pre-calculated status #82912

Merged
merged 1 commit into from Sep 25, 2019

Conversation

ahg-g
Copy link
Member

@ahg-g ahg-g commented Sep 20, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:

This is needed to allow efficient preemption simulations: during preemption, we remove/add pods from each node before running the filter plugins again to evaluate whether removing/ad
ding specific pods will allow the incoming pod to be scheduled on the node. Instead of calling prefilter again, we should allow the plugin to do incremental update to its pre-computed state.

Which issue(s) this PR fixes:
Fixes #82897

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

New APIs to allow adding/removing pods from pre-calculated prefilter state in the scheduling framework

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 20, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 20, 2019
@ahg-g ahg-g force-pushed the ahg-prefilter-update branch 2 times, most recently from efc0728 to e5ad022 Compare September 20, 2019 05:15
Copy link
Contributor

@draveness draveness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label api-review
/hold feel free to cancel

Did we have a thorough discussion of whether or not exposing these API? Is this a required interface (AddPod, RemovePod) which we must provide to the developer, or just an approach to help with the performance.

Besides, it requires an API review if we change the interface of the scheduler extension point.

@k8s-ci-robot
Copy link
Contributor

@draveness: The label(s) /label api-review cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/label api-review
/hold

Did we have a thorough discussion of whether or not exposing these API? Is this a required interface (AddPod, RemovePod) which we must provide to the developer, or just an approach to help with the performance.

Besides, it requires an API review if we change the interface of the scheduler extension point.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2019
@k8s-ci-robot
Copy link
Contributor

@draveness: The label(s) /label api-review cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/label api-review
/hold feel free to cancel

Did we have a thorough discussion of whether or not exposing these API? Is this a required interface (AddPod, RemovePod) which we must provide to the developer, or just an approach to help with the performance.

Besides, it requires an API review if we change the interface of the scheduler extension point.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ahg-g
Copy link
Member Author

ahg-g commented Sep 20, 2019

/label api-review
/hold feel free to cancel

Did we have a thorough discussion of whether or not exposing these API? Is this a required interface (AddPod, RemovePod) which we must provide to the developer, or just an approach to help with the performance.

Certainly, I created the issue to discuss the motivation. Will keep the hold until the community is comfortable with this addition.

Besides, it requires an API review if we change the interface of the scheduler extension point.

This particular change does not actually.

@ahg-g
Copy link
Member Author

ahg-g commented Sep 20, 2019

/cc @alculquicondor

@ahg-g
Copy link
Member Author

ahg-g commented Sep 20, 2019

/test pull-kubernetes-bazel-build

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below.

pluginNameToWeightMap map[string]int
queueSortPlugins []QueueSortPlugin
preFilterPlugins []PreFilterPlugin
preFilterWithUpdatePlugins []PreFilterWithUpdatePlugin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the idea of separating the two plugins. In a user's perspective, what they know is just a phase called "PreFilter" which "PreFilterPlugins" should be responsible for. IMO it's better to keep update related logic into another interface.

I'd propose to keep preFilterPlugins only, and then add a MetaUpdater() method in PreFilterPlugin like this:

type PreFilterPlugin interface {
	Plugin
	PreFilter(pc *PluginContext, p *v1.Pod) *Status
	MetaUpdater() MetaUpdater
}

And MetaUpdater is an interface with AddPod() and RemovePod():

type MetaUpdater interface {
	AddPod(pc *PluginContext, podToSchedule *v1.Pod, podToAdd *v1.Pod, nodeInfo *schedulernodeinfo.NodeInfo) *Status
	RemovePod(pc *PluginContext, podToSchedule *v1.Pod, podToRemove *v1.Pod, nodeInfo *schedulernodeinfo.NodeInfo) *Status
}

So RunPreFilter[Add|Remove]Pod method can be updated:

func (f *framework) RunPreFilterAddPod(pc *PluginContext, podToSchedule *v1.Pod,
	podToAdd *v1.Pod, nodeInfo *schedulernodeinfo.NodeInfo) *Status {
	for _, pl := range f.preFilterPlugins {
		status := pl.MetaUpdater().AddPod(pc, podToSchedule, podToAdd, nodeInfo)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the separate interfaces is that the extra methods are optional. And then we only iterate once to identify the plugins that implement the interface.
These private slices are not user (developer) facing.
And we have done the same for scorePlugins and scoreWithNormalizePlugins

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the idea is to make the methods optional, those slices are implementation details only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry to confuse you. I should have commented it on the interface definitions.

So basically, instead of defining two interfaces (PreFilterPlugin and PreFilterWithUpdatePlugin), I'm leaning towards to keep one interface PreFilterPlugin.

With the above extra interface funtion MetaUpdater(), implementations that returns nil will get simply bypassed without invoking its AddPod() and RemovePod() upon preemption. Right now, we're relying on typecasting to tell whether it's a PreFilterPlugin or PreFilterWithUpdatePlugin implementation, which I'm not sure I like the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above suggestion is similar like what we did in GVK:

Basically an API object can choose not to omit GVK by implementing GroupVersionKind() to simply return nil, so that API machinery won't call further GVK related methods upon this object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds good to me, it will also solve the confusion when a plugin implements only AddPod but not RemovePod for example, which will result in a failure in the type cast, and so AddPod will not be called contrary to what the developer may think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth making the optional normalize for score plugins consistent with this suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but we can have it in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I named the interface "Updater" only, I think it is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above extra interface funtion MetaUpdater(), implementations that returns nil will get simply bypassed without invoking its AddPod() and RemovePod() upon preemption. Right now, we're relying on typecasting to tell whether it's a PreFilterPlugin or PreFilterWithUpdatePlugin implementation, which I'm not sure I like the way.

I feel the same way, it's better to implement it as a noop. I sent #83042 to remove the scoreWithNormalize interface.

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
// RunPreFilterAddPod calls the AddPod interface for the set of configured
// PreFilter plugins. It returns directly if any of the plugins return any
// status other than Success.
func (f *framework) RunPreFilterAddPod(pc *PluginContext, podToSchedule *v1.Pod,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunMetaAddPod sounds more clear to me so that it's possible in the future to enable other plugins to be called under the hood.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we make this generic, it should be specific to prefilter (i.e., the updater in the prefilter plugins). This is internal implementation in any case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.

@Huang-Wei
Copy link
Member

/retest

@Huang-Wei
Copy link
Member

/lgtm

Thanks @ahg-g !

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2019
@Huang-Wei
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2019
@ahg-g ahg-g force-pushed the ahg-prefilter-update branch 3 times, most recently from 5859dc4 to afdb7f9 Compare September 24, 2019 12:06
@ahg-g
Copy link
Member Author

ahg-g commented Sep 24, 2019

/lgtm

Thanks @ahg-g !

Thanks, now that the context PR is in, I added logic to clone the context during preemption, PTAL at the last commit, if all looks good I will squash.

@@ -1023,7 +1023,9 @@ func (g *genericScheduler) selectNodesForPreemption(
if meta != nil {
metaCopy = meta.ShallowCopy()
}
pods, numPDBViolations, fits := g.selectVictimsOnNode(pluginContext, pod, metaCopy, nodeNameToInfo[nodeName], fitPredicates, queue, pdbs)
pluginContextClone := pluginContext.Clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do the copy in selectVictimsNode, given that pluginContextClone isn't used outside of it?

Also, prefer shorter names for local variables. We could start using pCtx instead of pluginContext

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do the copy in selectVictimsNode, given that pluginContextClone isn't used outside of it?

To be consistent with how "meta" is treated.

Also, prefer shorter names for local variables. We could start using pCtx instead of pluginContext

we already use pluginContext everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, they are just nits.

@@ -64,3 +64,11 @@ func TestPluginContextClone(t *testing.T) {
t.Errorf("cloned copy should not change, got %q, expected %q", v.data, data1)
}
}

func TestPluginContextCloneNil(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this can be merged into TestPluginContextClone in a Golang subtest manner.

Copy link
Member Author

@ahg-g ahg-g Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the boilerplate for doing that will outweigh its benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's just nits.

@Huang-Wei
Copy link
Member

Thanks, now that the context PR is in, I added logic to clone the context during preemption, PTAL at the last commit, if all looks good I will squash.

Please go squashing the commits.

@ahg-g
Copy link
Member Author

ahg-g commented Sep 24, 2019

Thanks, now that the context PR is in, I added logic to clone the context during preemption, PTAL at the last commit, if all looks good I will squash.

Please go squashing the commits.

Done.

…lated.

This is needed to allow efficient preemption simulations: during preemption, we remove/add pods from each node before running the filter plugins again to evaluate whether removing/adding specific pods will allow the incoming pod to be scheduled on the node. Instead of calling prefilter again, we should allow the plugin to do incremental update to its pre-computed state.
@Huang-Wei
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2019
@ahg-g
Copy link
Member Author

ahg-g commented Sep 24, 2019

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@Huang-Wei
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2019
podToRemove *v1.Pod, nodeInfo *schedulernodeinfo.NodeInfo) *Status {
for _, pl := range f.preFilterPlugins {
if updater := pl.Updater(); updater != nil {
status := updater.RemovePod(pc, podToSchedule, podToRemove, nodeInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has same signature as that of updater.AddPod , it seems this func (RunPreFilterUpdaterRemovePod) can be unified with RunPreFilterUpdaterAddPod by passing AddPod / RemovePod as parameter.

@ahg-g
Copy link
Member Author

ahg-g commented Sep 25, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 0b4cccc into kubernetes:master Sep 25, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Sep 25, 2019
@ahg-g ahg-g deleted the ahg-prefilter-update branch January 10, 2020 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A new API for updating prefilter calculations
6 participants