New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugin interfaces for reserve and prebind extension points of the scheduling framework #70227

Merged
merged 5 commits into from Dec 1, 2018

Conversation

@bsalamat
Copy link
Contributor

bsalamat commented Oct 25, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:
Add plugin interfaces for reserve and prebind extension points of the scheduling framework. It also includes code that shows how in-process plugins should be registered at "reserve" or "prebind" points.

Special notes for your reviewer:
This is an initial implementation and one of many possibilities. I look for feedback on how to design and implement our interfaces so that our plugins and the scheduling framework are easy to develop and maintain.

Does this PR introduce a user-facing change?:

Add an plugin interfaces for "reserve" and "prebind" extension points of the scheduling framework.

/sig scheduling
ref/ kubernetes/enhancements/issues/624

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Oct 25, 2018

@bsalamat bsalamat force-pushed the bsalamat:reserve branch from b35e349 to 90569bc Oct 25, 2018

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Oct 25, 2018

/assign

import (
"sync"

"github.com/pkg/errors"

This comment has been minimized.

@Huang-Wei

Huang-Wei Oct 29, 2018

Member

should be "errors"?

prebindPlugins []plugins.PrebindPlugin
}

var _ = plugins.PluginSet(&DefaultPluginSet{})

This comment has been minimized.

@Huang-Wei

Huang-Wei Oct 29, 2018

Member

IMO a golang idiomatic way is:

Suggested change Beta
var _ = plugins.PluginSet(&DefaultPluginSet{})
var _ plugins.PluginSet = &DefaultPluginSet{}

This comment has been minimized.

@bsalamat

bsalamat Oct 30, 2018

Author Contributor

I have seen both and I don't know which one is actually recommended, but I like your version better. I will change it.

@@ -0,0 +1,73 @@
/*

This comment has been minimized.

@Huang-Wei

Huang-Wei Oct 29, 2018

Member

A typo in filename: should be registrar.go.

@@ -20,6 +20,7 @@ package factory

import (
"fmt"
"k8s.io/kubernetes/pkg/scheduler/plugins"

This comment has been minimized.

@Huang-Wei

Huang-Wei Oct 29, 2018

Member

move this line down to the section importing "k8s.io/..."

@@ -536,6 +542,14 @@ func (sched *Scheduler) scheduleOne() {
return
}

// Run "reserve" plugins.

This comment has been minimized.

@Huang-Wei

Huang-Wei Oct 29, 2018

Member

I'm curious that are we going to replace current "assuming" logic as a ReservePlugin, and provided by default? Or, keep current logic as is.

This comment has been minimized.

@bsalamat

bsalamat Oct 30, 2018

Author Contributor

The assumption logic which updates the scheduler's cache remains as a part of core.

@yastij
Copy link
Member

yastij left a comment

I did a pass on the code structure. I’ll do deeper review tomorrow

// ContextData stored by one plugin can be read, altered, or deleted by another plugin.
// Context does not provide any data protection, as all plugins are assumed to be
// trusted.
type Context struct {

This comment has been minimized.

@yastij

yastij Oct 30, 2018

Member

This might confuse users with Go’s Context.TODO(). We might want to rename this into unstructuredCache

type PluginData struct {
Ctx *Context
SchedulerCache *cache.Cache
// We may want to add the scheduling queue here too.

This comment has been minimized.

@yastij

yastij Oct 30, 2018

Member

This might be a good idea especially for plugins that make decisions based on the state of the schedulingQueue

// PluginSet registers plugins used by the scheduling framework.
// The plugins registered are called at specified points in an scheduling cycle.
type PluginSet interface {
Data() *PluginData

This comment has been minimized.

@yastij

yastij Oct 30, 2018

Member

I think that a PluginSet shouldn’t leak its data. If we want to clean all the data just expose a clear() mthod rather than the whole data structure.

This comment has been minimized.

@bsalamat

bsalamat Nov 6, 2018

Author Contributor

A plugin can be stateless. The framework needs to have access to PluginData and pass it to plugins at invocations.

@@ -480,6 +480,12 @@ func (sched *Scheduler) bind(assumed *v1.Pod, b *v1.Binding) error {

// scheduleOne does the entire scheduling workflow for a single pod. It is serialized on the scheduling algorithm's host fitting.
func (sched *Scheduler) scheduleOne() {
plugins := sched.config.PluginSet
// Remove all plugin context data at the beginning of a scheduling cycle.
if plugins.Data().Ctx != nil {

This comment has been minimized.

@yastij

yastij Oct 30, 2018

Member

Same as the comment for the pluginSet: rely on a method that says if the it is empty or not.

@krmayankk

This comment has been minimized.

Copy link
Contributor

krmayankk commented Nov 2, 2018

/assign @krmayankk


import (
"fmt"
"k8s.io/api/core/v1"

This comment has been minimized.

@wgliang

wgliang Nov 5, 2018

Member

split with one space line.

Plugin
// Reserve is called by the scheduling framework when the scheduler cache is
// updated.
RunReserve(e *PluginData, p *v1.Pod) error

This comment has been minimized.

@mlmhl

mlmhl Nov 6, 2018

Contributor

I'm not sure if we need to pass the suggestedHost to ReservePlugins so that they can know which node should reserve resources for this pod.

BTW, maybe use Reserve as the method name is enough? RunReserve is somewhat redundant.

This comment has been minimized.

@wgliang

wgliang Nov 6, 2018

Member

It might be better to leave Run instead of Reserve. :)
ReservePlugin.Run(...)

This comment has been minimized.

@bsalamat

bsalamat Nov 8, 2018

Author Contributor

@wgliang a single plugin may have multiple functions called at various extension points. "Run()" is not going to work in those situations.

@bsalamat bsalamat force-pushed the bsalamat:reserve branch 2 times, most recently from 2b63c47 to 86efde2 Nov 7, 2018

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Nov 10, 2018

/retest

1 similar comment
@yastij

This comment has been minimized.

Copy link
Member

yastij commented Nov 10, 2018

/retest

@@ -45,12 +45,15 @@ import (
"k8s.io/kubernetes/pkg/scheduler/metrics"
"k8s.io/kubernetes/pkg/scheduler/util"

"errors"

This comment has been minimized.

@mlmhl

mlmhl Nov 10, 2018

Contributor

I think move this line to the beginning of the import clause is more consistent with Golang's idiomatic.

@bsalamat bsalamat modified the milestones: v1.13, v1.14 Nov 12, 2018

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Nov 13, 2018

I removed this PR from 1.13, despite the fact that I think it is ready, mainly because 1.13 is meant to be a stability release with no major features. Moreover, I don't think there are many users who are waiting for this feature in 1.13. So, we can merge this in 1.14 and add more plugin points as well in that release.

@bsalamat bsalamat force-pushed the bsalamat:reserve branch from e5c021e to 158f89f Nov 13, 2018

@bsalamat bsalamat force-pushed the bsalamat:reserve branch from 158f89f to d6c2f78 Nov 13, 2018

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Nov 13, 2018

I agree
/hold

visibility = ["//visibility:public"],
deps = [
"//pkg/scheduler/internal/cache:go_default_library",
"//pkg/scheduler/plugins/v1alpha1:go_default_library",

This comment has been minimized.

@k82cn

k82cn Nov 24, 2018

Member

How will we plan export this plugins API? should it be part of APIs , e.g. under scheduler/apis dir.

klog.Errorf("error while running %v reserve plugin for pod %v: %v", pl.Name(), assumedPod.Name, err)
sched.recordSchedulingFailure(assumedPod, err, SchedulerError,
fmt.Sprintf("reserve plugin %v failed", pl.Name()))
metrics.PodScheduleErrors.Inc()

This comment has been minimized.

@k82cn

k82cn Nov 24, 2018

Member

What happened if first plugin successfully, but second one failed; the plugin may updates some data in it; similar to other phase. IMO, we may give a callback to plugin on failures.

This comment has been minimized.

@mlmhl

mlmhl Nov 25, 2018

Contributor

From the design doc, we can revert all reserve operations in the Reject extension point if any Reserve plugins failed.

For the Prebind extension point, we indeed don't have a revert mechanism. I think this problem also exists in current implementation: In the bind phase, scheduler invokes bindVolumes and bind in order. If bindVolumes succeeded but bind failed, we do not revert the volume binding operation.

BTW, if all Reserve plugins succeeded but the bind phase failed, need we to revert all reserve operations? In the current implementation, we invoke sched.config.SchedulerCache.ForgetPod to revert assumed pods in the cache. Maybe we need this mechanism for Reserve plugins too.

cc @bsalamat

This comment has been minimized.

@k82cn

k82cn Nov 26, 2018

Member

IMO, to simplify, just give an Event to plugins on failure at any point :) Plugins should handle those Event accordingly and scheduler handle failure on framework scope.

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Nov 27, 2018

/retest

@bsalamat bsalamat force-pushed the bsalamat:reserve branch from d6c2f78 to 679d88c Dec 1, 2018

@bsalamat bsalamat added the lgtm label Dec 1, 2018

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 1, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 1, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 1, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 2c322a2 into kubernetes:master Dec 1, 2018

18 checks passed

cla/linuxfoundation bsalamat authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment