Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score plugin for the scheduling framework. #79109

Merged
merged 3 commits into from Jul 17, 2019

Conversation

@ahg-g
Copy link
Member

commented Jun 17, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
Implementation of the scheduling framework scoring plugin.

Which issue(s) this PR fixes:
Fixes #78002

Does this PR introduce a user-facing change?:

Add Score extension point to the scheduling framework.
@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jun 17, 2019

/assign @bsalamat

@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jun 17, 2019

This is just the interface, and an initial implementation of the RunScorePlugins without hooking it to the scheduler just to provide more clarity into how the interface can be supported.

Copy link
Member

left a comment

/priority important-soon

@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2019

/retest

@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2019

/test pull-kubernetes-integration

1 similar comment
@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

/test pull-kubernetes-integration

// Score is called on each filtered node. It must return success and an integer
// indicating the rank of the node. All scoring plugins must return success or
// the pod will be rejected.
Score(pc *PluginContext, p *v1.Pod, nodeName string, meta interface{}) (*Status, int)

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jun 21, 2019

Member

We should not have meta in the args. metadata can be stored in PluginContext and used by the plugins that may need it.

This comment has been minimized.

Copy link
@ahg-g

ahg-g Jun 26, 2019

Author Member

Done.

internalcache "k8s.io/kubernetes/pkg/scheduler/internal/cache"
)

// Code is the Status code/type which is returned from plugins.
type Code int

// PluginToHostPriorityMap declares a map from plugin name to its HostPriorityList.
type PluginToHostPriorityMap map[string]schedulerapi.HostPriorityList

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jun 21, 2019

Member

I know that HostPriorityList already exists and we may want to use that, but that's not a great name. Let's create a similar struct in the framework API called NodeScoreList and rename PluginToHostPriorityMap to PluginToNodeScoreMap.

This comment has been minimized.

Copy link
@ahg-g

ahg-g Jun 26, 2019

Author Member

Done.

pluginToHostPriorityMap := make(PluginToHostPriorityMap, len(f.scorePlugins))
for _, pl := range f.scorePlugins {
plScores := make(schedulerapi.HostPriorityList, len(nodes))
// TODO: run the following loop in parallel.

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jun 21, 2019

Member

yes, this is important

This comment has been minimized.

Copy link
@ahg-g

ahg-g Jun 26, 2019

Author Member

Done.

@bsalamat

This comment has been minimized.

Copy link
Member

commented Jun 21, 2019

Please ensure to put a release note in the first post of the PR.

@ahg-g ahg-g force-pushed the ahg-g:scoring branch from 8b1050f to c259426 Jun 26, 2019
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Jun 26, 2019
@ahg-g ahg-g changed the title Interface for the scheduling framework Score plugin. Score plugin for the scheduling framework. Jun 26, 2019
@draveness

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

The unreserve plugin flakes again, I'll open a PR to refactor the test with a separated scheduler by ginkgo tomorrow.

/retest

@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jul 1, 2019

The unreserve plugin flakes again, I'll open a PR to refactor the test with a separated scheduler by ginkgo tomorrow.

/retest

The reason for this flake isn't related to state reset, and so the planned refactoring wouldn't solve this failure. I explain the reason for the flake here: #79611

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/v1alpha1/interface.go Outdated Show resolved Hide resolved
pkg/scheduler/core/generic_scheduler.go Show resolved Hide resolved
pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/v1alpha1/interface.go Outdated Show resolved Hide resolved
pkg/scheduler/util/utils.go Outdated Show resolved Hide resolved
@draveness

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

We should put a release note here

Add Score extension point to the scheduling framework.
@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jul 10, 2019

We should put a release note here

Add Score extension point to the scheduling framework.

Thanks, done.

@ahg-g ahg-g force-pushed the ahg-g:scoring branch from c227fda to 9844ce3 Jul 11, 2019
pkg/scheduler/util/utils.go Outdated Show resolved Hide resolved
// NonBlockingSendError sends val to the provided channel without blocking on the receiver.
// Note that this function should only be used for errors since type casting the interface
// adds overhead that is best avoided in performance critical paths.
func NonBlockingSendError(ch chan<- interface{}, val interface{}) {

This comment has been minimized.

Copy link
@bsalamat

bsalamat Jul 12, 2019

Member

I am fine with these functions, but please do not add them to this file. @misterikkit tries hard to remove many things from this file. We should try to shrink and eventually remove it. You can move these to a new file of its own.

I think you should also create a struct that has a channel member and these functions are its methods. A "New" method can be used to create the struct and these functions do not need to receive the channel every time they are called. Something like:

type ErrorChannel struct {
  ch chan<- interface{}
}

func (e *ErrorChannel) SendError(val interface{}) {
...
}

func (e *ErrorChannel) SendErrorWithCancel(val interface{}, cancel context.CancelFunc) {
...
}

func (e *ErrorChannel) ReceiveError() ...

func NewErrorChannel() *ErrorChannel {
...
}

This comment has been minimized.

Copy link
@ahg-g

ahg-g Jul 12, 2019

Author Member

ok, changed it back to simple lock, and will use the new interface once merged and when we do the NormalizeScore part.

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2019

/test pull-kubernetes-bazel-build

@ahg-g ahg-g force-pushed the ahg-g:scoring branch from e9e7aef to c54c4d1 Jul 16, 2019
ahg-g added 2 commits Jul 16, 2019
Copy link
Member

left a comment

/lgtm
/approve

Thanks, @ahg-g!

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 16, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a3898dc into kubernetes:master Jul 17, 2019
23 checks passed
23 checks passed
cla/linuxfoundation ahg-g authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@tedyu

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

@ahg-g

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2019

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/79465/pull-kubernetes-integration/1151429469014069250/
TestScorePlugin was among the failed tests

Thanks, I filed issue #80255, I think I know what the problem is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.