Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule even if extender is not available when using extender #61445

Merged
merged 2 commits into from Apr 6, 2018

Conversation

@resouer
Copy link
Member

resouer commented Mar 21, 2018

What this PR does / why we need it:
When using scheduler extender, if the extender is not available scheduling of all pods fail.
We should let the scheduling happen but display error message that extender is failing.

IsIgnorable() is added to extender to indicate: if scheduling of all pods should fail when it's unavailable

Backward compabtiility:

We use IsIgnorable instead of IsCritical so that when this flag is not set, the default value will be false, i.e. not ignorable, which consistent with the current behavior in existing extenders.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes: #60616

Special notes for your reviewer:
kindly cc @ravisantoshgudimetla to see if this meets your expectation

TODO: update the examples in kubernetes/examples, but the strategy there is not clear to me for now

Release note:

Schedule even if extender is not available when using extender 

@resouer resouer removed the approved label Mar 21, 2018

@k8s-ci-robot k8s-ci-robot requested review from k82cn and wojtek-t Mar 21, 2018

@resouer resouer force-pushed the resouer:extender-priority branch 2 times, most recently from d3829fb to 345bbb1 Mar 21, 2018

@resouer resouer requested review from ravisantoshgudimetla and removed request for wojtek-t and k82cn Mar 21, 2018

@@ -272,7 +272,13 @@ func (g *genericScheduler) processPreemptionWithExtenders(
var err error
// Replace nodeToVictims with result after preemption from extender.
if nodeToVictims, err = extender.ProcessPreemption(pod, nodeToVictims, g.cachedNodeInfoMap); err != nil {
return nil, err
if extender.IsIgnorable() {
glog.Warningf("Skip: extender: %v is currently unavailable: %v",

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Mar 21, 2018

Contributor

IIUC, this skips even if the extender is available but returns an error. While this is not what I had in my mind when I created issue, this solves the problem.

I think this error could be misleading. We could have gotten some other error as well from extender, while continuing part seems fine, we should let the actual error be shown here. Smthg like

glog.Warningf("Skipping extender %v as it returned error %v and has ignorable flag set", extender, err)

This comment has been minimized.

@resouer

resouer Mar 22, 2018

Author Member

Got it, make sense.

@ravisantoshgudimetla
Copy link
Contributor

ravisantoshgudimetla left a comment

@resouer Thanks for working on this. Looks very good, except for nits. This addresses the issue I created. One thing to note is we can log the warning here itself.

https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/core/extender.go#L388

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Mar 22, 2018

@ravisantoshgudimetla Thanks for quick response!

Based on the last comment, do you prefer to skip the error on extender side if it's ignorable?

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor

ravisantoshgudimetla commented Mar 23, 2018

Based on the last comment, do you prefer to skip the error on extender side if it's ignorable?

Yes, if you mean adding the warning at "https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/core/extender.go#L388", I think it would be better because we are capturing error at the source rather than place where we are calling it from.

@@ -200,6 +200,9 @@ type ExtenderConfig struct {
// will skip checking the resource in predicates.
// +optional
ManagedResources []ExtenderManagedResource
// Ignorable specifies if the extender is ignorable, i.e. scheduling should not
// fail when this extender is unavailable.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Mar 23, 2018

Contributor

Let us the change the comment as well to something like "scheduling should not fail when extender returns an error"

This comment has been minimized.

@resouer

resouer Mar 23, 2018

Author Member

Sure, will fix this soon.

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Mar 26, 2018

Yes, if you mean adding the warning at "https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/core/extender.go#L388", I think it would be better because we are capturing error at the source rather than place where we are calling it from.

In that case, if the extender failed with errors other than send() (e.g. verb not defined etc), it will fail the scheduling. Not sure if this meets your expectation?

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor

ravisantoshgudimetla commented Mar 28, 2018

In that case, if the extender failed with errors other than send() (e.g. verb not defined etc), it will fail the scheduling. Not sure if this meets your expectation?

It shouldn't as we would still have continue in the loop but logging while continuing would make sense.

@resouer resouer force-pushed the resouer:extender-priority branch from 345bbb1 to a001ed4 Mar 28, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Mar 28, 2018

@ravisantoshgudimetla Take a look when you got time :)

I did not add warning in send() side as we have to keep warning in generic_scheduler.

@ravisantoshgudimetla
Copy link
Contributor

ravisantoshgudimetla left a comment

Thanks @resouer. Can you please squash the commits. @bsalamat, @k82cn Can one of you please review this?

@resouer resouer force-pushed the resouer:extender-priority branch from a001ed4 to 60b858c Mar 29, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Mar 29, 2018

Thanks! Rebased and squashed.

@@ -192,6 +192,9 @@ type ExtenderConfig struct {
// will skip checking the resource in predicates.
// +optional
ManagedResources []ExtenderManagedResource
// Ignorable specifies if the extender is ignorable, i.e. scheduling should not
// fail when this extender returns an error.

This comment has been minimized.

@bsalamat

bsalamat Mar 29, 2018

Contributor

To make things more clear for the users, please add to the comment:
"when the extender returns an error or is not reachable."

@@ -271,8 +271,15 @@ func (g *genericScheduler) processPreemptionWithExtenders(
if extender.SupportsPreemption() {
var err error
// Replace nodeToVictims with result after preemption from extender.
if nodeToVictims, err = extender.ProcessPreemption(pod, nodeToVictims, g.cachedNodeInfoMap); err != nil {
return nil, err
if nodeToVictims, err = extender.ProcessPreemption(

This comment has been minimized.

@bsalamat

bsalamat Mar 29, 2018

Contributor

When ProcessPreemption returns an error, it returns nil for nodeToVictims. This would cause preemption to not happen. You should not overwrite nodeToVictims when the extender is ignorable. If it returns error, you should bypass the extender and pass the existing nodeToVictims to the next extender.

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Mar 29, 2018

Contributor

Very good catch. I missed the fact, that we are iterating over extenders again.

This comment has been minimized.

@resouer

resouer Mar 29, 2018

Author Member

Thanks for pointing out, will fix it.

This comment has been minimized.

@bsalamat

bsalamat Mar 30, 2018

Contributor

@resouer It would be great to add a test for this, i.e., having two extenders that support preemption, one of them is ignorable and returns error, the second one removes one more node from the 'nodeToVictims' map and then test that preemption logic works as expected.

This comment has been minimized.

@resouer

resouer Mar 30, 2018

Author Member

Of course, test added, thank you for reviewing.

@resouer resouer force-pushed the resouer:extender-priority branch from 60b858c to 21f7f05 Mar 30, 2018

Add Ignorable flag to extender
Ignore extender in generic scheduler

Add test to verify the ignorable flag

Fix warning msg

@resouer resouer force-pushed the resouer:extender-priority branch from 21f7f05 to e047c20 Mar 30, 2018

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Mar 30, 2018

@resouer resouer force-pushed the resouer:extender-priority branch 2 times, most recently from 74fb5ef to 0ae3156 Mar 30, 2018

}
return []*v1.Pod{}, 0, true
if !fits {
return nil, 0, false, err

This comment has been minimized.

@wgliang

wgliang Apr 2, 2018

Member

Here the value of err is nil, perhaps we should give an ‘no fits’ error.

This comment has been minimized.

@resouer

resouer Apr 4, 2018

Author Member

This is expected to be nil, because for the caller side, unfit is different from error. But I can change it to return nil, 0, false, nil for consistency.

return nil, 0, false, err
}
if !fits {
return nil, 0, false, nil

This comment has been minimized.

@wgliang

wgliang Apr 2, 2018

Member

minor comment.

@resouer resouer force-pushed the resouer:extender-priority branch from 0ae3156 to e38364e Apr 4, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Apr 4, 2018

kindly ping @bsalamat for final review.

@bsalamat
Copy link
Contributor

bsalamat left a comment

just one minor comment. otherwise, lgtm.

// Scheduling is expected to not fail in
// Filter/Prioritize phases if the extender is not available and ignorable.
//
// If scheduler did not ignore the extender, the test will fail

This comment has been minimized.

@bsalamat

bsalamat Apr 4, 2018

Contributor

s/will/would/

This comment has been minimized.

@resouer

resouer Apr 4, 2018

Author Member

Thanks, just fixed

@resouer resouer force-pushed the resouer:extender-priority branch from e38364e to 083684d Apr 4, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Apr 4, 2018

@resouer: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-100-performance 083684d link /test pull-kubernetes-e2e-gce-100-performance

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Apr 5, 2018

pull-kubernetes-e2e-gce-100-performance should be a flaky test

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Apr 5, 2018

/lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Apr 5, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, resouer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Apr 5, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Apr 6, 2018

Automatic merge from submit-queue (batch tested with PRs 62063, 62169, 62155, 62139, 61445). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 6d8df0c into kubernetes:master Apr 6, 2018

15 of 16 checks passed

pull-kubernetes-e2e-gce-100-performance Job failed.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation resouer authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@resouer resouer deleted the resouer:extender-priority branch Jun 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.