Add Un-reserve extension point for the scheduling framework #77598

danielqsj · 2019-05-08T15:07:58Z

What type of PR is this?

/kind feature
/sig scheduling

What this PR does / why we need it:

Add Un-reserve extension point for the scheduling framework

Which issue(s) this PR fixes:

Fixes #77288 #77573

Special notes for your reviewer:

The former PR is #77457, but the test TestUnreservePlugin introduced is flaky. Ref #77573.
Then #77577 revert it.

This PR reintroduce Un-reserve extension point and fix the TestUnreservePlugin.

Does this PR introduce a user-facing change?:

Add Un-reserve extension point for the scheduling framework.

danielqsj · 2019-05-08T15:08:54Z

/assign @bsalamat @neolit123 @liggitt
/cc @tedyu

liggitt · 2019-05-08T15:16:57Z

can you separate out the changes made that address the flake into their own commit (straight un-revert in one commit, then flake fixes in a second)?

neolit123

thanks for the the update on the t.Errorf messages.

danielqsj · 2019-05-08T15:29:23Z

@liggitt yeah. Thanks for your suggestion, it's more readable to review.

bsalamat · 2019-05-08T19:16:51Z

test/integration/scheduler/framework_test.go

+			if err = wait.Poll(10*time.Millisecond, 30*time.Second, podSchedulingError(cs, pod.Namespace, pod.Name)); err != nil {
+				t.Errorf("test #%v: Expected a scheduling error, but didn't get it. error: %v", i, err)
+			}
+			if unresPlugin.numUnreserveCalled != pbdPlugin.numPrebindCalled {


Is there any explanation why the unreserve plugin might be called more than once?

From log, I find sometimes the test pod will be schedule twice before it be cleaned.

I0509 00:49:27.662644 7237 wrap.go:47] POST /api/v1/namespaces/test-2/pods: (1.030311ms) 201 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42654] I0509 00:49:27.662832 7237 scheduling_queue.go:795] About to try and schedule pod test-2/test-pod I0509 00:49:27.662850 7237 scheduler.go:452] Attempting to schedule pod: test-2/test-pod I0509 00:49:27.662962 7237 scheduler_binder.go:256] AssumePodVolumes for pod "test-2/test-pod", node "test-node-0" I0509 00:49:27.662980 7237 scheduler_binder.go:266] AssumePodVolumes for pod "test-2/test-pod", node "test-node-0": all PVCs bound and nothing to do numPrebindCalled++ -> 1 I0509 00:49:27.663012 7237 framework.go:85] rejected by prebind-plugin at prebind: reject pod test-pod E0509 00:49:27.663025 7237 factory.go:662] Error scheduling test-2/test-pod: rejected by prebind-plugin at prebind: reject pod test-pod; retrying I0509 00:49:27.663043 7237 factory.go:720] Updating pod condition for test-2/test-pod to (PodScheduled==False, Reason=Unschedulable) I0509 00:49:27.668967 7237 wrap.go:47] POST /api/v1/namespaces/test-2/events: (5.354584ms) 201 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42656] I0509 00:49:27.669020 7237 wrap.go:47] GET /api/v1/namespaces/test-2/pods/test-pod: (5.473273ms) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42652] I0509 00:49:27.669169 7237 wrap.go:47] PUT /api/v1/namespaces/test-2/pods/test-pod/status: (5.935527ms) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42654] I0509 00:49:27.669322 7237 scheduling_queue.go:795] About to try and schedule pod test-2/test-pod I0509 00:49:27.669347 7237 scheduler.go:452] Attempting to schedule pod: test-2/test-pod numUnreserveCalled++ -> 1 I0509 00:49:27.669550 7237 scheduler_binder.go:256] AssumePodVolumes for pod "test-2/test-pod", node "test-node-0" I0509 00:49:27.669570 7237 scheduler_binder.go:266] AssumePodVolumes for pod "test-2/test-pod", node "test-node-0": all PVCs bound and nothing to do numPrebindCalled++ -> 2 I0509 00:49:27.669609 7237 framework.go:85] rejected by prebind-plugin at prebind: reject pod test-pod E0509 00:49:27.669618 7237 factory.go:662] Error scheduling test-2/test-pod: rejected by prebind-plugin at prebind: reject pod test-pod; retrying I0509 00:49:27.669638 7237 factory.go:720] Updating pod condition for test-2/test-pod to (PodScheduled==False, Reason=Unschedulable) numUnreserveCalled++ -> 2 I0509 00:49:27.671425 7237 wrap.go:47] GET /api/v1/namespaces/test-2/pods/test-pod: (1.537755ms) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42652] E0509 00:49:27.671603 7237 factory.go:686] pod is already present in unschedulableQ I0509 00:49:27.671793 7237 wrap.go:47] PATCH /api/v1/namespaces/test-2/events/test-pod.159cf4497ef3b14b: (1.799554ms) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42656] I0509 00:49:27.764167 7237 wrap.go:47] GET /api/v1/namespaces/test-2/pods/test-pod: (739.161µs) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42656] reset numUnreserveCalled -> 0 reset numPrebindCalled -> 0 I0509 00:49:27.766347 7237 scheduling_queue.go:795] About to try and schedule pod test-2/test-pod I0509 00:49:27.766376 7237 scheduler.go:448] Skip schedule deleting pod: test-2/test-pod I0509 00:49:27.771321 7237 wrap.go:47] POST /api/v1/namespaces/test-2/events: (4.727587ms) 201 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42652] I0509 00:49:27.771869 7237 wrap.go:47] DELETE /api/v1/namespaces/test-2/pods/test-pod: (7.36413ms) 200 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42656] I0509 00:49:27.774353 7237 wrap.go:47] GET /api/v1/namespaces/test-2/pods/test-pod: (1.067354ms) 404 [scheduler.test/v0.0.0 (linux/amd64) kubernetes/$Format 127.0.0.1:42656] danieldebug: pod is deleted

@bsalamat Due to the time competition between :
1. scheduling_queue retry to schedule the test pod
2. cleanupPods

What we can do here is to check whether the times of prebind fails equal the times of unreserve. And I think it's safe and by design, right ?

Correct. This is expected. Thanks for checking. More accurately, since the pod is rejected at pre-bind, the scheduler will retry scheduling it. The scheduler may retry the pod one or more times before we check the number of times unreserve is called. In order to make the test more robust, please change the condition to:

if unresPlugin.numUnreserveCalled == 0 || unresPlugin.numUnreserveCalled != pbdPlugin.numPrebindCalled

Agree. Condition changed. PTAL

bsalamat

Thanks, @danielqsj!
Please squash commits.

/lgtm

danielqsj · 2019-05-10T05:20:19Z

@bsalamat squashed. PTAL, thanks

bsalamat

/lgtm
/approve

Thanks, @danielqsj!

k8s-ci-robot · 2019-05-10T05:24:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, danielqsj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [bsalamat]
~~test/integration/scheduler/OWNERS~~ [bsalamat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned bsalamat and liggitt May 8, 2019

k8s-ci-robot requested a review from ravisantoshgudimetla May 8, 2019 15:08

k8s-ci-robot assigned neolit123 May 8, 2019

k8s-ci-robot requested review from resouer and tedyu May 8, 2019 15:08

danielqsj mentioned this pull request May 8, 2019

fix TestUnreservePlugin failed issue #77574

Closed

danielqsj force-pushed the unreserve branch from eae0b65 to 3599159 Compare May 8, 2019 15:25

neolit123 reviewed May 8, 2019

View reviewed changes

bsalamat reviewed May 8, 2019

View reviewed changes

liggitt removed their assignment May 10, 2019

bsalamat reviewed May 10, 2019

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 10, 2019

Add Un-reserve extension point for the scheduling framework

997648a

danielqsj force-pushed the unreserve branch from ea59c86 to 997648a Compare May 10, 2019 05:19

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2019

bsalamat reviewed May 10, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2019

k8s-ci-robot merged commit b9ccdd2 into kubernetes:master May 10, 2019

draveness mentioned this pull request Jun 3, 2019

Scheduling Framework kubernetes/enhancements#624

Closed

draveness mentioned this pull request Jun 25, 2019

pull-kubernetes-integration#TestUnreservePlugin fails frequently #79166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Un-reserve extension point for the scheduling framework #77598

Add Un-reserve extension point for the scheduling framework #77598

danielqsj commented May 8, 2019 •

edited by bsalamat

danielqsj commented May 8, 2019

liggitt commented May 8, 2019

neolit123 left a comment

danielqsj commented May 8, 2019

bsalamat May 8, 2019

danielqsj May 9, 2019

danielqsj May 9, 2019

bsalamat May 9, 2019

danielqsj May 10, 2019

bsalamat left a comment

danielqsj commented May 10, 2019

bsalamat left a comment

k8s-ci-robot commented May 10, 2019

Add Un-reserve extension point for the scheduling framework #77598

Add Un-reserve extension point for the scheduling framework #77598

Conversation

danielqsj commented May 8, 2019 • edited by bsalamat

danielqsj commented May 8, 2019

liggitt commented May 8, 2019

neolit123 left a comment

Choose a reason for hiding this comment

danielqsj commented May 8, 2019

bsalamat May 8, 2019

Choose a reason for hiding this comment

danielqsj May 9, 2019

Choose a reason for hiding this comment

danielqsj May 9, 2019

Choose a reason for hiding this comment

bsalamat May 9, 2019

Choose a reason for hiding this comment

danielqsj May 10, 2019

Choose a reason for hiding this comment

bsalamat left a comment

Choose a reason for hiding this comment

danielqsj commented May 10, 2019

bsalamat left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 10, 2019

danielqsj commented May 8, 2019 •

edited by bsalamat