sched: integration test to cover event registration #105337

Huang-Wei · 2021-09-29T03:23:38Z

What type of PR is this?

/kind cleanup
/sig scheduling

What this PR does / why we need it:

Add an integration test to cover event registration for core resources.

Which issue(s) this PR fixes:

Fixes #105303.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

k8s-ci-robot · 2021-09-29T03:23:45Z

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Huang-Wei · 2021-09-29T03:24:07Z

/hold
for #104782 to be merged first.

k8s-ci-robot · 2021-09-29T03:24:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/integration/scheduler/OWNERS~~ [Huang-Wei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

test/integration/scheduler/queue_test.go

kerthcet · 2021-09-29T09:30:17Z

test/integration/scheduler/queue_test.go

+
+	// Create two Pods that are both unschedulable.
+	// - Pod1 is a best-efforts Pod, but doesn't have the required toleration.
+	// - Pod2 has the required toleration, but requests a large amount of CPU resource that the node cannot fit.


I think Pod2 has no tolerations, but not affect the finally test results.

Yes, I should have updated the comment. Done right now.

My original thought was to add a toleration for pod2, but it turned out with no extra code coverage. So I intentionally leave it without toleration, but with excessive pod req - that will cover the usage of preCheckForNode().

test/integration/scheduler/queue_test.go

chendave · 2021-09-30T06:26:37Z

test/integration/scheduler/queue_test.go

+	// - Pod2 requests a large amount of CPU resource that the node cannot fit.
+	//   Note: Pod2 will fail the tainttoleration plugin b/c that's ordered prior to noderesources.
+	pod1 := st.MakePod().Namespace(ns).Name("pod1").Container("image").Obj()
+	pod2 := st.MakePod().Namespace(ns).Name("pod2").Req(map[v1.ResourceName]string{v1.ResourceCPU: "4"}).Obj()


wrap the pod2 with Toleration to make sure pod2 will be failed with noderesource instead of thetaint.

It's intentional. Also explained in #105337 (comment).

If pod2 goes with a mapped toleration, it'd fail due to NodeResource failure, and hence won't be triggered at all. However, in the case above, pod2 also failed due to TaintToleration, and hence looked like it should have been triggered. But why it's not? It's due to the preCheckNode() logic, which serves as the last gate to move a pod or not. So in this case, we used pod2 to cover the preCheckNode() logic.

BTW: I will come up with a pod3 case that failed VolumeBinding, but due to a tiny bug, I will raise a bug fix along with the test case.

Thanks for the explanation, any idea on why the extra check in the preCheckForNode is needed instead of just moving a pod and leaving the filters to make the judgement? those logic seems like was migrated from kubelet and only cover some basic filtering.

chendave · 2021-09-30T06:28:30Z

test/integration/scheduler/queue_test.go

+		}
+		// Schedule the Pod manually.
+		_, fitError := testCtx.Scheduler.Algorithm.Schedule(ctx, nil, fwk, framework.NewCycleState(), podInfo.Pod)
+		if fitError == nil {


assert the err is fitError instead of assuming it's fitError.

that's redundant IMO.

chendave · 2021-09-30T06:30:41Z

test/integration/scheduler/util.go

@@ -548,3 +548,18 @@ func nextPodOrDie(t *testing.T, testCtx *testutils.TestContext) *framework.Queue
 	}
 	return podInfo
 }
+
+// nextPod returns the next Pod in the scheduler queue, with a 15 seconds timeout.


some comments here, might be just say the err is an acceptable result as the Queue might be empty to differentiate with
the func of nextPodOrDie.

We don't return an error here actually, just a popped pod or nil. Mentioning err would be confusing.

Huang-Wei · 2021-10-04T16:42:12Z

/cc @ahg-g

ahg-g · 2021-10-06T20:19:35Z

test/integration/scheduler/queue_test.go

+	// It's intended to not start the scheduler's queue, and hence to
+	// not start any flushing logic. We will pop and schedule the Pods manually later.


is this comment really meant for the CleanupTest call below?

nope :) it's just usually we start the scheduler right after SyncInformerFactory().

I will move the comments above.

test/integration/scheduler/queue_test.go

ahg-g · 2021-10-06T20:56:52Z

test/integration/scheduler/util.go

+	var podInfo *framework.QueuedPodInfo
+	// NextPod() is a blocking operation. Wrap it in timeout() to avoid relying on
+	// default go testing timeout (10m) to abort.
+	if err := timeout(testCtx.Ctx, time.Second*15, func() {


so basically the test will run for at least 15seconds, why not just check the active queue length instead?

There is only a PendingPods() function so far which cannot tell whether it's from activeQ or unschedulableQ/backoffQ. Also, I want to wait for some time instead of immediate check to detect undesirable pods' moving.

we can add a function to return the length of the active queue, but if you want to wait, 15seconds feels a lot in this case, I would reduce it to 5

I'd prefer to reduce it to 5 secs.

ahg-g · 2021-10-07T00:38:15Z

/lgtm

Huang-Wei · 2021-10-07T00:53:45Z

/unhold

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Sep 29, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2021

k8s-ci-robot requested review from chendave and damemi September 29, 2021 03:24

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 29, 2021

chendave reviewed Sep 29, 2021

View reviewed changes

test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved

test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved

kerthcet reviewed Sep 29, 2021

View reviewed changes

kerthcet mentioned this pull request Sep 30, 2021

Remove scheduler policy config and cc v1beta1 kubernetes/enhancements#2901

Closed

8 tasks

chendave reviewed Sep 30, 2021

View reviewed changes

Huang-Wei force-pushed the pr-105303 branch from 9405e6d to 4206c03 Compare October 4, 2021 16:40

k8s-ci-robot requested a review from ahg-g October 4, 2021 16:42

ahg-g reviewed Oct 6, 2021

View reviewed changes

sched: integration test to cover event registration

3283e6b

Huang-Wei force-pushed the pr-105303 branch from f03d574 to 3283e6b Compare October 6, 2021 22:17

k8s-ci-robot assigned ahg-g Oct 7, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2021

k8s-ci-robot merged commit 79ee735 into kubernetes:master Oct 7, 2021

k8s-ci-robot added this to the v1.23 milestone Oct 7, 2021

Huang-Wei deleted the pr-105303 branch November 9, 2021 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sched: integration test to cover event registration #105337

sched: integration test to cover event registration #105337

Huang-Wei commented Sep 29, 2021

k8s-ci-robot commented Sep 29, 2021

Huang-Wei commented Sep 29, 2021

k8s-ci-robot commented Sep 29, 2021

kerthcet Sep 29, 2021

Huang-Wei Sep 29, 2021

chendave Sep 30, 2021

Huang-Wei Sep 30, 2021

chendave Oct 8, 2021

chendave Sep 30, 2021

Huang-Wei Sep 30, 2021

chendave Sep 30, 2021

Huang-Wei Sep 30, 2021

Huang-Wei commented Oct 4, 2021

ahg-g Oct 6, 2021

Huang-Wei Oct 6, 2021

ahg-g Oct 6, 2021

Huang-Wei Oct 6, 2021

ahg-g Oct 6, 2021

Huang-Wei Oct 6, 2021

ahg-g commented Oct 7, 2021

Huang-Wei commented Oct 7, 2021

		// It's intended to not start the scheduler's queue, and hence to
		// not start any flushing logic. We will pop and schedule the Pods manually later.

sched: integration test to cover event registration #105337

sched: integration test to cover event registration #105337

Conversation

Huang-Wei commented Sep 29, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Sep 29, 2021

Huang-Wei commented Sep 29, 2021

k8s-ci-robot commented Sep 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Oct 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Oct 7, 2021

Huang-Wei commented Oct 7, 2021