Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: integration test to cover event registration #105337

Merged
merged 1 commit into from Oct 7, 2021

Conversation

Huang-Wei
Copy link
Member

What type of PR is this?

/kind cleanup
/sig scheduling

What this PR does / why we need it:

Add an integration test to cover event registration for core resources.

Which issue(s) this PR fixes:

Fixes #105303.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 29, 2021
@k8s-ci-robot
Copy link
Contributor

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Sep 29, 2021
@Huang-Wei
Copy link
Member Author

/hold
for #104782 to be merged first.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 29, 2021
test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved

// Create two Pods that are both unschedulable.
// - Pod1 is a best-efforts Pod, but doesn't have the required toleration.
// - Pod2 has the required toleration, but requests a large amount of CPU resource that the node cannot fit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Pod2 has no tolerations, but not affect the finally test results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should have updated the comment. Done right now.

My original thought was to add a toleration for pod2, but it turned out with no extra code coverage. So I intentionally leave it without toleration, but with excessive pod req - that will cover the usage of preCheckForNode().

test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved
// - Pod2 requests a large amount of CPU resource that the node cannot fit.
// Note: Pod2 will fail the tainttoleration plugin b/c that's ordered prior to noderesources.
pod1 := st.MakePod().Namespace(ns).Name("pod1").Container("image").Obj()
pod2 := st.MakePod().Namespace(ns).Name("pod2").Req(map[v1.ResourceName]string{v1.ResourceCPU: "4"}).Obj()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap the pod2 with Toleration to make sure pod2 will be failed with noderesource instead of thetaint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional. Also explained in #105337 (comment).

If pod2 goes with a mapped toleration, it'd fail due to NodeResource failure, and hence won't be triggered at all. However, in the case above, pod2 also failed due to TaintToleration, and hence looked like it should have been triggered. But why it's not? It's due to the preCheckNode() logic, which serves as the last gate to move a pod or not. So in this case, we used pod2 to cover the preCheckNode() logic.

BTW: I will come up with a pod3 case that failed VolumeBinding, but due to a tiny bug, I will raise a bug fix along with the test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, any idea on why the extra check in the preCheckForNode is needed instead of just moving a pod and leaving the filters to make the judgement? those logic seems like was migrated from kubelet and only cover some basic filtering.

}
// Schedule the Pod manually.
_, fitError := testCtx.Scheduler.Algorithm.Schedule(ctx, nil, fwk, framework.NewCycleState(), podInfo.Pod)
if fitError == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert the err is fitError instead of assuming it's fitError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's redundant IMO.

@@ -548,3 +548,18 @@ func nextPodOrDie(t *testing.T, testCtx *testutils.TestContext) *framework.Queue
}
return podInfo
}

// nextPod returns the next Pod in the scheduler queue, with a 15 seconds timeout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments here, might be just say the err is an acceptable result as the Queue might be empty to differentiate with
the func of nextPodOrDie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't return an error here actually, just a popped pod or nil. Mentioning err would be confusing.

@Huang-Wei
Copy link
Member Author

/cc @ahg-g

Comment on lines 62 to 63
// It's intended to not start the scheduler's queue, and hence to
// not start any flushing logic. We will pop and schedule the Pods manually later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment really meant for the CleanupTest call below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope :) it's just usually we start the scheduler right after SyncInformerFactory().

I will move the comments above.

test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/queue_test.go Outdated Show resolved Hide resolved
var podInfo *framework.QueuedPodInfo
// NextPod() is a blocking operation. Wrap it in timeout() to avoid relying on
// default go testing timeout (10m) to abort.
if err := timeout(testCtx.Ctx, time.Second*15, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so basically the test will run for at least 15seconds, why not just check the active queue length instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only a PendingPods() function so far which cannot tell whether it's from activeQ or unschedulableQ/backoffQ. Also, I want to wait for some time instead of immediate check to detect undesirable pods' moving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a function to return the length of the active queue, but if you want to wait, 15seconds feels a lot in this case, I would reduce it to 5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to reduce it to 5 secs.

@ahg-g
Copy link
Member

ahg-g commented Oct 7, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2021
@Huang-Wei
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2021
@k8s-ci-robot k8s-ci-robot merged commit 79ee735 into kubernetes:master Oct 7, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 7, 2021
@Huang-Wei Huang-Wei deleted the pr-105303 branch November 9, 2021 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sched: supplement an integration test to cover event registration
5 participants