Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup useless null pointer check about nodeInfo.Node() from snapshot for in-tree plugins #117834

Conversation

NoicFank
Copy link
Contributor

@NoicFank NoicFank commented May 6, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Since nodeInfo.Node() is guaranteed to be not nil for all the nodes in the snapshot, I think it's safe to cleanup null pointer check about nodeInfo.Node() for in-tree plugins.

Besides, currently some in-tree plugins directly use nodeInfo.Node() without null pointer check while others do not, so we unify the plugins to not check null pointer about nodeInfo.Node().

Which issue(s) this PR fixes:

Fixes # NONE

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2023
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2023
@NoicFank
Copy link
Contributor Author

NoicFank commented May 6, 2023

/retest

@NoicFank
Copy link
Contributor Author

NoicFank commented May 6, 2023

/assign @Huang-Wei PTAL, thanks.

@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch from 0457124 to 49169da Compare May 14, 2023 12:34
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 14, 2023
@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch 4 times, most recently from 8152163 to 2b1a2d7 Compare May 14, 2023 14:04
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test idea lgtm

pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch 3 times, most recently from d696ca4 to b0ba0b0 Compare May 18, 2023 08:33
@NoicFank
Copy link
Contributor Author

@Huang-Wei @alculquicondor Thanks for multiple suggestions. The testing code has been adjusted, PTAL, thanks a lot.

@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch 2 times, most recently from 6408253 to 6f5dd62 Compare May 18, 2023 09:30
@alculquicondor
Copy link
Member

leaving the review of the test to @Huang-Wei

pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Show resolved Hide resolved
pkg/scheduler/schedule_one_test.go Outdated Show resolved Hide resolved
createPodIndex++
}
}
go wait.Until(createPodsOneRound, 14*time.Millisecond, ctx.Done())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional to have it 14 ms here, and 15 ms in the deleteNodes goroutine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I originally thought that this setting could increase the probability of collisions between processes deleteNodes to delete node with createPods to trigger scheduler do snapshot. Which is helpful to find the nil node panic.

But it doesn't seem to have much impact now. So I unified them.

Comment on lines 595 to 617
// capture the events to wait all pods to be scheduled at least once
allWaitSchedulingPods := sets.NewString()
for i := 0; i < waitSchedulingPodNumber; i++ {
allWaitSchedulingPods.Insert(fmt.Sprintf("pod%d", i))
}
var wg sync.WaitGroup
wg.Add(waitSchedulingPodNumber)
stopFn, err := broadcaster.StartEventWatcher(func(obj runtime.Object) {
e, ok := obj.(*eventsv1.Event)
if !ok || (e.Reason != "Scheduled" && e.Reason != "FailedScheduling") {
return
}
if allWaitSchedulingPods.Has(e.Regarding.Name) {
wg.Done()
allWaitSchedulingPods.Delete(e.Regarding.Name)
}
})
if err != nil {
t.Fatal(err)
}
defer stopFn()

wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is logically accurate. But it seems it'd make this UT a bit slow:

$ go test ./pkg/scheduler -run TestSchedulerGuaranteeNonNilNodeInSchedulingCycle
ok  	k8s.io/kubernetes/pkg/scheduler	1.771s

I don't think a single UT would cost ~2s is a good idea... Do you think if we can achieve the same goal by simply plumbing a counter (make it atomic and plus 1 for each call) in the fake plugin, and check the completion on the counter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the execution time of this UT should be optimized. And I found that most of the time was spent waiting for the func createPodsOneRound to be called, as calculated by the following:

(waitSchedulingPodNumber / createPodNumberPerRound) * waitPeriod = (2000 / 25) * 14ms = 1.12s

Then, I update those parameters as following:

  • waitSchedulingPodNumber from 2000 to 200
  • createPodNumberPerRound from 25 to 50
  • waitPeriod from 14ms to 10ms
(waitSchedulingPodNumber / createPodNumberPerRound) * waitPeriod = (200 / 50) * 10ms = 0.04s

And, the overall UT time consumption on my Mac is 0.19s

=== RUN   TestSchedulerGuaranteeNonNilNodeInSchedulingCycle
--- PASS: TestSchedulerGuaranteeNonNilNodeInSchedulingCycle (0.19s)
PASS

In this case, I am thinking of keeping the fake plugin as it is now~

@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch from 6f5dd62 to 17661a6 Compare May 20, 2023 14:50
@NoicFank NoicFank force-pushed the cleanup-scheduler-node-must-not-nil-in-snapshot branch from 17661a6 to ed26fcf Compare May 20, 2023 14:53
@NoicFank NoicFank requested a review from Huang-Wei May 20, 2023 15:30
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/triage accepted
/lgtm
/approve

Thanks @NoicFank !

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 20, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 89b8c09b42a6776e0ae4bfef837cebc0ccff17f5

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, NoicFank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2023
@k8s-ci-robot k8s-ci-robot merged commit c7c41d2 into kubernetes:master May 20, 2023
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants