Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor schedulingCycle and bindingCycle in scheduler #112025

Merged

Conversation

kerthcet
Copy link
Member

What type of PR is this?

/kind cleanup
/sig scheduling

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #103853

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 25, 2022
@k8s-ci-robot
Copy link
Contributor

@kerthcet: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 25, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2022
@kerthcet
Copy link
Member Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 14, 2022
@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from 6f4af4d to 1e3d18d Compare September 15, 2022 03:18
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 15, 2022
@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from 1e3d18d to c7cda01 Compare September 15, 2022 03:34
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2022
@kerthcet kerthcet marked this pull request as ready for review September 15, 2022 03:39
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2022
@kerthcet
Copy link
Member Author

/retest

pkg/scheduler/schedule_one.go Show resolved Hide resolved
pkg/scheduler/schedule_one.go Outdated Show resolved Hide resolved
pkg/scheduler/schedule_one.go Show resolved Hide resolved
@kerthcet
Copy link
Member Author

Flaky test #112465

@kerthcet
Copy link
Member Author

This is ready to review. cc @ahg-g @alculquicondor @Huang-Wei

@kerthcet
Copy link
Member Author

The flaky test is fixed.
/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2022
@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from f3c2ee4 to d52d36e Compare October 18, 2022 06:52
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2022
@kerthcet
Copy link
Member Author

Sorry about the squashed commit for solving code conflicts.

@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from d52d36e to b4b459d Compare October 18, 2022 07:30
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 18, 2022
@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from b4b459d to 78cfdc3 Compare October 18, 2022 08:45
pkg/scheduler/schedule_one.go Outdated Show resolved Hide resolved
}
sched.FailureHandler(ctx, fwk, assumedPodInfo, sts.AsError(), v1.PodReasonSchedulerError, clearNominatedNode)
return ScheduleResult{}, nil
return ScheduleResult{nominatingInfo: clearNominatedNode, reason: v1.PodReasonSchedulerError, unreserve: true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only 2 instances where we call Unreserve and Forget during scheduling cycle. Here and after Permit.

It sounds pretty clean to me to just keep the calls here.

bindingCycle handles it differently, but that's not relevant here. In other words, I think we can keep handleBindingCycleError

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
for nit

Comment on lines 127 to 128
podsToActivate *framework.PodsToActivate) (
ScheduleResult, *framework.QueuedPodInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
podsToActivate *framework.PodsToActivate) (
ScheduleResult, *framework.QueuedPodInfo, error) {
podsToActivate *framework.PodsToActivate,
) (ScheduleResult, *framework.QueuedPodInfo, error) {

maybe this is a bit more readable? But up to you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and squashed again.

reason = v1.PodReasonUnschedulable
}

sched.FailureHandler(ctx, fwk, podInfo, status.AsError(), reason, clearNominatedNode, start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought:

kind of weird that we have to pass an error and a reason.

Perhaps we should be passing a status (although I know it's a bit more tricky for scheduling cycle), or the error should contain information about whether it's unschedulable or not (by implementing Is)

But this can be left for follow up, if you have extra time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we can optimize later. I'll follow this when I have time.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 20, 2022
@alculquicondor
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 20, 2022
Signed-off-by: kerthcet <kerthcet@gmail.com>
@kerthcet kerthcet force-pushed the refactor/handle-scheduling-failure branch from a076be2 to f7f8578 Compare October 21, 2022 05:53
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2022
@kerthcet
Copy link
Member Author

/retest

@alculquicondor
Copy link
Member

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 21, 2022
@k8s-ci-robot k8s-ci-robot merged commit 18b8151 into kubernetes:master Oct 21, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Oct 21, 2022
@kerthcet kerthcet deleted the refactor/handle-scheduling-failure branch October 21, 2022 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants