Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120557

pohly · 2023-09-11T07:52:36Z

Cherry pick of #120334 on release-1.26.

#120334: scheduler: start scheduling attempt with clean

For details on the cherry pick process, see the cherry pick requests page.

When some plugin was registered as "unschedulable" in some previous scheduling attempt, it kept that attribute for a pod forever. When that plugin then later failed with an error that requires backoff, the pod was incorrectly moved to the "unschedulable" queue where it got stuck until the periodic flushing because there was no event that the plugin was waiting for. Here's an example where that happened: framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c" schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c" ... scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip" Pop still needs the information about unschedulable plugins to update the UnschedulableReason metric. It can reset that information before returning the PodInfo for the next scheduling attempt.

k8s-ci-robot · 2023-09-11T07:52:45Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sanposhiho · 2023-09-12T00:25:49Z

/retest
/kind bug

saschagrunert · 2023-09-14T07:15:13Z

This one needs approval and LGTM before merging it together with #120535, #120556 and #120558

pohly · 2023-09-15T09:20:05Z

@sanposhiho: can you approve and lgtm?

sanposhiho · 2023-09-15T09:21:48Z

Oops, why I didn't.

/lgtm
/approve

k8s-ci-robot · 2023-09-15T09:21:55Z

LGTM label has been added.

Git tree hash: bdb313ef7b33fdf6a0570f869213954d012035d9

Vivekgaddigi · 2023-09-15T20:52:00Z

/approve

Vivekgaddigi · 2023-09-15T20:53:58Z

/lgtm
/approve

k8s-ci-robot · 2023-09-15T20:54:02Z

@Vivekgaddigi: changing LGTM is restricted to collaborators

In response to this:

/lgtm
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Vivekgaddigi · 2023-09-15T20:57:55Z

/assign @alculquicondor

alculquicondor · 2023-09-18T14:58:03Z

oops, I missed this one

/approve
/lgtm
/kind bug
/cc kubernetes/release-managers

k8s-ci-robot · 2023-09-21T10:05:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, pohly, sanposhiho, saschagrunert, Vivekgaddigi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2023-09-21T10:46:00Z

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit	`ad177b6`	link	unknown	`/test pull-kubernetes-unit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

saschagrunert · 2023-09-21T10:59:07Z

/retest

k8s-ci-robot added this to the v1.26 milestone Sep 11, 2023

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Sep 11, 2023

k8s-ci-robot requested review from denkensk and sanposhiho September 11, 2023 07:53

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 11, 2023

pohly mentioned this pull request Sep 11, 2023

Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120535

Merged

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 12, 2023

k8s-ci-robot assigned sanposhiho Sep 15, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 15, 2023

Vivekgaddigi approved these changes Sep 15, 2023

View reviewed changes

k8s-ci-robot assigned alculquicondor Sep 15, 2023

k8s-ci-robot requested a review from a team September 18, 2023 14:58

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2023

saschagrunert approved these changes Sep 21, 2023

View reviewed changes

saschagrunert added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. labels Sep 21, 2023

k8s-ci-robot merged commit 9bbb813 into kubernetes:release-1.26 Sep 21, 2023
13 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120557

Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120557

pohly commented Sep 11, 2023

k8s-ci-robot commented Sep 11, 2023

sanposhiho commented Sep 12, 2023

saschagrunert commented Sep 14, 2023

pohly commented Sep 15, 2023

sanposhiho commented Sep 15, 2023

k8s-ci-robot commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

k8s-ci-robot commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

alculquicondor commented Sep 18, 2023

k8s-ci-robot commented Sep 21, 2023

k8s-ci-robot commented Sep 21, 2023

saschagrunert commented Sep 21, 2023

Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120557

Automated cherry pick of #120334: scheduler: start scheduling attempt with clean #120557

Conversation

pohly commented Sep 11, 2023

k8s-ci-robot commented Sep 11, 2023

sanposhiho commented Sep 12, 2023

saschagrunert commented Sep 14, 2023

pohly commented Sep 15, 2023

sanposhiho commented Sep 15, 2023

k8s-ci-robot commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

k8s-ci-robot commented Sep 15, 2023

Vivekgaddigi commented Sep 15, 2023

alculquicondor commented Sep 18, 2023

k8s-ci-robot commented Sep 21, 2023

k8s-ci-robot commented Sep 21, 2023

saschagrunert commented Sep 21, 2023