Fix panic when process RunScorePlugins for cap out of range #121632

kerthcet · 2023-10-31T08:11:00Z

What type of PR is this?

/kind bug
/sig scheduling
/kind regression

What this PR does / why we need it:

The root cause is we can randomly customize the preScore & Score plugins in the KubeSchedulerConfiguration so there maybe 0 score plugins but x prescore plugins, then as long as one prescore plugins returns skip, we'll panic as the cap is negative, reporting panic: runtime error: makeslice: cap out of range

numPlugins := len(f.scorePlugins) - state.SkipScorePlugins.Len()

The precise number of numPlugins should be plugin in f.scorePlugins also in SkipScorePlugins, then we will have to traverse these two slices for comparison, considering the plugin number will not large, so I pick the len(f.scorePlugins) instead.

This bug was introduced in #115652, so it's a regression.

Which issue(s) this PR fixes:

Fixes #121630

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a regression since 1.27.0 in scheduler framework when running score plugins. 
The `skippedScorePlugins` number might be greater than `enabledScorePlugins`, 
so when initializing a slice the cap(len(enabledScorePlugins) - len(skippedScorePlugins)) is negative, 
which is not allowed.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Signed-off-by: kerthcet <kerthcet@gmail.com>

k8s-ci-robot · 2023-10-31T08:11:08Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-10-31T08:11:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [kerthcet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kerthcet · 2023-10-31T08:13:13Z

pkg/scheduler/framework/runtime/framework_test.go

@@ -1385,6 +1385,22 @@ func TestRunScorePlugins(t *testing.T) {
 				},
 			},
 		},
+		{
+			name:           "skipped prescore plugin number greater than the number of score plugins",


Panic here when revert the change.

kerthcet · 2023-10-31T08:17:39Z

/hold
Let me check again.

kerthcet · 2023-10-31T08:27:20Z

/hold cancel
functionality is ok.

denkensk

/lgtm

Thanks @kerthcet

k8s-ci-robot · 2023-10-31T10:31:02Z

LGTM label has been added.

Git tree hash: 588774c1c03e1f39fc06eb24c15f14f0847121e4

alculquicondor · 2023-10-31T17:57:35Z

Does this affect default scheduler? Do we need to cherry-pick?

alculquicondor · 2023-10-31T17:57:49Z

Please add a release note.

Huang-Wei · 2023-10-31T22:49:28Z

If it's a regression, let's ensure we tag it as kind/regression and also point out the PR that introduced this. @liggitt has a tool analyzing all regressions based on this tag.

kerthcet · 2023-11-01T03:10:51Z

Does this affect default scheduler? Do we need to cherry-pick?

Yes, I'll cherry-pick this.

If it's a regression, let's ensure we tag it as kind/regression and also point out the PR that introduced this

Done.

/kind regression

liggitt · 2023-11-01T03:38:11Z

Thanks, please add the specific release (1.x.y) that regressed in the release note

kerthcet · 2023-11-01T08:10:27Z

Thanks, please add the specific release (1.x.y) that regressed in the release note

Do I have to add the patch version? This bug was introduced since 1.27, so basically 1.27.X and 1.28.X should all have this bug. Or what I need to add is simply 1.27, 1.28? @liggitt

EDIT:
Modified as Fixed a regression since 1.27.0 ... plz ping me if not meet your expectation.

alculquicondor · 2023-11-01T13:19:49Z

Just to clarify, this affects the default scheduler, but not the default profile, as we never have a Score plugin enabled without its PreScore counterpart.
Is that correct?

kerthcet · 2023-11-01T16:22:26Z

Yes, for default profile, preScore & score are paired, but for default scheduler, we can customize another profile which breaks the pairing. I used to suggest validating the pairing, they should be enabled and disabled together, then we can also remove the failover like preScore state not exist, but seems not accepted at that moment, I forgot the reason. :(

Huang-Wei · 2023-11-18T00:44:49Z

The upgrade of scheduler-plugins also hit this regression: kubernetes-sigs/scheduler-plugins#670 (comment)

…1632-upstream-release-1.28 Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range

…1632-upstream-release-1.27 Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range

Fix panic when process RunScorePlugins for cap out of range

b02aad4

Signed-off-by: kerthcet <kerthcet@gmail.com>

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 31, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023

k8s-ci-robot requested review from denkensk and sanposhiho October 31, 2023 08:12

kerthcet commented Oct 31, 2023

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023

denkensk reviewed Oct 31, 2023

View reviewed changes

k8s-ci-robot assigned denkensk Oct 31, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2023

k8s-ci-robot merged commit d84ee0b into kubernetes:master Oct 31, 2023
14 checks passed

k8s-ci-robot added this to the v1.29 milestone Oct 31, 2023

liggitt mentioned this pull request Oct 31, 2023

Revert "Make the decode function respect the timeout context" #121646

Merged

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 1, 2023

k8s-ci-robot added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Nov 1, 2023

kerthcet deleted the fix/runscoreplugins branch November 1, 2023 03:10

This was referenced Nov 1, 2023

Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range #121666

Merged

Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range #121667

Merged

Huang-Wei mentioned this pull request Nov 18, 2023

panic while initializing kube-scheduler if score plugin is disabled and preScore plugins are not. #121630

Closed

k8s-ci-robot added a commit that referenced this pull request Nov 18, 2023

Merge pull request #121667 from kerthcet/automated-cherry-pick-of-#12…

770666e

…1632-upstream-release-1.28 Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range

k8s-ci-robot added a commit that referenced this pull request Nov 20, 2023

Merge pull request #121666 from kerthcet/automated-cherry-pick-of-#12…

0a0ea3d

…1632-upstream-release-1.27 Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix panic when process RunScorePlugins for cap out of range #121632

Fix panic when process RunScorePlugins for cap out of range #121632

kerthcet commented Oct 31, 2023 •

edited

k8s-ci-robot commented Oct 31, 2023

k8s-ci-robot commented Oct 31, 2023

kerthcet Oct 31, 2023

kerthcet commented Oct 31, 2023

kerthcet commented Oct 31, 2023

denkensk left a comment

k8s-ci-robot commented Oct 31, 2023

alculquicondor commented Oct 31, 2023

alculquicondor commented Oct 31, 2023

Huang-Wei commented Oct 31, 2023

kerthcet commented Nov 1, 2023

liggitt commented Nov 1, 2023

kerthcet commented Nov 1, 2023 •

edited

alculquicondor commented Nov 1, 2023

kerthcet commented Nov 1, 2023

Huang-Wei commented Nov 18, 2023

Fix panic when process RunScorePlugins for cap out of range #121632

Fix panic when process RunScorePlugins for cap out of range #121632

Conversation

kerthcet commented Oct 31, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Oct 31, 2023

k8s-ci-robot commented Oct 31, 2023

kerthcet Oct 31, 2023

Choose a reason for hiding this comment

kerthcet commented Oct 31, 2023

kerthcet commented Oct 31, 2023

denkensk left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 31, 2023

alculquicondor commented Oct 31, 2023

alculquicondor commented Oct 31, 2023

Huang-Wei commented Oct 31, 2023

kerthcet commented Nov 1, 2023

liggitt commented Nov 1, 2023

kerthcet commented Nov 1, 2023 • edited

alculquicondor commented Nov 1, 2023

kerthcet commented Nov 1, 2023

Huang-Wei commented Nov 18, 2023

kerthcet commented Oct 31, 2023 •

edited

kerthcet commented Nov 1, 2023 •

edited