Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic when process RunScorePlugins for cap out of range #121632

Merged
merged 1 commit into from Oct 31, 2023

Conversation

kerthcet
Copy link
Member

@kerthcet kerthcet commented Oct 31, 2023

What type of PR is this?

/kind bug
/sig scheduling
/kind regression

What this PR does / why we need it:

The root cause is we can randomly customize the preScore & Score plugins in the KubeSchedulerConfiguration so there maybe 0 score plugins but x prescore plugins, then as long as one prescore plugins returns skip, we'll panic as the cap is negative, reporting panic: runtime error: makeslice: cap out of range

numPlugins := len(f.scorePlugins) - state.SkipScorePlugins.Len()

The precise number of numPlugins should be plugin in f.scorePlugins also in SkipScorePlugins, then we will have to traverse these two slices for comparison, considering the plugin number will not large, so I pick the len(f.scorePlugins) instead.

This bug was introduced in #115652, so it's a regression.

Which issue(s) this PR fixes:

Fixes #121630

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a regression since 1.27.0 in scheduler framework when running score plugins. 
The `skippedScorePlugins` number might be greater than `enabledScorePlugins`, 
so when initializing a slice the cap(len(enabledScorePlugins) - len(skippedScorePlugins)) is negative, 
which is not allowed. 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: kerthcet <kerthcet@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
@@ -1385,6 +1385,22 @@ func TestRunScorePlugins(t *testing.T) {
},
},
},
{
name: "skipped prescore plugin number greater than the number of score plugins",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic here when revert the change.

@kerthcet
Copy link
Member Author

/hold
Let me check again.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@kerthcet
Copy link
Member Author

/hold cancel
functionality is ok.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
Copy link
Member

@denkensk denkensk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @kerthcet

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 588774c1c03e1f39fc06eb24c15f14f0847121e4

@k8s-ci-robot k8s-ci-robot merged commit d84ee0b into kubernetes:master Oct 31, 2023
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 31, 2023
@alculquicondor
Copy link
Member

Does this affect default scheduler? Do we need to cherry-pick?

@alculquicondor
Copy link
Member

Please add a release note.

@Huang-Wei
Copy link
Member

If it's a regression, let's ensure we tag it as kind/regression and also point out the PR that introduced this. @liggitt has a tool analyzing all regressions based on this tag.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 1, 2023
@kerthcet
Copy link
Member Author

kerthcet commented Nov 1, 2023

Does this affect default scheduler? Do we need to cherry-pick?

Yes, I'll cherry-pick this.

If it's a regression, let's ensure we tag it as kind/regression and also point out the PR that introduced this

Done.

/kind regression

@k8s-ci-robot k8s-ci-robot added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Nov 1, 2023
@kerthcet kerthcet deleted the fix/runscoreplugins branch November 1, 2023 03:10
@liggitt
Copy link
Member

liggitt commented Nov 1, 2023

Thanks, please add the specific release (1.x.y) that regressed in the release note

@kerthcet
Copy link
Member Author

kerthcet commented Nov 1, 2023

Thanks, please add the specific release (1.x.y) that regressed in the release note

Do I have to add the patch version? This bug was introduced since 1.27, so basically 1.27.X and 1.28.X should all have this bug. Or what I need to add is simply 1.27, 1.28? @liggitt

EDIT:
Modified as Fixed a regression since 1.27.0 ... plz ping me if not meet your expectation.

@alculquicondor
Copy link
Member

Just to clarify, this affects the default scheduler, but not the default profile, as we never have a Score plugin enabled without its PreScore counterpart.
Is that correct?

@kerthcet
Copy link
Member Author

kerthcet commented Nov 1, 2023

Yes, for default profile, preScore & score are paired, but for default scheduler, we can customize another profile which breaks the pairing. I used to suggest validating the pairing, they should be enabled and disabled together, then we can also remove the failover like preScore state not exist, but seems not accepted at that moment, I forgot the reason. :(

@Huang-Wei
Copy link
Member

The upgrade of scheduler-plugins also hit this regression: kubernetes-sigs/scheduler-plugins#670 (comment)

k8s-ci-robot added a commit that referenced this pull request Nov 18, 2023
…1632-upstream-release-1.28

Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range
k8s-ci-robot added a commit that referenced this pull request Nov 20, 2023
…1632-upstream-release-1.27

Automated cherry pick of #121632: Fix panic when process RunScorePlugins for cap out of range
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic while initializing kube-scheduler if score plugin is disabled and preScore plugins are not.
6 participants