-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup(framework): return earlier if all Score plugins are skipped #118297
Conversation
/priority important-soon As described, it's not a critical bug. But, it's a blocker to implement Skip in Score plugins because it's breaking the e2e test in the PR of Skip implementation (e.g., #117024). |
@@ -1014,49 +1014,57 @@ func (f *frameworkImpl) RunScorePlugins(ctx context.Context, state *framework.Cy | |||
metrics.FrameworkExtensionPointDuration.WithLabelValues(metrics.Score, status.Code().String(), f.profileName).Observe(metrics.SinceInSeconds(startTime)) | |||
}() | |||
allNodePluginScores := make([]framework.NodePluginScores, len(nodes)) | |||
numPlugins := len(f.scorePlugins) - state.SkipScorePlugins.Len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trigger of this bug is that numPlugins
becomes less than 0.
016fd06
to
d6ebf95
Compare
I ran into this before in some tests, but is there really a scenario that a plugin is enabled for prescore and disabled for score in reality? |
I'd say 100% No if no one made mistakes in the scheduler component config. This fix is for mistakers. /remove-priority important-soon I'll fix the e2e test side to unblock the Skip PR. |
But, on second thought, we may just want to accept this bug because this PR will remove pre-allocation for |
d6ebf95
to
e0d96a6
Compare
Let's go with this thought unless other people want this bug fix. /remove-kind bug |
e0d96a6
to
6b6ab1b
Compare
/retest |
@sanposhiho: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sanposhiho, Tusenka The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign |
Can you fix the test? |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Are you still planning this? |
It's in icebox state in my todo list. I don't think I can take time to take care of it for now. Anyone feel free to raise a similar PR. (or maybe future me will) /close |
@sanposhiho: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
do an earlier return when all Score plugins returned Skip in PreScore.
This PR originally had the fix for panic bug
The scheduler panics when all these conditions are satisfied:
You can reproduce the panic with newly added UT like the following.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This PR originally had the fix for panic bug
@kubernetes/sig-scheduling-leads
In my opinion, we don't need to cherry-pick this into v1.27, but please anyone double-check.
This shouldn't break the default Kubernetes scheduler because any in-tree Score plugins don't return Skip in v1.27 in the first place. (ref)
It might break the custom scheduler, for example, the scheduler with some custom PreScore plugins which may return Skip, but are disabled in the Score extension point. Also, to see this bug, "the number of PreScore plugins that returned Skip and disabled in the Score" should be more than the number of all Score plugins enabled in the scheduler, which is an extremely minor case.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: