Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization on running prePreEnqueuePlugins before adding pods into activeQ #115583

Merged
merged 1 commit into from Feb 15, 2023

Conversation

lianghao208
Copy link
Member

@lianghao208 lianghao208 commented Feb 7, 2023

What type of PR is this?

/kind feature
/sig scheduling

What this PR does / why we need it:

discussion: #113275 (comment)

  • pods called in flushBackoffQCompleted() should have already passed all PreEnqueue plugins, and given its state is one-way transitional, we don't need to go through PreEnqueue plugins again.

  • add a flag to call/skip PreEnqueue plugins.

Which issue(s) this PR fixes:

Part of #113608

Special notes for your reviewer:

cc @Huang-Wei

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 7, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 7, 2023
@lianghao208
Copy link
Member Author

/sig scheduling

@lianghao208
Copy link
Member Author

@Huang-Wei PTAL, the reason I introduced a skipPreEnqueue flag in addToActiveQ instead of calling activeQ.Add(pInfo) directly in flushBackoffQCompleted() is that I don't intend to break the encapsulation of addToActiveQ, WDYT?

@Huang-Wei
Copy link
Member

the reason I introduced a skipPreEnqueue flag in addToActiveQ instead of calling activeQ.Add(pInfo) directly in flushBackoffQCompleted() is that I don't intend to break the encapsulation of addToActiveQ, WDYT?

There is no need :) just call p.activeQ.Add(pInfo) directly in flushBackoffQCompleted().

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 11, 2023
@Huang-Wei
Copy link
Member

and could you add a test to cover the change? (ensure PreEnqueue is not called during flushBackoffQCompleted().

@lianghao208 ^^ in case you didn't see this comment

@lianghao208
Copy link
Member Author

and could you add a test to cover the change? (ensure PreEnqueue is not called during flushBackoffQCompleted().

@lianghao208 ^^ in case you didn't see this comment

working on it, thanks for the reminder.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 14, 2023
@lianghao208
Copy link
Member Author

@Huang-Wei already added a test to cover, PTAL

@lianghao208
Copy link
Member Author

@Huang-Wei already addressed the comments accordingly:)

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

Thanks @lianghao208 !

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d2fdf47034d663ec655b2f7b1f5bc125e8115488

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, lianghao208

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2023
@k8s-ci-robot k8s-ci-robot merged commit 40deade into kubernetes:master Feb 15, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 15, 2023
@sanposhiho
Copy link
Member

Sorry if I may miss something. But, I have doubts related to this change.

The PR description said "pods called in flushBackoffQCompleted() should have already passed all PreEnqueue plugins", is it really correct?

The runPreEnqueuePlugins func is only called in addToActiveQ. Where is it called before Pod is going to backoffQ?

@sanposhiho
Copy link
Member

sanposhiho commented Apr 11, 2023

For we only support one-direction state transition from gated -> non-gated, and pods in backoffQ are all non-gated.
#113275 (comment)

This is true only for the schedulinggate plugin. We say people mustn't change gated -> non-gated via schedulingGate field. But we don't say the PreEnqueue plugin cannot change the decision to a Pod once it returned success to a Pod.

So, in general, this change is a huge breaking change for other custom PreEnqueue plugin since the PreEnqueue is no longer the place where the Pods go through before ActiveQ

@sanposhiho
Copy link
Member

cc @kubernetes/sig-scheduling-leads

I even think we should revert this.
Given PreEnqueue plugins should be lightweight, this change has very less impact on the actual performance of enqueueing while breaking existing PreEnqueue plugins.

Also, this will make this proposal impossible.

@lianghao208
Copy link
Member Author

@sanposhiho If a Pod is in backoffQ, it must have passed through activeQ, which means it have already passed all PreEnqueue plugins.(Only if those pod passed all PreEnqueue plugins can they be insert into activeQ) Does it make sense to you?

PS: backoffQ.Add() has been called in following places:
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/internal/queue/scheduling_queue.go#L520
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/internal/queue/scheduling_queue.go#L662
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/internal/queue/scheduling_queue.go#L782

In these senario, pod may not been bound successfully (unschedulable) and need to add it back to scheduling queue.(already passed activeQ before)

@sanposhiho
Copy link
Member

sanposhiho commented Apr 11, 2023

No.
I mean even if the PreEnqueue plugins returned success to Pods previously, it doesn't mean that Pod will always get success from those PreEnqueue plugins. The PreEnqueue plugins may change the result based on the situation


You are correct if only talking about the schedulinggate.
But, PreEnqueue plugin is not only for the schedulinggate; we may get another usecase of PreEnqueue plugin like this. Also, there should be various usecases outside the upstream kubernetes. (Actually, my company is using custom PreEnqueue plugin)
And, even now, actually the PreEnqueue doesn't state that "the result for each Pod shouldn't get changed once it returns success for a Pod.". That's why I'm saying it's a "breaking" change.

@ahg-g
Copy link
Member

ahg-g commented Apr 11, 2023

I agree that this should be reverted. The case for preEnqueue is not limited to schedulingGates.

@sanposhiho
Copy link
Member

sanposhiho commented Apr 11, 2023

@ahg-g can we include the change for reverting this in v1.27? (I mean is it allowed under the code freeze policy?

@Huang-Wei
Copy link
Member

I think we should revert this too. It's a big win to ensure out-of-tree PreEnqueue plugins (or even in the future in-tree plugins as well) to be called when Pods are about to be enqueued.

@lianghao208 apologize for the confusion for initializing this PR idea and assign to you.

@Huang-Wei
Copy link
Member

I asked if it's too late to revert the change, in sig-release channel. If it's too late, we can wait for the first 1.27 patch release.

@sanposhiho
Copy link
Member

I'll send the reverting PR soon.
Depending on the answer from the sig-releasing (slack thread), we'll include the change in v1.27.0 or v1.27.1.

@sanposhiho
Copy link
Member

#117194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants