New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run all PreFilter when the preemption will happen later in the same scheduling cycle #119779
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sat Aug 5 22:30:03 UTC 2023. |
will soon add UTs |
/retest |
/cc @Huang-Wei |
@@ -655,7 +656,16 @@ func (f *frameworkImpl) RunPreFilterPlugins(ctx context.Context, state *framewor | |||
if !s.IsSuccess() { | |||
s.SetFailedPlugin(pl.Name()) | |||
if s.IsUnschedulable() { | |||
return nil, s | |||
if s.Code() == framework.UnschedulableAndUnresolvable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to always continue running PreFilter upon Unschedulable
, I'm thinking of conditional continue/stop:
- option 1: offer a knob in scheduler config to indicate whether it's intended to continue run PreFilter plugin upon the first
Unschedulable
- it should be a profile-specific parameter. - option 2: try to deduce the user's intent w/o offering a config knob. Technically, it's doable to tell whether a PreFilter plugin returns nil in PreFilterExtensions. For example, suppose pluginA and pluginB both implement PreFilter, pluginB is placed after pluginA. When pluginA fails with
Unschedulable
- runPreFilterPlugins() stops if pluginB returns nil PreFilterExtensions
- runPreFilterPlugins() continues if pluginB returns non-nil PreFilterExtensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg. I prefer option2, as option1 seems a bit difficult for users to understand appropriately, and I don't come up with any scenario that option2 cannot cover, but people want to continue or stop running other PreFilter.
But, can we leave this enhancement to a follow-up so that this PR won't go bigger? because I'm thinking this PR deserves to be cherry-picked to older versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @alculquicondor @ahg-g @kerthcet A simple approach is to keep running all PreFilter plugins upon Unschedulable, which should be no penalty for in-tree plugins (at least for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw: even for out-of-tree plugins, it's still very strict to trigger this issue, basically you have to satisfy the following conditions:
- an out-of-tree pluginA returns Unschedulable in PreFilter
- an out-of-tree pluginB is placed after pluginA, and
- it implements PreFilter, and write its own key to cycleState
- it implements PreFilterExtensions#Add/RemovePod, and report error upon missing key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I haven't got time to check the sig-meeting rec, but is the conclusion just to go with keeping running all PreFilter when Unschedulable and leave the idea of conditional continue at least for now?
Also, is there anything to improve on this PR that I need to pile up to get /lgtm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current approach. And it's not that severe comparing to other 2 regressions we recently discussed. Let's polish this (I guess there will be conflicts on tests) after those 2 regression PRs get merged first, and then discuss if we want to cherry-pick.
e1457bd
to
265fc51
Compare
/remove-lifecycle rotten |
Rebased. |
In which case can this happen in the default scheduler? |
/retest |
/remove-kind bug |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please update the release note to reflect that this is a new feature that external plugins can take advantage of. It's not a bug for the existing in-tree plugins. /lgtm |
LGTM label has been added. Git tree hash: 02649e6e98e33ac1e694b098455063f19ee28f57
|
@alculquicondor Updated. Also I remarked on a bug that could happen in custom schedulers though, feel free to eliminate that part if you don't prefer to include it. |
/hold cancel |
We are going to need an updated release note. And then prepare the cherry-pick, please. |
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
@@ -2227,7 +2227,7 @@ func TestSchedulerSchedulePod(t *testing.T) { | |||
nodes: []string{"node1", "node2", "node3"}, | |||
pod: st.MakePod().Name("test-prefilter").UID("test-prefilter").Obj(), | |||
wantNodes: sets.New("node2"), | |||
wantEvaluatedNodes: ptr.To[int32](1), | |||
wantEvaluatedNodes: ptr.To[int32](3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this number shouldn't have changed?
Although it's only useful for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be changed.
Please refer to the change in pkg/scheduler/scheduler.go in this PR: After this PR, EvaluatedNodes contains the number of nodes that filtered out by PreFilterResult.
https://github.com/kubernetes/kubernetes/pull/119779/files#r1439161728
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that it's not a bug. However, it was useful to have the value 1 to know that only one node passed PreFilter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then, we have to change how to calculate EvaluatedNodes
.
https://github.com/sanposhiho/kubernetes/blob/master/pkg/scheduler/schedule_one.go#L435
Created #124705
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
…egal nodes, the pod scheduling flow will abort immediately. this is a minimum fix from kubernetes#119779 Signed-off-by: joey <zchengjoey@gmail.com>
What type of PR is this?
/kind bug
/triage accepted
/priority important-soon
What this PR does / why we need it:
Not all PreFilter plugins aren't executed either
But maybe their Filter()s are executed in the preemption and could cause error by trying to read data, which is supposed to be stored by PreFilter (actually not), from cycle state.
The former case shouldn't happen in the default scheduler since we don't have PreFilter plugins returning Unschedulable.
But, the latter case may happen in the default scheduler, and cherry-pick is needed.
#119777 adds integ test for this scenario and #119780 proves this patch fixing the bug.
Which issue(s) this PR fixes:
Part of #119770
Special notes for your reviewer:
PreFilterResult was introduced at 1.24. So, we need to cherry-pick this PR for all supported versions.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: