New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(scheduler): won't run Filter if PreFilter returned a Skip status #114125
feature(scheduler): won't run Filter if PreFilter returned a Skip status #114125
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Nov 25 03:33:55 UTC 2022. |
@sanposhiho: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @alculquicondor |
/retest |
5c0fba5
to
a54e6ad
Compare
st.RegisterPreFilterPlugin( | ||
"FakePreFilter1", | ||
st.NewFakePreFilterPlugin("FakeFilter1", nil, nil), | ||
), | ||
st.RegisterFilterPlugin( | ||
"FakeFilter1", | ||
st.NewFakeFilterPlugin(map[string]framework.Code{ | ||
"node1": framework.Unschedulable, | ||
}), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I want to make sure that all other PreFilter or Filter plugins are executed even if some plugins return skip in PreFilter and some plugins are skipped in Filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving that tests just to the runtime package, but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep it because if this test have existed in the past PR (reverted one), actually we wouldn't have encountered the bug.
The runtime package's test can only confirm the behavior of specific extension point (Only either Filter or PreFilter) and it cannot confirm what if the scheduler actually run from PreFilter to Filter.
@@ -94,8 +94,15 @@ func (pl *NodeAffinity) PreFilter(ctx context.Context, cycleState *framework.Cyc | |||
affinity := pod.Spec.Affinity | |||
if affinity == nil || | |||
affinity.NodeAffinity == nil || | |||
affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution == nil || | |||
len(affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we preserve this line here?
So that if there is no RequiredDuringSchedulingIgnoredDuringExecution
and no addedNodeSelector
and no .Spec.NodeSelector
, we also return Skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we can add len(affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms) == 0
here, right? I think that's true. Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at an existing test... it looks like my assumption was wrong. Empty NodeSelectorTerms
for a non-nil RequiredDuringScheduling
means no match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, that's true.
Btw, I checked the validation logic, and Pods with a nil []NodeSelectorTerm should be rejected actually.
https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L4012-L4014
Anyway, even given that, we probably should keep the original behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, good find. It should be fine to remove the test case. Up to you.
}, | ||
{ | ||
name: "missing labels", | ||
pod: st.MakePod().NodeSelector(map[string]string{ | ||
"foo": "bar", | ||
}).Obj(), | ||
wantStatus: framework.NewStatus(framework.UnschedulableAndUnresolvable, ErrReasonPod), | ||
wantStatus: framework.NewStatus(framework.UnschedulableAndUnresolvable, ErrReasonPod), | ||
disablePreFilter: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? Same question for the rest of the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, we don't need it. Removed all disablePreFilter
.
@@ -45,15 +45,18 @@ func TestNodeAffinity(t *testing.T) { | |||
disablePreFilter bool | |||
}{ | |||
{ | |||
name: "no selector", | |||
pod: &v1.Pod{}, | |||
name: "no selector and affinity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different from test case "Pod with no Affinity will schedule onto a node" below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems duplicated. Removed it 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the new case, instead of the old one. It's better for the git history.
pkg/scheduler/framework/interface.go
Outdated
func (s *Status) IsSuccess() bool { | ||
return s.Code() == Success | ||
return s.Code() == Success || s.Code() == Skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reasonable? IsSuccess general means we'll perform happy path then, but IsSkip means we'll skip the phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is generally used like if !s.IsSuccess()
, in which it makes sense. Maybe we should change the name? But I think Skip
is not too different from Success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the same as @alculquicondor. Skip is also kinda happy path.
I'm not against renaming, but is there any good name for the func? No good idea from my poor English vocab. 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We not only depend on !s.IsSuccess()
, also s.IsSuccess()
, see here (also other places)
kubernetes/pkg/scheduler/schedule_one.go
Lines 514 to 521 in 17bf864
if status.IsSuccess() { | |
length := atomic.AddInt32(&feasibleNodesLen, 1) | |
if length > numNodesToFind { | |
cancel() | |
atomic.AddInt32(&feasibleNodesLen, -1) | |
} else { | |
feasibleNodes[length-1] = nodeInfo.Node() | |
} |
This is not for PreFilter, but we now changed the underlying meanings of Success
. If the code is Skip, I don't think we should process with this. Maybe we should add another method like IsPassed()
means we can continue with the logic that follows, both success or skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping IsSuccess
as it is (= only Success
) and creating another func like IsPassed
(= Success
and Skip
) makes sense to me.
In pkg/scheduler/schedule_one.go
, we currently need to use IsSkip
to distinguish Skip
from IsSuccess
(= Success
and Skip
), and a new IsSuccess
(= only Success
) would be useful there.
Then, we can just change all other places where using IsSuccess
to use IsPassed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4fec5285ed6. The diff gets much bigger now, but I believe I replace all IsSuccess
to IsPassed
. (Other than the ones in pkg/scheduler/schedule_one.go
which was used like IsSuccess() && !IsSkip()
.)
/retest |
@@ -617,6 +620,9 @@ func (f *frameworkImpl) RunPreFilterPlugins(ctx context.Context, state *framewor | |||
} | |||
return nil, framework.AsStatus(fmt.Errorf("running PreFilter plugin %q: %w", pl.Name(), s.AsError())).WithFailedPlugin(pl.Name()) | |||
} | |||
if s.IsSkip() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if status is not success, how will it be skipped? I think we missed a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not correctly understand what you mean, but if a returned status isn't a success, then it doesn't reach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget this for my misread, sorry.
pkg/scheduler/framework/interface.go
Outdated
func (s *Status) IsSuccess() bool { | ||
return s.Code() == Success | ||
return s.Code() == Success || s.Code() == Skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We not only depend on !s.IsSuccess()
, also s.IsSuccess()
, see here (also other places)
kubernetes/pkg/scheduler/schedule_one.go
Lines 514 to 521 in 17bf864
if status.IsSuccess() { | |
length := atomic.AddInt32(&feasibleNodesLen, 1) | |
if length > numNodesToFind { | |
cancel() | |
atomic.AddInt32(&feasibleNodesLen, -1) | |
} else { | |
feasibleNodes[length-1] = nodeInfo.Node() | |
} |
This is not for PreFilter, but we now changed the underlying meanings of Success
. If the code is Skip, I don't think we should process with this. Maybe we should add another method like IsPassed()
means we can continue with the logic that follows, both success or skip.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dacfe90
to
fa6cb72
Compare
Squashed. Thanks @alculquicondor @kerthcet for the long-time effort in reviewing this. |
dd138f5
to
6791547
Compare
state := framework.NewCycleState() | ||
|
||
f.RunPreFilterPlugins(ctx, state, nil) | ||
f.RunPreFilterExtensionAddPod(ctx, state, nil, nil, nil) | ||
f.RunPreFilterExtensionRemovePod(ctx, state, nil, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added these changes to prevent nil-pointer panic in RunPreFilterExtensionAddPod
and RunPreFilterExtensionRemovePod
. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
LGTM label has been added. Git tree hash: 0af973e0dd5d182a572893ef6ccdd1f6226dfe4c
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
2 similar comments
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
name string | ||
plugins []*TestPlugin | ||
wantPreFilterResult *framework.PreFilterResult | ||
wantSkippedPlugins sets.Set[string] | ||
wantStatusCode framework.Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
for go format CI failure
feel free to unhold once the CI is green.
7189322
to
e5b5367
Compare
Oops, sorry for that. Formated. 🙏 |
/retest |
/retest |
/lgtm |
LGTM label has been added. Git tree hash: fa18af83341c08a7817aff55c709c5f3378289f7
|
/unhold |
What type of PR is this?
/kind feature
/sig scheduling
What this PR does / why we need it:
Change the framework so that it doesn't run plugins's Filter() if its PreFilter() returned a Skip status.
For example, nodeAffinity can return Skip in PreFilter if the pod doesn't specify any node selector or affinity.
This skip status is basically regarded as "Success" and you needs to distinguish Skip from Success by
IsSkip
func, notIsSuccess
func if you need to refer to Skip status.#112637 got reverted because of the bug. 🙏
This PR also changes NodeAffinity PreFilter to return Skip so that we can ensure the same bug doesn't exist in this PR anymore. It's just a starting point, once this PR gets merged, I'll work on changing other PreFilter plugins to return Skip appropriately.
This PR is composed of three commits:
Which issue(s) this PR fixes:
Part of #107556
Part of #110643
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: