Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-scheduler crashes and restarts with panics in DefaultPreemption plugin #101548

Closed
yuanchen8911 opened this issue Apr 28, 2021 · 36 comments · Fixed by #101560
Closed

kube-scheduler crashes and restarts with panics in DefaultPreemption plugin #101548

yuanchen8911 opened this issue Apr 28, 2021 · 36 comments · Fixed by #101560
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@yuanchen8911
Copy link
Member

yuanchen8911 commented Apr 28, 2021

What happened:

Kubernetes 1.19 (confirmed with 1.19.7 and 1.19.10)

kube-scheduler crashes and restarts with the following errors in default_preemption.go.

  1. panic: runtime error: index out of range [0] with length 0

The problem code is line 389. When victims is nil or victims.Pods is empty, the error happens. If we skip GetPodPriority when it's nil or empty, the error is gone.

387         victims := nodesToVictims[node]
388         // highestPodPriority is the highest priority among the victims on this node.
389         highestPodPriority := podutil.GetPodPriority(victims.Pods[0])
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0
goroutine 1406 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1c9fde0, 0xc00748cf48)
 /usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc007e3dbc0, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kubescheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:389 +0xd2c
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc0086e0840, 0x4, 0x4, 0xc008235800, 0x2068138)
/Users/yuanchen/projects/aci/go/kubescheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:331 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000ac4b80, 0x20677d0, 0xc005849540, 0xc0085616e0, 0xc008235800, 0xc008561710, 0x2d80750, 0xc00820a8c0, 0x7f0937ef47d8, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:135 +0x4bd
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000ac4b80, 0x20677d0, 0xc005849540, 0xc0085616e0, 0xc0071543a8, 0xc008561710, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:83 +0xf1
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin(0xc0002fd380, 0x20677d0, 0xc005849540, 0x7f0937a84040, 0xc000ac4b80, 0xc0085616e0, 0xc0071543a8, 0xc008561710, 0xc0084f3ab0, 0x5f5e100)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:557 +0x87
  1. panic: runtime error: invalid memory address or nil poster dereference

452 latestStartTime := util.GetEarliestPodStartTime(nodesToVictims[minNodes2[0]])

The problem is nodesToVictims[minNodes2[0]] does not exist and returns nil sometimes. Simply skipping it won't solve the problem. The scheduler will reach either line 456 or 341.

456        klog.Errorf("earliestStartTime is nil for node %s. Should not reach here.", minNodes2[0])

341 klog.Errorf("None candidate can be picked from %v.", candidates)

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15706d7]
goroutine 1385 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1b448c0, 0x2d281b0)
/usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/util.GetEarliestPodStartTime(0x0, 0xc004d4d9e0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/util/utils.go:56 +0x37
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc004d4d9e0, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:452+0x7a9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc0066e5180, 0x4, 0x4, 0xc0062f5000, 0x2068138)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:331 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000f9e880, 0x20677d0, 0xc004fa3340, 0xc0061b5290, 0xc0062f5000, 0xc0061b52c0, 0x2d80750, 0xc00603fc40, 0x7f6bcbc363f8, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:135 +0x4bd
: k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000f9e880, 0x20677d0, 0xc004fa3340, 0xc0061b5290, 0xc002523ac8, 0xc0061b52c0, 0x0, 0x0)

What you expected to happen:

The scheduler works without failures.

How to reproduce it (as minimally and precisely as possible):

When there are preemptions in a cluster.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.19.10 or v1.19.7
  • Cloud provider or hardware configuration: bare metal
  • OS (e.g: cat /etc/os-release): centos 7
  • Kernel (e.g. uname -a): Linux 5.4.77-7.el7pie Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sat Nov 21 01:16:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@yuanchen8911 yuanchen8911 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 28, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 28, 2021
@k8s-ci-robot
Copy link
Contributor

@yuanchen8911: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuanchen8911
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 28, 2021
@yuanchen8911
Copy link
Member Author

/cc @Huang-Wei @ahg-g

@niulechuan
Copy link

/assign

@Huang-Wei
Copy link
Member

@yuanchen8911 Are you running the vanilla scheduler? or a customized one that leverages the utilities in default_preemption.go?

@yuanchen8911 yuanchen8911 changed the title kube-scheduler crashes and restarts with panics in default_preemption plugin kube-scheduler crashes and restarts with panics in DefaultPreemption plugin Apr 28, 2021
@yuanchen8911
Copy link
Member Author

yuanchen8911 commented Apr 28, 2021

@Huang-Wei It's a custom scheduler with some internal out-of-tree plugins (just like /kubernetes-sigs/scheduler-plugins), but it doesn't have any custom postFilter plugins.

Which utility functions are you referring to? Thanks.

@ahg-g
Copy link
Member

ahg-g commented Apr 28, 2021

are those custom plugins filter plugins? do they maintain state?

If yes, then one hypothesis is that those custom filters filter the node in the filter phase, but not when executed in the preemption phase, and so this will result in a candidate node with no victims?

@yuanchen8911
Copy link
Member Author

Yes, it includes filter plugins and uses cycle state.

@ahg-g
Copy link
Member

ahg-g commented Apr 28, 2021

you need to make sure that those filter plugins play nicely when executed again in the preemption phase in the same cycle: i.e., produce the same result.

@yuanchen8911
Copy link
Member Author

@ahg-g Thanks What do you mean by performing filter in the preemption phase? would you mind elaborating it a little?

@yuanchen8911
Copy link
Member Author

What additional logic is needed to handle the preemption case?

@ahg-g
Copy link
Member

ahg-g commented Apr 28, 2021

in the preemption phase we run the filters again to check if removing lower priority pods make the node schedulable:

if fits, _, err := core.PodPassesFiltersOnNode(ctx, ph, state, pod, nodeInfo); !fits {

we later add them one by one to reduce the set of victim pods to the absolute minimum:

My hypothesis is that somehow your filter returns success when adding all the pods back, and so you end up with zero victims and a candidate node. This shouldn't happen because if the pod fits the node without removing any victims, then we shouldn't be running preemption in the first place. So the theory is that the custom filter is returning false when run in the filter phase, but returns true when executed in the preemption phase perhaps because it makes some assumptions about cyclestate that makes it behave this way (e.g. that a the filter will be executed once per node in a scheduling cycle).

@yuanchen8911
Copy link
Member Author

Disabled the filter plugin, but still see the same issues.

@ahg-g
Copy link
Member

ahg-g commented Apr 28, 2021

are you able to change the scheduler code and run the test again? I can send a patch tomorrow to add some debugging messages to help us root cause the issue.

@yuanchen8911
Copy link
Member Author

yes, thanks a lot! really appreciate it!

@yuanchen8911
Copy link
Member Author

@ahg-g You are right. We still use a deprecated Predicate extender, which causes the problem. After disabling it, it's working fine. It used to work with 1.18 though. We are retiring it. Thank you so much!!!

@yuanchen8911
Copy link
Member Author

If the test cases can help debug scheduler extender Webhook (Predicate) too, I'd like to try it. Thanks again!

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented Apr 28, 2021

How can we improve the preemption logic to prevent bugs and issues in custom Filters or Predicates crashing the scheduler?

@ahg-g
Copy link
Member

ahg-g commented Apr 28, 2021

for starter, we should check the list len, the scheduler should not panic in all cases.

Other than that, I think we need to clearly document what the preemption logic does (that it executes the prefilter extensions points and the filter plugins multiple times) and the expectations from filter plugins (that they may get executed more than once in the same cycle)

@Huang-Wei
Copy link
Member

We still use a deprecated Predicate extender, which causes the problem.

Is it a predicate extender or preemption extender? If it's a predicate extender, I don't quite think we found the root cause - b/c predicate extenders are not invoked during preemption (#86942 (comment)).

It makes more sense if it's a preemption extender - if the preemption extender mutates the candidates to make it invalid. If it's the case, could you verify it by adding debug info before and after L157 to check the validness of candidates?

candidates, status = CallExtenders(pl.fh.Extenders(), pod, nodeLister, candidates)

@Huang-Wei
Copy link
Member

the scheduler should not panic in all cases.

That's always true :)

@yuanchen8911
Copy link
Member Author

filed a PR to mitigate it. #101560

@yuanchen8911
Copy link
Member Author

It's a predicate extender. I'll debug it more.

We still use a deprecated Predicate extender, which causes the problem.

Is it a predicate extender or preemption extender? If it's a predicate extender, I don't quite think we found the root cause - b/c predicate extenders are not invoked during preemption (#86942 (comment)).

It makes more sense if it's a preemption extender - if the preemption extender mutates the candidates to make it invalid. If it's the case, could you verify it by adding debug info before and after L157 to check the validness of candidates?

candidates, status = CallExtenders(pl.fh.Extenders(), pod, nodeLister, candidates)

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented May 1, 2021

Added some debug info. after FindCandidates and CallExtenders. Here's the result.

FindCandidates returned 4 candidates, but one of candidates (candidates[3]) has zero pod. The candidate happened to be the best candidate and hence caused the "index out of range" error. It should have nothing to do with CallExtenders since our extender doesn't support Preemption. Disabling it may change pod placements/candidates selection and bypassed the error. It's like a coincidence.

To remove the panic, a simple fix is to check if candidate.Victims.Pods is empty. If it is, do not add the node to candidates.

func candidatesToVictimsMap(candidates []Candidate) map[string]*extenderv1.Victims {
329     m := make(map[string]*extenderv1.Victims)
330     for _, c := range candidates {
332         // if len(c.Victims().Pods) != 0 {
333         m[c.Name()] = c.Victims()
334         // }
335     }
336     return m
337 }

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

Here is the log with debug info.

12360 default_preemption.go:124] After FindCandidates
default_preemption.go:125] candidates length is  4
default_preemption.go:127] candidates[0].Name  = knode0019  victims Pods = 1
default_preemption.go:127] candidates[1].Name  = knode0016   victims Pods = 1
default_preemption.go:127] candidates[2].Name  = knode0024  victims Pods = 1
default_preemption.go:127] candidates[3].Name  = knode0046  victims Pods = 0
default_preemption.go:137] After CallExtenders
default_preemption.go:138] candidates length is  4
default_preemption.go:140] candidates[0].Name  = knode0019 victims Pods  = 1
default_preemption.go:140] candidates[1].Name  = knode0016   victims Pods = 1
default_preemption.go:140] candidates[2].Name  = knode0024  victims Pods = 1
default_preemption.go:140] candidates[3].Name  = knode0046  victims Pods = 0
runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1ca0de0, 0xc009afa5a0)
 /Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x1ca0de0, 0xc009afa5a0)
/usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc005450f30, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:407 +0xcac
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc007276100, 0x4, 0x4, 0x0, 0x0)
 /Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:349 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000ddbfa0, 0x2068990, 0xc006654d80, 0xc008f9e600, 0xc008312800, 0xc008f9eb70, 0x2d82750, 0xc000fda340, 0x7f6faeb12760, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:150 +0xbd9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000ddbfa0, 0x2068990, 0xc006654d80, 0xc008f9e600, 0xc006aa4000, 0xc008f9eb70, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:83 +0xf1
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin(0xc0001b1040, 0x2068990, 0xc006654d80, 0x7f6faf522238, 0xc000ddbfa0, 0xc008f9e600, 0xc006aa4000, 0xc008f9eb70, 0xc00198fab0, 0x5f5e100)
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:208 +0x11b
kube-scheduler.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Unit kube-scheduler.service entered failed state.
kube-scheduler.service failed.
kube-scheduler.service holdoff time over, scheduling restart.

@Huang-Wei
Copy link
Member

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

This looks more promising.


Back to digging into the root cause, it seems selectVictimsOnNode returns an empty pods but with fits=true. So that implies firstly we have a non-nil potentialVictims, and then in filterPodsWithPDBViolation, the potentialVictims is separated into two list: violatingVictims and nonViolatingVictims. Finally the two victims list are tried one by one to run reprievePod. It looks like reprievePod returns {fits=true, nil error} for every victim, which is quite abnormal. I think it's problematic that every pod is retrievable and in which we can debug into.

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented May 1, 2021

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

This looks more promising.

Back to digging into the root cause, it seems selectVictimsOnNode returns an empty pods but with fits=true. So that implies firstly we have a non-nil potentialVictims, and then in filterPodsWithPDBViolation, the potentialVictims is separated into two list: violatingVictims and nonViolatingVictims. Finally the two victims list are tried one by one to run reprievePod. It looks like reprievePod returns {fits=true, nil error} for every victim, which is quite abnormal. I think it's problematic that every pod is retrievable and in which we can debug into.

Yes, selectVictimOnNode should not return fit=true with empty pods in the first place, but it's hard to locate the problematic code in this function.

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented May 1, 2021

The following code in CallExtenters looks questionable to me. If victimsMap is empty, should the function return nil, nil?

victimsMap := candidatesToVictimsMap(candidates)
   if len(victimsMap) == 0 {
       return candidates, nil
   }

Also, if !extender.SupportsPreemption() || !extender.IsInterested(pod) is true for all extenders, the loop won't do anything and candidates will remain the same. Why do we bother creating a new candidates later? Can it just return candidates?
@Huang-Wei @ahg-g?

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented May 1, 2021

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

This looks more promising.

It can prevent FindCandidates from returning candidates with empty pods, but what about CalleExtenders? It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption contain such nodes.
nodeNameToVictims, err := extender.ProcessPreemption(pod, victimsMap, nodeLister)

How about adding the following check to CallExtender?

307     for nodeName := range victimsMap {
308         //  check if victims.Pods is empty
309        /victims := victimsMap[nodeName]
310         if len(victims.Pods) == 0 {
311             klog.Errorf("no pods in victim node %s. Should not reach here.", nodeName)
312             continue
313         }

@yuanchen8911
Copy link
Member Author

Updated the PR based on the finding and discussions. #101560

@Huang-Wei
Copy link
Member

Why do we bother creating a new candidates later? Can it just return candidates?

Yes, we can use a flag to mark if all extenders don't support preemption. If yes, simply return the candidates.

@Huang-Wei
Copy link
Member

It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption has such nodes.

Technically this can exist due to faulty extender implementation. If we really want to guard it, instead of logging the error and continue, I'm more inclined to return the error immediately as this is a fatal error - the victimsMap cannot be used either for latter extender or further preemptor nominating.

@yuanchen8911
Copy link
Member Author

@Huang-Wei Is my understanding correct?

In CallExtenters, if victimsMap is empty, should it return nil, nil instead of candidates, nil?

victimsMap := candidatesToVictimsMap(candidates)
   if len(victimsMap) == 0 {
       return candidates, nil
   }

@Huang-Wei
Copy link
Member

The only case victimsMap can be empty is empty candidates, and so return either doesn't quite matter, right?

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented May 3, 2021

The only case victimsMap can be empty is empty candidates, and so return either doesn't quite matter, right?

You are right.

@yuanchen8911
Copy link
Member Author

It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption has such nodes.

Technically this can exist due to faulty extender implementation. If we really want to guard it, instead of logging the error and continue, I'm more inclined to return the error immediately as this is a fatal error - the victimsMap cannot be used either for latter extender or further preemptor nominating.

As long as one extender returns an invalid victimsMap (with empty pods), CallExtenders returns an error immediately?

What about selectVictimsOnNode? Should it return an error immediately if it returnsstatus.IsSuccess()with empty pods?

@Huang-Wei
Copy link
Member

As long as one extender returns an invalid victimsMap (with empty pods), CallExtenders returns an error immediately?

I think so. Because the result it returned is faulty, and we don't want to continue based on it

What about selectVictimsOnNode? Should it return an error immediately if it returnsstatus.IsSuccess()with empty pods?

The same. We may don't want to continue the scheduling cycle based on faulty result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants