kube-scheduler crashes and restarts with panics in DefaultPreemption plugin #101548

yuanchen8911 · 2021-04-28T02:09:08Z

What happened:

Kubernetes 1.19 (confirmed with 1.19.7 and 1.19.10)

kube-scheduler crashes and restarts with the following errors in default_preemption.go.

panic: runtime error: index out of range [0] with length 0

The problem code is line 389. When victims is nil or victims.Pods is empty, the error happens. If we skip GetPodPriority when it's nil or empty, the error is gone.

387         victims := nodesToVictims[node]
388         // highestPodPriority is the highest priority among the victims on this node.
389         highestPodPriority := podutil.GetPodPriority(victims.Pods[0])

panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0
goroutine 1406 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1c9fde0, 0xc00748cf48)
 /usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc007e3dbc0, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kubescheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:389 +0xd2c
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc0086e0840, 0x4, 0x4, 0xc008235800, 0x2068138)
/Users/yuanchen/projects/aci/go/kubescheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:331 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000ac4b80, 0x20677d0, 0xc005849540, 0xc0085616e0, 0xc008235800, 0xc008561710, 0x2d80750, 0xc00820a8c0, 0x7f0937ef47d8, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:135 +0x4bd
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000ac4b80, 0x20677d0, 0xc005849540, 0xc0085616e0, 0xc0071543a8, 0xc008561710, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:83 +0xf1
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin(0xc0002fd380, 0x20677d0, 0xc005849540, 0x7f0937a84040, 0xc000ac4b80, 0xc0085616e0, 0xc0071543a8, 0xc008561710, 0xc0084f3ab0, 0x5f5e100)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:557 +0x87

panic: runtime error: invalid memory address or nil poster dereference

452 latestStartTime := util.GetEarliestPodStartTime(nodesToVictims[minNodes2[0]])

The problem is nodesToVictims[minNodes2[0]] does not exist and returns nil sometimes. Simply skipping it won't solve the problem. The scheduler will reach either line 456 or 341.

456        klog.Errorf("earliestStartTime is nil for node %s. Should not reach here.", minNodes2[0])

341 klog.Errorf("None candidate can be picked from %v.", candidates)

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15706d7]
goroutine 1385 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1b448c0, 0x2d281b0)
/usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/util.GetEarliestPodStartTime(0x0, 0xc004d4d9e0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/util/utils.go:56 +0x37
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc004d4d9e0, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:452+0x7a9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc0066e5180, 0x4, 0x4, 0xc0062f5000, 0x2068138)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:331 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000f9e880, 0x20677d0, 0xc004fa3340, 0xc0061b5290, 0xc0062f5000, 0xc0061b52c0, 0x2d80750, 0xc00603fc40, 0x7f6bcbc363f8, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:135 +0x4bd
: k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000f9e880, 0x20677d0, 0xc004fa3340, 0xc0061b5290, 0xc002523ac8, 0xc0061b52c0, 0x0, 0x0)

What you expected to happen:

The scheduler works without failures.

How to reproduce it (as minimally and precisely as possible):

When there are preemptions in a cluster.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.19.10 or v1.19.7
Cloud provider or hardware configuration: bare metal
OS (e.g: cat /etc/os-release): centos 7
Kernel (e.g. uname -a): Linux 5.4.77-7.el7pie Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sat Nov 21 01:16:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-04-28T02:09:15Z

@yuanchen8911: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yuanchen8911 · 2021-04-28T02:09:19Z

/sig scheduling

yuanchen8911 · 2021-04-28T02:09:50Z

/cc @Huang-Wei @ahg-g

niulechuan · 2021-04-28T02:27:33Z

/assign

Huang-Wei · 2021-04-28T02:28:53Z

@yuanchen8911 Are you running the vanilla scheduler? or a customized one that leverages the utilities in default_preemption.go?

yuanchen8911 · 2021-04-28T02:39:40Z

@Huang-Wei It's a custom scheduler with some internal out-of-tree plugins (just like /kubernetes-sigs/scheduler-plugins), but it doesn't have any custom postFilter plugins.

Which utility functions are you referring to? Thanks.

ahg-g · 2021-04-28T02:43:56Z

are those custom plugins filter plugins? do they maintain state?

If yes, then one hypothesis is that those custom filters filter the node in the filter phase, but not when executed in the preemption phase, and so this will result in a candidate node with no victims?

yuanchen8911 · 2021-04-28T02:44:38Z

Yes, it includes filter plugins and uses cycle state.

ahg-g · 2021-04-28T02:46:21Z

you need to make sure that those filter plugins play nicely when executed again in the preemption phase in the same cycle: i.e., produce the same result.

yuanchen8911 · 2021-04-28T02:48:56Z

@ahg-g Thanks What do you mean by performing filter in the preemption phase? would you mind elaborating it a little?

yuanchen8911 · 2021-04-28T02:49:52Z

What additional logic is needed to handle the preemption case?

ahg-g · 2021-04-28T02:58:32Z

in the preemption phase we run the filters again to check if removing lower priority pods make the node schedulable:

kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go

Line 543 in d9839a3

    
           if fits, _, err := core.PodPassesFiltersOnNode(ctx, ph, state, pod, nodeInfo); !fits {

we later add them one by one to reduce the set of victim pods to the absolute minimum:

kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go

Line 558 in d9839a3

if err := addPod(p); err != nil {

My hypothesis is that somehow your filter returns success when adding all the pods back, and so you end up with zero victims and a candidate node. This shouldn't happen because if the pod fits the node without removing any victims, then we shouldn't be running preemption in the first place. So the theory is that the custom filter is returning false when run in the filter phase, but returns true when executed in the preemption phase perhaps because it makes some assumptions about cyclestate that makes it behave this way (e.g. that a the filter will be executed once per node in a scheduling cycle).

yuanchen8911 · 2021-04-28T03:00:54Z

Disabled the filter plugin, but still see the same issues.

ahg-g · 2021-04-28T03:05:08Z

are you able to change the scheduler code and run the test again? I can send a patch tomorrow to add some debugging messages to help us root cause the issue.

yuanchen8911 · 2021-04-28T03:06:27Z

yes, thanks a lot! really appreciate it!

yuanchen8911 · 2021-04-28T03:18:47Z

@ahg-g You are right. We still use a deprecated Predicate extender, which causes the problem. After disabling it, it's working fine. It used to work with 1.18 though. We are retiring it. Thank you so much!!!

yuanchen8911 · 2021-04-28T03:21:05Z

If the test cases can help debug scheduler extender Webhook (Predicate) too, I'd like to try it. Thanks again!

yuanchen8911 · 2021-04-28T03:44:45Z

How can we improve the preemption logic to prevent bugs and issues in custom Filters or Predicates crashing the scheduler?

ahg-g · 2021-04-28T04:22:21Z

for starter, we should check the list len, the scheduler should not panic in all cases.

Other than that, I think we need to clearly document what the preemption logic does (that it executes the prefilter extensions points and the filter plugins multiple times) and the expectations from filter plugins (that they may get executed more than once in the same cycle)

Huang-Wei · 2021-04-28T05:09:17Z

We still use a deprecated Predicate extender, which causes the problem.

Is it a predicate extender or preemption extender? If it's a predicate extender, I don't quite think we found the root cause - b/c predicate extenders are not invoked during preemption (#86942 (comment)).

It makes more sense if it's a preemption extender - if the preemption extender mutates the candidates to make it invalid. If it's the case, could you verify it by adding debug info before and after L157 to check the validness of candidates?

kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go

Line 157 in c6e6507

    
           candidates, status = CallExtenders(pl.fh.Extenders(), pod, nodeLister, candidates)

Huang-Wei · 2021-04-28T05:09:32Z

the scheduler should not panic in all cases.

That's always true :)

yuanchen8911 · 2021-04-28T05:34:49Z

filed a PR to mitigate it. #101560

yuanchen8911 · 2021-04-28T15:38:24Z

It's a predicate extender. I'll debug it more.

We still use a deprecated Predicate extender, which causes the problem.

Is it a predicate extender or preemption extender? If it's a predicate extender, I don't quite think we found the root cause - b/c predicate extenders are not invoked during preemption (#86942 (comment)).

It makes more sense if it's a preemption extender - if the preemption extender mutates the candidates to make it invalid. If it's the case, could you verify it by adding debug info before and after L157 to check the validness of candidates?

kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go

Line 157 in c6e6507

candidates, status = CallExtenders(pl.fh.Extenders(), pod, nodeLister, candidates)

yuanchen8911 · 2021-05-01T01:45:20Z

Added some debug info. after FindCandidates and CallExtenders. Here's the result.

FindCandidates returned 4 candidates, but one of candidates (candidates[3]) has zero pod. The candidate happened to be the best candidate and hence caused the "index out of range" error. It should have nothing to do with CallExtenders since our extender doesn't support Preemption. Disabling it may change pod placements/candidates selection and bypassed the error. It's like a coincidence.

To remove the panic, a simple fix is to check if candidate.Victims.Pods is empty. If it is, do not add the node to candidates.

func candidatesToVictimsMap(candidates []Candidate) map[string]*extenderv1.Victims {
329     m := make(map[string]*extenderv1.Victims)
330     for _, c := range candidates {
332         // if len(c.Victims().Pods) != 0 {
333         m[c.Name()] = c.Victims()
334         // }
335     }
336     return m
337 }

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

Here is the log with debug info.

12360 default_preemption.go:124] After FindCandidates
default_preemption.go:125] candidates length is  4
default_preemption.go:127] candidates[0].Name  = knode0019  victims Pods = 1
default_preemption.go:127] candidates[1].Name  = knode0016   victims Pods = 1
default_preemption.go:127] candidates[2].Name  = knode0024  victims Pods = 1
default_preemption.go:127] candidates[3].Name  = knode0046  victims Pods = 0
default_preemption.go:137] After CallExtenders
default_preemption.go:138] candidates length is  4
default_preemption.go:140] candidates[0].Name  = knode0019 victims Pods  = 1
default_preemption.go:140] candidates[1].Name  = knode0016   victims Pods = 1
default_preemption.go:140] candidates[2].Name  = knode0024  victims Pods = 1
default_preemption.go:140] candidates[3].Name  = knode0046  victims Pods = 0
runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1ca0de0, 0xc009afa5a0)
 /Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x1ca0de0, 0xc009afa5a0)
/usr/local/Cellar/go/1.16.2/libexec/src/runtime/panic.go:965 +0x1b9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.pickOneNodeForPreemption(0xc005450f30, 0x4, 0x4)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:407 +0xcac
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.SelectCandidate(0xc007276100, 0x4, 0x4, 0x0, 0x0)
 /Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:349 +0x85
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).preempt(0xc000ddbfa0, 0x2068990, 0xc006654d80, 0xc008f9e600, 0xc008312800, 0xc008f9eb70, 0x2d82750, 0xc000fda340, 0x7f6faeb12760, 0x30)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:150 +0xbd9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter(0xc000ddbfa0, 0x2068990, 0xc006654d80, 0xc008f9e600, 0xc006aa4000, 0xc008f9eb70, 0x0, 0x0)
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:83 +0xf1
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin(0xc0001b1040, 0x2068990, 0xc006654d80, 0x7f6faf522238, 0xc000ddbfa0, 0xc008f9e600, 0xc006aa4000, 0xc008f9eb70, 0xc00198fab0, 0x5f5e100)
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/Users/yuanchen/projects/aci/go/kube-scheduler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:208 +0x11b
kube-scheduler.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Unit kube-scheduler.service entered failed state.
kube-scheduler.service failed.
kube-scheduler.service holdoff time over, scheduling restart.

Huang-Wei · 2021-05-01T02:29:44Z

Alternatively, a change to dryRunPreemption may work too.

258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {

This looks more promising.

Back to digging into the root cause, it seems selectVictimsOnNode returns an empty pods but with fits=true. So that implies firstly we have a non-nil potentialVictims, and then in filterPodsWithPDBViolation, the potentialVictims is separated into two list: violatingVictims and nonViolatingVictims. Finally the two victims list are tried one by one to run reprievePod. It looks like reprievePod returns {fits=true, nil error} for every victim, which is quite abnormal. I think it's problematic that every pod is retrievable and in which we can debug into.

yuanchen8911 · 2021-05-01T03:19:22Z

Alternatively, a change to dryRunPreemption may work too.
258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {
This looks more promising.

Back to digging into the root cause, it seems selectVictimsOnNode returns an empty pods but with fits=true. So that implies firstly we have a non-nil potentialVictims, and then in filterPodsWithPDBViolation, the potentialVictims is separated into two list: violatingVictims and nonViolatingVictims. Finally the two victims list are tried one by one to run reprievePod. It looks like reprievePod returns {fits=true, nil error} for every victim, which is quite abnormal. I think it's problematic that every pod is retrievable and in which we can debug into.

Yes, selectVictimOnNode should not return fit=true with empty pods in the first place, but it's hard to locate the problematic code in this function.

yuanchen8911 · 2021-05-01T03:26:56Z

The following code in CallExtenters looks questionable to me. If victimsMap is empty, should the function return nil, nil?

victimsMap := candidatesToVictimsMap(candidates)
   if len(victimsMap) == 0 {
       return candidates, nil
   }

Also, if !extender.SupportsPreemption() || !extender.IsInterested(pod) is true for all extenders, the loop won't do anything and candidates will remain the same. Why do we bother creating a new candidates later? Can it just return candidates?
@Huang-Wei @ahg-g?

yuanchen8911 · 2021-05-01T16:58:55Z

Alternatively, a change to dryRunPreemption may work too.
258         pods, numPDBViolations, fits := selectVictimsOnNode(ctx, fh, stateCopy, pod, nodeInfoCopy, pdbs)
259         if fits && len(pods) != 0 {
This looks more promising.

It can prevent FindCandidates from returning candidates with empty pods, but what about CalleExtenders? It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption contain such nodes.
nodeNameToVictims, err := extender.ProcessPreemption(pod, victimsMap, nodeLister)

How about adding the following check to CallExtender?

307     for nodeName := range victimsMap {
308         //  check if victims.Pods is empty
309        /victims := victimsMap[nodeName]
310         if len(victims.Pods) == 0 {
311             klog.Errorf("no pods in victim node %s. Should not reach here.", nodeName)
312             continue
313         }

yuanchen8911 · 2021-05-01T19:45:17Z

Updated the PR based on the finding and discussions. #101560

Huang-Wei · 2021-05-03T19:57:27Z

Why do we bother creating a new candidates later? Can it just return candidates?

Yes, we can use a flag to mark if all extenders don't support preemption. If yes, simply return the candidates.

Huang-Wei · 2021-05-03T20:06:44Z

It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption has such nodes.

Technically this can exist due to faulty extender implementation. If we really want to guard it, instead of logging the error and continue, I'm more inclined to return the error immediately as this is a fatal error - the victimsMap cannot be used either for latter extender or further preemptor nominating.

yuanchen8911 · 2021-05-03T21:01:22Z

@Huang-Wei Is my understanding correct?

In CallExtenters, if victimsMap is empty, should it return nil, nil instead of candidates, nil?

victimsMap := candidatesToVictimsMap(candidates)
   if len(victimsMap) == 0 {
       return candidates, nil
   }

Huang-Wei · 2021-05-03T22:14:22Z

The only case victimsMap can be empty is empty candidates, and so return either doesn't quite matter, right?

yuanchen8911 · 2021-05-03T23:09:50Z

The only case victimsMap can be empty is empty candidates, and so return either doesn't quite matter, right?

You are right.

yuanchen8911 · 2021-05-03T23:18:27Z

It's possible to add nodes without victim pods to candidates if nodeNameToVictims returned by ProcessPreemption has such nodes.

Technically this can exist due to faulty extender implementation. If we really want to guard it, instead of logging the error and continue, I'm more inclined to return the error immediately as this is a fatal error - the victimsMap cannot be used either for latter extender or further preemptor nominating.

As long as one extender returns an invalid victimsMap (with empty pods), CallExtenders returns an error immediately?

What about selectVictimsOnNode? Should it return an error immediately if it returnsstatus.IsSuccess()with empty pods?

Huang-Wei · 2021-05-03T23:22:25Z

As long as one extender returns an invalid victimsMap (with empty pods), CallExtenders returns an error immediately?

I think so. Because the result it returned is faulty, and we don't want to continue based on it

What about selectVictimsOnNode? Should it return an error immediately if it returnsstatus.IsSuccess()with empty pods?

The same. We may don't want to continue the scheduling cycle based on faulty result.

yuanchen8911 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 28, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 28, 2021

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 28, 2021

k8s-ci-robot assigned niulechuan Apr 28, 2021

yuanchen8911 changed the title ~~kube-scheduler crashes and restarts with panics in default_preemption plugin~~ kube-scheduler crashes and restarts with panics in DefaultPreemption plugin Apr 28, 2021

yuanchen8911 mentioned this issue Apr 28, 2021

Prevent scheduler crashing in default preemption plugin #101560

Merged

k8s-ci-robot closed this as completed in #101560 May 5, 2021

Huang-Wei mentioned this issue Sep 27, 2021

scheduler CrashLoopBackOff when there isn't enough sibling pods kubernetes-sigs/scheduler-plugins#264

Closed

machine424 mentioned this issue Jul 31, 2022

Kube scheduler panic topolvm/topolvm#544

Closed

kube-scheduler crashes and restarts with panics in DefaultPreemption plugin #101548

kube-scheduler crashes and restarts with panics in DefaultPreemption plugin #101548

Comments

yuanchen8911 commented Apr 28, 2021 • edited Loading

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

k8s-ci-robot commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

niulechuan commented Apr 28, 2021

Huang-Wei commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021 • edited Loading

ahg-g commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

ahg-g commented Apr 28, 2021 • edited Loading

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

ahg-g commented Apr 28, 2021 • edited Loading

yuanchen8911 commented Apr 28, 2021

ahg-g commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021 • edited Loading

ahg-g commented Apr 28, 2021

Huang-Wei commented Apr 28, 2021

Huang-Wei commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented Apr 28, 2021

yuanchen8911 commented May 1, 2021 • edited Loading

Huang-Wei commented May 1, 2021

yuanchen8911 commented May 1, 2021 • edited Loading

yuanchen8911 commented May 1, 2021 • edited Loading

yuanchen8911 commented May 1, 2021 • edited Loading

yuanchen8911 commented May 1, 2021

Huang-Wei commented May 3, 2021

Huang-Wei commented May 3, 2021

yuanchen8911 commented May 3, 2021

Huang-Wei commented May 3, 2021

yuanchen8911 commented May 3, 2021 • edited Loading

yuanchen8911 commented May 3, 2021

Huang-Wei commented May 3, 2021

yuanchen8911 commented Apr 28, 2021 •

edited

Loading

yuanchen8911 commented Apr 28, 2021 •

edited

Loading

ahg-g commented Apr 28, 2021 •

edited

Loading

ahg-g commented Apr 28, 2021 •

edited

Loading

yuanchen8911 commented Apr 28, 2021 •

edited

Loading

yuanchen8911 commented May 1, 2021 •

edited

Loading

yuanchen8911 commented May 1, 2021 •

edited

Loading

yuanchen8911 commented May 1, 2021 •

edited

Loading

yuanchen8911 commented May 1, 2021 •

edited

Loading

yuanchen8911 commented May 3, 2021 •

edited

Loading