Pods not being evenly scheduled across worker nodes #105220

rsevilla87 · 2021-09-23T21:58:56Z

What happened:

After a straightforward scale test consisting of creating several hundreds of standalone pods (sleep) on a small-size cluster (9 worker nodes) I realized that the pods are not evenly scheduled across the nodes.

The test was executed w/o any limitRange and the created pods don't have any requests either.

What you expected to happen:

Pods are evenly spread across all worker nodes.

How to reproduce it (as minimally and precisely as possible):

Number of pods in nodes before executing the test:

$ kubectl get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-134-116.eu-west-3.compute.internal   Ready    master   10d     v1.22.0-rc.0+75ee307
ip-10-0-139-16.eu-west-3.compute.internal    Ready    worker   7h48m   v1.22.0-rc.0+75ee307
ip-10-0-146-3.eu-west-3.compute.internal     Ready    worker   7h48m   v1.22.0-rc.0+75ee307
ip-10-0-156-89.eu-west-3.compute.internal    Ready    worker   10d     v1.22.0-rc.0+75ee307
ip-10-0-168-121.eu-west-3.compute.internal   Ready    worker   7h48m   v1.22.0-rc.0+75ee307
ip-10-0-182-174.eu-west-3.compute.internal   Ready    worker   7h47m   v1.22.0-rc.0+75ee307
ip-10-0-187-122.eu-west-3.compute.internal   Ready    worker   10d     v1.22.0-rc.0+75ee307
ip-10-0-187-21.eu-west-3.compute.internal    Ready    master   10d     v1.22.0-rc.0+75ee307
ip-10-0-199-68.eu-west-3.compute.internal    Ready    worker   3d10h   v1.22.0-rc.0+75ee307
ip-10-0-210-1.eu-west-3.compute.internal     Ready    worker   7h48m   v1.22.0-rc.0+75ee307
ip-10-0-218-198.eu-west-3.compute.internal   Ready    worker   7h47m   v1.22.0-rc.0+75ee307
ip-10-0-223-121.eu-west-3.compute.internal   Ready    master   10d     v1.22.0-rc.0+75ee307
$ kubectl get pods -o go-template --template='{{range .items}}{{if eq .status.phase "Running"}}{{.spec.nodeName}}{{"\n"}}{{end}}{{end}}' --all-namespaces | awk '{nodes[$1]++ }                                           
END{ for (n in nodes) print n": "nodes[n]}'
ip-10-0-187-21.eu-west-3.compute.internal: 59 <- master node not schedulable
ip-10-0-139-16.eu-west-3.compute.internal: 23
ip-10-0-210-1.eu-west-3.compute.internal: 15
ip-10-0-146-3.eu-west-3.compute.internal: 14
ip-10-0-156-89.eu-west-3.compute.internal: 17
ip-10-0-134-116.eu-west-3.compute.internal: 35 <- master node not schedulable
ip-10-0-218-198.eu-west-3.compute.internal: 15
ip-10-0-168-121.eu-west-3.compute.internal: 14
ip-10-0-182-174.eu-west-3.compute.internal: 14
ip-10-0-199-68.eu-west-3.compute.internal: 15
ip-10-0-223-121.eu-west-3.compute.internal: 32<- master node not schedulable
ip-10-0-187-122.eu-west-3.compute.internal: 24

Create 1000 pods:
for i in {1..1000}; do kubectl run --image=k8s.gcr.io/pause sleep-${i}; done

Check Running pods per node:

$ kubectl get pods -o go-template --template='{{range .items}}{{if eq .status.phase "Running"}}{{.spec.nodeName}}{{"\n"}}{{end}}{{end}}' --all-namespaces | awk '{nodes[$1]++ }END{ for (n in nodes) print n": "nodes[n]}'
ip-10-0-187-21.eu-west-3.compute.internal: 59 <- master node not schedulable
ip-10-0-139-16.eu-west-3.compute.internal: 224
ip-10-0-210-1.eu-west-3.compute.internal: 78
ip-10-0-146-3.eu-west-3.compute.internal: 71
ip-10-0-156-89.eu-west-3.compute.internal: 250
ip-10-0-134-116.eu-west-3.compute.internal: 35 <- master node not schedulable
ip-10-0-218-198.eu-west-3.compute.internal: 76
ip-10-0-168-121.eu-west-3.compute.internal: 71
ip-10-0-182-174.eu-west-3.compute.internal: 75
ip-10-0-199-68.eu-west-3.compute.internal: 56
ip-10-0-223-121.eu-west-3.compute.internal: 32 <- master node not schedulable
ip-10-0-187-122.eu-west-3.compute.internal: 250

As shown above, some nodes ran out of room to execute more pods (max-pods is set to 250) while there're other nodes with much fewer pods

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"archive", BuildDate:"2021-03-30T00:00:00Z", GoVersion:"go1.16", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.0-rc.0+75ee307", GitCommit:"75ee3073266f07baaba5db004cde0636425737cf", GitTreeState:"clean", BuildDate:"2021-09-04T12:16:28Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:

AWS using m5.xlarge worker nodes

The text was updated successfully, but these errors were encountered:

rsevilla87 · 2021-09-23T22:00:17Z

/sig scheduling

rsevilla87 · 2021-09-23T22:01:01Z

cc: @alculquicondor @damemi @jtaleric

rsevilla87 · 2021-09-23T22:29:11Z

I also realized that if I create a deployment with 1000 replicas, pods are evenly distributed:

$ kubectl create  deployment --replicas=1000 --image=k8s.gcr.io/pause sleep
$ kubectl get pods -o go-template --template='{{range .items}}{{if eq .status.phase "Running"}}{{.spec.nodeName}}{{"\n"}}{{end}}{{end}}' --all-namespaces | awk '{nodes[$1]++ }                                           
END{ for (n in nodes) print n": "nodes[n]}'
ip-10-0-187-21.eu-west-3.compute.internal: 59
ip-10-0-139-16.eu-west-3.compute.internal: 134
ip-10-0-210-1.eu-west-3.compute.internal: 126
ip-10-0-146-3.eu-west-3.compute.internal: 125
ip-10-0-156-89.eu-west-3.compute.internal: 128
ip-10-0-134-116.eu-west-3.compute.internal: 35
ip-10-0-218-198.eu-west-3.compute.internal: 126
ip-10-0-168-121.eu-west-3.compute.internal: 125
ip-10-0-182-174.eu-west-3.compute.internal: 125
ip-10-0-199-68.eu-west-3.compute.internal: 127
ip-10-0-223-121.eu-west-3.compute.internal: 32
ip-10-0-187-122.eu-west-3.compute.internal: 135

alculquicondor · 2021-09-24T13:47:58Z

A recap of the conclusions we already have:

This regressed with #102925, which changed NodeResourcesBalancedAllocation and NodeResourcesMostAllocated scores. However, the fix does what it intended: ensure that a node is not underutilized, when using NodeResourcesMostAllocated. Of course, this is causing the opposite problem when using NodeResourcesLeastAllocated.

I also realized that if I create a deployment with 1000 replicas, pods are evenly distributed:

Ah, that's good to have confirmed. Pods within a deployment have an extra spreading score. It looks like the score that NodeResourcesBalancedAllocation provides is not as strong as the spreading score 🥳. This is probably thanks to #101946

Discussing solutions:

We should do a partial revert of Fix Node Resources plugins score when there are pods with no requests #102925 in 1.20 and 1.21, as they don't have Support extended resource in NodeResourcesBalancedAllocation plugin #101946. By partial I mean that we should leave the fix on NodeResourcesMostAllocated.

For 1.22 and beyond, I think we should reduce the "default requests" that the scheduler implicitly adds when scoring. They are arguably too big. I would suggest 10 to 20% of the current numbers

kubernetes/pkg/scheduler/util/non_zero.go

Lines 34 to 37 in 7bff8ad

    
           // DefaultMilliCPURequest defines default milli cpu request number. 
        
           DefaultMilliCPURequest int64 = 100 // 0.1 core 
        
           // DefaultMemoryRequest defines default memory request size. 
        
           DefaultMemoryRequest int64 = 200 * 1024 * 1024 // 200 MB

This should reduce the chances of the scheduler estimating 100% resources allocated. In reality, most production clusters have characteristics that your test cluster probably doesn't have: bigger nodes and a smaller number of pods per node.

alculquicondor · 2021-09-24T13:48:27Z

cc @ahg-g @Huang-Wei

damemi · 2021-09-24T14:15:41Z

Opened the partial reverts for 1.21 and 1.20:

Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105238
Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105239

ahg-g · 2021-09-24T14:35:34Z

Is there any negative impact on reverting the changes made to the balanced plugin? looking at #102925, I agree that we shouldn't have made the change for the balanced plugin, but just wondering what was the rational?

This should reduce the chances of the scheduler estimating 100% resources allocated. In reality, most production clusters have characteristics that your test cluster probably doesn't have: bigger nodes and a smaller number of pods per node.

Another thought: adding pod count to the balanced resource calculation in addition to cpu and memory?

alculquicondor · 2021-09-24T14:40:17Z

Is there any negative impact on reverting the changes made to the balanced plugin?

The node would get a score of 0 for NodeResourcesBalancedAllocation which will hurt utilization when trying to bin-pack.

Maybe reverting is a valid solution too, but only if we reduce the non-zero requests. This reduces the chances of nodes getting the zero score.

Another thought: adding pod count to the balanced resource calculation in addition to cpu and memory?

I don't think so. It's hard for users to estimate how many pods they would fit in a node. This might lead to more undesired behaviors.

ahg-g · 2021-09-24T17:05:06Z

Reducing the non-zero requests sgtm. But perhaps the other question is how much impact should balanced allocation have compared to least/most allocated. I also feel we are not using the scoring weights enough to solve those types of issues.

I don't think so. It's hard for users to estimate how many pods they would fit in a node. This might lead to more undesired behaviors.

Each node already defines the max number of pods, and each pod consumes 1. So there is nothing new to be estimated. but yeah, since each pod have a fixed request of 1, we are basically scoring on node max pod limit, which is usually fixed for all nodes.

alculquicondor · 2021-09-24T18:59:21Z

Each node already defines the max number of pods, and each pod consumes 1

What I'm saying is that most users probably don't optimize the number of pods to tailor their workloads.

But perhaps the other question is how much impact should balanced allocation have compared to least/most allocated.

At least it doesn't seem to be too strong of a signal compared to Spreading. Note that after #101946, it tops at 50.
I think the problem in the scenario above is that all pods already have a big number of pods (56 at least, for non masters). With the current defaults, this is equivalent to 5.6 CPUs, which is likely greater than the allocatable CPU (@rsevilla87, please confirm).

If that's the case, all nodes had 100% utilization, thus 0 score for NodeResourcesLeastAllocated. Then the fact that this was behaving as badly in 1.22 as in previous versions without #101946 makes more sense. Essentially there is only one score at play, NodeResourcesBalancedAllocation.

Then, reducing the non-zero request is a win-win-win for the three scores :)

Huang-Wei · 2021-09-24T19:42:34Z

I'm fine with reducing the default requests, esp. the memory value.

Another idea is to set the non-zero requests dynamically. For example, suppose the initial default req for a resource is M, as time goes, when the number of best-efforts pods reaches to a number N on a Node, make the default value as M/2. When the number of bets-efforts pods is less than N, the value gets restored to M.

Regarding the type of resources, we may apply the dynamics to non-compressible resources (memory) only.

alculquicondor · 2021-09-24T19:58:26Z

That sounds kind of hard to configure. Can you explain a bit more why you think it would be a good idea?

ahg-g · 2021-09-24T20:20:28Z

What I'm saying is that most users probably don't optimize the number of pods to tailor their workloads.

Some do because they want to optimize IP usage. But again, this doesn't address the problem here.

At least it doesn't seem to be too strong of a signal compared to Spreading. Note that after #101946, it tops at 50.

But we keep tuning the score returned by the plugins without reference which should be stronger than which. We should try and rank all plugins based on importance and weight them accordingly.

Then, reducing the non-zero request is a win-win-win for the three scores :)

Reducing the default requests will help, but if there are bunch of pods that make actual large enough requests, then we are back to the same issue. I think another thing we probably need is to do is make the default cpu and memory close enough to the ratio used in common machines types.

I'm fine with reducing the default requests, esp. the memory value.

Assuming common machine types, I think we need to roughly double the memory, then reduce both by which ever value we want (like .01 CPU and 40MB memory assuming 10%)

alculquicondor · 2021-09-24T20:53:05Z

but if there are bunch of pods that make actual large enough requests

In that case the Filter would kick in.

But we keep tuning the score returned by the plugins without reference which should be stronger than which.

I don't think this is case of bad weights. We had a score topping at the same time that the other score was hitting the lower score.

Optimizing the weights is a longer discussion which requires a lot of experimentation. Maybe we can prioritize it for 1.24.

Assuming common machine types, I think we need to roughly double the memory, then reduce both by which ever value we want (like .01 CPU and 40MB memory assuming 10%)

SGTM. Maybe it's safer to start at 20%?

ahg-g · 2021-09-24T21:01:26Z

In that case the Filter would kick in.

No it wouldn't for the ones that don't have requests. Basically the pods with requests make the available resources lower, and so negating the fact that the non-zero requests got lower.

Huang-Wei · 2021-09-24T21:58:59Z

That sounds kind of hard to configure. Can you explain a bit more why you think it would be a good idea?

My key idea is to differentiate the default reqs for nodes when a node is obviously overutilized. So that it can prevent the symptom described in this issue.

Optimizing the weights is a longer discussion which requires a lot of experimentation. Maybe we can prioritize it for 1.24.

We have to admit the limitation and applicability of the current rule/weight-based scoring. We "thought" one score pluign should be weighted higher/lower than the other, but it's not always satisfying for different workloads and clusters. In the long run, we may turn to build an adaptive machine learning model to complete the scoring job. Inspired by some project like adaptdl. Maybe we can leverage some industry practices to improve the entire scheduler scoring area.

rsevilla87 · 2021-09-24T22:54:56Z

Each node already defines the max number of pods, and each pod consumes 1

What I'm saying is that most users probably don't optimize the number of pods to tailor their workloads.

But perhaps the other question is how much impact should balanced allocation have compared to least/most allocated.

At least it doesn't seem to be too strong of a signal compared to Spreading. Note that after #101946, it tops at 50.
I think the problem in the scenario above is that all pods already have a big number of pods (56 at least, for non masters). With the current defaults, this is equivalent to 5.6 CPUs, which is likely greater than the allocatable CPU (@rsevilla87, please confirm).

If that's the case, all nodes had 100% utilization, thus 0 score for NodeResourcesLeastAllocated. Then the fact that this was behaving as badly in 1.22 as in previous versions without #101946 makes more sense. Essentially there is only one score at play, NodeResourcesBalancedAllocation.

Then, reducing the non-zero request is a win-win-win for the three scores :)

Worker node allocatable resources are:

  allocatable:       
    attachable-volumes-aws-ebs: "25"                                                                                                                                                                                                          
    cpu: 3500m                                                                                                         
    ephemeral-storage: "115470533646"
    hugepages-1Gi: "0"                                                                                                                                                                                                                        
    hugepages-2Mi: "0"  
    memory: 14783292Ki 
    pods: "250"

ahg-g · 2021-09-26T15:45:12Z

I don't think this is case of bad weights. We had a score topping at the same time that the other score was hitting the lower score.

My comment was in response to At least it doesn't seem to be too strong of a signal compared to Spreading. Note that after #101946, it tops at 50.

We have to admit the limitation and applicability of the current rule/weight-based scoring

Did we actually try to use it? I feel we didn't.

In the long run, we may turn to build an adaptive machine learning model to complete the scoring job.

I also feel that the choices are fairly limited and we can reach a reasonable ranking without the complexity of ML, also the ML model is as good as the data ("ground truth") you feed it...

alculquicondor · 2021-09-27T13:17:57Z

We started to deviate from the problem at hand. Is everyone ok with this?

Assuming common machine types, I think we need to roughly double the memory, then reduce both by which ever value we want (like .01 CPU and 40MB memory assuming 10%)

I agree that we should match common machine types' ratios. But I would vote 20% to start.

ahg-g · 2021-09-27T15:19:18Z

What is the difference between 10% and 20%? do we actually need to consider pods that don't declare requests in balanced utilization score?

alculquicondor · 2021-09-27T15:37:03Z

The main reason to be more conservative is that the same non-zero values are used for the 3 scores.

alculquicondor · 2021-09-27T15:40:53Z

Unless you are suggesting we decouple NodeResourcesLeastAllocated and NodeResourcesMostAllocated from the balanced one, which will only use declared requests. I'm fine with that, but it might be harder to reason about how the scores play together.

ahg-g · 2021-09-27T15:53:12Z

Yes, I am suggesting we treat the balanced score differently from the others. As I mentioned above, reducing the values will basically shift the problem not solve it.

but it might be harder to reason about how the scores play together.

In a sense balanced serves a different purpose which is also evident from it not being part of the common score plugin we now have.

alculquicondor · 2021-09-27T16:44:24Z

So, in summary, the suggested solution is to use the original requests in NodeResourcesBalancedAllocation instead of the nonzero ones.

@damemi WDYT? could you take that?

ShashankGirish · 2021-09-28T11:30:18Z

Opened the partial reverts for 1.21 and 1.20:

Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105238

Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105239

We seem to have hit the same problem even on 1.19.14 / 1.19.15.

@SenatorSupes has managed to find the offending commit here - f7b2ca5

alculquicondor · 2021-09-28T15:39:01Z

The linked PRs are not merged yet. Unfortunately, I don't think there will be another 1.19 release.

ahmad-diaa · 2021-09-30T22:08:06Z

@damemi Do you mind if I pick this up?

alculquicondor · 2021-10-04T14:36:44Z

/assign @ahmad-diaa
/triage accepted

rsevilla87 · 2021-10-21T13:28:19Z

I've done some additional tests consisting of deploying a bunch of pause pods with requests in a single namespace:

# Number of pods in the workload's namespace
rsevilla@wonderland ~ $ kubectl get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 --no-headers  | wc -l
446

# All of the deployed pods have resource requests configured
rsevilla@wonderland ~ $ kubectl get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 node-density-1 -o jsonpath="{.spec.containers[*].resources}"
{"requests":{"cpu":"1m","memory":"10Mi"}}

# Worker nodes total
$ kubectl describe node -l node-role.kubernetes.io/worker | grep -E "(^Name:|^Non-terminated)"
Name:               ip-10-0-147-142.eu-west-3.compute.internal
Non-terminated Pods:                                      (249 in total)
Name:               ip-10-0-158-24.eu-west-3.compute.internal
Non-terminated Pods:                                      (249 in total)
Name:               ip-10-0-187-55.eu-west-3.compute.internal
Non-terminated Pods:                                      (25 in total)
Name:               ip-10-0-218-220.eu-west-3.compute.internal
Non-terminated Pods:                                      (31 in total)

# Number of pods per node in the workload's namespace
rsevilla@wonderland ~ $ kubectl get pod -n node-density-bbe06b64-991a-4a74-8d9d-75aa23f45415 -o wide --no-headers | awk '{node[$7]++ }END{ for (n in node) print n": "node[n]; }'
ip-10-0-147-142.eu-west-3.compute.internal: 218
ip-10-0-187-55.eu-west-3.compute.internal: 5
ip-10-0-158-24.eu-west-3.compute.internal: 223

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-07-22T00:00:00Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.0-rc.0+af080cb", GitCommit:"af080cb8d127b31307ed3622992c05a4b59f15ba", GitTreeState:"clean", BuildDate:"2021-09-17T18:36:43Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

According to the comments here I thought that setting pod requests would help to get these pods scheduled properly, however as shown above, this is not happening.

alculquicondor · 2021-10-21T14:31:21Z

It's not just about the pods that are being scheduled, but also the pods that already exist in the cluster.

249-218 is 31 Pods. That could be enough to make the BalancedAllocation plugin return 100% score, if none of those pods have requests.

zerkms · 2023-12-01T05:13:51Z

For 1.22 and beyond, I think we should reduce the "default requests" that the scheduler implicitly adds when scoring. They are arguably too big. I would suggest 10 to 20% of the current numbers

@alculquicondor hello from 2023 :-) I just hit the unevenly balanced node issues due to high default for cpu requests: #122131

rsevilla87 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 23, 2021

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 23, 2021

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2021

This was referenced Sep 24, 2021

Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105238

Merged

Revert 102925: Fix Node Resources plugins score when there are pods with no requests #105239

Merged

k8s-ci-robot assigned ahmad-diaa Oct 4, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 4, 2021

This was referenced Oct 6, 2021

Revert 102925: "Fix Node Resources plugins score when there are pods with no requests harisund/kubernetes#2

Closed

Revert 102925: "Fix Node Resources plugins score when there are pods with no requests #105526

Merged

ahmad-diaa mentioned this issue Oct 22, 2021

Use original requests in NodeResourcesBalancedAllocation instead of NonZeroRequested #105845

Merged

k8s-ci-robot closed this as completed in #105845 Oct 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods not being evenly scheduled across worker nodes #105220

Pods not being evenly scheduled across worker nodes #105220

rsevilla87 commented Sep 23, 2021

rsevilla87 commented Sep 23, 2021

rsevilla87 commented Sep 23, 2021 •

edited

rsevilla87 commented Sep 23, 2021 •

edited

alculquicondor commented Sep 24, 2021 •

edited

alculquicondor commented Sep 24, 2021

damemi commented Sep 24, 2021

ahg-g commented Sep 24, 2021

alculquicondor commented Sep 24, 2021 •

edited

ahg-g commented Sep 24, 2021

alculquicondor commented Sep 24, 2021

Huang-Wei commented Sep 24, 2021

alculquicondor commented Sep 24, 2021

ahg-g commented Sep 24, 2021 •

edited

alculquicondor commented Sep 24, 2021

ahg-g commented Sep 24, 2021 •

edited

Huang-Wei commented Sep 24, 2021

rsevilla87 commented Sep 24, 2021 •

edited

ahg-g commented Sep 26, 2021

alculquicondor commented Sep 27, 2021

ahg-g commented Sep 27, 2021

alculquicondor commented Sep 27, 2021

alculquicondor commented Sep 27, 2021

ahg-g commented Sep 27, 2021 •

edited

alculquicondor commented Sep 27, 2021

ShashankGirish commented Sep 28, 2021

alculquicondor commented Sep 28, 2021

ahmad-diaa commented Sep 30, 2021 •

edited

alculquicondor commented Oct 4, 2021

rsevilla87 commented Oct 21, 2021

alculquicondor commented Oct 21, 2021

zerkms commented Dec 1, 2023

Pods not being evenly scheduled across worker nodes #105220

Pods not being evenly scheduled across worker nodes #105220

Comments

rsevilla87 commented Sep 23, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

rsevilla87 commented Sep 23, 2021

rsevilla87 commented Sep 23, 2021 • edited

rsevilla87 commented Sep 23, 2021 • edited

alculquicondor commented Sep 24, 2021 • edited

alculquicondor commented Sep 24, 2021

damemi commented Sep 24, 2021

ahg-g commented Sep 24, 2021

alculquicondor commented Sep 24, 2021 • edited

ahg-g commented Sep 24, 2021

alculquicondor commented Sep 24, 2021

Huang-Wei commented Sep 24, 2021

alculquicondor commented Sep 24, 2021

ahg-g commented Sep 24, 2021 • edited

alculquicondor commented Sep 24, 2021

ahg-g commented Sep 24, 2021 • edited

Huang-Wei commented Sep 24, 2021

rsevilla87 commented Sep 24, 2021 • edited

ahg-g commented Sep 26, 2021

alculquicondor commented Sep 27, 2021

ahg-g commented Sep 27, 2021

alculquicondor commented Sep 27, 2021

alculquicondor commented Sep 27, 2021

ahg-g commented Sep 27, 2021 • edited

alculquicondor commented Sep 27, 2021

ShashankGirish commented Sep 28, 2021

alculquicondor commented Sep 28, 2021

ahmad-diaa commented Sep 30, 2021 • edited

alculquicondor commented Oct 4, 2021

rsevilla87 commented Oct 21, 2021

alculquicondor commented Oct 21, 2021

zerkms commented Dec 1, 2023

rsevilla87 commented Sep 23, 2021 •

edited

rsevilla87 commented Sep 23, 2021 •

edited

alculquicondor commented Sep 24, 2021 •

edited

alculquicondor commented Sep 24, 2021 •

edited

ahg-g commented Sep 24, 2021 •

edited

ahg-g commented Sep 24, 2021 •

edited

rsevilla87 commented Sep 24, 2021 •

edited

ahg-g commented Sep 27, 2021 •

edited

ahmad-diaa commented Sep 30, 2021 •

edited