New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check scheduler spread algorithm #73843

Open
lynx-coding opened this Issue Feb 8, 2019 · 2 comments

Comments

Projects
None yet
3 participants
@lynx-coding
Copy link

lynx-coding commented Feb 8, 2019

What happened:
We are running an application on twelve dedicated nodes with a k8s cluster. When deploying and/or rescaling the application we found pods beeing often deployed on the same node. This caused a really unbalanced cluster.

What you expected to happen:
When reading documentation and a lot of k8s source code we expected the scheduler to try to find a node with as little pods (of the same service/deployment...) as possible.

How to reproduce it (as minimally and precisely as possible):

  • deploy an application with replica = node count
  • check the distribution of the pods
  • scale up to replica = 2* node count
  • check again
  • if there are enough resources (only respecting requests) you will see the imbalance

Anything else we need to know?:
Example of the imbalanced cluster before scaling up. You can see two replicas on worker9-14 and only one replica on worker 15-20:

$ kubectl get po -lapp=image-3d-node -o wide | sort --key 7
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE
image-3d-node-75449bb497-d5skt   1/1     Running   0          2d    192.168.211.43   worker09
image-3d-node-75449bb497-pkght   1/1     Running   0          2d    192.168.211.42   worker09
image-3d-node-75449bb497-mkwsk   1/1     Running   1          2d    192.168.218.40   worker10
image-3d-node-75449bb497-sw9ld   1/1     Running   0          2d    192.168.218.41   worker10
image-3d-node-75449bb497-dr85s   1/1     Running   1          2d    192.168.205.37   worker11
image-3d-node-75449bb497-nkqx2   1/1     Running   0          2d    192.168.205.38   worker11
image-3d-node-75449bb497-5g6b2   1/1     Running   1          2d    192.168.236.25   worker12
image-3d-node-75449bb497-lbddl   1/1     Running   0          2d    192.168.236.24   worker12
image-3d-node-75449bb497-7cw4l   1/1     Running   1          2d    192.168.239.23   worker13
image-3d-node-75449bb497-zh6ff   1/1     Running   0          2d    192.168.239.22   worker13
image-3d-node-75449bb497-2psfm   1/1     Running   1          2d    192.168.138.21   worker14
image-3d-node-75449bb497-ntjzc   1/1     Running   1          2d    192.168.138.20   worker14
image-3d-node-75449bb497-w5ntq   1/1     Running   1          2d    192.168.242.7    worker15
image-3d-node-75449bb497-5jwpt   1/1     Running   0          2d    192.168.241.7    worker16
image-3d-node-75449bb497-vx4t7   1/1     Running   1          2d    192.168.170.6    worker17
image-3d-node-75449bb497-rrrcx   1/1     Running   0          2d    192.168.152.7    worker18
image-3d-node-75449bb497-kscb5   1/1     Running   0          2d    192.168.142.7    worker19
image-3d-node-75449bb497-8rhj8   1/1     Running   0          2d    192.168.167.7    worker20

After scaling up you can see that the scheduler place the pods on workers with already two instances, but I expected it to use the worker with only one running instance:

$ kubectl scale deployment image-3d-node --replicas=25

$ kubectl get po -lapp=image-3d-node -o wide | sort --key 7
NAME                             READY   STATUS    RESTARTS   AGE   IP               NODE
image-3d-node-75449bb497-6t9z4   0/1     Running   0          11s   192.168.211.45   worker09 <---
image-3d-node-75449bb497-d5skt   1/1     Running   0          2d    192.168.211.43   worker09
image-3d-node-75449bb497-pkght   1/1     Running   0          2d    192.168.211.42   worker09
image-3d-node-75449bb497-5pjhx   0/1     Running   0          11s   192.168.218.43   worker10 <---
image-3d-node-75449bb497-mkwsk   1/1     Running   1          2d    192.168.218.40   worker10
image-3d-node-75449bb497-sw9ld   1/1     Running   0          2d    192.168.218.41   worker10
image-3d-node-75449bb497-dr85s   1/1     Running   1          2d    192.168.205.37   worker11
image-3d-node-75449bb497-nkqx2   1/1     Running   0          2d    192.168.205.38   worker11
image-3d-node-75449bb497-ws67x   0/1     Running   0          11s   192.168.205.39   worker11 <---
image-3d-node-75449bb497-5g6b2   1/1     Running   1          2d    192.168.236.25   worker12
image-3d-node-75449bb497-7w6l9   0/1     Running   0          11s   192.168.236.27   worker12 <---
image-3d-node-75449bb497-lbddl   1/1     Running   0          2d    192.168.236.24   worker12
image-3d-node-75449bb497-7cw4l   1/1     Running   1          2d    192.168.239.23   worker13
image-3d-node-75449bb497-hb89c   0/1     Running   0          11s   192.168.239.25   worker13 <---
image-3d-node-75449bb497-nddnb   0/1     Running   0          11s   192.168.239.26   worker13 <---
image-3d-node-75449bb497-zh6ff   1/1     Running   0          2d    192.168.239.22   worker13
image-3d-node-75449bb497-2psfm   1/1     Running   1          2d    192.168.138.21   worker14
image-3d-node-75449bb497-j22hl   0/1     Running   0          11s   192.168.138.23   worker14 <---
image-3d-node-75449bb497-ntjzc   1/1     Running   1          2d    192.168.138.20   worker14
image-3d-node-75449bb497-w5ntq   1/1     Running   1          2d    192.168.242.7    worker15
image-3d-node-75449bb497-5jwpt   1/1     Running   0          2d    192.168.241.7    worker16
image-3d-node-75449bb497-vx4t7   1/1     Running   1          2d    192.168.170.6    worker17
image-3d-node-75449bb497-rrrcx   1/1     Running   0          2d    192.168.152.7    worker18
image-3d-node-75449bb497-kscb5   1/1     Running   0          2d    192.168.142.7    worker19
image-3d-node-75449bb497-8rhj8   1/1     Running   0          2d    192.168.167.7    worker20

I found the following lines in the k8s code which make me feel this could be a bug:
In version 1.10.5: https://github.com/kubernetes/kubernetes/blob/v1.10.5/pkg/scheduler/algorithm/priorities/selector_spreading.go#L88-L114
An still in the latest version: https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/algorithm/priorities/selector_spreading.go#L192-L208

Environment:

  • Kubernetes version (use kubectl version): 1.10.5
  • Cloud provider or hardware configuration: CoreOS
  • OS (e.g. from /etc/os-release): CoreOS
  • Kernel (e.g. uname -a): 4.14.19-coreos
  • Others: There are no other Pods running on those workers
@lynx-coding

This comment has been minimized.

Copy link
Author

lynx-coding commented Feb 8, 2019

/sig scheduling

@xiaoxubeii

This comment has been minimized.

Copy link
Member

xiaoxubeii commented Feb 13, 2019

It duplicates #72916 and the pr is #73711.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment