What would you like to be added:
Add the ability for the leader Pod to be in its own affinity group when using the subgroup feature. For example, when deploying a leader Pod that should be scheduled on a CPU-only VM and worker Pods that should be scheduled on multiple TPU slices:
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: my-lws
annotations:
leaderworkerset.sigs.k8s.io/subgroup-exclusive-topology: cloud.google.com/gke-nodepool
spec:
replicas: 1
leaderWorkerTemplate:
subGroupPolicy:
subGroupSize: 2
size: 5
leaderTemplate:
spec:
nodeSelector:
cloud.google.com/machine-familty: n2
node.kubernetes.io/instance-type: n2-standard-8
containers:
- name: leader
...
workerTemplate:
spec:
nodeSelector:
cloud.google.com/gke-tpu-accelerator: tpu-v5p-slice
cloud.google.com/gke-tpu-topology: 2x2x2
containers:
- name: worker
...
resources:
limits:
google.com/tpu: "4"
Currently the leader Pod is put in subgroup 0 which causes it to have the same affinity key as the workers in subgroup 0: https://github.com/kubernetes-sigs/lws/blob/main/pkg/webhooks/pod_webhook.go#L132. This causes the leader Pod in my example to be unscheduable because of the CPU instance type node selectors.
Why is this needed:
To support deploying leader-worker architectures where the leader should be scheduled in separate topologies from the worker groups.
Completion requirements:
An option in subGroupPolicy that causes the leader to have its own affinity key.
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
What would you like to be added:
Add the ability for the leader Pod to be in its own affinity group when using the subgroup feature. For example, when deploying a leader Pod that should be scheduled on a CPU-only VM and worker Pods that should be scheduled on multiple TPU slices:
Currently the leader Pod is put in subgroup 0 which causes it to have the same affinity key as the workers in subgroup 0: https://github.com/kubernetes-sigs/lws/blob/main/pkg/webhooks/pod_webhook.go#L132. This causes the leader Pod in my example to be unscheduable because of the CPU instance type node selectors.
Why is this needed:
To support deploying leader-worker architectures where the leader should be scheduled in separate topologies from the worker groups.
Completion requirements:
An option in
subGroupPolicythat causes the leader to have its own affinity key.This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.