A ClusterQueue
is a cluster-scoped object that governs a pool of resources
such as CPU, memory and hardware accelerators. A ClusterQueue
defines:
- The resource flavors that it manages, with usage limits and order of consumption.
- Fair sharing rules across the tenants of the cluster.
Only cluster administrators should create ClusterQueue
objects.
A sample ClusterQueue looks like the following:
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ClusterQueue
metadata:
name: cluster-total
spec:
namespaceSelector: {}
resources:
- name: "cpu"
flavors:
- name: default
quota:
min: 9
- name: "memory"
flavors:
- name: default
quota:
min: 36Gi
This ClusterQueue admits workloads if and only if:
- The sum of the CPU requests is less than or equal to 9.
- The sum of the memory requests is less than or equal to 36Gi.
You can specify the quota as a quantity.
You can limit which namespaces can have workloads admitted in the ClusterQueue
by setting a label selector.
in the .spec.namespaceSelector
field.
To allow workloads from all namespaces, set the empty selector {}
to the
spec.namespaceSelector
field.
A sample namespaceSelector
looks like the following:
namespaceSelector:
matchExpressions:
- key: team
operator: In
values:
- team-a
You can set different queueing strategies in a ClusterQueue using the
.spec.queueingStrategy
field. The queueing strategy determines how workloads
are ordered in the ClusterQueue and how they are re-queued after an unsuccessful
admission attempt.
The following are the supported queueing strategies:
StrictFIFO
: Workloads are ordered first by priority and then by.metadata.creationTimestamp
. Older workloads that can't be admitted will block newer workloads, even if the newer workloads fit in the available quota.BestEffortFIFO
: Workloads are ordered the same way asStrictFIFO
. However, older workloads that can't be admitted will not block newer workloads that fit in the available quota.
The default queueing strategy is BestEffortFIFO
.
Resources in a cluster are typically not homogeneous. Resources could differ in:
- pricing and availability (ex: spot vs on-demand VMs)
- architecture (ex: x86 vs ARM CPUs)
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)
A ResourceFlavor
is an object that represents these variations and allows you
to associate them with node labels and taints.
Note: If your cluster is homogeneous, you can use an empty ResourceFlavor instead of adding labels to custom ResourceFlavors.
A sample ResourceFlavor looks like the following:
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ResourceFlavor
metadata:
name: spot
labels:
instance-type: spot
taints:
- effect: NoSchedule
key: spot
value: "true"
You can use the .metadata.name
to reference a flavor from a ClusterQueue in
the .spec.resources[*].flavors[*].name
field.
For each resource of each pod set in a Workload, Kueue
assigns the first flavor in the .spec.resources[*].flavors
list that has enough unused quota in the ClusterQueue or the ClusterQueue's
cohort.
To associate a ResourceFlavor with a subset of nodes of you cluster, you can
configure the .labels
field with matching node labels that uniquely identify
the nodes. If you are using cluster autoscaler
(or equivalent controllers), make sure it is configured to add those labels when
adding new nodes.
To guarantee that the workload Pods run on the nodes associated to the flavor that Kueue decided that the workload should use, Kueue performs the following steps:
-
When admitting a workload, Kueue evaluates the
.nodeSelector
and.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
fields in the PodSpecs of your Workload against the ResourceFlavor labels. -
Once the workload is admitted, Kueue adds the ResourceFlavor labels to the
.nodeSelector
of the underlying workload Pod templates, if the workload didn't specify them already.For example, for a batch/v1.Job, Kueue adds the labels to
.spec.template.spec.nodeSelector
. This guarantees that the workload Pods run on the nodes associated to the flavor that Kueue decided that the workload should use.
To restrict the usage of a ResourceFlavor, you can configure the .taints
field
with taints.
Taints on the ResourceFlavor work similarly to node taints. For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the workload should have a toleration for it. As opposed to ResourceFlavor labels, Kueue will not add tolerations for the flavor taints.
If your cluster has homogeneous resources, or if you don't need to manage quotas for the different flavors of a resource separately, you can create a ResourceFlavor without any labels or taints. Such ResourceFlavor is called an empty ResourceFlavor and its sample looks like the following:
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ResourceFlavor
metadata:
name: default
ClusterQueues can be grouped in cohorts. ClusterQueues that belong to the same cohort can borrow unused quota from each other.
To add a ClusterQueue to a cohort, specify the name of the cohort in the
.spec.cohort
field. All ClusterQueues that have a matching spec.cohort
are
part of the same cohort. If the spec.cohort
field is empty, the ClusterQueue
doesn't belong to any cohort, and thus it cannot borrow quota from any other
ClusterQueue.
When borrowing, Kueue satisfies the following semantics:
- When assigning flavors, Kueue goes through the list of flavors in
.spec.resources[*].flavors
. For each flavor, Kueue attempts to fit the workload using the min quota of the ClusterQueue or the unused min quota of other ClusterQueues in the cohort, up to the max quota of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next flavor in the list. - Borrowing happens per-flavor. A ClusterQueue can only borrow quota of flavors it defines.
Assume you created the following two ClusterQueues:
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
namespaceSelector: {}
cohort: team-ab
resources:
- name: "cpu"
flavors:
- name: default
quota:
min: 9
- name: "memory"
flavors:
- name: default
quota:
min: 36Gi
apiVersion: kueue.x-k8s.io/v1alpha1
kind: ClusterQueue
metadata:
name: team-b-cq
spec:
namespaceSelector: {}
cohort: team-ab
resources:
- name: "cpu"
flavors:
- name: default
quota:
min: 12
- name: "memory"
flavors:
- name: default
quota:
min: 48Gi
ClusterQueue team-a-cq
can admit workloads depending on the following
scenarios:
- If ClusterQueue
team-b-cq
has no admitted workloads, then ClusterQueueteam-a-cq
can admit workloads with resources adding up to12+9=21
CPUs and48+36=84Gi
of memory. - If ClusterQueue
team-b-cq
has pending workloads and the ClusterQueueteam-a-cq
has all itsmin
quota used, Kueue will admit workloads in ClusterQueueteam-b-cq
before admitting any new workloads inteam-a-cq
. Therefore, Kueue ensures themin
quota forteam-b-cq
is met.
Note: Kueue does not support preemption. No admitted workloads will be stopped to make space for new workloads.
To limit the amount of resources that a ClusterQueue can borrow from others,
you can set the .spec.resources[*].flavors[*].quota.max
quantity field.
max
must be greater than or equal to min
.
If, for a given flavor, the max
field is empty or null, a ClusterQueue can
borrow up to the sum of min quotas from all the ClusterQueues in the cohort.
- Learn how to administer cluster quotas.