Skip to content

Commit

Permalink
docs: add troubleshooting resource requests (#2001)
Browse files Browse the repository at this point in the history
* docs: add note about resources matching cluster-queue

Problem: the troubleshooting guide should demonstrate how to
debug the case where jobs are not admitted.
Solution: add a small section to show that resource types
need to match resource requests, and other small debug tips.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

* fix: typos in provisioning and troubleshooting

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

* fix: code indent

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

* review: aldo

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

---------

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Co-authored-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch and vsoch committed Apr 18, 2024
1 parent e55bb8c commit 472ce6d
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The Provisioning Admission Check Controller is supported on [Kubernetes cluster-
## Usage

To use the Provisioning AdmissionCheck, create an [AdmissionCheck](docs/concepts/admission_check)
with `kueue.x-k8s.io/provisioning-request` as a `.spec.controllerName` and create a ProvisioningRequest configuration usign a `ProvisioningRequestConfig` object. See an example below.
with `kueue.x-k8s.io/provisioning-request` as a `.spec.controllerName` and create a ProvisioningRequest configuration using a `ProvisioningRequestConfig` object. See an example below.

## ProvisioningRequest configuration

Expand Down
38 changes: 38 additions & 0 deletions site/content/en/docs/tasks/troubleshooting/troubleshooting_jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,44 @@ status:
type: QuotaReserved
```

### Does my ClusterQueue have the resource requests that the job requires?

When you submit a job that has a resource request, for example:

```bash
$ kubectl get jobs job-0-9-size-6 -o json | jq -r .spec.template.spec.containers[0].resources
```
```console
{
"limits": {
"cpu": "2"
},
"requests": {
"cpu": "2"
}
}
```

If your ClusterQueue does not have a definition for the `requests`, Kueue cannot admit the job. For the job above, you should define `cpu` quotas under `resourceGroups`. A ClusterQueue defining `cpu` quota looks like the following:

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {}
resourceGroups:
- coveredResources: ["cpu"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 40
```

See [resources groups](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#resource-groups) for more information.

### Unattempted Workload

When using a [ClusterQueue](/docs/concepts/cluster_queue) with the `StrictFIFO`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Events:
## Why did my Pod disappear?

When you enable [preemption](/docs/concepts/cluster_queue/#preemption), Kueue might preempt Pods
to accomodate higher priority jobs or reclaim quota. Preemption is implemented via `DELETE` calls,
to accommodate higher priority jobs or reclaim quota. Preemption is implemented via `DELETE` calls,
the standard way of terminating a Pod in Kubernetes.

When using single Pods, Kubernetes will delete Workload object along with the Pod, as there is
Expand Down

0 comments on commit 472ce6d

Please sign in to comment.