Skip to content

Commit

Permalink
Doc for troubleshooting queues (#2125)
Browse files Browse the repository at this point in the history
Change-Id: Ica8de2f65f9536f6841f3919647e811fda5de5f9

Co-authored-by: Aldo Culquicondor <acondor@google.com>
  • Loading branch information
1 parent 634c4bc commit c96f2e4
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 3 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Troubleshooting Jobs"
date: 2024-03-21
weight: 1
weight: 2
description: >
Troubleshooting the status of a Job
---
Expand Down Expand Up @@ -201,7 +201,7 @@ status:
When using a [ClusterQueue](/docs/concepts/cluster_queue) with the `StrictFIFO`
[`queueingStrategy`](/docs/concepts/cluster_queue/#queueing-strategy), Kueue only attempts
to admit the head of each ClusterQueue. As a result, if Kueue didn't attempt to admit
a Workload, the Workload status would not contain any condition.
a Workload, the Workload status might not contain any condition.

### Misconfigured LocalQueues or ClusterQueues

Expand All @@ -218,6 +218,9 @@ status:
type: QuotaReserved
```

See [Troubleshooting Queues](/docs/tasks/troubleshooting/troubleshooting_queues) to understand why a
ClusterQueue or a LocalQueue is inactive.

## Is my Job preempted?

If your Job is not running, and your ClusterQueues have [preemption](/docs/concepts/cluster_queue/#preemption) enabled,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Troubleshooting Pods"
date: 2024-03-21
weight: 1
weight: 3
description: >
Troubleshooting the status of a Pod or group of Pods
---
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: "Troubleshooting Queues"
date: 2024-03-21
weight: 2
description: >
Troubleshooting the status of a LocalQueue or ClusterQueue
---

## Why no workloads are admitted in the LocalQueue?

The status of the [LocalQueue](/docs/concepts/local_queue) includes details of any configuration problems
on the LocalQueue, as part of the `Active` condition.

Run the following command to see the status of the LocalQueue:

```bash
kubectl get localqueue -n my-namespace my-local-queue -o yaml
```

The status of the LocalQueue will be similar to the following:

```yaml
status:
admittedWorkloads: 0
conditions:
- lastTransitionTime: "2024-05-03T18:57:32Z"
message: Can't submit new workloads to clusterQueue
reason: ClusterQueueIsInactive
status: "False"
type: Active
```

In the example above, the `Active` condition has status `False` because the ClusterQueue
is not active.

## Why no workloads are admitted in the ClusterQueue?

The status of the [ClusterQueue](/docs/concepts/cluster_queue) includes details of any configuration problems on
the ClusterQueue, as part of the `Active` condition.

Run the following command to see the status of the ClusterQueue:

```bash
kubectl get clusterqueue my-clusterqueue -o yaml
```

The status of the ClusterQueue will be similar to the following:

```yaml
status:
admittedWorkloads: 0
conditions:
- lastTransitionTime: "2024-05-03T18:22:30Z"
message: 'Can''t admit new workloads: FlavorNotFound'
reason: FlavorNotFound
status: "False"
type: Active
```

In the example above, the `Active` condition has status `False` because the configured flavor
does not exist.
Read [Aminister ClusterQueues](/docs/tasks/manage/administer_cluster_quotas) to learn how
to configure a ClusterQueue.

If the ClusterQueue is properly configured, the status will be similar to the following:

```yaml
status:
admittedWorkloads: 1
conditions:
- lastTransitionTime: "2024-05-03T18:35:28Z"
message: Can admit new workloads
reason: Ready
status: "True"
type: Active
```

If the ClusterQueue has the `Active` condition with status `True`, and you still don't observe
workloads being admitted, then the problem is more likely to be in the individual workloads.
Read [Troubleshooting jobs](/docs/tasks/troubleshooting/troubleshooting_jobs) to learn why individual jobs cannot be admitted.

0 comments on commit c96f2e4

Please sign in to comment.