Wrongly Decreasing the queued metric #751

Kidswiss · 2022-10-18T13:22:57Z

Description

During a debug session where a seemingly wrong K8upJobStuck alert kept appearing on a cluster, I found that if there are many backup jobs to the same repo, the likelyhood of wrong stuck alerts is a lot higher.

Additional Context

Currently, the decrease is only called if there's no other job in the queue pointing to the same repository: https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90

The K8upJobStuck check sees if there is a non-zero amount of jobs in the queue for more than 24h. This is to ensure that the operator is actually able to schedule the given jobs.

The way the logic is intended is, when a job is added to the queue https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L79, it increases the counter. When it leaves the queue it should decrease the counter https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90. However the decrease is only triggered if there are no other jobs queued for the same repository.

The fix is to just move the decrease outside of that if case.

Logs

Not applicable.

Expected Behavior

The counter is always decreased when a job is fetched from the queue.

Steps To Reproduce

Have schedules point to the same repository
The schedules trigger at the same time
Get the alert

Version of K8up

v2.5.1

Version of Kubernetes

any

Distribution of Kubernetes

any

The text was updated successfully, but these errors were encountered:

Fixes #751

Fixes #751 Signed-off-by: Simon Beck <simon.beck@vshn.ch>

Kidswiss added the bug Something isn't working label Oct 18, 2022

Kidswiss added a commit that referenced this issue Oct 18, 2022

Decrease the queue counter unconditionally

f28c5ed

Fixes #751

Kidswiss mentioned this issue Oct 18, 2022

Decrease the queue counter unconditionally #752

Merged

4 tasks

Kidswiss added a commit that referenced this issue Oct 18, 2022

Decrease the queue counter unconditionally

049e5f9

Fixes #751 Signed-off-by: Simon Beck <simon.beck@vshn.ch>

ccremer pushed a commit that referenced this issue Oct 20, 2022

Decrease the queue counter unconditionally

d517577

Fixes #751 Signed-off-by: Simon Beck <simon.beck@vshn.ch>

ccremer closed this as completed in #752 Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrongly Decreasing the queued metric #751

Wrongly Decreasing the queued metric #751

Kidswiss commented Oct 18, 2022 •

edited

Loading

Wrongly Decreasing the queued metric #751

Wrongly Decreasing the queued metric #751

Comments

Kidswiss commented Oct 18, 2022 • edited Loading

Description

Additional Context

Logs

Expected Behavior

Steps To Reproduce

Version of K8up

Version of Kubernetes

Distribution of Kubernetes

Kidswiss commented Oct 18, 2022 •

edited

Loading