You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a debug session where a seemingly wrong K8upJobStuck alert kept appearing on a cluster, I found that if there are many backup jobs to the same repo, the likelyhood of wrong stuck alerts is a lot higher.
The K8upJobStuck check sees if there is a non-zero amount of jobs in the queue for more than 24h. This is to ensure that the operator is actually able to schedule the given jobs.
Description
During a debug session where a seemingly wrong
K8upJobStuck
alert kept appearing on a cluster, I found that if there are many backup jobs to the same repo, the likelyhood of wrong stuck alerts is a lot higher.Additional Context
Currently, the decrease is only called if there's no other job in the queue pointing to the same repository: https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90
The
K8upJobStuck
check sees if there is a non-zero amount of jobs in the queue for more than 24h. This is to ensure that the operator is actually able to schedule the given jobs.The way the logic is intended is, when a job is added to the queue https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L79, it increases the counter. When it leaves the queue it should decrease the counter https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90. However the decrease is only triggered if there are no other jobs queued for the same repository.
The fix is to just move the decrease outside of that if case.
Logs
Expected Behavior
The counter is always decreased when a job is fetched from the queue.
Steps To Reproduce
Version of K8up
v2.5.1
Version of Kubernetes
any
Distribution of Kubernetes
any
The text was updated successfully, but these errors were encountered: