Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrongly Decreasing the queued metric #751

Closed
Kidswiss opened this issue Oct 18, 2022 · 0 comments · Fixed by #752
Closed

Wrongly Decreasing the queued metric #751

Kidswiss opened this issue Oct 18, 2022 · 0 comments · Fixed by #752
Labels
bug Something isn't working

Comments

@Kidswiss
Copy link
Contributor

Kidswiss commented Oct 18, 2022

Description

During a debug session where a seemingly wrong K8upJobStuck alert kept appearing on a cluster, I found that if there are many backup jobs to the same repo, the likelyhood of wrong stuck alerts is a lot higher.

Additional Context

Currently, the decrease is only called if there's no other job in the queue pointing to the same repository: https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90

The K8upJobStuck check sees if there is a non-zero amount of jobs in the queue for more than 24h. This is to ensure that the operator is actually able to schedule the given jobs.

The way the logic is intended is, when a job is added to the queue https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L79, it increases the counter. When it leaves the queue it should decrease the counter https://github.com/k8up-io/k8up/blob/master/operator/queue/execution.go#L90. However the decrease is only triggered if there are no other jobs queued for the same repository.

The fix is to just move the decrease outside of that if case.

Logs

Not applicable.

Expected Behavior

The counter is always decreased when a job is fetched from the queue.

Steps To Reproduce

  1. Have schedules point to the same repository
  2. The schedules trigger at the same time
  3. Get the alert

Version of K8up

v2.5.1

Version of Kubernetes

any

Distribution of Kubernetes

any

@Kidswiss Kidswiss added the bug Something isn't working label Oct 18, 2022
Kidswiss added a commit that referenced this issue Oct 18, 2022
Kidswiss added a commit that referenced this issue Oct 18, 2022
Fixes #751

Signed-off-by: Simon Beck <simon.beck@vshn.ch>
ccremer pushed a commit that referenced this issue Oct 20, 2022
Fixes #751

Signed-off-by: Simon Beck <simon.beck@vshn.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant