Gaps in K8up metrics #976

simu · 2024-05-29T14:39:03Z

Description

Currently, K8up only emits timeseries for schedules for which it's seen at least one Job with the matching completion state, e.g. k8up_jobs_successful_counter will only have timeseries for schedules which have at least one successful job since the last K8up restart.

For schedules with relatively low frequency (e.g. 1/day) this can lead to significant gaps in the metric in Prometheus which confuses Prometheus functions such as rate() which otherwise can compensate for counter resets due to pod restarts.

Additional Context

No response

Logs

No response

Expected Behavior

K8up initializes the counter metrics (k8up_jobs_failed_counter, k8up_jobs_successful_counter, and k8up_jobs_total) with value 0 for all job types and all namespaces in which a Schedule exists immediately after startup.

Steps To Reproduce

Create a Schedule
Check K8up's /metrics endpoint and observe that there's no k8up_jobs_* timeseries for the namespace of the new Schedule until a first job runs.

Version of K8up

v2.7.2

Version of Kubernetes

v1.27.13

Distribution of Kubernetes

OpenShift 4

The text was updated successfully, but these errors were encountered:

mhutter · 2024-05-29T14:46:46Z

For my understanding, the problem would be fixed if K8up would emit those metrics with labels for all existing schedules and a value of 0?

simu · 2024-05-30T07:06:43Z

For my understanding, the problem would be fixed if K8up would emit those metrics with labels for all existing schedules and a value of 0?

Yes, (if I understand Prometheus's behavior correctly) emitting metrics with labels for all existing schedules and value 0 until the first job is observed would let Prometheus correctly identify the counter resets.

simu added the bug Something isn't working label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaps in K8up metrics #976

Gaps in K8up metrics #976

simu commented May 29, 2024 •

edited

Loading

mhutter commented May 29, 2024

simu commented May 30, 2024

Gaps in K8up metrics #976

Gaps in K8up metrics #976

Comments

simu commented May 29, 2024 • edited Loading

Description

Additional Context

Logs

Expected Behavior

Steps To Reproduce

Version of K8up

Version of Kubernetes

Distribution of Kubernetes

mhutter commented May 29, 2024

simu commented May 30, 2024

simu commented May 29, 2024 •

edited

Loading