Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator metric for backup failures and successes #112

Open
JamesLaverack opened this issue Dec 4, 2019 · 0 comments
Open

Add operator metric for backup failures and successes #112

JamesLaverack opened this issue Dec 4, 2019 · 0 comments

Comments

@JamesLaverack
Copy link
Contributor

JamesLaverack commented Dec 4, 2019

Users of the operator want to monitor backup failures and successes, in particular to alert on failed backups or a lack of successful ones.

Design

A metric will be added to the exposed operator metrics as a counter of successes and failures for EtcdBackupSchedule resources. This counter will be labelled by the namespace and name of the EtcdBackupSchedule resource.

Other options

Instrumenting all backups

All backups could be counted by building our counter from EtcdBackup resources directly. However as the backup resource has no unique name to operate on, and has only a list of endpoints, there's no good way to provide a unique identity of which cluster is being backed up.

Without labels on the metric it would be hard to identify from a dashboard or alert which etcd cluster (if there are multiple) is failing to backup.

Not using a metric

Alternatively, all of this information is available in the Kubernetes API anyway via a status field on EtcdBackup resources. However this relies on an Kuberntes administrator using and configuring something like kube-state-metrics to support alerts and dashboards on this data.

@adamhosier adamhosier mentioned this issue Dec 4, 2019
13 tasks
@cheahjs cheahjs mentioned this issue Feb 12, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant