Skip to content

Commit

Permalink
Add playbook
Browse files Browse the repository at this point in the history
  • Loading branch information
talal committed Aug 18, 2020
1 parent d4b7288 commit 58cc938
Show file tree
Hide file tree
Showing 7 changed files with 67 additions and 69 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- Operator can be disabled for a specific alert rule.
- `playbook` label to absent metric alerts.
- `keep-labels` flag for specifying which labels to carry over from alert
rules.

## [0.1.0] - 2020-08-13

Expand Down
94 changes: 25 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,24 @@

> Project status: **alpha**. The API and user facing objects may change.
In this document:

- [Overview](#overview)
- [Motivation](#motivation)
- [Installation](#installation)
- [Pre\-compiled binaries and Docker images](#pre-compiled-binaries-and-docker-images)
- [Building from source](#building-from-source)
- [Usage](#usage)
- [Disable for specific alerts](#disable-for-specific-alerts)
- [Caveat](#caveat)
- [Metrics](#metrics)
- [Absent metric alert definition](#absent-metric-alert-definition)
- [Template](#template)
- [Labels](#labels)
- [Defaults](#defaults)
- [Carry over from original alert rule](#carry-over-from-original-alert-rule)
- [Tier and service](#tier-and-service)
- [Tier and service](#tier-and-service)

In other documents:

- [Operator's Playbook](./doc/playbook.md)

## Overview

The absent metrics operator is a companion operator for the [Prometheus
Operator](https://github.com/prometheus-operator/prometheus-operator).
Expand Down Expand Up @@ -81,74 +85,21 @@ annotations:
description: The metric 'foo_bar' is missing. 'ImportantAlert' alert using it may not fire as intended.
```

## Installation

### Pre-compiled binaries and Docker images

See the latest [release](https://github.com/sapcc/absent-metrics-operator/releases/latest).

### Building from source

The only required build dependency is [Go](https://golang.org/).

```
$ git clone https://github.com/sapcc/absent-metrics-operator.git
$ cd absent-metrics-operator
$ make install
```

This will put the binary in `/usr/bin/`.

Alternatively, you can also build directly with the `go get` command:

```
$ go get -u github.com/sapcc/absent-metrics-operator
```

This will put the binary in `$GOPATH/bin/`.

## Usage

```
$ absent-metrics-operator --kubeconfig="$KUBECONFIG"
```
We provide pre-compiled binaries and container images. See the latest
[release](https://github.com/sapcc/absent-metrics-operator/releases/latest).

`kubeconfig` flag is only required if running outside a cluster.
Alternatively, you can build with `make`, install with `make install`, `go get`, or
`docker build`.

For detailed usage instructions:
For usage instructions:

```
$ absent-metrics-operator --help
```

### Disable for specific alerts

You can disable the operator for a specific `PrometheusRule` resource by adding
the following label to it:

```yaml
absent-metrics-operator/disable: true
```

If you want to disable the operator for only a specific alert rule instead of
all the alerts in a `PrometheusRule`, you can add the following label to the
alert rule:

```yaml
alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
no_alert_on_absence: true
...
```

#### Caveat

If you disable the operator for a specific alert or a specific
`PrometheusRule`, however there are other alerts or `PrometheusRules` which
have alert definitions that use the same metric(s) then the absent metric
alerts for those metric(s) will be created regardless.
You can disable the the operator for a specific `PrometheusRule` or a specific alert definition, refer to the [operator's Playbook](./doc/playbook.md) for more info.

### Metrics

Expand Down Expand Up @@ -203,21 +154,26 @@ Then the alert name would be `AbsentOsLimesSuccessfulScrapesRate5m`.
The following labels are always present on every absent metric alert rule:

- `severity` is alway `info`.
- `playbook` provides a [link](./doc/playbook.md) to documentation that can be
referenced on how to deal with an absent metric alert.

#### Carry over from original alert rule

You can specify which labels to carry over from the original alert rule by
specifying a comma-separated list of labels to the `--keep-labels` flag. The
default value for this flag is `service,tier`.

##### Tier and service
#### Tier and service

`tier` and `service` labels are carried over from the original alert rule
unless those labels use templating (i.e. use `$labels`), in which case the
default `tier` and `service` values will be used.
`tier` and `service` labels are a special case they are carried over from the
original alert rule unless those labels use templating (i.e. use `$labels`), in
which case the default `tier` and `service` values will be used.

The operator determines a default `tier` and `service` for a specific
Prometheus server in a namespace by traversing through all the alert rule
definitions for that Prometheus server in that namespace. It chooses the most
common `tier` and `service` label combination that is used across those alerts
as the default values.

The value of these labels are also for used (if enabled with `keep-labels`) in
the name for the absent metric alert. See [template](#Template).
35 changes: 35 additions & 0 deletions doc/playbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Operator's Playbook

This document assumes that you have already read and understood the [general
README](../README.md). If not, start reading there.

### Disable for specific alerts

You can disable the operator for a specific `PrometheusRule` resource by adding
the following label to it:

```yaml
absent-metrics-operator/disable: "true"
```

If you want to disable the operator for only a specific alert rule instead of
all the alerts in a `PrometheusRule`, you can add the `no_alert_on_absence`
label to the alert rule. For example:

```yaml
alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
no_alert_on_absence: "true"
...
```

**Note**: make sure that you use `"true"` and not `true`.

#### Caveat

If you disable the operator for a specific alert or a specific
`PrometheusRule`, however there are other alerts or `PrometheusRules` which
have alert definitions that use the same metric(s) then the absent metric
alerts for those metric(s) will be created regardless.
1 change: 1 addition & 0 deletions internal/controller/alert_rule.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ func (c *Controller) ParseAlertRule(tier, service string, in monitoringv1.Rule)
// Default labels
lab := map[string]string{
"severity": "info",
"playbook": "https://git.io/absent-metrics-operator-playbook",
}

// Carry over labels from the original alert
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ var kepLab = map[string]string{
"tier": "os",
"service": "keppel",
"severity": "info",
"playbook": "https://git.io/absent-metrics-operator-playbook",
}

// ResMgmtK8sAbsentPromRule represents the PrometheusRule that should be
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ var limesLab = map[string]string{
"tier": "os",
"service": "limes",
"severity": "info",
"playbook": "https://git.io/absent-metrics-operator-playbook",
}

// ResMgmtOSAbsentPromRule represents the PrometheusRule that should be
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ var swiftLab = map[string]string{
"tier": "os",
"service": "swift",
"severity": "info",
"playbook": "https://git.io/absent-metrics-operator-playbook",
}

// SwiftOSAbsentPromRule represents the PrometheusRule that should be generated
Expand Down

0 comments on commit 58cc938

Please sign in to comment.