Skip to content

Commit

Permalink
Consolidate configuration and rules docs in docs/configuration/
Browse files Browse the repository at this point in the history
  • Loading branch information
grobie committed Oct 27, 2017
1 parent 4d30a11 commit f432b81
Show file tree
Hide file tree
Showing 8 changed files with 112 additions and 11 deletions.
98 changes: 98 additions & 0 deletions docs/configuration/alerting_rules.md
@@ -0,0 +1,98 @@
---
title: Alerting rules
sort_rank: 3
---

# Alerting rules

Alerting rules allow you to define alert conditions based on Prometheus
expression language expressions and to send notifications about firing alerts
to an external service. Whenever the alert expression results in one or more
vector elements at a given point in time, the alert counts as active for these
elements' label sets.

Alerting rules are configured in Prometheus in the same way as [recording
rules](recording_rules.md).

### Defining alerting rules

Alerting rules are defined in the following syntax:

ALERT <alert name>
IF <expression>
[ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <label set> ]

The alert name must be a valid metric name.

The optional `FOR` clause causes Prometheus to wait for a certain duration
between first encountering a new expression output vector element (like an
instance with a high HTTP error rate) and counting an alert as firing for this
element. Elements that are active, but not firing yet, are in pending state.

The `LABELS` clause allows specifying a set of additional labels to be attached
to the alert. Any existing conflicting labels will be overwritten. The label
values can be templated.

The `ANNOTATIONS` clause specifies another set of labels that are not
identifying for an alert instance. They are used to store longer additional
information such as alert descriptions or runbook links. The annotation values
can be templated.

#### Templating

Label and annotation values can be templated using [console templates](https://prometheus.io/docs/visualization/consoles).
The `$labels` variable holds the label key/value pairs of an alert instance
and `$value` holds the evaluated value of an alert instance.

# To insert a firing element's label values:
{{ $labels.<labelname> }}
# To insert the numeric expression value of the firing element:
{{ $value }}

Examples:

# Alert for any instance that is unreachable for >5 minutes.
ALERT InstanceDown
IF up == 0
FOR 5m
LABELS { severity = "page" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.",
}

# Alert for any instance that have a median request latency >1s.
ALERT APIHighRequestLatency
IF api_http_request_latencies_second{quantile="0.5"} > 1
FOR 1m
ANNOTATIONS {
summary = "High request latency on {{ $labels.instance }}",
description = "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)",
}

### Inspecting alerts during runtime

To manually inspect which alerts are active (pending or firing), navigate to
the "Alerts" tab of your Prometheus instance. This will show you the exact
label sets for which each defined alert is currently active.

For pending and firing alerts, Prometheus also stores synthetic time series of
the form `ALERTS{alertname="<alert name>", alertstate="pending|firing", <additional alert labels>}`.
The sample value is set to `1` as long as the alert is in the indicated active
(pending or firing) state, and a single `0` value gets written out when an alert
transitions from active to inactive state. Once inactive, the time series does
not get further updates.

### Sending alert notifications

Prometheus's alerting rules are good at figuring what is broken *right now*, but
they are not a fully-fledged notification solution. Another layer is needed to
add summarization, notification rate limiting, silencing and alert dependencies
on top of the simple alert definitions. In Prometheus's ecosystem, the
[Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this
role. Thus, Prometheus may be configured to periodically send information about
alert states to an Alertmanager instance, which then takes care of dispatching
the right notifications. The Alertmanager instance may be configured via the
`-alertmanager.url` command line flag.
4 changes: 2 additions & 2 deletions docs/configuration.md → docs/configuration/configuration.md
@@ -1,6 +1,6 @@
---
title: Configuration
sort_rank: 3
sort_rank: 1
---

# Configuration
Expand All @@ -10,7 +10,7 @@ the command-line flags configure immutable system parameters (such as storage
locations, amount of data to keep on disk and in memory, etc.), the
configuration file defines everything related to scraping [jobs and their
instances](https://prometheus.io/docs/concepts/jobs_instances/), as well as
which [rule files to load](querying/rules.md#configuring-rules).
which [rule files to load](recording_rules.md#configuring-rules).

To view all available command-line flags, run `prometheus -h`.

Expand Down
4 changes: 4 additions & 0 deletions docs/configuration/index.md
@@ -0,0 +1,4 @@
---
title: Configuration
sort_rank: 3
---
@@ -1,6 +1,6 @@
---
title: Recording rules
sort_rank: 6
sort_rank: 2
---

# Defining recording rules
Expand All @@ -9,10 +9,9 @@ sort_rank: 6

Prometheus supports two types of rules which may be configured and then
evaluated at regular intervals: recording rules and [alerting
rules](https://prometheus.io/docs/alerting/rules/). To include rules in
Prometheus, create a file containing the necessary rule statements and have
Prometheus load the file via the `rule_files` field in the [Prometheus
configuration](../configuration.md).
rules](alerting_rules.md). To include rules in Prometheus, create a file
containing the necessary rule statements and have Prometheus load the file via
the `rule_files` field in the [Prometheus configuration](configuration.md).

The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus
process. The changes are only applied if all rule files are well-formatted.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started.md
Expand Up @@ -56,7 +56,7 @@ scrape_configs:
```

For a complete specification of configuration options, see the
[configuration documentation](configuration.md).
[configuration documentation](configuration/configuration.md).

## Starting Prometheus

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Expand Up @@ -13,7 +13,7 @@ The documentation is available alongside all the project documentation at

- [Installing](install.md)
- [Getting started](getting_started.md)
- [Configuration](configuration.md)
- [Configuration](configuration/configuration.md)
- [Querying](querying/basics.md)
- [Storage](storage.md)
- [Federation](federation.md)
2 changes: 1 addition & 1 deletion docs/querying/basics.md
Expand Up @@ -204,7 +204,7 @@ Prometheus's expression browser until the result set seems reasonable
(hundreds, not thousands, of time series at most). Only when you have filtered
or aggregated your data sufficiently, switch to graph mode. If the expression
still takes too long to graph ad-hoc, pre-record it via a [recording
rule](rules.md#recording-rules).
rule](../configuration/recording_rules.md#recording-rules).

This is especially relevant for Prometheus's query language, where a bare
metric name selector like `api_http_requests_total` could expand to thousands
Expand Down
2 changes: 1 addition & 1 deletion docs/storage.md
Expand Up @@ -160,7 +160,7 @@ in the next section.

Case (3) depends on the targets you monitor. To mitigate an unplanned explosion
of the number of series, you can limit the number of samples per individual
scrape (see `sample_limit` in the [scrape config](configuration.md#scrape_config)).
scrape (see `sample_limit` in the [scrape config](configuration/configuration.md#scrape_config)).
If the number of active time series exceeds the number of memory chunks the
Prometheus server can afford, the server will quickly throttle ingestion as
described above. The only way out of this is to give Prometheus more RAM or
Expand Down

0 comments on commit f432b81

Please sign in to comment.