Skip to content

Ruler Alerts

Lorenzo Mangani edited this page Mar 10, 2023 · 17 revisions

qryn alert manager

qryn v1.3.1+ implements an Alertmanager compatible API to support Grafana Advanced Alerting

The ruler API uses the concept of a "namespace" when creating rule groups. This is a stand-in for the name of the rule file in Prometheus. Rule groups must be named uniquely within a namespace.

API Endpoints

The following endpoints are exposed by the qryn ruler:

GET /api/prom/rules
GET /api/prom/rules/{namespace}
GET /api/prom/rules/{namespace}/{groupName}
POST /api/prom/rules/{namespace}
DELETE /api/prom/rules/{namespace}/{groupName}
DELETE /api/prom/rules/{namespace}

Usage

Configuration

Starting from v1.3.1 qryn support is set for Alerts via Grafana.

Add a Cortex or Loki managed alerting rule

1. In the Grafana Menu, click the Bell icon to open the Alerting page listing existing alerts. 2. Click New alert rule. 3. In Step 1 of the creation dialog, add the rule name, rule type and data source (qryn).

  • In Rule name, add a descriptive name. This name is displayed in the alert rule list. It is also the alertname label for every alert instance that is created from this rule.
  • From the Rule type drop-down, select Cortex / Loki managed alert.
  • From the Select data source drop-down, select an external qryn data source.
  • From the Namespace drop-down, select an existing rule namespace. Otherwise, click Add new and enter a name to create a new one. Namespaces can contain one or more rule groups and only have an organizational purpose.
  • From the Group drop-down, select an existing group within the selected namespace. Otherwise, click Add new and enter a name to create a new one. Newly created rules are appended to the end of the group. Rules within a group are run sequentially at a regular interval, with the same evaluation time.

Step 1 Data Entry

4. In Step 2 of the creation dialog, add the query to evaluate.

  • Enter a Metrics or LogQL expression. The rule fires if the evaluation result has at least one series with a value that is greater than 0.

Query Entry

5. In Step 3 of the creation dialog, add conditions.

  • In the For text box, specify the duration for which the condition must be true before an alert fires. If you specify 5 minutes, the condition must be true for 5 minutes before the alert fires.

Note: Once a condition is met, the alert goes into the Pending state. If the condition remains active for the duration specified, the alert transitions to the Firing state, else it reverts to the Normal state.

6. In Step 4 of the creation dialog, add additional metadata associated with the rule.

  • Add a description and summary to customize alert messages.
  • Add Runbook URL, panel, dashboard, and alert IDs.
  • Add custom labels.

Description and metadata entry

7. Click Save to save the rule or Save and exit to save the rule and go back to the Alerting page.


Example

The following are a few working alert rule examples using both logQL and metrics queries:

LogQL

  • avg_over_time({system="server.e2.central"} | json | unwrap cpu_percent [5s]) > 90 -- CPU over 90% over 5s average
  • rate({system="server.e1.central"} |~ "error" [5s]) > 1 -- Error rate is more than 1 over 5s bucket

Metrics

  • avg_over_time({system="server.e3.west"} | unwrap_value [5s]) > 70 -- The metric measured over 70 in a 5s average

Firing Rule Example

A rule that is firing will appear as Firing state in the UI. Below is an example of what a Firing rule looks like:

Firing Rule Dialog

Note that the rule has been firing for 3s and the type of the data (if configured correctly for the metric see Metrics HTTP API on how to set correct labels ) is indicated as cpu_percent metric. The condition was set deliberately low to 20% to show a Firing rule.