From b9cf79860871aa662809813cbbb5bde193c32166 Mon Sep 17 00:00:00 2001 From: Pierre Riteau Date: Thu, 10 Mar 2022 12:15:51 +0100 Subject: [PATCH] Document how to silence Prometheus alerts --- source/operations_and_monitoring.rst | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/source/operations_and_monitoring.rst b/source/operations_and_monitoring.rst index 7a2733b..d06200b 100644 --- a/source/operations_and_monitoring.rst +++ b/source/operations_and_monitoring.rst @@ -36,6 +36,8 @@ Kolla-Ansible and can be extracted from the encrypted passwords file kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^grafana_admin_password +.. _prometheus-alertmanager: + Access to Prometheus Alertmanager ================================= @@ -290,6 +292,32 @@ Alerts are defined in code and stored in Kayobe configuration. See ``*.rules`` files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add custom rules. +Silencing Prometheus Alerts +--------------------------- + +Sometimes alerts must be silenced because the root cause cannot be resolved +right away, such as when hardware is faulty. For example, an unreachable +hypervisor will produce several alerts: + +* ``InstanceDown`` from Node Exporter +* ``OpenStackServiceDown`` from the OpenStack exporter, which reports status of + the ``nova-compute`` agent on the host +* ``PrometheusTargetMissing`` from several Prometheus exporters + +Rather than silencing each alert one by one for a specific host, a silence can +apply to multiple alerts using a reduced list of labels. :ref:`Log into +Alertmanager `, click on the ``Silence`` button next +to an alert and adjust the matcher list to keep only ``instance=`` +label. Then, create another silence to match ``hostname=`` (this is +required because, for the OpenStack exporter, the instance is the host running +the monitoring service rather than the host being monitored). + +.. note:: + + After creating the silence, you may get redirected to a 404 page. This is a + `known issue `__ + when running several Alertmanager instances behind HAProxy. + Control Plane Shutdown Procedure ================================