diff --git a/doc/source/configuration/monitoring.rst b/doc/source/configuration/monitoring.rst index 6045f3c29..c2384ba93 100644 --- a/doc/source/configuration/monitoring.rst +++ b/doc/source/configuration/monitoring.rst @@ -7,9 +7,20 @@ Monitoring Configuration StackHPC kayobe config includes a reference monitoring and alerting stack based on Prometheus, Alertmanager, Grafana, Fluentd, Elasticsearch & Kibana. These -services by default come enabled and configured. Central Elasticsearch cluster -collects OpenStack logs, with an option to receive operating system logs too. -In order to enable this, execute custom playbook after deployment: +services by default come enabled and configured. + +Monitoring hosts, usually the controllers, should be added to the monitoring +group. The group definition can be applied in various different places. For +example, this configuration could be added to etc/kayobe/inventory/groups: + +.. code-block:: yaml + + [monitoring:children] + controllers + +Central Elasticsearch cluster collects OpenStack logs, with an option to receive +operating system logs too. In order to enable this, execute custom playbook +after deployment: .. code-block:: console @@ -78,3 +89,41 @@ on the overcloud hosts: SMART reporting should now be enabled along with a Prometheus alert for unhealthy disks and a Grafana dashboard called ``Hardware Overview``. + +Alertmanager and Slack +====================== + +StackHPC Kayobe configuration comes bundled with an array of alerts but does not +enable any receivers for notifications by default. Various receivers can be +configured for Alertmanager. Slack is currently the most common. + +To set up a receiver, create a ``prometheus-alertmanager.yml`` file under +``etc/kayobe/kolla/config/prometheus/``. An example config is stored in this +directory. The example configuration uses two Slack channels. One channel +receives all alerts while the other only receives alerts tagged as critical. It +also adds a silence button to temporarily mute alerts. To use the example in a +deployment, you will need to generate two webhook URLs, one for each channel. + +To generate a slack webhook, `create a new app +`__ in the workspace you want to add alerts to. +From the Features page, toggle Activate incoming webhooks on. Click Add new +webhook to workspace. Pick a channel that the app will post to, then click +Authorise. You only need one app to generate both webhooks. + +Both URLs should be encrypted using ansible vault, as they give anyone access to +your slack channels. The standard practice is to store them in +``kayobe/secrets.yml`` as: + +.. code-block:: yaml + + secrets_slack_notification_channel_url: + secrets_slack_critical_notification_channel_url: + +These should then be set as the ``slack_api_url`` and ``api_url`` for the +regular and critical alerts channels respectively. Both slack channel names will +need to be set, and the proxy URL sould be set or removed. + +If you want to add an alerting rule, there are many good examples of alerts are +available `here `__. They simply +need to be added to one of the ``*.rules`` files in the prometheus configuration +directory. \ No newline at end of file diff --git a/etc/kayobe/kolla/config/prometheus/prometheus-alertmanager.yml.example b/etc/kayobe/kolla/config/prometheus/prometheus-alertmanager.yml.example new file mode 100644 index 000000000..435d63c09 --- /dev/null +++ b/etc/kayobe/kolla/config/prometheus/prometheus-alertmanager.yml.example @@ -0,0 +1,47 @@ +--- +global: + resolve_timeout: 5m + smtp_require_tls: true + slack_api_url: 'https://hooks.slack.com/services/example/alerts/webhook' + +route: + receiver: 'slack-notifications' + group_by: [alertname] + group_wait: 30s + group_interval: 5m + repeat_interval: 4h + + routes: + - matchers: + - severity=~"critical|alert" + receiver: 'slack-critical-notifications' + + +receivers: + - name: 'slack-notifications' + slack_configs: + - channel: '#notifications' + actions: + - type: button + text: 'Silence 🔕' + url: {{ '{% raw %}' }}{% raw %} '{{ template "__alert_silence_link" . }}' +{% endraw %}{{ '{% endraw %}' }} + send_resolved: true + http_config: + proxy_url: http://1.2.3.4:3128 + - name: 'slack-critical-notifications' + slack_configs: + - channel: '#notifications-critical' + actions: + - type: button + text: 'Silence 🔕' + url: {{ '{% raw %}' }}{% raw %} '{{ template "__alert_silence_link" . }}' +{% endraw %}{{ '{% endraw %}' }} + send_resolved: true + http_config: + proxy_url: http://1.2.3.4:3128 + api_url: 'https://hooks.slack.com/services/example/alerts/webhook-critical' + + +templates: + - '/etc/prometheus/*.tmpl' \ No newline at end of file