From f55a8816e2f7ac8fef19c0de6165a3e5dbd481d3 Mon Sep 17 00:00:00 2001
From: Pierre Riteau <pierre@stackhpc.com>
Date: Mon, 13 Dec 2021 18:04:58 +0100
Subject: [PATCH 1/3] Replace Monasca by EFK/Prometheus

---
 source/introduction.rst              |  12 --
 source/operations_and_monitoring.rst | 175 +++++++--------------------
 source/vars.rst                      |   2 +
 3 files changed, 43 insertions(+), 146 deletions(-)

diff --git a/source/introduction.rst b/source/introduction.rst
index 738f68c..8d40113 100644
--- a/source/introduction.rst
+++ b/source/introduction.rst
@@ -60,12 +60,6 @@ A command that must be run within the Bifrost service container, hosted on the s
 
 A command that can be run (as superuser) from a running compute instance.
 
-``monasca#``
-
-A command that must be run with OpenStack control plane admin credentials
-loaded, and the Monasca client and supporting modules available (whether in a
-virtualenv or installed in the OS libraries).
-
 Glossary of Terms
 -----------------
 
@@ -130,12 +124,6 @@ Glossary of Terms
       Multi-Chassis Link Aggregate - a method of providing multi-pathing and
       multi-switch redundancy in layer-2 networks.
 
-    Monasca
-      OpenStack’s monitoring service (“Monitoring as a Service at Scale”).
-      Logging, telemetry and events from the infrastructure, control plane and
-      user projects can be submitted and processed by Monasca.
-      https://docs.openstack.org/monasca-api/latest/
-
     Neutron
       OpenStack’s networking service.
       https://docs.openstack.org/neutron/latest/
diff --git a/source/operations_and_monitoring.rst b/source/operations_and_monitoring.rst
index c564cfa..9809e77 100644
--- a/source/operations_and_monitoring.rst
+++ b/source/operations_and_monitoring.rst
@@ -7,12 +7,12 @@ Operations and Monitoring
 Access to Kibana
 ================
 
-OpenStack control plane logs are aggregated from all servers by Monasca and
+OpenStack control plane logs are aggregated from all servers by Fluentd and
 stored in ElasticSearch. The control plane logs can be accessed from
 ElasticSearch using Kibana, which is available at the following URL:
 |kibana_url|
 
-To login, use the ``kibana`` user. The password is auto-generated by
+To log in, use the ``kibana`` user. The password is auto-generated by
 Kolla-Ansible and can be extracted from the encrypted passwords file
 (|kolla_passwords|):
 
@@ -24,19 +24,32 @@ Kolla-Ansible and can be extracted from the encrypted passwords file
 Access to Grafana
 =================
 
-Monasca metrics can be visualised in Grafana dashboards. Monasca Grafana can be
+Control plane metrics can be visualised in Grafana dashboards. Grafana can be
 found at the following address: |grafana_url|
 
-Grafana uses Keystone authentication. To login, use valid OpenStack user
-credentials.
+To log in, use the |grafana_username| user. The password is auto-generated by
+Kolla-Ansible and can be extracted from the encrypted passwords file
+(|kolla_passwords|):
+
+.. code-block:: console
+   :substitutions:
+
+   kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^grafana_admin_password
 
-To visualise control plane metrics, you will need one of the following roles in
-the ``monasca_control_plane`` project:
+Access to Prometheus Alertmanager
+=================================
+
+Control plane alerts can be visualised and managed in Alertmanager, which can
+be found at the following address: |alertmanager_url|
+
+To log in, use the ``admin`` user. The password is auto-generated by
+Kolla-Ansible and can be extracted from the encrypted passwords file
+(|kolla_passwords|):
+
+.. code-block:: console
+   :substitutions:
 
-* ``admin``
-* ``monasca-user``
-* ``monasca-read-only-user``
-* ``monasca-editor``
+   kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^prometheus_alertmanager_password
 
 Migrating virtual machines
 ==========================
@@ -246,6 +259,7 @@ Monitoring
 
 * `Back up InfluxDB <https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/>`__
 * `Back up ElasticSearch <https://www.elastic.co/guide/en/elasticsearch/reference/current/backup-cluster-data.html>`__
+* `Back up Prometheus <https://prometheus.io/docs/prometheus/latest/querying/api/#snapshot>`__
 
 Seed
 ----
@@ -260,137 +274,30 @@ Ansible control host
 Control Plane Monitoring
 ========================
 
-Monasca has been configured to collect logs and metrics across the control
-plane. It provides a single point where control plane monitoring and telemetry
-data can be analysed and correlated.
-
-Metrics are collected per server via the `Monasca Agent
-<https://opendev.org/openstack/monasca-agent>`__. The Monasca Agent is deployed
-and configured by Kolla Ansible.
+The control plane has been configured to collect logs centrally using the EFK
+stack (Elasticsearch, Fluentd and Kibana).
 
-Logging to Monasca is done via a `Fluentd output plugin
-<https://github.com/monasca/fluentd-monasca>`__.
+Monitoring of the control plane is performed by Prometheus. Metrics are
+collected by Prometheus exporters, which are either running on all hosts (e.g.
+node exporter), on specific hosts (e.g. controllers for the memcached exporter
+or monitoring hosts for the OpenStack exporter). These exporters are scrapped
+by the Prometheus server.
 
-Configuring Monasca Alerts
---------------------------
+Configuring Prometheus Alerts
+-----------------------------
 
 Generating Metrics from Specific Log Messages
 +++++++++++++++++++++++++++++++++++++++++++++
 
-If you wish to generate alerts for specific log messages, you must first
-generate metrics from those log messages. Metrics are generated from the
-transformed logs queue in Kafka. The Monasca log metrics service reads log
-messages from this queue, transforms them into metrics and then writes them to
-the metrics queue.
-
-The rules which govern this transformation are defined in the logstash config
-file. This file can be configured via kayobe. To do this, edit
-``etc/kayobe/kolla/config/monasca/log-metrics.conf``, for example:
-
-.. code-block:: text
-
-   # Create events from specific log signatures
-   filter {
-     if "Another thread already created a resource provider" in [log][message] {
-       mutate {
-         add_field => { "[log][dimensions][event]" => "hat" }
-       }
-     } else if "My string here" in [log][message] {
-       mutate {
-         add_field => { "[log][dimensions][event]" => "my_new_alert" }
-       }
-    }
-
-Reconfigure Monasca:
-
-.. code-block:: text
-
-   kayobe# kayobe overcloud service reconfigure --kolla-tags monasca
-
-Verify that logstash doesn't complain about your modification. On each node
-running the ``monasca-log-metrics`` service, the logs can be inspected in the
-Kolla logs directory, under the ``logstash`` folder:
-``/var/log/kolla/logstash``.
-
-Metrics will now be generated from the configured log messages. To generate
-alerts/notifications from your new metric, follow the next section.
-
-Generating Monasca Alerts from Metrics
-++++++++++++++++++++++++++++++++++++++
-
-Firstly, we will configure alarms and notifications. This should be done via
-the Monasca client. More detailed documentation is available in the `Monasca
-API specification
-<https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms>`__.
-This document provides an overview of common use-cases.
+This feature from the Monasca logging and alerting pipeline must be transposed
+to Fluentd and Prometheus.
 
-To create a Slack notification, first obtain the URL for the notification hook
-from Slack, and configure the notification as follows:
-
-.. code-block:: console
-
-   monasca# monasca notification-create stackhpc_slack SLACK https://hooks.slack.com/services/UUID
-
-You can view notifications at any time by invoking:
-
-.. code-block:: console
-
-   monasca# monasca notification-list
-
-To create an alarm with an associated notification:
-
-.. code-block:: console
-
-   monasca# monasca alarm-definition-create multiple_nova_compute \
-            '(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
-            --description "Multiple nova compute instances detected" \
-            --severity HIGH --alarm-actions $NOTIFICATION_ID
-
-By default one alarm will be created for all hosts. This is typically useful
-when you are looking at the overall state of some hosts. For example in the
-screenshot below the ``db_mon_log_high_mem_usage`` alarm has previously
-triggered on a number of hosts, but is currently below threshold.
-
-If you wish to have an alarm created per host you can use the ``--match-by``
-option and specify the hostname dimension. For example:
-
-.. code-block:: console
-
-   monasca# monasca alarm-definition-create multiple_nova_compute \
-            '(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
-            --description "Multiple nova compute instances detected" \
-            --severity HIGH --alarm-actions $NOTIFICATION_ID
-            --match-by hostname
-
-Creating an alarm per host can be useful when alerting on one off events such
-as log messages which need to be actioned individually. Once the issue has been
-investigated and fixed, the alarm can be deleted on a per host basis.
-
-For example, in the case of monitoring for file system corruption one might
-define a metric from the system logs alerting on XFS file system corruption, or
-ECC memory errors. These metrics may only be generated once, but it is
-important that they are not ignored. Therefore, in the example below, the last
-operator is used so that the alarm is evaluated against the last metric
-associated with the log message. Since for log metrics the value of this metric
-is always greater than 0, this alarm can only be reset by deleting it (which
-can be accomplished by clicking on the dustbin icon in Monasca Grafana). By
-ensuring that the alarm has to be manually deleted and will not reset to the OK
-status, important errors can be tracked.
-
-.. code-block:: console
-
-   monasca# monasca alarm-definition-create xfs_errors \
-            '(last(log.event.xfs_errors_detected{}, deterministic)>0)' \
-            --description "XFS errors detected on host" \
-            --severity HIGH --alarm-actions $NOTIFICATION_ID \
-            --match-by hostname
-
-It is also possible to update existing alarms. For example, to update, or add
-multiple notifications to an alarm:
-
-.. code-block:: console
+Generating Alerts from Metrics
+++++++++++++++++++++++++++++++
 
-   monasca# monasca alarm-definition-patch $ALARM_ID --alarm-actions $NOTIFICATION_ID --alarm-actions $NOTIFICATION_ID_2
+Alerts are defined in code and stored in Kayobe configuration. See ``*.rules``
+files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add
+custom rules.
 
 Control Plane Shutdown Procedure
 ================================
diff --git a/source/vars.rst b/source/vars.rst
index f6faba6..a1ffb5d 100644
--- a/source/vars.rst
+++ b/source/vars.rst
@@ -1,3 +1,4 @@
+.. |alertmanager_url| replace:: https://openstack.acme.example:9093
 .. |base_path| replace:: ~/kayobe-env
 .. |control_host_access| replace:: |control_host| is used as the Ansible control host. Each operator uses their own account on this host, but with a shared SSH key stored as ``~/.ssh/id_rsa``.
 .. |control_host| replace:: acme-seed-hypervisor
@@ -8,6 +9,7 @@
 .. |flavor_name| replace:: m1.tiny
 .. |floating_ip_access| replace:: from acme-seed-hypervisor and the rest of the Acme network
 .. |grafana_url| replace:: https://openstack.acme.example:3000
+.. |grafana_username| replace:: ``grafana_local_admin``
 .. |horizon_access| replace:: via the Internet.
 .. |horizon_theme_clone_url| replace:: https://github.com/acme-openstack/horizon-theme.git
 .. |horizon_theme_name| replace:: acme

From 8f134c09c6555c59ee2f65b39ef51ddefd8bf7a6 Mon Sep 17 00:00:00 2001
From: Pierre Riteau <pierre@stackhpc.com>
Date: Mon, 13 Dec 2021 18:14:40 +0100
Subject: [PATCH 2/3] Update Elasticsearch Curator information

---
 source/operations_and_monitoring.rst | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/source/operations_and_monitoring.rst b/source/operations_and_monitoring.rst
index 9809e77..f6b36e8 100644
--- a/source/operations_and_monitoring.rst
+++ b/source/operations_and_monitoring.rst
@@ -591,21 +591,26 @@ perform the following cleanup procedure regularly:
 
 Elasticsearch indexes retention
 ===============================
-To enable and alter default rotation values for Elasticsearch Curator edit ``${KAYOBE_CONFIG_PATH}/kolla/globals.yml`` - This applies both to Monasca and Central Logging configurations.
+
+To enable and alter default rotation values for Elasticsearch Curator, edit
+``${KAYOBE_CONFIG_PATH}/kolla/globals.yml``:
 
 .. code-block:: console
 
    # Allow Elasticsearch Curator to apply a retention policy to logs
    enable_elasticsearch_curator: true
+
    # Duration after which index is closed
    elasticsearch_curator_soft_retention_period_days: 90
+
    # Duration after which index is deleted
    elasticsearch_curator_hard_retention_period_days: 180
 
-Reconfigure elasticsearch with new values:
+Reconfigure Elasticsearch with new values:
 
 .. code-block:: console
 
-   kayobe overcloud service reconfigure --kolla-tags elasticsearch --kolla-skip-tags common --skip-precheck
+   kayobe overcloud service reconfigure --kolla-tags elasticsearch
 
-For more information see `upstream documentation <https://docs.openstack.org/kolla-ansible/ussuri/reference/logging-and-monitoring/central-logging-guide.html#curator>`__
+For more information see the `upstream documentation
+<https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html#curator>`__.

From 77b01ee4b82b751941cd97078fda84cb3c329efd Mon Sep 17 00:00:00 2001
From: Pierre Riteau <pierre@stackhpc.com>
Date: Wed, 15 Dec 2021 22:49:35 +0100
Subject: [PATCH 3/3] Update following Stig's comments

---
 source/operations_and_monitoring.rst | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/source/operations_and_monitoring.rst b/source/operations_and_monitoring.rst
index f6b36e8..ee48607 100644
--- a/source/operations_and_monitoring.rst
+++ b/source/operations_and_monitoring.rst
@@ -277,24 +277,15 @@ Control Plane Monitoring
 The control plane has been configured to collect logs centrally using the EFK
 stack (Elasticsearch, Fluentd and Kibana).
 
-Monitoring of the control plane is performed by Prometheus. Metrics are
-collected by Prometheus exporters, which are either running on all hosts (e.g.
-node exporter), on specific hosts (e.g. controllers for the memcached exporter
-or monitoring hosts for the OpenStack exporter). These exporters are scrapped
-by the Prometheus server.
+Telemetry monitoring of the control plane is performed by Prometheus. Metrics
+are collected by Prometheus exporters, which are either running on all hosts
+(e.g.  node exporter), on specific hosts (e.g. controllers for the memcached
+exporter or monitoring hosts for the OpenStack exporter). These exporters are
+scraped by the Prometheus server.
 
 Configuring Prometheus Alerts
 -----------------------------
 
-Generating Metrics from Specific Log Messages
-+++++++++++++++++++++++++++++++++++++++++++++
-
-This feature from the Monasca logging and alerting pipeline must be transposed
-to Fluentd and Prometheus.
-
-Generating Alerts from Metrics
-++++++++++++++++++++++++++++++
-
 Alerts are defined in code and stored in Kayobe configuration. See ``*.rules``
 files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add
 custom rules.