Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions source/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,6 @@ A command that must be run within the Bifrost service container, hosted on the s

A command that can be run (as superuser) from a running compute instance.

``monasca#``

A command that must be run with OpenStack control plane admin credentials
loaded, and the Monasca client and supporting modules available (whether in a
virtualenv or installed in the OS libraries).

Glossary of Terms
-----------------

Expand Down Expand Up @@ -130,12 +124,6 @@ Glossary of Terms
Multi-Chassis Link Aggregate - a method of providing multi-pathing and
multi-switch redundancy in layer-2 networks.

Monasca
OpenStack’s monitoring service (“Monitoring as a Service at Scale”).
Logging, telemetry and events from the infrastructure, control plane and
user projects can be submitted and processed by Monasca.
https://docs.openstack.org/monasca-api/latest/

Neutron
OpenStack’s networking service.
https://docs.openstack.org/neutron/latest/
Expand Down
189 changes: 46 additions & 143 deletions source/operations_and_monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Operations and Monitoring
Access to Kibana
================

OpenStack control plane logs are aggregated from all servers by Monasca and
OpenStack control plane logs are aggregated from all servers by Fluentd and
stored in ElasticSearch. The control plane logs can be accessed from
ElasticSearch using Kibana, which is available at the following URL:
|kibana_url|

To login, use the ``kibana`` user. The password is auto-generated by
To log in, use the ``kibana`` user. The password is auto-generated by
Kolla-Ansible and can be extracted from the encrypted passwords file
(|kolla_passwords|):

Expand All @@ -24,19 +24,32 @@ Kolla-Ansible and can be extracted from the encrypted passwords file
Access to Grafana
=================

Monasca metrics can be visualised in Grafana dashboards. Monasca Grafana can be
Control plane metrics can be visualised in Grafana dashboards. Grafana can be
found at the following address: |grafana_url|

Grafana uses Keystone authentication. To login, use valid OpenStack user
credentials.
To log in, use the |grafana_username| user. The password is auto-generated by
Kolla-Ansible and can be extracted from the encrypted passwords file
(|kolla_passwords|):

.. code-block:: console
:substitutions:

kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^grafana_admin_password

Access to Prometheus Alertmanager
=================================

To visualise control plane metrics, you will need one of the following roles in
the ``monasca_control_plane`` project:
Control plane alerts can be visualised and managed in Alertmanager, which can
be found at the following address: |alertmanager_url|

* ``admin``
* ``monasca-user``
* ``monasca-read-only-user``
* ``monasca-editor``
To log in, use the ``admin`` user. The password is auto-generated by
Kolla-Ansible and can be extracted from the encrypted passwords file
(|kolla_passwords|):

.. code-block:: console
:substitutions:

kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^prometheus_alertmanager_password

Migrating virtual machines
==========================
Expand Down Expand Up @@ -246,6 +259,7 @@ Monitoring

* `Back up InfluxDB <https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/>`__
* `Back up ElasticSearch <https://www.elastic.co/guide/en/elasticsearch/reference/current/backup-cluster-data.html>`__
* `Back up Prometheus <https://prometheus.io/docs/prometheus/latest/querying/api/#snapshot>`__

Seed
----
Expand All @@ -260,137 +274,21 @@ Ansible control host
Control Plane Monitoring
========================

Monasca has been configured to collect logs and metrics across the control
plane. It provides a single point where control plane monitoring and telemetry
data can be analysed and correlated.

Metrics are collected per server via the `Monasca Agent
<https://opendev.org/openstack/monasca-agent>`__. The Monasca Agent is deployed
and configured by Kolla Ansible.

Logging to Monasca is done via a `Fluentd output plugin
<https://github.com/monasca/fluentd-monasca>`__.

Configuring Monasca Alerts
--------------------------

Generating Metrics from Specific Log Messages
+++++++++++++++++++++++++++++++++++++++++++++

If you wish to generate alerts for specific log messages, you must first
generate metrics from those log messages. Metrics are generated from the
transformed logs queue in Kafka. The Monasca log metrics service reads log
messages from this queue, transforms them into metrics and then writes them to
the metrics queue.

The rules which govern this transformation are defined in the logstash config
file. This file can be configured via kayobe. To do this, edit
``etc/kayobe/kolla/config/monasca/log-metrics.conf``, for example:

.. code-block:: text

# Create events from specific log signatures
filter {
if "Another thread already created a resource provider" in [log][message] {
mutate {
add_field => { "[log][dimensions][event]" => "hat" }
}
} else if "My string here" in [log][message] {
mutate {
add_field => { "[log][dimensions][event]" => "my_new_alert" }
}
}

Reconfigure Monasca:

.. code-block:: text

kayobe# kayobe overcloud service reconfigure --kolla-tags monasca

Verify that logstash doesn't complain about your modification. On each node
running the ``monasca-log-metrics`` service, the logs can be inspected in the
Kolla logs directory, under the ``logstash`` folder:
``/var/log/kolla/logstash``.

Metrics will now be generated from the configured log messages. To generate
alerts/notifications from your new metric, follow the next section.

Generating Monasca Alerts from Metrics
++++++++++++++++++++++++++++++++++++++
The control plane has been configured to collect logs centrally using the EFK
stack (Elasticsearch, Fluentd and Kibana).

Firstly, we will configure alarms and notifications. This should be done via
the Monasca client. More detailed documentation is available in the `Monasca
API specification
<https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms>`__.
This document provides an overview of common use-cases.
Telemetry monitoring of the control plane is performed by Prometheus. Metrics
are collected by Prometheus exporters, which are either running on all hosts
(e.g. node exporter), on specific hosts (e.g. controllers for the memcached
exporter or monitoring hosts for the OpenStack exporter). These exporters are
scraped by the Prometheus server.

To create a Slack notification, first obtain the URL for the notification hook
from Slack, and configure the notification as follows:
Configuring Prometheus Alerts
-----------------------------

.. code-block:: console

monasca# monasca notification-create stackhpc_slack SLACK https://hooks.slack.com/services/UUID

You can view notifications at any time by invoking:

.. code-block:: console

monasca# monasca notification-list

To create an alarm with an associated notification:

.. code-block:: console

monasca# monasca alarm-definition-create multiple_nova_compute \
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
--description "Multiple nova compute instances detected" \
--severity HIGH --alarm-actions $NOTIFICATION_ID

By default one alarm will be created for all hosts. This is typically useful
when you are looking at the overall state of some hosts. For example in the
screenshot below the ``db_mon_log_high_mem_usage`` alarm has previously
triggered on a number of hosts, but is currently below threshold.

If you wish to have an alarm created per host you can use the ``--match-by``
option and specify the hostname dimension. For example:

.. code-block:: console

monasca# monasca alarm-definition-create multiple_nova_compute \
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
--description "Multiple nova compute instances detected" \
--severity HIGH --alarm-actions $NOTIFICATION_ID
--match-by hostname

Creating an alarm per host can be useful when alerting on one off events such
as log messages which need to be actioned individually. Once the issue has been
investigated and fixed, the alarm can be deleted on a per host basis.

For example, in the case of monitoring for file system corruption one might
define a metric from the system logs alerting on XFS file system corruption, or
ECC memory errors. These metrics may only be generated once, but it is
important that they are not ignored. Therefore, in the example below, the last
operator is used so that the alarm is evaluated against the last metric
associated with the log message. Since for log metrics the value of this metric
is always greater than 0, this alarm can only be reset by deleting it (which
can be accomplished by clicking on the dustbin icon in Monasca Grafana). By
ensuring that the alarm has to be manually deleted and will not reset to the OK
status, important errors can be tracked.

.. code-block:: console

monasca# monasca alarm-definition-create xfs_errors \
'(last(log.event.xfs_errors_detected{}, deterministic)>0)' \
--description "XFS errors detected on host" \
--severity HIGH --alarm-actions $NOTIFICATION_ID \
--match-by hostname

It is also possible to update existing alarms. For example, to update, or add
multiple notifications to an alarm:

.. code-block:: console

monasca# monasca alarm-definition-patch $ALARM_ID --alarm-actions $NOTIFICATION_ID --alarm-actions $NOTIFICATION_ID_2
Alerts are defined in code and stored in Kayobe configuration. See ``*.rules``
files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add
custom rules.

Control Plane Shutdown Procedure
================================
Expand Down Expand Up @@ -684,21 +582,26 @@ perform the following cleanup procedure regularly:

Elasticsearch indexes retention
===============================
To enable and alter default rotation values for Elasticsearch Curator edit ``${KAYOBE_CONFIG_PATH}/kolla/globals.yml`` - This applies both to Monasca and Central Logging configurations.

To enable and alter default rotation values for Elasticsearch Curator, edit
``${KAYOBE_CONFIG_PATH}/kolla/globals.yml``:

.. code-block:: console

# Allow Elasticsearch Curator to apply a retention policy to logs
enable_elasticsearch_curator: true

# Duration after which index is closed
elasticsearch_curator_soft_retention_period_days: 90

# Duration after which index is deleted
elasticsearch_curator_hard_retention_period_days: 180

Reconfigure elasticsearch with new values:
Reconfigure Elasticsearch with new values:

.. code-block:: console

kayobe overcloud service reconfigure --kolla-tags elasticsearch --kolla-skip-tags common --skip-precheck
kayobe overcloud service reconfigure --kolla-tags elasticsearch

For more information see `upstream documentation <https://docs.openstack.org/kolla-ansible/ussuri/reference/logging-and-monitoring/central-logging-guide.html#curator>`__
For more information see the `upstream documentation
<https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html#curator>`__.
2 changes: 2 additions & 0 deletions source/vars.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.. |alertmanager_url| replace:: https://openstack.acme.example:9093
.. |base_path| replace:: ~/kayobe-env
.. |control_host_access| replace:: |control_host| is used as the Ansible control host. Each operator uses their own account on this host, but with a shared SSH key stored as ``~/.ssh/id_rsa``.
.. |control_host| replace:: acme-seed-hypervisor
Expand All @@ -8,6 +9,7 @@
.. |flavor_name| replace:: m1.tiny
.. |floating_ip_access| replace:: from acme-seed-hypervisor and the rest of the Acme network
.. |grafana_url| replace:: https://openstack.acme.example:3000
.. |grafana_username| replace:: ``grafana_local_admin``
.. |horizon_access| replace:: via the Internet.
.. |horizon_theme_clone_url| replace:: https://github.com/acme-openstack/horizon-theme.git
.. |horizon_theme_name| replace:: acme
Expand Down