Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc from openstack admin guide #1083

Open
wants to merge 21 commits into
base: stackhpc/2023.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/configuration/cephadm.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _cephadm-kayobe:

================
Cephadm & Kayobe
================
Expand Down
2 changes: 2 additions & 0 deletions doc/source/configuration/release-train.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _stackhpc-release-train:

======================
StackHPC Release Train
======================
Expand Down
35 changes: 34 additions & 1 deletion doc/source/configuration/wazuh.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,20 @@
Wazuh
=====

`Wazuh <https://wazuh.com>`_ is a security monitoring platform.
It monitors for:

* Security-related system events.
* Known vulnerabilities (CVEs) in versions of installed software.
* Misconfigurations in system security.

The short version
=================

#. Create an infrastructure VM for the Wazuh manager, and add it to the wazuh-manager group
#. Configure the infrastructure VM with kayobe: ``kayobe infra vm host configure``
#. Edit your config under
``etc/kayobe/inventory/group_vars/wazuh-manager/wazuh-manager``, in
``$KAYOBE_CONFIG_PATHinventory/group_vars/wazuh-manager/wazuh-manager``, in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``$KAYOBE_CONFIG_PATHinventory/group_vars/wazuh-manager/wazuh-manager``, in
``$KAYOBE_CONFIG_PATH/inventory/group_vars/wazuh-manager/wazuh-manager``, in

particular the defaults assume that the ``provision_oc_net`` network will be
used.
#. Generate secrets: ``kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-secrets.yml``
Expand Down Expand Up @@ -234,9 +241,12 @@ You may need to modify some of the variables, including:
- etc/kayobe/wazuh-manager.yml
- etc/kayobe/inventory/group_vars/wazuh/wazuh-agent/wazuh-agent

You'll need to run ``wazuh-manager.yml`` playbook again to apply customisation.

Secrets
-------

Wazuh requires that secrets or passwords are set for itself and the services with which it communiticates.
Wazuh secrets playbook is located in ``etc/kayobe/ansible/wazuh-secrets.yml``.
Running this playbook will generate and put pertinent security items into secrets
vault file which will be placed in ``$KAYOBE_CONFIG_PATH/wazuh-secrets.yml``.
Expand All @@ -252,6 +262,10 @@ It will be used by wazuh secrets playbook to generate wazuh secrets vault file.
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-secrets.yml
ansible-vault encrypt --vault-password-file ~/vault.pass $KAYOBE_CONFIG_PATH/wazuh-secrets.yml

.. note:: Use ``ansible-vault`` to view the secrets:

``ansible-vault view --vault-password-file ~/vault.password $KAYOBE_CONFIG_PATH/inventory/group_vars/wazuh-manager/wazuh-secrets.yml``

Configure Wazuh Dashboard's Server Host
---------------------------------------

Expand Down Expand Up @@ -392,6 +406,25 @@ Deploy the Wazuh agents:

``kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-agent.yml``

The Wazuh Agent is deployed to all hosts in the ``wazuh-agent``
inventory group, comprising the ``seed`` group
plus the ``overcloud`` group (containing all hosts in the
OpenStack control plane).

.. code-block:: ini

[wazuh-agent:children]
seed
overcloud

The hosts running Wazuh Agent should automatically be registered
and visible within the Wazuh Manager dashboard.

.. note:: It is good practice to use a `Kayobe deploy hook
<https://docs.openstack.org/kayobe/latest/custom-ansible-playbooks.html#hooks>`_
to automate deployment and configuration of the Wazuh Agent
following a run of ``kayobe overcloud host configure``.

Verification
------------

Expand Down
279 changes: 279 additions & 0 deletions doc/source/operations/ceph-management.rst
seunghun1ee marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
===========================
Managing and Operating Ceph
===========================

Working with Cephadm
====================

This documentation provides guide for Ceph operations. For deploying Ceph,
please refer to :ref:`cephadm-kayobe` documentation.

cephadm configuration location
------------------------------

In kayobe-config repository, under ``etc/kayobe/cephadm.yml`` (or in a specific
Kayobe environment when using multiple environment, e.g.
``etc/kayobe/environments/<Environment Name>/cephadm.yml``)

StackHPC's cephadm Ansible collection relies on multiple inventory groups:

- ``mons``
- ``mgrs``
- ``osds``
- ``rgws`` (optional)

Those groups are usually defined in ``etc/kayobe/inventory/groups``.

Running cephadm playbooks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Running cephadm playbooks
Running Cephadm playbooks

or

Suggested change
Running cephadm playbooks
Running ``cephadm`` playbooks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other uses of "cephadm"

-------------------------

In kayobe-config repository, under ``etc/kayobe/ansible`` there is a set of
cephadm based playbooks utilising stackhpc.cephadm Ansible Galaxy collection.

- ``cephadm.yml`` - runs the end to end process starting with deployment and
defining EC profiles/crush rules/pools and users
- ``cephadm-crush-rules.yml`` - defines Ceph crush rules according
- ``cephadm-deploy.yml`` - runs the bootstrap/deploy playbook without the
additional playbooks
- ``cephadm-ec-profiles.yml`` - defines Ceph EC profiles
- ``cephadm-gather-keys.yml`` - gather Ceph configuration and keys and populate
kayobe-config
- ``cephadm-keys.yml`` - defines Ceph users/keys
- ``cephadm-pools.yml`` - defines Ceph pools\
Comment on lines +33 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order here has changed a bit since this was written, and there are some new playbooks that run custom commands. The list is here: https://github.com/stackhpc/stackhpc-kayobe-config/blob/stackhpc/2023.1/etc/kayobe/ansible/cephadm.yml


Running Ceph commands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get someone with more Ceph experience to review the rest of this file to check it for accuracy.

---------------------

Ceph commands are usually run inside a ``cephadm shell`` utility container:

.. code-block:: console

# From storage host
sudo cephadm shell

Operating a cluster requires a keyring with an admin access to be available for Ceph
commands. Cephadm will copy such keyring to the nodes carrying
`_admin <https://docs.ceph.com/en/latest/cephadm/host-management/#special-host-labels>`__
label - present on MON servers by default when using
`StackHPC Cephadm collection <https://github.com/stackhpc/ansible-collection-cephadm>`__.

Adding a new storage node
-------------------------

Add a node to a respective group (e.g. osds) and run ``cephadm-deploy.yml``
playbook.

.. note::
To add other node types than osds (mons, mgrs, etc) you need to specify
``-e cephadm_bootstrap=True`` on playbook run.

Removing a storage node
-----------------------

First drain the node

.. code-block:: console

# From storage host
sudo cephadm shell
ceph orch host drain <host>

Once all daemons are removed - you can remove the host:

.. code-block:: console

# From storage host
sudo cephadm shell
ceph orch host rm <host>

And then remove the host from inventory (usually in
``etc/kayobe/inventory/overcloud``)

Additional options/commands may be found in
`Host management <https://docs.ceph.com/en/latest/cephadm/host-management/>`_

Replacing failing drive
-----------------------

A failing drive in a Ceph cluster will cause OSD daemon to crash.
In this case Ceph will go into `HEALTH_WARN` state.
Ceph can report details about failed OSDs by running:

.. code-block:: console

# From storage host
sudo cephadm shell
ceph health detail

.. note ::

Remember to run ceph/rbd commands from within ``cephadm shell``
(preferred method) or after installing Ceph client. Details in the
official `documentation <https://docs.ceph.com/en/latest/cephadm/install/#enable-ceph-cli>`__.
It is also required that the host where commands are executed has admin
Ceph keyring present - easiest to achieve by applying
`_admin <https://docs.ceph.com/en/latest/cephadm/host-management/#special-host-labels>`__
label (Ceph MON servers have it by default when using
`StackHPC Cephadm collection <https://github.com/stackhpc/ansible-collection-cephadm>`__).

A failed OSD will also be reported as down by running:

.. code-block:: console

ceph osd tree

Note the ID of the failed OSD.

The failed disk is usually logged by the Linux kernel too:

.. code-block:: console

# From storage host
dmesg -T

Cross-reference the hardware device and OSD ID to ensure they match.
(Using `pvs` and `lvs` may help make this connection).

See upstream documentation:
https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd

In case where disk holding DB and/or WAL fails, it is necessary to recreate
all OSDs that are associated with this disk - usually NVMe drive. The
following single command is sufficient to identify which OSDs are tied to
which physical disks:

.. code-block:: console

ceph device ls

Once OSDs on failed disks are identified, follow procedure below.

If rebooting a Ceph node, first set ``noout`` to prevent excess data
movement:

.. code-block:: console

# From storage host
sudo cephadm shell
ceph osd set noout

Reboot the node and replace the drive

Unset noout after the node is back online

.. code-block:: console

# From storage host
sudo cephadm shell
ceph osd unset noout

Remove the OSD using Ceph orchestrator command:

.. code-block:: console

# From storage host
sudo cephadm shell
ceph orch osd rm <ID> --replace

After removing OSDs, if the drives the OSDs were deployed on once again become
available, cephadm may automatically try to deploy more OSDs on these drives if
they match an existing drivegroup spec.
If this is not your desired action plan - it's best to modify the drivegroup
spec before (``cephadm_osd_spec`` variable in ``etc/kayobe/cephadm.yml``).
Either set ``unmanaged: true`` to stop cephadm from picking up new disks or
modify it in some way that it no longer matches the drives you want to remove.

Host maintenance
----------------

https://docs.ceph.com/en/latest/cephadm/host-management/#maintenance-mode

Upgrading
---------

https://docs.ceph.com/en/latest/cephadm/upgrade/


Troubleshooting
===============

Inspecting a Ceph Block Device for a VM
---------------------------------------

To find out what block devices are attached to a VM, go to the hypervisor that
it is running on (an admin-level user can see this from ``openstack server
show``).

On this hypervisor, enter the libvirt container:

.. code-block:: console

# From hypervisor host
docker exec -it nova_libvirt /bin/bash

Find the VM name using libvirt:

.. code-block:: console

(nova-libvirt)[root@compute-01 /]# virsh list
Id Name State
------------------------------------
1 instance-00000001 running

Now inspect the properties of the VM using ``virsh dumpxml``:

.. code-block:: console

(nova-libvirt)[root@compute-01 /]# virsh dumpxml instance-00000001 | grep rbd
<source protocol='rbd' name='<nova rbd pool>/51206278-e797-4153-b720-8255381228da_disk'>

On a Ceph node, the RBD pool can be inspected and the volume extracted as a RAW
block image:

.. code-block:: console

# From storage host
sudo cephadm shell
rbd ls <nova rbd pool>
rbd export <nova rbd pool>/51206278-e797-4153-b720-8255381228da_disk blob.raw

The raw block device (blob.raw above) can be mounted using the loopback device.

Inspecting a QCOW Image using LibGuestFS
----------------------------------------

The virtual machine's root image can be inspected by installing
libguestfs-tools and using the guestfish command:

.. code-block:: console

# From storage host
export LIBGUESTFS_BACKEND=direct
guestfish -a blob.qcow
><fs> run
100% [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 00:00
><fs> list-filesystems
/dev/sda1: ext4
><fs> mount /dev/sda1 /
><fs> ls /
bin
boot
dev
etc
home
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
><fs> quit
Loading