Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
89501ff
Add cephadm_commands placeholder variable
jovial Jan 9, 2023
50cf847
Update magnum_tag
mnasiadka Jan 11, 2023
00ce3fb
Update magnum_tag
mnasiadka Jan 11, 2023
471c865
Add releasenote
m-bull Jan 10, 2023
b474578
Add releasenote
m-bull Jan 10, 2023
280b215
Update doc/source/configuration/cephadm.rst
jovial Jan 11, 2023
a7dffd4
Merge pull request #331 from stackhpc/wallaby_magnum_bump
mnasiadka Jan 11, 2023
deb6652
Merge pull request #330 from stackhpc/xena_magnum_bump
mnasiadka Jan 11, 2023
4ce07df
Update example
jovial Jan 11, 2023
c9376b6
fix pep8 syntax check
g0rgamesh Jan 11, 2023
5ec350b
Merge pull request #324 from stackhpc/docs/xena/cephadm-commands
markgoddard Jan 12, 2023
47c2fc5
RabbitMQ: fix HA rollout docs
markgoddard Jan 11, 2023
a5aee95
docs: RabbitMQ HA: add known issues section
markgoddard Jan 12, 2023
29cab8a
Merge pull request #334 from stackhpc/xena-wallaby-merge
markgoddard Jan 12, 2023
41490de
Merge pull request #336 from stackhpc/xena-rmq-ha-docs
markgoddard Jan 16, 2023
86c4cd5
Use Rocky 8.7 repositories
m-bull Jan 17, 2023
9dd9d93
Add releasenote
m-bull Jan 17, 2023
72e2f72
Patch edk2-ovmf file to fix UEFI support
priteau Jan 18, 2023
f2541d0
Merge pull request #339 from stackhpc/edk2-ovmf-uefi
priteau Jan 18, 2023
4eaa416
Deploy Nova images with edk2-ovmf UEFI fix
priteau Jan 19, 2023
b360311
Merge pull request #338 from stackhpc/rocky-8-7
m-bull Jan 19, 2023
08920c5
CI: Workaround container image builder power cycle issue
markgoddard Jan 19, 2023
251b4c5
Merge pull request #340 from stackhpc/wallaby-builder-workaround
markgoddard Jan 19, 2023
12bdfe5
Fix small nits with existing release notes
priteau Jan 19, 2023
2e3aad7
Add release note for UEFI instance launch issues
priteau Jan 19, 2023
dc7c0e2
Small fix to rabbit docs
MoteHue Jan 19, 2023
7e6b2d3
Merge pull request #344 from stackhpc/xena-rabbit-docs
markgoddard Jan 19, 2023
9cc0916
Merge pull request #341 from stackhpc/edk2-ovmf-uefi
priteau Jan 19, 2023
800d5ac
Merge branch 'stackhpc/wallaby' into xena-wallaby-merge
priteau Jan 20, 2023
7189ed2
Bump nova tag for Xena
priteau Jan 20, 2023
b81a95f
Merge pull request #345 from stackhpc/xena-wallaby-merge
priteau Jan 25, 2023
fa41237
Merge stackhpc/xena into stackhpc/yoga
priteau Jan 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/stackhpc-container-image-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,14 @@ jobs:
sudo ip l set dummy1 up
sudo ip l set dummy1 master breth1

# FIXME: Without this workaround we see the following issue after the runner is power cycled:
# TASK [MichaelRigart.interfaces : RedHat | ensure network service is started and enabled] ***
# Unable to start service network: Job for network.service failed because the control process exited with error code.
# See \"systemctl status network.service\" and \"journalctl -xe\" for details.
- name: Kill dhclient (workaround)
run: |
(sudo killall dhclient || true) && sudo systemctl restart network

- name: Install Kayobe
run: |
mkdir -p venvs &&
Expand Down
16 changes: 16 additions & 0 deletions doc/source/configuration/cephadm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,22 @@ for Cinder, Cinder backup, Glance, and Nova in Kolla Ansible.
mgr: "profile rbd pool=images"
state: present

Ceph Commands
~~~~~~~~~~~~~

It is possible to run an arbitrary list of commands against the cluster after deployment
by setting the ``cephadm_commands`` variable. ``cephadm_commands`` should be a list of commands
to pass to ``cephadm shell -- ceph``. For example:

.. code:: yaml

# A list of commands to pass to cephadm shell -- ceph. See stackhpc.cephadm.commands
# for format.
cephadm_commands:
# Configure Prometheus exporter to listen on a specific interface. The default
# is to listen on all interfaces.
- "config set mgr mgr/prometheus/server_addr 10.0.0.1"

Deployment
==========

Expand Down
98 changes: 82 additions & 16 deletions doc/source/operations/rabbitmq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,29 @@ state of RabbitMQ will also be reset.
Instructions
------------
If you are planning to perform an upgrade, it is recommended to first roll out these changes.

The configuration should be merged with StackHPC Kayobe configuration. If
bringing in the latest changes is not possible for some reason, you may cherry
pick the following changes:

RabbitMQ hammer playbook (all releases):

* ``3933e4520ba512b5bf095a28b791c0bac12c5dd0``
* ``d83cceb2c41c18c2406032dac36cf90e57f37107``
* ``097c98565dd6bd0eb16d49b87e4da7e2f2be3a5c``

RabbitMQ tags (Wallaby):

* ``69c245dc91a2eb4d34590624760c32064c3ac07b``

RabbitMQ tags & HA flag (Xena):

* ``2fd1590eb8ac739a07ad9cccbefc7725ea1a3855``

RabbitMQ HA flag (Yoga):

* ``31406648544372187352e129d2a3b4f48498267c``

If you are currently running Wallaby, you will need to enable the HA config option in
``etc/kayobe/kolla/globals.yml``.

Expand All @@ -50,13 +73,20 @@ If you are running Wallaby or Xena, synchronise the Pulp containers.

.. code-block:: console

kayobe playbook run etc/kayobe/ansible/pulp-container-sync.yml pulp-container-publish.yml -e stackhpc_pulp_images_kolla_filter=rabbitmq
kayobe playbook run etc/kayobe/ansible/pulp-container-sync.yml etc/kayobe/ansible/pulp-container-publish.yml -e stackhpc_pulp_images_kolla_filter=rabbitmq

Ensure that Kolla Ansible is up to date.

.. code-block:: console

kayobe control host bootstrap

Generate the new config files for the overcloud services.
Generate the new config files for the overcloud services. This ensures that
queues are created as durable.

.. code-block:: console

kayobe overcloud service configuration generate
kayobe overcloud service configuration generate --node-config-dir /etc/kolla

Pull the RabbitMQ container image.

Expand All @@ -68,7 +98,7 @@ Stop all the OpenStack services which use RabbitMQ.

.. code-block:: console

kayobe overcloud host command run --command "docker ps -a | egrep '(barbican|blazar|ceilometer|cinder|cloudkitty|designate|heat|ironic|keystone|magnum|manila|masakari|neutron|nova|octavia)' | awk '{ print $NF }' | xargs docker stop"
kayobe overcloud host command run --command "docker ps -a | egrep '(barbican|blazar|ceilometer|cinder|cloudkitty|designate|heat|ironic|keystone|magnum|manila|masakari|neutron|nova|octavia)' | awk '{ print \$NF }' | xargs docker stop"

Upgrade RabbitMQ.

Expand All @@ -77,33 +107,69 @@ Upgrade RabbitMQ.
kayobe overcloud service upgrade -kt rabbitmq --skip-prechecks

In order to convert the queues to be durable, you will need to reset the state
of RabbitMQ, and restart the services which use it. This can be done with the
RabbitMQ hammer playbook:
of RabbitMQ. This can be done with the RabbitMQ hammer playbook:

.. code-block:: console

kayobe playbook run stackhpc-kayobe-config/etc/kayobe/ansible/rabbitmq-reset.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rabbitmq-reset.yml --skip-tags restart-openstack

The hammer playbook only targets the services which are known to have issues
when RabbitMQ breaks. You will still need to start the remaining services:
Check to see if RabbitMQ is functioning as expected.

.. code-block:: console

kayobe overcloud host command run --command "docker ps -a | egrep '(barbican|blazar|ceilometer|cloudkitty|designate|manila|masakari|octavia)' | awk '{ print $NF }' | xargs docker start"
kayobe overcloud host command run --limit controllers --show-output --command 'docker exec rabbitmq rabbitmqctl cluster_status'

Check to see if RabbitMQ is functioning as expected.
The cluster status should list all controllers.

Check to see if all OpenStack queues and exchanges have been removed from the RabbitMQ cluster.

.. code-block:: console

kayobe overcloud host command run --limit controllers --show-output --command 'docker exec rabbitmq rabbitmqctl list_queues name'
kayobe overcloud host command run --limit controllers --show-output --command 'docker exec rabbitmq rabbitmqctl list_exchanges name'

Start the OpenStack services which use RabbitMQ. Note that this will start all
matching services, even if they weren't running prior to starting this
procedure.

.. code-block:: console

kayobe overcloud host command run --show-output --command 'docker exec rabbitmq rabbitmqctl cluster_status'
kayobe overcloud host command run --show-output --command 'docker exec rabbitmq rabbitmqctl list_queues name durable'
kayobe overcloud host command run --command "docker ps -a | egrep '(barbican|blazar|ceilometer|cinder|cloudkitty|designate|heat|ironic|keystone|magnum|manila|masakari|neutron|nova|octavia)' | awk '{ print \$NF }' | xargs docker start"

Check to see if the expected queues are durable.

.. code-block:: console

The cluster status should list all controllers. The queues listed should be
durable if their names do not start with the following:
kayobe overcloud host command run --limit controllers --show-output --command 'docker exec rabbitmq rabbitmqctl list_queues name durable'

The queues listed should be durable if their names do not start with the
following:

* amq.
* .\*\_fanout\_
* reply\_

If there are issues with the services after this, particularly during upgrades,
you may find it useful to reuse the hammer playbook.
you may find it useful to reuse the hammer playbook, ``rabbitmq-reset.yml``.

Known issues
------------

If there are any OpenStack services running without durable queues enabled
while the RabbitMQ cluster is reset, they are likely to create non-durable
queues before the other OpenStack services start. This leads to an error
such as the following when other OpenStack services start::

Unable to connect to AMQP server on <IP>:5672 after inf tries:
Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'durable'
for exchange 'neutron' in vhost '/': received 'true' but current is
'false': amqp.exceptions.PreconditionFailed: Exchange.declare: (406)
PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'neutron' in
vhost '/': received 'true' but current is 'false'

This may happen if a host is not in the inventory, leading to them not being
targeted by the ``docker stop`` command. If this does happen, look for the
hostname of the offending node in the queues created after the RabbitMQ reset.

Once the rogue services have been stopped, reset the RabbitMQ cluster again to
clear the queues.
4 changes: 4 additions & 0 deletions etc/kayobe/cephadm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ cephadm_cluster_network: "{{ storage_mgmt_net_name | net_cidr }}"
# List of Cephx keys. See stackhpc.cephadm.keys role for format.
#cephadm_keys:

# A list of commands to pass to cephadm shell -- ceph. See stackhpc.cephadm.commands
# for format.
#cephadm_commands:

###############################################################################
# Kolla Ceph auto-configuration.

Expand Down
8 changes: 8 additions & 0 deletions etc/kayobe/kolla.yml
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,14 @@ kolla_build_blocks:
&& grafana-cli plugins install grafana-piechart-panel
ironic_inspector_header: |
ADD additions-archive /
nova_base_footer: |
# Fix for https://bugs.launchpad.net/nova/+bug/1955035, i.e.
# https://bugzilla.redhat.com/show_bug.cgi?id=2090752 on c8s
{% raw %}
{% if base_package_type == 'rpm' %}
RUN sed -i 's/"pc-q35-rhel8.5.0"/"pc-q35-*"/' /usr/share/qemu/firmware/50-edk2-ovmf-cc.json
{% endif %}
{% endraw %}

# Dict mapping image customization variable names to their values.
# Each variable takes the form:
Expand Down
5 changes: 5 additions & 0 deletions etc/kayobe/pulp-repo-versions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,9 @@ stackhpc_pulp_repo_rocky_8_6_baseos_version: 20220914T080246
stackhpc_pulp_repo_rocky_8_6_extras_version: 20220904T041706
stackhpc_pulp_repo_rocky_8_6_nfv_version: 20220918T035853
stackhpc_pulp_repo_rocky_8_6_powertools_version: 20220918T035853
stackhpc_pulp_repo_rocky_8_7_appstream_version: 20221201T192704
stackhpc_pulp_repo_rocky_8_7_baseos_version: 20221202T032715
stackhpc_pulp_repo_rocky_8_7_extras_version: 20221201T192704
stackhpc_pulp_repo_rocky_8_7_nfv_version: 20221202T032715
stackhpc_pulp_repo_rocky_8_7_powertools_version: 20221202T032715
stackhpc_pulp_repo_mlnx_ofed_5_7_1_0_2_0_rhel8_6_version: 20220920T151419
4 changes: 2 additions & 2 deletions etc/kayobe/pulp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -179,8 +179,8 @@ stackhpc_pulp_sync_centos_stream8: "{{ os_distribution == 'centos' }}"

# Whether to sync Rocky Linux 8 packages.
stackhpc_pulp_sync_rocky_8: "{{ os_distribution == 'rocky' }}"
# Rocky 8 minor version number. Supported values: 6.
stackhpc_pulp_repo_rocky_8_minor_version: 6
# Rocky 8 minor version number. Supported values: 6, 7
stackhpc_pulp_repo_rocky_8_minor_version: 7
# Rocky 8 Snapshot versions. The defaults use the appropriate version from
# pulp-repo-versions.yml for the selected minor release.
stackhpc_pulp_repo_rocky_8_appstream_version: "{{ lookup('vars', 'stackhpc_pulp_repo_rocky_8_%s_appstream_version' % stackhpc_pulp_repo_rocky_8_minor_version) }}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ fixes:
- |
Fixes CoreDNS for Magnum clusters crashing on startup.
- |
Allows cinder-csi nodeplugin to start on the same Magnum cluster host as
Allows cinder-csi nodeplugin to start on the same Magnum cluster host as
cinder-csi controllerplugin.
- |
Corrects ClusterRole rules for Magnum cluster-autoscaler, and sets
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
fixes:
- |
Fix creation of VM instances with UEFI enabled and Secure Boot disabled.
4 changes: 4 additions & 0 deletions releasenotes/notes/rocky-87-rpms-999565263dacde42.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
features:
- |
Sync Rocky Linux 8.7 RPM repositories to local Pulp servers.