From fb9a53e9c993ae081e8c11f442c62f17d2a2c224 Mon Sep 17 00:00:00 2001 From: Alex-Welsh Date: Mon, 11 Dec 2023 12:24:34 +0000 Subject: [PATCH 1/4] Add password rotation docs page --- doc/source/operations/password-rotation.rst | 521 ++++++++++++++++++++ 1 file changed, 521 insertions(+) create mode 100644 doc/source/operations/password-rotation.rst diff --git a/doc/source/operations/password-rotation.rst b/doc/source/operations/password-rotation.rst new file mode 100644 index 000000000..ec5bdc89f --- /dev/null +++ b/doc/source/operations/password-rotation.rst @@ -0,0 +1,521 @@ +=============== +Secret Rotation +=============== + +General notes +============= + +This guide covers secret rotation in Kayobe and Kolla-Ansible for most services +in a standard deployment. It does not cover every secret. A full list of +passwords that have been successfully rotated is available at the bottom of +this page TODO: Link. + +Many of the secrets can simply be deleted from your passwords.yml and will be +automatically regenerated with a ``kayobe overcloud service deploy``. + +Some secrets require manual input from the operator to change. + +Following this process, there should be no interruption to service for running +VMs, but there will be API downtime for all services. The main reason for the +outage is that RabbitMQ must be completely stopped to change the secrets it +uses. The services must all be reconfigured to use the new RabbitMQ cluster. +Each service will come back once it has been reconfigured. The outage time for +each service is therefore approximately equal to the time between starting a +``kayobe overcloud service deploy``, and that service being reconfigured. + +Some secrets currently have to be regenerated by hand. Make sure you use a +reliable tool and match the formatting (length, character set etc) of the +existing secret. ``pwgen`` is recommended and used as an example throughout +this guide. Installation: + +.. code:: bash + + sudo apt/dnf install pwgen + + +As of writing, there are three upstream patches in the works to make this +process easier. + +1. A change to Kolla, to automate Step 2 of the full method (TODO: link). + + The upstream patch can be found `here + `__. + + This was previously mitigated with a change to the StackHPC fork of + Kolla-Ansible, which has since been reverted due to an unforeseen issue. + +2. A change to Nova, to automate Step 11 of the full method (TODO: link). + + The upstream patch can be found `here + `__. + +3. A change to Kolla-Ansible, to automate Step 13 of the full method (TODO: link). + + The upstream patch can be found `here + `__. + + +Full method +=========== + +.. warning:: + + You **must** back up your passwords.yml before making changes. You will need + to refer back to it later. + +1. Run a tempest refstack & check Kibana/OpenSearch Dashboards to check + the state of the cloud before any changes are made. + +2. Edit your kolla-ansible checkout to include this line within the + kolla_docker dict in + ``ansible/roles/nova/tasks/bootstrap_service.yml`` See + `here `__ + for an example. (If you are using the latest ``stackhpc/yoga`` branch + of kolla-ansible this should already be set) + + .. code:: bash + + command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' + + This change will break new deployments and should be reverted once the process is complete. + +3. Re-install kolla-ansible from source in your kolla-ansible python + environment + +4. Navigate to the directory containing your passwords.yml file + (``kayobe-config/etc/kolla/passwords.yml`` OR + ``kayobe-config/etc/kayobe/environments/envname/kolla/passwords.yml``) + +5. Create a file called deletelist.txt and populate it with this content + (including all whitespace): + +:: + + _keystone_password + _database_password + ^keystone_admin_password + ^memcache_secret_key + ^designate_rndc_key + ^docker_registry_password + ^keepalived_password + ^kibana_password + ^libvirt_sasl_password + ^metadata_secret + ^opensearch_dashboards_password + ^osprofiler_secret + ^prometheus_alertmanager_password + ^qdrouterd_password + ^redis_master_password + ^memcache_secret_key + _ssh_key + + private_key + public_key + ^$ + rabbitmq + ^haproxy_password + +NOTE: The above deletes bifrost secrets. Might need a seed s-d as well + +3. Decrypt your passwords.yml file with ansible vault + +4. Delete all the passwords in the deletion list + + .. code:: bash + + grep -vf deletelist.txt passwords.yml > new-passwords.yml + +5. Check the new file for basic formatting errors. If it looks correct, + replace the existing ``passwords.yml`` file with + ``new-passwords.yml`` + + .. code:: bash + + rm passwords.yml && mv new-passwords.yml passwords.yml + +6. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for + hosts across the cloud. The playbook should exist under + ``kayobe-config/etc/kayobe/ansible/`` if not, merge the latest + ``stackhpc-kayobe-config``. + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml + +7. Update the Pulp password + + 1. Generate a new Pulp password + + .. code:: bash + + pwgen -s 25 1 + + 2. Update secrets_pulp_password (usually found in secrets.yml) + + 3. Deploy changes + + .. code:: bash + + kayobe seed service deploy -t seed-deploy-containers -kt none + + (note you may need to skip docker registry login since the + password will now be ‘incorrect’ e.g. ``-e`` + ``deploy_containers_registry_attempt_login``) + +8. Rotate ``horizon_secret_key``. + + 1. Generate a new secret: + + .. code:: bash + + pwgen -s 40 1 + + 2. Add it to the passwords file, along with the old secret, in this + exact format (including quotes in the middle): + + .. code:: bash + + horizon_secret_key: newsecret' 'oldsecret + + This will allow both the old and new secrets to be used at the + same time, resulting in no interruption to service. The key is + mainly used for generating login and password reset tokens. The + old secret can be deleted & redeployed at a later date once all + users have closed & reopened their sessions. + +9. Update grafana_admin_password + + 1. Generate a new Grafana Admin password + + .. code:: bash + + pwgen -s 40 1 + + 2. Exec into the grafana container on a controller + + .. code:: bash + + sudo docker exec -it grafana bash + + 3. Run the password reset command, then enter the new password + + .. code:: bash + + grafana-cli admin reset-admin-password --password-from-stdin + + 4. Update the value of ``grafana_admin_password`` in passwords.yml + +10. Update the MariaDB database password + + 1. Generate a new secret: + + .. code:: bash + + pwgen -s 40 1 + + 2. Exec into the mariadb container on a controller + + .. code:: bash + + sudo docker exec -it mariadb bash + + 3. Log in to the database. You will be prompted for the password. + Use the existing value of ``database_password`` + + .. code:: bash + + mysql -uroot -p + + 4. Check the current state of the root user + + .. code:: bash + + SELECT Host,User,Password FROM mysql.user WHERE User='root'; + + 5. Update the password for the root user + + .. code:: bash + + SET PASSWORD FOR 'root'@'%' = PASSWORD('newpassword'); + + 6. Check that the password hash has changed in the user list + + .. code:: bash + + SELECT Host,User,Password FROM mysql.user WHERE User='root'; + + 7. If there are any remaining root users with the old password + e.g. ``root@localhost``, change the password for them too. + + 8. update ``database_password`` in ``passwords.yml`` with your new + password. + +:::warning From this point onward, service may be disrupted + +:: + + ::: + +11. Update the nova DB password + + 1. Create a new ``nova_database_password`` and store it in + ``passwords.yml`` + + .. code:: bash + + pwgen -s 40 1 + + 2. Exec into the nova_conductor container + + .. code:: bash + + sudo docker exec -it nova_conductor bash + + 3. List the cells + + .. code:: bash + + nova-manage cell_v2 list_cells --verbose + + 4. Find the entry for cell0, copy the Database Connection value, + replace the password in the string with the new value, and update + it with the following command: + + .. code:: bash + + nova-manage cell_v2 update_cell --cell_uuid 00000000-0000-0000-0000-000000000000 --database_connection "CONNECTION WITH NEW PASSWORD HERE" --transport-url "none:///" + + (If the ``cell_uuid`` for cell0 is not + ``00000000-0000-0000-0000-000000000000``, change the above + command accordingly) + +12. Re-encrypt your ``passwords.yml`` file + +13. Delete the service users in keystone. The exact users will depend on + the deployment. Multinode example: + +:::warning This will immediately cause an API outage + +:: + + Alt: Cherry pick this patch: + + ::: + + ```bash + openstack user delete glance cinder placement nova neutron heat magnum magnum_trustee_domain_admin barbican designate + ``` + +14. Stop services + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/stop-openstack-services.yml + +15. Nuke RabbitMQ + + .. code:: bash + + kayobe overcloud host command run -l controllers --become --command "docker stop rabbitmq && docker rm rabbitmq && docker volume rm rabbitmq" + +16. Reconfigure overcloud services to push changes + +:::warning VMs should continue running, but connections to them will +likely be disrupted when neutron is redeployed + +:: + + ::: + + ```bash + kayobe overcloud service deploy + ``` + +17. Flush the memcached data on all controllers (any old data will now + be inaccessible) + + 1. Install telnet (on one of the controllers) + + .. code:: bash + + sudo apt -y install telnet + + 2. Check the config for the ip and port used by memcache (on every + controller) + + .. code:: bash + + sudo grep command /etc/kolla/memcached/config.json + + The IP and port will be printed after ``-l`` and ``-p`` + respectively. + + 3. For each controller start a telnet session, clear all data, then + exit + + .. code:: bash + + telnet + flush_all + quit + +18. Manually update heat_domain_admin_password + + 1. TODO: Instructions + +19. Re-run tempest to make sure everything has come back + +20. Inform other users of the steps they’ll need to take now that the + secrets have been rotated: + + 1. SSH keys have been rotated, so the new key will have to be + distributed if individual user accounts are used + 2. A PR with the new passwords will need to be merged in to the main + config branch (REMEMBER TO RE-ENCRYPT PASSWORDS.YML BEFORE + COMMITING) + 3. Any existing openrc files generated by Kolla Ansible will need to + be re-generated or edited to use the new Kolla admin password. + +21. At some point in the future (approx 1 week), remove the old horizon + secret key from ``passwords.yml`` and reconfigure horizon. + +Future improvements +------------------- + +- ☐ Regenerate passwords that we think we aren’t using, in case they + are actually being used +- ☐ Add the new database_password to passwords.yml before changing it, + in case it gets lost +- ☐ Stop services before deleting users? (Except for keystone) or allow + setting update_password in kolla-ansible +- ☒ At the end of the procedure write down what others need to do to + use the new passwords +- ☐ Can we get kolla-ansible to generate all/more of the passwords and + reduce the reliance on pwgen? +- ☐ Add a step to remove the old horizon_secret_key and redeploy + horizon at some point after the rotation + +Full password list +------------------- + +:: + + aodh_database_password + aodh_keystone_password + blazar_database_password + blazar_keystone_password + caso_keystone_password + ceilometer_database_password + ceilometer_keystone_password + cinder_database_password + cinder_keystone_password + barbican_database_password + barbican_keystone_password + cloudkitty_database_password + cloudkitty_keystone_password + congress_database_password + congress_keystone_password + cyborg_database_password + cyborg_keystone_password + designate_database_password + designate_keystone_password + freezer_database_password + freezer_keystone_password + glance_database_password + glance_keystone_password + gnocchi_database_password + gnocchi_keystone_password + heat_database_password + heat_keystone_password + horizon_database_password + ironic_database_password + ironic_inspector_database_password + ironic_inspector_keystone_password + ironic_keystone_password + karbor_database_password + karbor_keystone_password + keystone_database_password Check this one + magnum_database_password + manila_database_password + mariadb_backup_database_password + masakari_database_password + mistral_database_password + monasca_database_password + murano_database_password + neutron_database_password + nova_api_database_password + nova_database_password + octavia_database_password + panko_database_password + placement_database_password + prometheus_mysql_exporter_database_password + qinling_database_password + rally_database_password + sahara_database_password + senlin_database_password + solum_database_password + tacker_database_password + trove_database_password + vitrage_database_password + watcher_database_password + zun_database_password + keystone_admin_password + kuryr_keystone_password + magnum_keystone_password + manila_keystone_password + masakari_keystone_password + mistral_keystone_password + monasca_keystone_password + murano_keystone_password + neutron_keystone_password + nova_keystone_password + octavia_keystone_password + panko_keystone_password + rabbitmq_cluster_cookie + rabbitmq_monitoring_password + rabbitmq_password + database_password + heat_domain_admin_password + horizon_secret_key + placement_keystone_password + qinling_keystone_password + sahara_keystone_password + searchlight_keystone_password + senlin_keystone_password + solum_keystone_password + swift_keystone_password + tacker_keystone_password + trove_keystone_password + vitrage_keystone_password + watcher_keystone_password + zun_keystone_password + ceph_rgw_keystone_password + designate_rndc_key + keepalived_password + kibana_password + libvirt_sasl_password + metadata_secret + opensearch_dashboards_password + osprofiler_secret + prometheus_alertmanager_password + qdrouterd_password + grafana_admin_password + docker_registry_password + secrets_pulp_password + redis_master_password + keystone_ssh_key + private_key + public_key + neutron_ssh_key + private_key + public_key + nova_ssh_key + private_key + public_key + octavia_amp_ssh_key + private_key + public_key + bifrost_ssh_key + private_key + public_key + From 327e05282bda9e70e6b00117796612e1c60212fe Mon Sep 17 00:00:00 2001 From: Alex-Welsh Date: Mon, 11 Dec 2023 16:13:49 +0000 Subject: [PATCH 2/4] Update secret rotation docs page --- doc/source/operations/index.rst | 1 + ...sword-rotation.rst => secret-rotation.rst} | 322 +++++++++--------- 2 files changed, 163 insertions(+), 160 deletions(-) rename doc/source/operations/{password-rotation.rst => secret-rotation.rst} (53%) diff --git a/doc/source/operations/index.rst b/doc/source/operations/index.rst index 1327b8c2e..284795d6a 100644 --- a/doc/source/operations/index.rst +++ b/doc/source/operations/index.rst @@ -11,3 +11,4 @@ This guide is for operators of the StackHPC Kayobe configuration project. octavia hotfix-playbook rocky-linux-9 + secret-rotation diff --git a/doc/source/operations/password-rotation.rst b/doc/source/operations/secret-rotation.rst similarity index 53% rename from doc/source/operations/password-rotation.rst rename to doc/source/operations/secret-rotation.rst index ec5bdc89f..63eef3f35 100644 --- a/doc/source/operations/password-rotation.rst +++ b/doc/source/operations/secret-rotation.rst @@ -8,20 +8,22 @@ General notes This guide covers secret rotation in Kayobe and Kolla-Ansible for most services in a standard deployment. It does not cover every secret. A full list of passwords that have been successfully rotated is available at the bottom of -this page TODO: Link. +this page (:ref:`link`). -Many of the secrets can simply be deleted from your passwords.yml and will be -automatically regenerated with a ``kayobe overcloud service deploy``. +Many of the secrets can simply be deleted from your ``passwords.yml`` and will +be automatically regenerated with a ``kayobe overcloud service deploy``. Some secrets require manual input from the operator to change. -Following this process, there should be no interruption to service for running -VMs, but there will be API downtime for all services. The main reason for the -outage is that RabbitMQ must be completely stopped to change the secrets it -uses. The services must all be reconfigured to use the new RabbitMQ cluster. -Each service will come back once it has been reconfigured. The outage time for -each service is therefore approximately equal to the time between starting a -``kayobe overcloud service deploy``, and that service being reconfigured. +Following this process, there may be a few seconds of network downtime for +running VMs when Neutron is reconfigured. + +There will be API downtime for all services. The main reason for the outage is +that RabbitMQ must be completely stopped to change the secrets it uses. The +services must all be reconfigured to use the new RabbitMQ cluster. Each service +will come back once it has been reconfigured. The outage time for each service +is therefore equal to the time between starting a ``kayobe overcloud service +deploy``, and that service being reconfigured. Some secrets currently have to be regenerated by hand. Make sure you use a reliable tool and match the formatting (length, character set etc) of the @@ -34,22 +36,25 @@ this guide. Installation: As of writing, there are three upstream patches in the works to make this -process easier. +process easier. -1. A change to Kolla, to automate Step 2 of the full method (TODO: link). +#. A change to Kolla, to automate :ref:`this` step to change the + extended start for the ``nova-api`` container. The upstream patch can be found `here `__. This was previously mitigated with a change to the StackHPC fork of - Kolla-Ansible, which has since been reverted due to an unforeseen issue. + Kolla-Ansible, which has since been reverted due to an unforeseen issue. -2. A change to Nova, to automate Step 11 of the full method (TODO: link). +#. A change to Nova, to automate :ref:`this` step to change the + nova cell0 database connection string. The upstream patch can be found `here `__. -3. A change to Kolla-Ansible, to automate Step 13 of the full method (TODO: link). +#. A change to Kolla-Ansible, to automate :ref:`this` step to + update service keystone user passwords. The upstream patch can be found `here `__. @@ -60,89 +65,90 @@ Full method .. warning:: - You **must** back up your passwords.yml before making changes. You will need - to refer back to it later. + You **must** back up your ``passwords.yml`` before making changes. You will + need to refer back to it later -1. Run a tempest refstack & check Kibana/OpenSearch Dashboards to check - the state of the cloud before any changes are made. +1. Run a Tempest ``refstack`` & check Kibana/OpenSearch Dashboards to check + the state of the cloud before any changes are made -2. Edit your kolla-ansible checkout to include this line within the - kolla_docker dict in - ``ansible/roles/nova/tasks/bootstrap_service.yml`` See - `here `__ - for an example. (If you are using the latest ``stackhpc/yoga`` branch - of kolla-ansible this should already be set) +.. _kolla-change: - .. code:: bash +2. Edit your Kolla-Ansible checkout to include this line within the + ``kolla_docker`` dict in ``ansible/roles/nova/tasks/bootstrap_service.yml`` See + `here + `__ + for an example. (If you are using the latest ``stackhpc/yoga`` branch of + Kolla-Ansible this should already be set) - command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' + .. code:: + + command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' - This change will break new deployments and should be reverted once the process is complete. + This change will break new deployments and should be reverted once this + process is complete -3. Re-install kolla-ansible from source in your kolla-ansible python +3. Re-install Kolla-Ansible from source in your Kolla-Ansible Python environment -4. Navigate to the directory containing your passwords.yml file +4. Navigate to the directory containing your ``passwords.yml`` file (``kayobe-config/etc/kolla/passwords.yml`` OR ``kayobe-config/etc/kayobe/environments/envname/kolla/passwords.yml``) -5. Create a file called deletelist.txt and populate it with this content +5. Create a file called ``deletelist.txt`` and populate it with this content (including all whitespace): -:: - - _keystone_password - _database_password - ^keystone_admin_password - ^memcache_secret_key - ^designate_rndc_key - ^docker_registry_password - ^keepalived_password - ^kibana_password - ^libvirt_sasl_password - ^metadata_secret - ^opensearch_dashboards_password - ^osprofiler_secret - ^prometheus_alertmanager_password - ^qdrouterd_password - ^redis_master_password - ^memcache_secret_key - _ssh_key - - private_key - public_key - ^$ - rabbitmq - ^haproxy_password - -NOTE: The above deletes bifrost secrets. Might need a seed s-d as well - -3. Decrypt your passwords.yml file with ansible vault - -4. Delete all the passwords in the deletion list + .. code:: + + _keystone_password + _database_password + ^keystone_admin_password + ^memcache_secret_key + ^designate_rndc_key + ^docker_registry_password + ^keepalived_password + ^kibana_password + ^libvirt_sasl_password + ^metadata_secret + ^opensearch_dashboards_password + ^osprofiler_secret + ^prometheus_alertmanager_password + ^qdrouterd_password + ^redis_master_password + ^memcache_secret_key + _ssh_key + + private_key + public_key + ^$ + rabbitmq + ^haproxy_password + + +6. Decrypt your ``passwords.yml`` file with ``ansible-vault`` + +7. Delete all the passwords in the deletion list .. code:: bash grep -vf deletelist.txt passwords.yml > new-passwords.yml -5. Check the new file for basic formatting errors. If it looks correct, - replace the existing ``passwords.yml`` file with - ``new-passwords.yml`` +8. Check the new file for basic formatting errors. If it looks correct, + replace the existing ``passwords.yml`` file with ``new-passwords.yml`` .. code:: bash rm passwords.yml && mv new-passwords.yml passwords.yml -6. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for - hosts across the cloud. The playbook should exist under +9. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for hosts + across the cloud. The playbook should exist under ``kayobe-config/etc/kayobe/ansible/`` if not, merge the latest - ``stackhpc-kayobe-config``. + ``stackhpc-kayobe-config`` .. code:: bash kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml -7. Update the Pulp password +10. Update the Pulp password 1. Generate a new Pulp password @@ -150,7 +156,7 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well pwgen -s 25 1 - 2. Update secrets_pulp_password (usually found in secrets.yml) + 2. Update ``secrets_pulp_password`` (usually found in ``secrets.yml``) 3. Deploy changes @@ -158,11 +164,11 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well kayobe seed service deploy -t seed-deploy-containers -kt none - (note you may need to skip docker registry login since the - password will now be ‘incorrect’ e.g. ``-e`` + (note you may need to skip docker registry login since the password will + now be ‘incorrect’ e.g. ``-e`` ``deploy_containers_registry_attempt_login``) -8. Rotate ``horizon_secret_key``. +11. Rotate ``horizon_secret_key`` 1. Generate a new secret: @@ -170,20 +176,20 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well pwgen -s 40 1 - 2. Add it to the passwords file, along with the old secret, in this + 2. Add it to the ``passwords.yml`` file, along with the old secret, in this exact format (including quotes in the middle): .. code:: bash horizon_secret_key: newsecret' 'oldsecret - This will allow both the old and new secrets to be used at the - same time, resulting in no interruption to service. The key is - mainly used for generating login and password reset tokens. The - old secret can be deleted & redeployed at a later date once all - users have closed & reopened their sessions. + This will allow both the old and new secrets to be used at the same + time, resulting in no interruption to service. The key is mainly used + for generating login and password reset tokens. The old secret can be + deleted & redeployed at a later date once all users have closed & + reopened their sessions. -9. Update grafana_admin_password +12. Update ``grafana_admin_password`` 1. Generate a new Grafana Admin password @@ -191,7 +197,7 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well pwgen -s 40 1 - 2. Exec into the grafana container on a controller + 2. Exec into the Grafana container on a controller .. code:: bash @@ -203,9 +209,9 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well grafana-cli admin reset-admin-password --password-from-stdin - 4. Update the value of ``grafana_admin_password`` in passwords.yml + 4. Update the value of ``grafana_admin_password`` in ``passwords.yml`` -10. Update the MariaDB database password +13. Update the MariaDB database password 1. Generate a new secret: @@ -213,26 +219,26 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well pwgen -s 40 1 - 2. Exec into the mariadb container on a controller + 2. Exec into the MariaDB container on a controller .. code:: bash sudo docker exec -it mariadb bash - 3. Log in to the database. You will be prompted for the password. - Use the existing value of ``database_password`` + 3. Log in to the database. You will be prompted for the password. Use the + existing value of ``database_password`` .. code:: bash mysql -uroot -p - 4. Check the current state of the root user + 4. Check the current state of the ``root`` user .. code:: bash SELECT Host,User,Password FROM mysql.user WHERE User='root'; - 5. Update the password for the root user + 5. Update the password for the ``root`` user .. code:: bash @@ -244,113 +250,115 @@ NOTE: The above deletes bifrost secrets. Might need a seed s-d as well SELECT Host,User,Password FROM mysql.user WHERE User='root'; - 7. If there are any remaining root users with the old password - e.g. ``root@localhost``, change the password for them too. + 7. If there are any remaining root users with the old password e.g. + ``root@localhost``, change the password for them too - 8. update ``database_password`` in ``passwords.yml`` with your new - password. + 8. Update ``database_password`` in ``passwords.yml`` with your new + password -:::warning From this point onward, service may be disrupted -:: +.. _nova-change: + +14. Update the Nova Database password + .. warning:: - ::: + From this point onward, service may be disrupted -11. Update the nova DB password - 1. Create a new ``nova_database_password`` and store it in + #. Create a new ``nova_database_password`` and store it in ``passwords.yml`` .. code:: bash pwgen -s 40 1 - 2. Exec into the nova_conductor container + #. Exec into the ``nova_conductor`` container .. code:: bash sudo docker exec -it nova_conductor bash - 3. List the cells + #. List the cells .. code:: bash nova-manage cell_v2 list_cells --verbose - 4. Find the entry for cell0, copy the Database Connection value, - replace the password in the string with the new value, and update - it with the following command: + #. Find the entry for ``cell0``, copy the Database Connection value, + replace the password in the string with the new value, and update it + with the following command: .. code:: bash nova-manage cell_v2 update_cell --cell_uuid 00000000-0000-0000-0000-000000000000 --database_connection "CONNECTION WITH NEW PASSWORD HERE" --transport-url "none:///" (If the ``cell_uuid`` for cell0 is not - ``00000000-0000-0000-0000-000000000000``, change the above - command accordingly) + ``00000000-0000-0000-0000-000000000000``, change the above command + accordingly) -12. Re-encrypt your ``passwords.yml`` file -13. Delete the service users in keystone. The exact users will depend on - the deployment. Multinode example: +15. Re-encrypt your ``passwords.yml`` file -:::warning This will immediately cause an API outage -:: +.. _k-a-change: - Alt: Cherry pick this patch: +16. Delete the service users in Keystone. The exact users will depend on the + deployment. Multinode example: - ::: + .. note:: - ```bash - openstack user delete glance cinder placement nova neutron heat magnum magnum_trustee_domain_admin barbican designate - ``` + Alternatively, cherry-pick + `this patch `__ -14. Stop services + + .. code:: bash + + openstack user delete glance cinder placement nova neutron heat magnum magnum_trustee_domain_admin barbican designate + +17. Stop services using RabbitMQ .. code:: bash kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/stop-openstack-services.yml -15. Nuke RabbitMQ +18. Nuke RabbitMQ .. code:: bash kayobe overcloud host command run -l controllers --become --command "docker stop rabbitmq && docker rm rabbitmq && docker volume rm rabbitmq" -16. Reconfigure overcloud services to push changes +19. Reconfigure Overcloud services to apply changes -:::warning VMs should continue running, but connections to them will -likely be disrupted when neutron is redeployed -:: + .. warning:: + + VMs should continue running, but connections to them will briefly be + disrupted when Neutron is redeployed - ::: + .. code:: bash - ```bash - kayobe overcloud service deploy - ``` + kayobe overcloud service deploy -17. Flush the memcached data on all controllers (any old data will now - be inaccessible) - 1. Install telnet (on one of the controllers) +20. Flush the Memcached data on all controllers (any old data will now be + inaccessible) + + #. Install Telnet (on one of the controllers) .. code:: bash sudo apt -y install telnet - 2. Check the config for the ip and port used by memcache (on every + #. Check the config for the IP and port used by Memcached (on every controller) .. code:: bash sudo grep command /etc/kolla/memcached/config.json - The IP and port will be printed after ``-l`` and ``-p`` - respectively. + The IP and port will be printed after ``-l`` and ``-p`` respectively - 3. For each controller start a telnet session, clear all data, then + #. For each controller start a Telnet session, clear all data, then exit .. code:: bash @@ -359,41 +367,35 @@ likely be disrupted when neutron is redeployed flush_all quit -18. Manually update heat_domain_admin_password +21. Manually update ``heat_domain_admin_password`` - 1. TODO: Instructions + #. TODO: Instructions + This has not been tested yet -19. Re-run tempest to make sure everything has come back +22. Re-run Tempest to make sure everything has come back -20. Inform other users of the steps they’ll need to take now that the - secrets have been rotated: +23. Inform other users of the steps they’ll need to take now that the secrets + have been rotated: - 1. SSH keys have been rotated, so the new key will have to be - distributed if individual user accounts are used - 2. A PR with the new passwords will need to be merged in to the main - config branch (REMEMBER TO RE-ENCRYPT PASSWORDS.YML BEFORE - COMMITING) - 3. Any existing openrc files generated by Kolla Ansible will need to - be re-generated or edited to use the new Kolla admin password. + 1. SSH keys have been rotated, so the new key will have to be distributed + if individual user accounts are used -21. At some point in the future (approx 1 week), remove the old horizon - secret key from ``passwords.yml`` and reconfigure horizon. + 2. Any existing ``openrc`` files generated by Kolla Ansible will need to be + re-generated or edited to use the new Kolla admin password + +24. Create a PR to merge the new secrets into your main Kayobe configuration + branch + + .. warning:: + + Unless you **really** enjoyed this process, RE-ENCRYPT + ``passwords.yml`` BEFORE COMMITTING + +25. Approximately 1 week after deploying, remove the old horizon secret key + from ``passwords.yml`` and reconfigure horizon -Future improvements -------------------- -- ☐ Regenerate passwords that we think we aren’t using, in case they - are actually being used -- ☐ Add the new database_password to passwords.yml before changing it, - in case it gets lost -- ☐ Stop services before deleting users? (Except for keystone) or allow - setting update_password in kolla-ansible -- ☒ At the end of the procedure write down what others need to do to - use the new passwords -- ☐ Can we get kolla-ansible to generate all/more of the passwords and - reduce the reliance on pwgen? -- ☐ Add a step to remove the old horizon_secret_key and redeploy - horizon at some point after the rotation +.. _full-password-list: Full password list ------------------- @@ -434,7 +436,7 @@ Full password list ironic_keystone_password karbor_database_password karbor_keystone_password - keystone_database_password Check this one + keystone_database_password magnum_database_password manila_database_password mariadb_backup_database_password From 43477f74b3c782f5f9988cef18433a8317a4ec01 Mon Sep 17 00:00:00 2001 From: Alex-Welsh Date: Mon, 11 Dec 2023 16:30:55 +0000 Subject: [PATCH 3/4] Fix secret rotation docs tox errors --- doc/source/operations/secret-rotation.rst | 20 ++++++++++---------- tox.ini | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/doc/source/operations/secret-rotation.rst b/doc/source/operations/secret-rotation.rst index 63eef3f35..3e80e7cab 100644 --- a/doc/source/operations/secret-rotation.rst +++ b/doc/source/operations/secret-rotation.rst @@ -36,28 +36,28 @@ this guide. Installation: As of writing, there are three upstream patches in the works to make this -process easier. +process easier. #. A change to Kolla, to automate :ref:`this` step to change the extended start for the ``nova-api`` container. The upstream patch can be found `here - `__. + `__. This was previously mitigated with a change to the StackHPC fork of Kolla-Ansible, which has since been reverted due to an unforeseen issue. #. A change to Nova, to automate :ref:`this` step to change the nova cell0 database connection string. - + The upstream patch can be found `here - `__. + `__. #. A change to Kolla-Ansible, to automate :ref:`this` step to update service keystone user passwords. - + The upstream patch can be found `here - `__. + `__. Full method @@ -83,7 +83,7 @@ Full method .. code:: command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' - + This change will break new deployments and should be reverted once this process is complete @@ -162,7 +162,7 @@ Full method .. code:: bash - kayobe seed service deploy -t seed-deploy-containers -kt none + kayobe seed service deploy -t seed-deploy-containers -kt none (note you may need to skip docker registry login since the password will now be ‘incorrect’ e.g. ``-e`` @@ -307,10 +307,10 @@ Full method .. note:: - Alternatively, cherry-pick + Alternatively, cherry-pick `this patch `__ - + .. code:: bash openstack user delete glance cinder placement nova neutron heat magnum magnum_trustee_domain_admin barbican designate diff --git a/tox.ini b/tox.ini index f79ac9701..c6b949efe 100644 --- a/tox.ini +++ b/tox.ini @@ -13,8 +13,8 @@ deps = commands = yamllint etc/kayobe reno lint - doc8 README.rst doc/source --ignore D001 - + # secret-rotation must be skipped because it includes purposeful whitespace + doc8 README.rst doc/source --ignore D001 --ignore-path doc/source/operations/secret-rotation.rst # StackHPC Kayobe configuration release notes: [testenv:releasenotes] allowlist_externals = rm From adaafa4ab59624f09380bf636c3bb60cc2eaf012 Mon Sep 17 00:00:00 2001 From: Alex-Welsh Date: Tue, 12 Dec 2023 11:51:59 +0000 Subject: [PATCH 4/4] Secret rotation docs post-review changes --- doc/source/operations/secret-rotation.rst | 186 +++++++++++----------- tox.ini | 2 +- 2 files changed, 96 insertions(+), 92 deletions(-) diff --git a/doc/source/operations/secret-rotation.rst b/doc/source/operations/secret-rotation.rst index 3e80e7cab..7912530fb 100644 --- a/doc/source/operations/secret-rotation.rst +++ b/doc/source/operations/secret-rotation.rst @@ -16,7 +16,7 @@ be automatically regenerated with a ``kayobe overcloud service deploy``. Some secrets require manual input from the operator to change. Following this process, there may be a few seconds of network downtime for -running VMs when Neutron is reconfigured. +running VMs when Neutron is reconfigured when using ML2/OVS. There will be API downtime for all services. The main reason for the outage is that RabbitMQ must be completely stopped to change the secrets it uses. The @@ -45,7 +45,9 @@ process easier. `__. This was previously mitigated with a change to the StackHPC fork of - Kolla-Ansible, which has since been reverted due to an unforeseen issue. + Kolla-Ansible, which has since been reverted due to an unforeseen issue. See + `here ` for more + details. #. A change to Nova, to automate :ref:`this` step to change the nova cell0 database connection string. @@ -71,30 +73,41 @@ Full method 1. Run a Tempest ``refstack`` & check Kibana/OpenSearch Dashboards to check the state of the cloud before any changes are made +2. Edit your Kolla-Ansible checkout to include changes not yet included + upstream. + .. _kolla-change: -2. Edit your Kolla-Ansible checkout to include this line within the - ``kolla_docker`` dict in ``ansible/roles/nova/tasks/bootstrap_service.yml`` See - `here - `__ - for an example. (If you are using the latest ``stackhpc/yoga`` branch of - Kolla-Ansible this should already be set) + 1. Add this line within the ``kolla_docker`` dict in + ``ansible/roles/nova/tasks/bootstrap_service.yml`` See `here + `__ + for an example. - .. code:: + .. code:: + + command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' - command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' + This change will break new deployments and should be reverted once this + process is complete + +.. _k-a-change: - This change will break new deployments and should be reverted once this - process is complete + 2. Cherry-pick `this patch + `__ -3. Re-install Kolla-Ansible from source in your Kolla-Ansible Python - environment + .. code:: bash -4. Navigate to the directory containing your ``passwords.yml`` file + git fetch https://review.opendev.org/openstack/kolla-ansible refs/changes/78/903178/2 && git cherry-pick FETCH_HEAD + + 3. Re-install Kolla-Ansible from source in your Kolla-Ansible Python + environment + + +3. Navigate to the directory containing your ``passwords.yml`` file (``kayobe-config/etc/kolla/passwords.yml`` OR ``kayobe-config/etc/kayobe/environments/envname/kolla/passwords.yml``) -5. Create a file called ``deletelist.txt`` and populate it with this content +4. Create a file called ``deletelist.txt`` and populate it with this content (including all whitespace): .. code:: @@ -124,31 +137,42 @@ Full method ^haproxy_password -6. Decrypt your ``passwords.yml`` file with ``ansible-vault`` +5. Decrypt your ``passwords.yml`` file with ``ansible-vault`` -7. Delete all the passwords in the deletion list +6. Delete all the passwords in the deletion list .. code:: bash grep -vf deletelist.txt passwords.yml > new-passwords.yml -8. Check the new file for basic formatting errors. If it looks correct, +7. Check the new file for basic formatting errors. If it looks correct, replace the existing ``passwords.yml`` file with ``new-passwords.yml`` .. code:: bash rm passwords.yml && mv new-passwords.yml passwords.yml -9. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for hosts +8. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for hosts across the cloud. The playbook should exist under ``kayobe-config/etc/kayobe/ansible/`` if not, merge the latest ``stackhpc-kayobe-config`` - .. code:: bash + 1. Run the playbook to generate a new keypair and add it to the authorised + keys of your hosts. - kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml + .. code:: bash -10. Update the Pulp password + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml + + 2. Ensure you can SSH to other nodes using the new keypair + + 3. Re-run the playbook with arguments to remove the old keypair. + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml -t remove-key -e rekey_remove_existing_key=true + +9. Update the Pulp password 1. Generate a new Pulp password @@ -164,11 +188,10 @@ Full method kayobe seed service deploy -t seed-deploy-containers -kt none - (note you may need to skip docker registry login since the password will - now be ‘incorrect’ e.g. ``-e`` - ``deploy_containers_registry_attempt_login``) + (note you will need to skip Docker registry login since the password will + now be ‘incorrect’ e.g. ``-e deploy_containers_registry_attempt_login=false``) -11. Rotate ``horizon_secret_key`` +10. Rotate ``horizon_secret_key`` 1. Generate a new secret: @@ -189,7 +212,7 @@ Full method deleted & redeployed at a later date once all users have closed & reopened their sessions. -12. Update ``grafana_admin_password`` +11. Update ``grafana_admin_password`` 1. Generate a new Grafana Admin password @@ -197,21 +220,21 @@ Full method pwgen -s 40 1 - 2. Exec into the Grafana container on a controller + 2. Update the value of ``grafana_admin_password`` in ``passwords.yml`` + + 3. Exec into the Grafana container on a controller .. code:: bash sudo docker exec -it grafana bash - 3. Run the password reset command, then enter the new password + 4. Run the password reset command, then enter the new password .. code:: bash grafana-cli admin reset-admin-password --password-from-stdin - 4. Update the value of ``grafana_admin_password`` in ``passwords.yml`` - -13. Update the MariaDB database password +12. Update the MariaDB database password 1. Generate a new secret: @@ -219,52 +242,51 @@ Full method pwgen -s 40 1 - 2. Exec into the MariaDB container on a controller + 2. Update ``database_password`` in ``passwords.yml`` with your new + password. Make a note of the old password. + + 3. Exec into the MariaDB container on a controller .. code:: bash sudo docker exec -it mariadb bash - 3. Log in to the database. You will be prompted for the password. Use the - existing value of ``database_password`` + 4. Log in to the database. You will be prompted for the password. Use the + old value of ``database_password`` .. code:: bash mysql -uroot -p - 4. Check the current state of the ``root`` user + 5. Check the current state of the ``root`` user .. code:: bash SELECT Host,User,Password FROM mysql.user WHERE User='root'; - 5. Update the password for the ``root`` user + 6. Update the password for the ``root`` user .. code:: bash SET PASSWORD FOR 'root'@'%' = PASSWORD('newpassword'); - 6. Check that the password hash has changed in the user list + 7. Check that the password hash has changed in the user list .. code:: bash SELECT Host,User,Password FROM mysql.user WHERE User='root'; - 7. If there are any remaining root users with the old password e.g. + 8. If there are any remaining root users with the old password e.g. ``root@localhost``, change the password for them too - 8. Update ``database_password`` in ``passwords.yml`` with your new - password - - .. _nova-change: -14. Update the Nova Database password +13. Update the Nova Database password + .. warning:: From this point onward, service may be disrupted - #. Create a new ``nova_database_password`` and store it in ``passwords.yml`` @@ -296,51 +318,15 @@ Full method ``00000000-0000-0000-0000-000000000000``, change the above command accordingly) +14. Re-encrypt your ``passwords.yml`` file -15. Re-encrypt your ``passwords.yml`` file - - -.. _k-a-change: - -16. Delete the service users in Keystone. The exact users will depend on the - deployment. Multinode example: - - .. note:: - - Alternatively, cherry-pick - `this patch `__ - - - .. code:: bash - - openstack user delete glance cinder placement nova neutron heat magnum magnum_trustee_domain_admin barbican designate - -17. Stop services using RabbitMQ +15. Stop all OpenStack services .. code:: bash kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/stop-openstack-services.yml -18. Nuke RabbitMQ - - .. code:: bash - - kayobe overcloud host command run -l controllers --become --command "docker stop rabbitmq && docker rm rabbitmq && docker volume rm rabbitmq" - -19. Reconfigure Overcloud services to apply changes - - - .. warning:: - - VMs should continue running, but connections to them will briefly be - disrupted when Neutron is redeployed - - .. code:: bash - - kayobe overcloud service deploy - - -20. Flush the Memcached data on all controllers (any old data will now be +16. Flush the Memcached data on all controllers (any old data will now be inaccessible) #. Install Telnet (on one of the controllers) @@ -367,23 +353,40 @@ Full method flush_all quit -21. Manually update ``heat_domain_admin_password`` +17. Nuke RabbitMQ + + .. code:: bash + + kayobe overcloud host command run -l controllers --become --command "docker stop rabbitmq && docker rm rabbitmq && docker volume rm rabbitmq" + +19. Reconfigure Overcloud services to apply changes + + .. warning:: + + VMs should continue running, but connections to them will briefly be + disrupted when Neutron is redeployed when using ML2/OVS + + .. code:: bash + + kayobe overcloud service deploy + +20. Manually update ``heat_domain_admin_password`` #. TODO: Instructions This has not been tested yet -22. Re-run Tempest to make sure everything has come back +21. Re-run Tempest to make sure everything has come back -23. Inform other users of the steps they’ll need to take now that the secrets +22. Inform other users of the steps they’ll need to take now that the secrets have been rotated: 1. SSH keys have been rotated, so the new key will have to be distributed if individual user accounts are used 2. Any existing ``openrc`` files generated by Kolla Ansible will need to be - re-generated or edited to use the new Kolla admin password + re-generated or edited to use the new Keystone admin password -24. Create a PR to merge the new secrets into your main Kayobe configuration +23. Create a PR to merge the new secrets into your main Kayobe configuration branch .. warning:: @@ -391,7 +394,7 @@ Full method Unless you **really** enjoyed this process, RE-ENCRYPT ``passwords.yml`` BEFORE COMMITTING -25. Approximately 1 week after deploying, remove the old horizon secret key +24. Approximately 1 week after deploying, remove the old horizon secret key from ``passwords.yml`` and reconfigure horizon @@ -505,6 +508,7 @@ Full password list docker_registry_password secrets_pulp_password redis_master_password + haproxy_password keystone_ssh_key private_key public_key diff --git a/tox.ini b/tox.ini index c6b949efe..e7f0d1d09 100644 --- a/tox.ini +++ b/tox.ini @@ -14,7 +14,7 @@ commands = yamllint etc/kayobe reno lint # secret-rotation must be skipped because it includes purposeful whitespace - doc8 README.rst doc/source --ignore D001 --ignore-path doc/source/operations/secret-rotation.rst + doc8 README.rst doc/source --ignore D001 --ignore-path-errors doc/source/operations/secret-rotation.rst;D002 # StackHPC Kayobe configuration release notes: [testenv:releasenotes] allowlist_externals = rm