diff --git a/doc/source/operations/index.rst b/doc/source/operations/index.rst index 1327b8c2e..284795d6a 100644 --- a/doc/source/operations/index.rst +++ b/doc/source/operations/index.rst @@ -11,3 +11,4 @@ This guide is for operators of the StackHPC Kayobe configuration project. octavia hotfix-playbook rocky-linux-9 + secret-rotation diff --git a/doc/source/operations/secret-rotation.rst b/doc/source/operations/secret-rotation.rst new file mode 100644 index 000000000..7912530fb --- /dev/null +++ b/doc/source/operations/secret-rotation.rst @@ -0,0 +1,527 @@ +=============== +Secret Rotation +=============== + +General notes +============= + +This guide covers secret rotation in Kayobe and Kolla-Ansible for most services +in a standard deployment. It does not cover every secret. A full list of +passwords that have been successfully rotated is available at the bottom of +this page (:ref:`link`). + +Many of the secrets can simply be deleted from your ``passwords.yml`` and will +be automatically regenerated with a ``kayobe overcloud service deploy``. + +Some secrets require manual input from the operator to change. + +Following this process, there may be a few seconds of network downtime for +running VMs when Neutron is reconfigured when using ML2/OVS. + +There will be API downtime for all services. The main reason for the outage is +that RabbitMQ must be completely stopped to change the secrets it uses. The +services must all be reconfigured to use the new RabbitMQ cluster. Each service +will come back once it has been reconfigured. The outage time for each service +is therefore equal to the time between starting a ``kayobe overcloud service +deploy``, and that service being reconfigured. + +Some secrets currently have to be regenerated by hand. Make sure you use a +reliable tool and match the formatting (length, character set etc) of the +existing secret. ``pwgen`` is recommended and used as an example throughout +this guide. Installation: + +.. code:: bash + + sudo apt/dnf install pwgen + + +As of writing, there are three upstream patches in the works to make this +process easier. + +#. A change to Kolla, to automate :ref:`this` step to change the + extended start for the ``nova-api`` container. + + The upstream patch can be found `here + `__. + + This was previously mitigated with a change to the StackHPC fork of + Kolla-Ansible, which has since been reverted due to an unforeseen issue. See + `here ` for more + details. + +#. A change to Nova, to automate :ref:`this` step to change the + nova cell0 database connection string. + + The upstream patch can be found `here + `__. + +#. A change to Kolla-Ansible, to automate :ref:`this` step to + update service keystone user passwords. + + The upstream patch can be found `here + `__. + + +Full method +=========== + +.. warning:: + + You **must** back up your ``passwords.yml`` before making changes. You will + need to refer back to it later + +1. Run a Tempest ``refstack`` & check Kibana/OpenSearch Dashboards to check + the state of the cloud before any changes are made + +2. Edit your Kolla-Ansible checkout to include changes not yet included + upstream. + +.. _kolla-change: + + 1. Add this line within the ``kolla_docker`` dict in + ``ansible/roles/nova/tasks/bootstrap_service.yml`` See `here + `__ + for an example. + + .. code:: + + command: bash -c 'sudo -E kolla_set_configs && nova-manage api_db sync && nova-manage db sync --local_cell' + + This change will break new deployments and should be reverted once this + process is complete + +.. _k-a-change: + + 2. Cherry-pick `this patch + `__ + + .. code:: bash + + git fetch https://review.opendev.org/openstack/kolla-ansible refs/changes/78/903178/2 && git cherry-pick FETCH_HEAD + + 3. Re-install Kolla-Ansible from source in your Kolla-Ansible Python + environment + + +3. Navigate to the directory containing your ``passwords.yml`` file + (``kayobe-config/etc/kolla/passwords.yml`` OR + ``kayobe-config/etc/kayobe/environments/envname/kolla/passwords.yml``) + +4. Create a file called ``deletelist.txt`` and populate it with this content + (including all whitespace): + + .. code:: + + _keystone_password + _database_password + ^keystone_admin_password + ^memcache_secret_key + ^designate_rndc_key + ^docker_registry_password + ^keepalived_password + ^kibana_password + ^libvirt_sasl_password + ^metadata_secret + ^opensearch_dashboards_password + ^osprofiler_secret + ^prometheus_alertmanager_password + ^qdrouterd_password + ^redis_master_password + ^memcache_secret_key + _ssh_key + + private_key + public_key + ^$ + rabbitmq + ^haproxy_password + + +5. Decrypt your ``passwords.yml`` file with ``ansible-vault`` + +6. Delete all the passwords in the deletion list + + .. code:: bash + + grep -vf deletelist.txt passwords.yml > new-passwords.yml + +7. Check the new file for basic formatting errors. If it looks correct, + replace the existing ``passwords.yml`` file with ``new-passwords.yml`` + + .. code:: bash + + rm passwords.yml && mv new-passwords.yml passwords.yml + +8. Use the ``rekey-hosts.yml`` playbook to rotate your SSH keys for hosts + across the cloud. The playbook should exist under + ``kayobe-config/etc/kayobe/ansible/`` if not, merge the latest + ``stackhpc-kayobe-config`` + + 1. Run the playbook to generate a new keypair and add it to the authorised + keys of your hosts. + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml + + 2. Ensure you can SSH to other nodes using the new keypair + + 3. Re-run the playbook with arguments to remove the old keypair. + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/rekey-hosts.yml -t remove-key -e rekey_remove_existing_key=true + +9. Update the Pulp password + + 1. Generate a new Pulp password + + .. code:: bash + + pwgen -s 25 1 + + 2. Update ``secrets_pulp_password`` (usually found in ``secrets.yml``) + + 3. Deploy changes + + .. code:: bash + + kayobe seed service deploy -t seed-deploy-containers -kt none + + (note you will need to skip Docker registry login since the password will + now be ‘incorrect’ e.g. ``-e deploy_containers_registry_attempt_login=false``) + +10. Rotate ``horizon_secret_key`` + + 1. Generate a new secret: + + .. code:: bash + + pwgen -s 40 1 + + 2. Add it to the ``passwords.yml`` file, along with the old secret, in this + exact format (including quotes in the middle): + + .. code:: bash + + horizon_secret_key: newsecret' 'oldsecret + + This will allow both the old and new secrets to be used at the same + time, resulting in no interruption to service. The key is mainly used + for generating login and password reset tokens. The old secret can be + deleted & redeployed at a later date once all users have closed & + reopened their sessions. + +11. Update ``grafana_admin_password`` + + 1. Generate a new Grafana Admin password + + .. code:: bash + + pwgen -s 40 1 + + 2. Update the value of ``grafana_admin_password`` in ``passwords.yml`` + + 3. Exec into the Grafana container on a controller + + .. code:: bash + + sudo docker exec -it grafana bash + + 4. Run the password reset command, then enter the new password + + .. code:: bash + + grafana-cli admin reset-admin-password --password-from-stdin + +12. Update the MariaDB database password + + 1. Generate a new secret: + + .. code:: bash + + pwgen -s 40 1 + + 2. Update ``database_password`` in ``passwords.yml`` with your new + password. Make a note of the old password. + + 3. Exec into the MariaDB container on a controller + + .. code:: bash + + sudo docker exec -it mariadb bash + + 4. Log in to the database. You will be prompted for the password. Use the + old value of ``database_password`` + + .. code:: bash + + mysql -uroot -p + + 5. Check the current state of the ``root`` user + + .. code:: bash + + SELECT Host,User,Password FROM mysql.user WHERE User='root'; + + 6. Update the password for the ``root`` user + + .. code:: bash + + SET PASSWORD FOR 'root'@'%' = PASSWORD('newpassword'); + + 7. Check that the password hash has changed in the user list + + .. code:: bash + + SELECT Host,User,Password FROM mysql.user WHERE User='root'; + + 8. If there are any remaining root users with the old password e.g. + ``root@localhost``, change the password for them too + +.. _nova-change: + +13. Update the Nova Database password + + .. warning:: + + From this point onward, service may be disrupted + + #. Create a new ``nova_database_password`` and store it in + ``passwords.yml`` + + .. code:: bash + + pwgen -s 40 1 + + #. Exec into the ``nova_conductor`` container + + .. code:: bash + + sudo docker exec -it nova_conductor bash + + #. List the cells + + .. code:: bash + + nova-manage cell_v2 list_cells --verbose + + #. Find the entry for ``cell0``, copy the Database Connection value, + replace the password in the string with the new value, and update it + with the following command: + + .. code:: bash + + nova-manage cell_v2 update_cell --cell_uuid 00000000-0000-0000-0000-000000000000 --database_connection "CONNECTION WITH NEW PASSWORD HERE" --transport-url "none:///" + + (If the ``cell_uuid`` for cell0 is not + ``00000000-0000-0000-0000-000000000000``, change the above command + accordingly) + +14. Re-encrypt your ``passwords.yml`` file + +15. Stop all OpenStack services + + .. code:: bash + + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/stop-openstack-services.yml + +16. Flush the Memcached data on all controllers (any old data will now be + inaccessible) + + #. Install Telnet (on one of the controllers) + + .. code:: bash + + sudo apt -y install telnet + + #. Check the config for the IP and port used by Memcached (on every + controller) + + .. code:: bash + + sudo grep command /etc/kolla/memcached/config.json + + The IP and port will be printed after ``-l`` and ``-p`` respectively + + #. For each controller start a Telnet session, clear all data, then + exit + + .. code:: bash + + telnet + flush_all + quit + +17. Nuke RabbitMQ + + .. code:: bash + + kayobe overcloud host command run -l controllers --become --command "docker stop rabbitmq && docker rm rabbitmq && docker volume rm rabbitmq" + +19. Reconfigure Overcloud services to apply changes + + .. warning:: + + VMs should continue running, but connections to them will briefly be + disrupted when Neutron is redeployed when using ML2/OVS + + .. code:: bash + + kayobe overcloud service deploy + +20. Manually update ``heat_domain_admin_password`` + + #. TODO: Instructions + This has not been tested yet + +21. Re-run Tempest to make sure everything has come back + +22. Inform other users of the steps they’ll need to take now that the secrets + have been rotated: + + 1. SSH keys have been rotated, so the new key will have to be distributed + if individual user accounts are used + + 2. Any existing ``openrc`` files generated by Kolla Ansible will need to be + re-generated or edited to use the new Keystone admin password + +23. Create a PR to merge the new secrets into your main Kayobe configuration + branch + + .. warning:: + + Unless you **really** enjoyed this process, RE-ENCRYPT + ``passwords.yml`` BEFORE COMMITTING + +24. Approximately 1 week after deploying, remove the old horizon secret key + from ``passwords.yml`` and reconfigure horizon + + +.. _full-password-list: + +Full password list +------------------- + +:: + + aodh_database_password + aodh_keystone_password + blazar_database_password + blazar_keystone_password + caso_keystone_password + ceilometer_database_password + ceilometer_keystone_password + cinder_database_password + cinder_keystone_password + barbican_database_password + barbican_keystone_password + cloudkitty_database_password + cloudkitty_keystone_password + congress_database_password + congress_keystone_password + cyborg_database_password + cyborg_keystone_password + designate_database_password + designate_keystone_password + freezer_database_password + freezer_keystone_password + glance_database_password + glance_keystone_password + gnocchi_database_password + gnocchi_keystone_password + heat_database_password + heat_keystone_password + horizon_database_password + ironic_database_password + ironic_inspector_database_password + ironic_inspector_keystone_password + ironic_keystone_password + karbor_database_password + karbor_keystone_password + keystone_database_password + magnum_database_password + manila_database_password + mariadb_backup_database_password + masakari_database_password + mistral_database_password + monasca_database_password + murano_database_password + neutron_database_password + nova_api_database_password + nova_database_password + octavia_database_password + panko_database_password + placement_database_password + prometheus_mysql_exporter_database_password + qinling_database_password + rally_database_password + sahara_database_password + senlin_database_password + solum_database_password + tacker_database_password + trove_database_password + vitrage_database_password + watcher_database_password + zun_database_password + keystone_admin_password + kuryr_keystone_password + magnum_keystone_password + manila_keystone_password + masakari_keystone_password + mistral_keystone_password + monasca_keystone_password + murano_keystone_password + neutron_keystone_password + nova_keystone_password + octavia_keystone_password + panko_keystone_password + rabbitmq_cluster_cookie + rabbitmq_monitoring_password + rabbitmq_password + database_password + heat_domain_admin_password + horizon_secret_key + placement_keystone_password + qinling_keystone_password + sahara_keystone_password + searchlight_keystone_password + senlin_keystone_password + solum_keystone_password + swift_keystone_password + tacker_keystone_password + trove_keystone_password + vitrage_keystone_password + watcher_keystone_password + zun_keystone_password + ceph_rgw_keystone_password + designate_rndc_key + keepalived_password + kibana_password + libvirt_sasl_password + metadata_secret + opensearch_dashboards_password + osprofiler_secret + prometheus_alertmanager_password + qdrouterd_password + grafana_admin_password + docker_registry_password + secrets_pulp_password + redis_master_password + haproxy_password + keystone_ssh_key + private_key + public_key + neutron_ssh_key + private_key + public_key + nova_ssh_key + private_key + public_key + octavia_amp_ssh_key + private_key + public_key + bifrost_ssh_key + private_key + public_key + diff --git a/tox.ini b/tox.ini index f79ac9701..e7f0d1d09 100644 --- a/tox.ini +++ b/tox.ini @@ -13,8 +13,8 @@ deps = commands = yamllint etc/kayobe reno lint - doc8 README.rst doc/source --ignore D001 - + # secret-rotation must be skipped because it includes purposeful whitespace + doc8 README.rst doc/source --ignore D001 --ignore-path-errors doc/source/operations/secret-rotation.rst;D002 # StackHPC Kayobe configuration release notes: [testenv:releasenotes] allowlist_externals = rm