This repository has been archived by the owner on Feb 29, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
As discussed during the last PTG these are the specs for the tripleo-ha-utils project. Change-Id: I2e51bfe2f6d76d2ad674e23c5e05313eb47ecef0
- Loading branch information
Raoul Scarazzini
committed
Mar 5, 2018
1 parent
d0537d9
commit a021956
Showing
1 changed file
with
143 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
.. | ||
This work is licensed under a Creative Commons Attribution 3.0 Unported | ||
License. | ||
http://creativecommons.org/licenses/by/3.0/legalcode | ||
|
||
============================================= | ||
TripleO tools for testing HA deployments | ||
============================================= | ||
|
||
We need a way to verify a Highly Available TripleO deployment with proper tests | ||
that check if the HA bits are behaving correctly. | ||
|
||
Problem Description | ||
=================== | ||
|
||
Currently, we test HA behavior of TripleO deployments only by deploying | ||
environments with three controllers and see if we're able to spawn an instance, | ||
but this is not enough. | ||
|
||
There should be a way to verify the HA capabilities of deployments, and if the | ||
behavior of the environment is still correct after inducted failures, | ||
simulated outages and so on. | ||
|
||
This tool should be a standalone component to be included by the user if | ||
necessary, without breaking any of the dynamics present in TripleO. | ||
|
||
Proposed Change | ||
=============== | ||
|
||
Overview | ||
-------- | ||
|
||
The proposal is to create an Ansible based project named tripleo-ha-utils that | ||
will be consumable by the various tools that we use to deploy TripleO | ||
environments like tripleo-quickstart or infrared or by manual deployments. | ||
|
||
The project will initially cover three principal roles: | ||
|
||
* **stonith-config**: a playbook used to automate the creation of fencing | ||
devices in the overcloud; | ||
* **instance-ha**: a playbook that automates the seventeen manual steps needed | ||
to configure instance HA in the overcloud, test them via rally and verify | ||
that instance HA works appropriately; | ||
* **validate-ha**: a playbook that runs a series of disruptive actions in the | ||
overcloud and verifies it always behaves correctly by deploying a | ||
heat-template that involves all the overcloud components; | ||
|
||
Today the project exists outside the TripleO umbrella, and it is named | ||
tripleo-quickstart-utils [1] (see "Alternatives" for the historical reasons of | ||
this name). It is used internally inside promotion pipelines, and has | ||
also been tested with success in RDOCloud. | ||
|
||
Pluggable implementation | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The base principle of the project is to give people the ability to integrate | ||
the first roles with whatever kind of test. For example, today we're using | ||
a simple bash framework to interact with the cluster (so pcs commands and | ||
other interactions), rally to test instance-ha and Ansible itself to simulate | ||
full power outage scenarios. | ||
The idea is to keep this pluggable approach leaving the final user the choice | ||
about what to use. | ||
|
||
Retro compatibility | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
One of the aims of this project is to be retro-compatible with the previous | ||
version of OpenStack. Starting from Liberty, we cover instance-ha and | ||
stonith-config Ansible playbooks for all the releases. | ||
The same happens while testing HA since all the tests are plugged in depending | ||
on the release. | ||
|
||
Alternatives | ||
------------ | ||
|
||
While evaluating alternatives, the first thing to consider is that this | ||
project aims to be a TripleO-centric set of tools for HA, not a generic | ||
OpenStack's one. | ||
We want tools to help the user answer questions like "Is the Galera bundle | ||
cluster resource able to tolerate a stop and a consecutive start without | ||
affecting the environment capabilities?" or "Is the environment able to | ||
evacuate instances after being configured for Instance HA?". And the answer we | ||
want is YES or NO. | ||
|
||
* *tripleo-validations*: the most logical place to put this, at least | ||
looking at the name, would be tripleo-validations. By talking with folks | ||
working on it, it came out that the meaning of tripleo-validations project is | ||
not doing disruptive tests. Integrating this stuff would be out of scope. | ||
|
||
* *tripleo-quickstart-extras*: apart from the fact that this is not | ||
something meant just for quickstart (the project supports infrared and | ||
"plain" environments as well) even if we initially started there, in the | ||
end, it came out that nobody was looking at the patches since nobody was | ||
able to verify them. The result was a series of reviews stuck forever. | ||
So moving back to extras would be a step backward. | ||
|
||
Other End User Impact | ||
--------------------- | ||
|
||
None. The good thing about this solution is that there's no impact for anyone | ||
unless the solution gets loaded inside an existing project. Since this will be | ||
an external project, it will not impact anything of the current stuff. | ||
|
||
Performance Impact | ||
------------------ | ||
|
||
None. Unless the deployments, the CI runs or whatever include the roles there | ||
will be no impact, and so the performances will not change. | ||
|
||
Implementation | ||
============== | ||
|
||
Primary assignees: | ||
|
||
* rscarazz | ||
|
||
Work Items | ||
---------- | ||
|
||
* Import the tripleo-quickstart-utils [1] as a new repository and start new | ||
deployments from there. | ||
|
||
Testing | ||
======= | ||
|
||
Due to the disruptive nature of these tests, the TripleO CI should not be | ||
updated to include these tests, mostly because of timing issues. | ||
This project should remain optionally usable by people when needed, or in | ||
specific CI environments meant to support longer than usual jobs. | ||
|
||
Documentation Impact | ||
==================== | ||
|
||
All the implemented roles are today fully documented in the | ||
tripleo-quickstart-utils [1] project, so importing its repository as is will | ||
also give its full documentation. | ||
|
||
References | ||
========== | ||
|
||
[1] Original project to import as new | ||
https://github.com/redhat-openstack/tripleo-quickstart-utils |