Home

Jump to bottom

Mitch Kelley edited this page May 8, 2019 · 2 revisions

Decisions

Pod dependencies

Supergloo
Prometheus
~~Prometheus Alert Manager~~

Experiment consists of

[note] Design for cron job (for now) - no fancy timing/lifecycle
Set of faults
Stop condition
Timeout

CRDs

Experiment

TODO

Watch prometheus for stop condition

Reconciler: Terminate experiment when fail trigger condition met Terminate experiment when time out expires Create Faults from experiment

Init: Sort by state

if running, do reconcile loop
- create routing rules
- write the start time to the exp
- start timeout tick
running -> monitor (nothing)
completed -> cleanup res

Watch for failure conditions:

sets completed state
running -> failed

Load tester

various modes:
- database cache overloaded
- DDOS
- regular traffic

CLI features:

watch flag for getting status of running experiment

Prometheus in Helm

Future consideration: exclude stale resources from reconciler (to avoid having to reconcile 10MM dead resources)

Clone this wiki locally