You already draw the diagram — components, dependencies, what "healthy" means. mgtt reads it as code. One YAML model drives two modes:
- At design time,
mgtt simulateinjects synthetic failures and asserts the engine reaches the right conclusion. Runs in CI on every PR. Catches architectural drift before the diagram lies to you. - At 3am,
mgtt diagnoseruns the same engine against the live system. Real probes replace the synthetic facts. It names the broken component, eliminates the healthy ones, and hands you the chain.
Same model. Same reasoning. Two fixture sources.
$ mgtt diagnose --suspect api
▶ probe nginx upstream_count ✗ unhealthy
▶ probe api ready_replicas ✗ unhealthy
▶ probe rds available ✓ healthy ← eliminated
▶ probe frontend ready_replicas ✓ healthy ← eliminated
Root cause: api.degraded
Chain: nginx ← api
Probes run: 4
The engine picks probes by information value, so every call rules out a branch. You didn't need to know the system — the model knew it for you. Partial visibility (RBAC refusals, transient throttles) surfaces as a flag, not an abort.
curl -sSL https://raw.githubusercontent.com/mgt-tool/mgtt/main/install.sh | shOr: go install github.com/mgt-tool/mgtt/cmd/mgtt@latest
mgtt init # scaffold system.model.yaml
mgtt model validate # check the model
mgtt simulate --all # run scenarios (in CI)
mgtt diagnose --suspect api # troubleshoot a live system- Quick Start — end-to-end in five minutes
- Blue/green storefront worked example — 20-component real system, five scenarios, lessons from real use
- How It Works — the constraint engine
- Docs site — reference, providers, specs
TLA+ checks your design; mgtt checks your running system.
Dual-licensed: engine + CLI under AGPL-3.0; provider SDK under Apache-2.0.
