SwiftDeploy is a declarative deployment lifecycle tool for containerized services. It uses a single manifest.yaml as the source of truth, generates runtime configuration from templates, manages Docker Compose lifecycle operations, enforces environment policy through Open Policy Agent, and supports stable/canary promotion with observable, auditable control flow.
The project demonstrates how a DevOps tool can turn a simple deployment manifest into a working stack made up of an application service, Nginx reverse proxy, OPA policy sidecar, health checks, Prometheus metrics, structured logs, and controlled rollout workflows - where no deployment or promotion can proceed unless policy explicitly permits it.
- Overview
- Project Requirements Covered
- Architecture
- Repository Structure
- Manifest Design
- Application Service
- CLI Subcommands
- Generated Configuration
- Policy Engine
- Observability
- Audit Trail
- Security and Runtime Hardening
- Nginx Behaviour
- Docker and Docker Compose Behaviour
- Prerequisites
- Local Setup
- Usage Walkthrough
- Chaos Testing
- Validation Checks
- Evidence and Screenshots
- Troubleshooting
- Design Decisions
- Cleanup
Most deployment tasks require manually writing Docker Compose files, Nginx configuration, and deployment scripts. SwiftDeploy reverses that workflow.
Instead of hand-writing runtime configuration, the user edits only:
manifest.yaml
The swiftdeploy CLI then derives everything else:
docker-compose.yml
nginx.conf
Those generated files are intentionally treated as disposable artifacts. They can be deleted and recreated at any time by running:
python ./swiftdeploy initThis ensures the manifest remains the single source of truth.
Beyond generation, SwiftDeploy acts as a policy-gated control plane. Before any deployment or promotion executes, the CLI queries Open Policy Agent and will refuse to proceed if the environment does not meet the defined safety standards. Every decision, every mode change, and every policy violation is recorded in an append-only audit trail.
SwiftDeploy satisfies the Stage 4A and Stage 4B task requirements as follows:
| Requirement | Implementation |
|---|---|
| Declarative manifest | manifest.yaml defines service, Nginx, OPA, network, policy limits, and audit settings |
| Generated config files | docker-compose.yml and nginx.conf are generated from Jinja-style templates |
| CLI tool | swiftdeploy provides init, validate, deploy, promote, teardown, status, and audit subcommands |
| Stable/canary mode | The same app image runs with MODE=stable or MODE=canary |
| Canary header | Canary mode adds X-Mode: canary to every response |
| Chaos endpoint | /chaos supports slow, error, and recover modes in canary mode only |
| Health checks | /healthz returns liveness information and uptime in seconds |
| Prometheus metrics | /metrics exposes request counters, latency histograms, uptime, mode, and chaos state |
| Nginx reverse proxy | Nginx is the only public entry point on the configured port |
| No direct app exposure | App service uses expose, not host ports |
| Nginx error JSON | 502, 503, and 504 return structured JSON errors |
| Structured access logs | Logs use the required `$time_iso8601 |
| OPA policy sidecar | OPA runs as an isolated container, unreachable via the Nginx port |
| Infrastructure policy | Blocks deployment if disk free is below threshold or CPU load exceeds limit |
| Canary safety policy | Blocks promotion if error rate or P99 latency exceeds configured limits |
| Pre-deploy gate | CLI queries OPA before starting the stack — hard block on failure |
| Pre-promote gate | CLI scrapes /metrics, calculates error rate and P99 latency, queries OPA before promotion |
| Policy reasoning | Every OPA decision carries explicit reasoning surfaced to the operator |
| OPA failure handling | Each distinct OPA failure mode produces a different human-readable outcome |
| Live status dashboard | swiftdeploy status shows real-time req/s, error rate, P99 latency, and policy compliance |
| Audit trail | Every lifecycle event appended to history.jsonl |
| Audit report | swiftdeploy audit generates audit_report.md with timeline and violations table |
| Docker Compose lifecycle | Stack is started, restarted, promoted, and removed by the CLI |
| Non-root containers | App runs as 10001:10001, Nginx as 101:101, OPA with dropped capabilities |
| Capability dropping | All containers use cap_drop: ALL and no-new-privileges:true |
| Image size requirement | App image is built from python:3.12-slim and verified under 300MB |
| Manifest regeneration | teardown --clean removes generated configs; init regenerates them exactly |
SwiftDeploy follows this flow:
manifest.yaml
|
v
swiftdeploy CLI
|
|---> OPA policy check (pre-deploy / pre-promote)
| |
| v
| policies/*.rego + policy_limits from manifest
|
v
templates/
|----------------------|
v v
docker-compose.yml nginx.conf
| |
| v
| Nginx reverse proxy (public: port 8080)
| |
v v
Docker Compose network -> App service (internal: port 3000)
-> OPA sidecar (internal: port 8181, host loopback only)
Runtime request flow:
Client
|
v
Nginx on localhost:8080
|
v
App service on internal Docker network port 3000
Policy decision flow:
swiftdeploy CLI
|
v
POST http://127.0.0.1:8181/v1/data/swiftdeploy/<domain>/decision
|
v
OPA evaluates Rego policy against input
|
v
{ "allow": true/false, "reasons": [...] }
|
v
CLI surfaces reasoning and proceeds or blocks
The app container is not exposed directly to the host. All traffic must pass through Nginx. OPA is bound only to the host loopback interface and is not reachable via the Nginx port.
swiftdeploy/
├── app/
│ ├── main.py # Flask API with /metrics, /healthz, /chaos
│ └── requirements.txt # Pinned Python dependencies
├── policies/
│ ├── infrastructure.rego # Pre-deploy: disk and CPU policy
│ └── canary.rego # Pre-promote: error rate and latency policy
├── templates/
│ ├── docker-compose.yml.tpl # Compose template including OPA service
│ └── nginx.conf.tpl # Nginx template with headers and error handling
├── screenshots/
│ ├── 01_validate_all_pass.png
│ ├── 02_deploy_success.png
│ ├── 03_canary_and_headers.png
│ ├── 04_generated_configs.png
│ ├── 05_nginx_logs_clean.png
│ ├── 06_policy_hard_gate.png
│ ├── 07_status_chaos.png
│ ├── 08_promote_blocked.png
│ ├── 09_promote_stable_clean.png
│ └── 10_audit_report.png
├── audit/
│ └── .gitkeep # Directory preserved for audit output
├── .gitignore
├── Dockerfile
├── README.md
├── manifest.yaml
└── swiftdeploy # CLI entry point
Generated files are deliberately excluded from Git:
docker-compose.yml
nginx.conf
history.jsonl
audit_report.md
They are produced by the CLI and should not be treated as manually maintained source files.
The manifest defines the complete deployment intent, including policy limits and audit settings.
services:
image: swift-deploy-1-node:latest
port: 3000
mode: stable
version: "1.0.0"
restart_policy: unless-stopped
nginx:
image: nginx:latest
port: 8080
proxy_timeout: 10
contact: o.odimayo@gbadedata.com
opa:
image: openpolicyagent/opa:latest
port: 8181
policies_dir: policies
decision_timeout_seconds: 5
network:
name: swiftdeploy-net
driver_type: bridge
logs:
volume_name: swiftdeploy-logs
policy_limits:
infrastructure:
min_disk_free_gb: 10
max_cpu_load: 2.0
canary:
max_error_rate: 0.01
max_p99_latency_ms: 500
evaluation_window_seconds: 30
audit:
history_file: history.jsonl
report_file: audit_report.mdThe required base fields from Stage 4A are preserved unchanged:
services:
image: swift-deploy-1-node:latest
port: 3000
nginx:
image: nginx:latest
port: 8080
network:
name: swiftdeploy-net
driver_type: bridgeThe policy_limits section is the only place threshold values are defined. Rego policies read these values from input.limits at evaluation time — nothing is hardcoded inside the policy files themselves.
The application is a Python Flask service running inside the Docker image:
swift-deploy-1-node:latest
It exposes four endpoints.
Returns a welcome response with deployment metadata:
{
"message": "Welcome to SwiftDeploy",
"mode": "stable",
"version": "1.0.0",
"timestamp": "2026-05-04T03:09:05.482280+00:00"
}Returns service health and uptime:
{
"status": "ok",
"mode": "stable",
"version": "1.0.0",
"uptime_seconds": 12.34,
"timestamp": "2026-05-04T03:09:05.482280+00:00"
}Returns runtime metrics in Prometheus text format. Tracked metrics include:
| Metric | Type | Labels | Description |
|---|---|---|---|
http_requests_total |
Counter | method, path, status_code | Total HTTP requests served |
http_request_duration_seconds |
Histogram | method, path | Request latency with standard buckets |
app_uptime_seconds |
Gauge | — | Seconds since process start |
app_mode |
Gauge | — | 0 = stable, 1 = canary |
chaos_active |
Gauge | — | 0 = none, 1 = slow, 2 = error |
Chaos mode is only active in canary mode. In stable mode, this endpoint returns 403.
Supported payloads:
{ "mode": "slow", "duration": 2 }Simulates latency by sleeping N seconds before responding to subsequent requests.
{ "mode": "error", "rate": 0.5 }Simulates intermittent HTTP 500 responses at the specified probability.
{ "mode": "recover" }Clears any active chaos behaviour and resets the chaos gauge to 0.
The swiftdeploy script is the deployment control plane.
Reads manifest.yaml and generates:
docker-compose.yml
nginx.conf
Command:
python .\swiftdeploy initExpected output:
[PASS] Loaded manifest.yaml
[PASS] Generated docker-compose.yml
[PASS] Generated nginx.conf
Runs five pre-flight checks before any deployment is attempted:
manifest.yamlexists and is valid YAML- Required fields are present and non-empty
- Docker image exists locally
- Nginx host port is free
- Generated
nginx.confis syntactically valid
Command:
python .\swiftdeploy validateExpected output:
[PASS] manifest.yaml exists and is valid YAML
[PASS] All required fields are present and non-empty
[PASS] Docker image exists locally: swift-deploy-1-node:latest
[PASS] Nginx port is free on host: 8080
[PASS] Generated nginx.conf is syntactically valid
Validation exits non-zero if any check fails.
Runs init, validates the stack, queries OPA for infrastructure policy approval, starts Docker Compose, and waits for health checks to pass. If OPA denies the deployment, the stack is not started and the policy reasoning is printed to the operator.
Command:
python .\swiftdeploy deployExpected output (policy passing):
[PASS] Loaded manifest.yaml
[PASS] Generated docker-compose.yml
[PASS] Generated nginx.conf
[PASS] manifest.yaml exists and is valid YAML
[PASS] All required fields are present and non-empty
[PASS] Docker image exists locally: swift-deploy-1-node:latest
[PASS] Nginx port is free on host: 8080
[PASS] Generated nginx.conf is syntactically valid
[POLICY][PASS] infrastructure.pre_deploy
- Infrastructure policy passed
[PASS] Docker Compose stack started
[PASS] Health check passed: mode=stable, version=1.0.0
Expected output (policy blocking):
[POLICY][FAIL] infrastructure.pre_deploy
- Disk free 2GB is below required minimum 10GB
- CPU load 3.20 exceeds allowed maximum 2.00
[FAIL] Deployment blocked by policy.
The command waits up to 60 seconds for /healthz to become healthy.
Switches service mode to canary without a policy check. Canary is an experimental mode — the gate applies when returning to stable.
Command:
python .\swiftdeploy promote canaryWhat it does:
- Updates
manifest.yamlin-place - Regenerates
docker-compose.ymlwithMODE=canary - Recreates only the app container
- Confirms the new mode through
/healthz
Expected output:
[PASS] Updated manifest.yaml mode to canary
[PASS] Regenerated docker-compose.yml
[PASS] Restarted service container only
[PASS] Promotion confirmed through /healthz: mode=canary
Verify canary headers:
curl.exe -i http://127.0.0.1:8080/healthzExpected headers:
X-Mode: canary
X-Deployed-By: swiftdeploySwitches service mode back to stable. Before executing, the CLI scrapes /metrics, calculates the current error rate and P99 latency over all observed requests, and queries OPA's canary safety policy. If the canary is unhealthy, promotion is blocked and the reasoning is surfaced.
Command:
python .\swiftdeploy promote stableExpected output (canary healthy):
[POLICY][PASS] canary.pre_promote
- Canary safety policy passed
[PASS] Updated manifest.yaml mode to stable
[PASS] Regenerated docker-compose.yml
[PASS] Restarted service container only
[PASS] Promotion confirmed through /healthz: mode=stable
Expected output (canary unhealthy):
[POLICY][FAIL] canary.pre_promote
- Error rate 0.380952 exceeds allowed maximum 0.01
[FAIL] Promotion blocked by policy.
Stops and removes the stack, including containers, networks, and volumes.
Command:
python .\swiftdeploy teardownExpected output:
[PASS] Removed containers, networks, and volumes
Stops the stack and deletes generated configuration files.
Command:
python .\swiftdeploy teardown --cleanExpected output:
[PASS] Removed containers, networks, and volumes
[PASS] Deleted generated file: docker-compose.yml
[PASS] Deleted generated file: nginx.conf
This proves that generated files are disposable and can be recreated from the manifest alone.
Scrapes /metrics, calculates real-time req/s and P99 latency, queries both OPA policy domains independently, and prints a live dashboard. Every scrape is appended to history.jsonl for the audit trail.
Command (single scrape):
python .\swiftdeploy status --onceCommand (continuous, refreshes every 2 seconds):
python .\swiftdeploy statusCommand (custom interval):
python .\swiftdeploy status --interval 5Expected output:
SwiftDeploy Status
==================
Timestamp: 2026-05-06T23:12:26.328363+00:00
Mode: canary
Chaos: error
Req/s: 0.00
Error rate: 38.10%
P99 latency: 5.00ms
Uptime: 128.42s
Policy Compliance
-----------------
[PASS] infrastructure.pre_deploy
- Infrastructure policy passed
[FAIL] canary.pre_promote
- Error rate 0.380952 exceeds allowed maximum 0.01
Press Ctrl+C to stop the continuous dashboard.
Parses history.jsonl and generates audit_report.md. The report contains a deployment timeline and a dedicated violations section listing every policy failure with its timestamp, domain, and reasoning.
Command:
python .\swiftdeploy auditExpected output:
[PASS] Generated audit_report.md
The report renders correctly as GitHub Flavored Markdown and can be viewed directly on GitHub.
swiftdeploy init generates two files from templates.
Generated from:
templates/docker-compose.yml.tpl
It defines:
appservice with health check on/healthznginxservice depending on app health, publishing port 8080opaservice bound to127.0.0.1:8181only — not reachable via Nginx- shared Docker network for internal service communication
- named volume for log persistence
- environment variables injected from manifest values
- restart policy, capability restrictions, and non-root users for all services
The app service uses expose, not host ports, so it is not directly reachable from the host machine.
Generated from:
templates/nginx.conf.tpl
It defines:
- Nginx listener on the manifest-defined port
- reverse proxy to the app service on the internal Docker network
- proxy timeout from manifest
- structured access logging in the required format
- JSON error responses for 502, 503, and 504
X-Deployed-By: swiftdeployresponse header on all requests- forwarding of upstream
X-Modeheader from the app
SwiftDeploy uses Open Policy Agent as an isolated policy sidecar. The CLI never makes allow/deny decisions itself — all decision logic lives exclusively in Rego policy files.
Each policy domain owns exactly one question and one set of data it cares about. The CLI queries each domain independently. A change to one domain's policy never requires touching another.
| Domain | Question | Input data | Blocks |
|---|---|---|---|
infrastructure |
pre_deploy |
Disk free GB, CPU load, configured limits | deploy |
canary |
pre_promote |
Error rate, P99 latency ms, configured limits | promote stable |
Evaluated before every deployment. Sends host stats and configured limits to OPA.
Rules:
- Disk free must be >=
policy_limits.infrastructure.min_disk_free_gb - CPU load must be <=
policy_limits.infrastructure.max_cpu_load
Every decision includes a reasons array. On failure, each reason names the specific metric and threshold that was violated. On pass, reasons confirm the policy passed.
Evaluated before promoting from canary back to stable. Sends observed metrics scraped from /metrics and configured limits to OPA.
Rules:
- Error rate must be <=
policy_limits.canary.max_error_rate - P99 latency must be <=
policy_limits.canary.max_p99_latency_ms
All threshold values are defined in manifest.yaml under policy_limits. Rego files read them from input.limits. Changing a threshold requires only editing the manifest — no Rego files need to be touched.
The CLI handles every distinct OPA failure mode with a specific, human-readable outcome:
| Failure mode | Output |
|---|---|
| OPA container not reachable | OPA unavailable at http://127.0.0.1:8181 |
| OPA request timed out | OPA decision timed out after 5s |
| OPA returned non-200 | OPA returned HTTP 503: ... |
| OPA response was not JSON | OPA returned non-JSON response |
| OPA result missing | OPA response did not include a decision result |
In all failure cases the operation is blocked — the CLI never proceeds when OPA is unreachable.
The OPA container is published only to the host loopback interface:
ports:
- "127.0.0.1:8181:8181"Nginx has no route to port 8181. The OPA API is not accessible via the Nginx port under any circumstances.
The app exposes Prometheus-format metrics at /metrics. The status command scrapes this endpoint directly — no external Prometheus server is required.
Example output:
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/",status_code="200"} 30.0
http_requests_total{method="GET",path="/",status_code="500"} 12.0
# HELP http_request_duration_seconds HTTP request latency in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005",method="GET",path="/"} 42.0
...
# HELP app_uptime_seconds Application uptime in seconds
# TYPE app_uptime_seconds gauge
app_uptime_seconds 210.91
# HELP app_mode Application mode: 0=stable, 1=canary
# TYPE app_mode gauge
app_mode 1.0
# HELP chaos_active Chaos state: 0=none, 1=slow, 2=error
# TYPE chaos_active gauge
chaos_active 2.0
swiftdeploy status calculates the following from raw Prometheus samples:
- Req/s - change in
http_requests_totalsince the previous scrape divided by elapsed seconds - Error rate - 5xx responses as a fraction of all non-health, non-metrics requests
- P99 latency - derived from histogram bucket counts using linear interpolation to find the 99th percentile bound
Health check and metrics paths are excluded from error rate and latency calculations to avoid skewing results.
Every significant lifecycle event is appended to history.jsonl as a JSON object with a UTC timestamp, event type, and event data.
Recorded event types:
| Event type | Triggered by |
|---|---|
deploy |
Successful swiftdeploy deploy |
mode_change |
Successful swiftdeploy promote |
policy_violation |
Any OPA FAIL during deploy or promote |
pre_promote_policy_check |
Every promote stable attempt |
status_scrape |
Every swiftdeploy status scrape |
metrics_failure |
Failed /metrics scrape during promote |
python .\swiftdeploy auditThe generated audit_report.md contains:
- Summary - total event count, mode/deploy event count, and violation count
- Timeline - table of all deploy and mode change events with timestamps and summaries
- Policy Violations - table of every policy failure with timestamp, domain, question, and full reasoning
SwiftDeploy applies several runtime security controls across all containers.
The app runs as:
user: "10001:10001"Nginx runs as:
user: "101:101"OPA runs with dropped capabilities and no-new-privileges.
All containers drop Linux capabilities:
cap_drop:
- ALLAll services use:
security_opt:
- no-new-privileges:trueThe app does not publish a host port:
expose:
- "3000"The OPA container is published only to the host loopback interface:
ports:
- "127.0.0.1:8181:8181"The only public entry point is Nginx on port 8080.
Nginx listens on the manifest-defined port:
nginx:
port: 8080Nginx adds to every response:
X-Deployed-By: swiftdeployIn canary mode, the app adds:
X-Mode: canaryNginx forwards this header from the upstream response.
Nginx returns structured JSON bodies for upstream failure codes:
{
"error": "bad gateway",
"code": 502,
"service": "swiftdeploy",
"contact": "o.odimayo@gbadedata.com"
}Equivalent responses are defined for 503 and 504.
$time_iso8601 | $status | ${request_time}s | $upstream_addr | $request
Example output:
2026-05-06T23:04:50+00:00 | 200 | 0.001s | 172.18.0.2:3000 | GET / HTTP/1.1
2026-05-06T23:04:50+00:00 | 200 | 0.001s | 172.18.0.2:3000 | GET /healthz HTTP/1.1
The generated Compose file ensures:
app,nginx, andopashare the configured internal networknginxdepends onapphealth before starting- service environment variables (
MODE,APP_VERSION,APP_PORT) are injected from manifest values - restart policy is controlled by the manifest
- named volume is mounted for log persistence
apphas a/healthzhealth checknginxhas a/healthzproxy health checkopahas a health check usingopa eval true- app is not directly exposed to the host
Injected app environment:
environment:
MODE: "stable"
APP_VERSION: "1.0.0"
APP_PORT: "3000"Required locally:
- Docker Desktop
- Docker Compose plugin
- Python 3.12+
- Git
- PowerShell, Bash, or another terminal
This project was developed and tested with:
Python 3.13.11
Docker Desktop 29.2.1
Docker Compose v5.0.2
PowerShell on Windows
Clone the repository:
git clone https://github.com/gbadedata/swiftdeploy.git
cd swiftdeployCreate a virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install CLI dependencies:
python -m pip install --upgrade pip
pip install pyyaml jinja2 requestsBuild the app image:
docker build -t swift-deploy-1-node:latest .Verify image size:
docker images swift-deploy-1-node:latestThe image must be under 300MB.
python .\swiftdeploy initpython .\swiftdeploy validatepython .\swiftdeploy deploycurl.exe http://127.0.0.1:8080/curl.exe http://127.0.0.1:8080/healthzcurl.exe http://127.0.0.1:8080/metricspython .\swiftdeploy status --oncepython .\swiftdeploy promote canary
curl.exe -i http://127.0.0.1:8080/healthz@'
{"mode":"error","rate":0.5}
'@ | Set-Content -Encoding ascii chaos-error.json
curl.exe -X POST http://127.0.0.1:8080/chaos -H "Content-Type: application/json" --data-binary "@chaos-error.json"
1..20 | ForEach-Object { curl.exe -s -o NUL -w "%{http_code}`n" http://127.0.0.1:8080/ }
python .\swiftdeploy status --oncepython .\swiftdeploy promote stable@'
{"mode":"recover"}
'@ | Set-Content -Encoding ascii chaos-recover.json
curl.exe -X POST http://127.0.0.1:8080/chaos -H "Content-Type: application/json" --data-binary "@chaos-recover.json"
docker compose restart app
Start-Sleep -Seconds 15
1..30 | ForEach-Object { curl.exe -s -o NUL -w "%{http_code}`n" http://127.0.0.1:8080/ }
python .\swiftdeploy promote stablepython .\swiftdeploy auditdocker logs swiftdeploy-nginx --tail 10python .\swiftdeploy teardown --cleanPowerShell can corrupt inline JSON quoting when calling curl.exe. For reliable testing, use JSON files.
Promote to canary first:
python .\swiftdeploy promote canary@'
{ "mode": "slow", "duration": 2 }
'@ | Set-Content -Encoding ascii chaos-slow.json
curl.exe -X POST http://127.0.0.1:8080/chaos `
-H "Content-Type: application/json" `
--data-binary "@chaos-slow.json"
curl.exe -w "`nTotal time: %{time_total}s`n" http://127.0.0.1:8080/Expected result: response delay of approximately two seconds.
@'
{ "mode": "error", "rate": 0.5 }
'@ | Set-Content -Encoding ascii chaos-error.json
curl.exe -X POST http://127.0.0.1:8080/chaos `
-H "Content-Type: application/json" `
--data-binary "@chaos-error.json"
1..10 | ForEach-Object { curl.exe -s -o NUL -w "%{http_code}`n" http://127.0.0.1:8080/ }Expected result: mixed 200 and 500 responses.
After injecting errors, run the status dashboard to see the canary policy fail in real time:
python .\swiftdeploy status --once@'
{ "mode": "recover" }
'@ | Set-Content -Encoding ascii chaos-recover.json
curl.exe -X POST http://127.0.0.1:8080/chaos `
-H "Content-Type: application/json" `
--data-binary "@chaos-recover.json"Remove temporary test files:
Remove-Item chaos-*.json -ErrorAction SilentlyContinueThe CLI implements five pre-flight checks. All must pass before deployment proceeds.
The CLI fails if manifest.yaml is missing, empty, or not parseable as YAML.
Required fields include:
services.image services.port services.mode
services.version services.restart_policy
nginx.image nginx.port nginx.proxy_timeout nginx.contact
opa.image opa.port opa.policies_dir opa.decision_timeout_seconds
network.name network.driver_type
logs.volume_name
policy_limits.infrastructure policy_limits.canary
audit.history_file audit.report_file
The CLI checks the image with docker image inspect before deployment.
The CLI checks that the configured Nginx port is not already bound on the host.
The CLI validates nginx.conf by running nginx -t inside a temporary Nginx container. A host mapping for the app upstream name is added for the validation context so DNS resolution does not require the full stack to be running.
The screenshots/ folder contains submission evidence for both Stage 4A and Stage 4B.
| Screenshot | Stage | Purpose |
|---|---|---|
01_validate_all_pass.png |
4A | All five validation checks passing |
02_deploy_success.png |
4A | Successful stable deployment with health check |
03_canary_and_headers.png |
4A | Canary promotion, X-Mode: canary, X-Deployed-By: swiftdeploy |
04_generated_configs.png |
4A | Generated docker-compose.yml and nginx.conf contents |
05_nginx_logs_clean.png |
4A | Nginx access logs in required structured format |
06_policy_hard_gate.png |
4B | Deploy blocked by infrastructure policy — disk threshold exceeded |
07_status_chaos.png |
4B | Status dashboard showing chaos active and canary policy failing |
08_promote_blocked.png |
4B | Promotion to stable blocked by canary safety policy |
09_promote_stable_clean.png |
4B | Clean promotion after chaos recovery — policy passes |
10_audit_report.png |
4B | Generated audit report with timeline and violations table |
If validation fails with:
[FAIL] Nginx port is already bound on host: 8080
Stop the running stack:
python .\swiftdeploy teardown --cleanCheck for active listeners:
netstat -ano | findstr :8080TIME_WAIT entries are not usually a problem. Active LISTENING entries are.
If the CLI reports:
OPA unavailable at http://127.0.0.1:8181
Check that OPA started correctly:
docker logs swiftdeploy-opaOPA is started automatically by swiftdeploy deploy. If running promote independently after a teardown, start the stack first.
The Prometheus error counter persists for the lifetime of the app process. Restarting the app container resets the counters:
docker compose restart app
Start-Sleep -Seconds 15
1..30 | ForEach-Object { curl.exe -s -o NUL -w "%{http_code}`n" http://127.0.0.1:8080/ }
python .\swiftdeploy promote stableThis means a UTF-8 BOM was written at the start of nginx.conf. Regenerate it:
python .\swiftdeploy teardown --clean
python .\swiftdeploy initThe CLI renderer writes generated files as clean UTF-8 bytes without BOM.
Use JSON files and --data-binary rather than inline escaped JSON strings. PowerShell quote handling corrupts JSON when passed directly to curl.exe.
Run:
python .\swiftdeploy initGenerated files are intentionally excluded from Git and can always be recreated from the manifest and templates.
The manifest defines deployment intent. Generated files are artifacts, not source files. This allows deterministic regeneration and reduces manual drift. docker-compose.yml and nginx.conf can be deleted and swiftdeploy init can be run to restore them exactly.
Templates make the relationship between manifest values and generated runtime files explicit. This is closer to real-world infrastructure tooling such as Helm, Terraform templating, and deployment generators.
The decision logic lives in Rego files, not in Python. This means policies can be updated without touching the CLI. Each domain - infrastructure and canary - owns its own policy file, its own question, and its own set of input data. A change to the canary policy cannot accidentally affect the infrastructure check.
Hardcoding limits inside Rego files makes them environment-specific and harder to tune. By putting all threshold values in manifest.yaml under policy_limits, the same policy files work across any environment with different limits - without editing Rego.
Promoting to canary is an intentional experiment. The safety gate applies on the return journey — when promoting back to stable - because that is when the operator needs to prove the canary was healthy before widening the blast radius.
Rather than integrating a notification service, every event is written to history.jsonl. This is the audit source of truth. The audit command reads it and renders a report. External alerting systems can tail this file independently.
The app uses one Gunicorn worker to keep chaos state deterministic during local testing. With multiple workers, in-memory chaos state could be updated in one worker while requests are served by another, causing inconsistent error rates and unpredictable policy outcomes.
docker-compose.yml, nginx.conf, history.jsonl, and audit_report.md are excluded because they are runtime artifacts. The source of truth is the manifest and the templates. Committing generated files creates drift risk and makes the manifest redundant.
OPA is bound to 127.0.0.1:8181 only. It shares the internal Docker network with the app and Nginx but is not reachable from the public Nginx port. This prevents the policy API from being queried or manipulated from outside the stack.
Stop the stack and remove all generated files:
python .\swiftdeploy teardown --cleanVerify generated files were deleted:
Test-Path .\docker-compose.yml
Test-Path .\nginx.confExpected:
False
False
Regenerate at any time:
python .\swiftdeploy initSwiftDeploy is intentionally local-first. The task does not require AWS or domain deployment, so the implementation avoids unnecessary cloud dependencies and keeps the grading path reproducible on any Docker-enabled machine.
The key conditions are preserved:
manifest.yaml is the single source of truth
OPA is the only decision-maker - the CLI never allows or denies itself
Generated files are disposable and fully reproducible
No deployment or promotion proceeds without explicit policy approval
A full technical deep dive covering both Stage 4A and Stage 4B — the design, the policy engine, the chaos testing, and the lessons learned — is published at:
The stack can be destroyed, generated files can be removed, and the complete runtime configuration - including the OPA sidecar, policy evaluation, metrics, and audit trail - can be recreated by running:
python .\swiftdeploy init
python .\swiftdeploy deploy