Skip to content

Packaging & deployability: prod images, Helm, Terraform (AWS/GCP/Azure), CI/CD, ops#18

Merged
kosminus merged 6 commits into
mainfrom
feat/packaging-deployability
Jun 8, 2026
Merged

Packaging & deployability: prod images, Helm, Terraform (AWS/GCP/Azure), CI/CD, ops#18
kosminus merged 6 commits into
mainfrom
feat/packaging-deployability

Conversation

@kosminus

@kosminus kosminus commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Implements the Packaging & deployability parallel track from planfull.md — the production path from commit to running cluster. All artifacts live under deploy/ (+ root prod compose / .github/); the dev docker-compose.yml and Dockerfiles are untouched.

What's included

  • Hardened images — multi-stage, non-root backend/Dockerfile.prod (builder venv → slim runtime, healthcheck) and frontend/Dockerfile.prod (Vite build → unprivileged nginx serving the SPA and reverse-proxying /api + /mcp). Same-origin build via VITE_API_URL="" (client.ts now uses ??).
  • Production compose (docker-compose.prod.yml) — pgvector, redis, one-shot migrate (gated so replicas never race), uvicorn backend, arq worker, nginx edge. Configured by .env.prod (.env.prod.example).
  • Helm chart (deploy/helm/querywise/) — backend Deployment + HPA + PDB, arq worker, frontend + PDB, path-based ingress, ServiceAccount, and an Alembic pre-install/pre-upgrade migration-hook Job. Secrets via chart Secret or existingSecret (external-secrets seam).
  • Terraform (deploy/terraform/{aws,gcp,azure}/) — each provisions the data plane + secrets (managed Postgres 16/pgvector, managed Redis, a secret store with the assembled DSNs/keys, object storage, optional network, an identity/policy for external-secrets) in the customer's own account/VPC. Compute is intentionally separate state.
  • CI/CD (.github/) — release.yml builds + pushes both images to GHCR then Helm-deploys (main → staging, tag v* → production, --wait --atomic) via a reusable composite action; deploy-validate.yml lints chart + Terraform on PRs.
  • Ops (deploy/ops/) — encrypted backup.sh/restore.sh, a backup CronJob example, a DR runbook, and a production config reference.

Design notes

  • Migration ordering is solved properly — the ConfigMap/Secret are weight -10 hooks, the migrate Job -5, so config exists first, migrations finish before new backend pods roll, and N replicas never race on alembic upgrade.
  • End-to-end secrets contract — every Terraform module's secret keys map 1:1 to the backend's env vars, so external-secrets does a plain dataFrom into the querywise-secrets k8s Secret the chart references.
  • Compute deliberately out of scope in Terraform (BYO / upstream cluster module), kept in separate state so a cluster rebuild never risks the database.
  • ENCRYPTION_KEY rotation caveat is documented loudly in the runbook (it Fernet-encrypts stored connection strings — re-encrypt before swapping).

Validation

All offline checks pass: helm lint + helm template | kubeconform -strict (13/13), terraform validate + fmt across all three clouds, actionlint clean, shellcheck clean, backup CronJob kubeconform-valid. Full terraform plan / live deploy need cloud + cluster credentials (deploy-time).

Deferred

The managed-SaaS control plane (provisioning/billing/fleet upgrades) — additive, since each tenant is already an isolated instance.

🤖 Generated with Claude Code

cosmin chauciuc and others added 6 commits June 8, 2026 19:12
Multi-stage, non-root images (backend/Dockerfile.prod, frontend/
Dockerfile.prod) and docker-compose.prod.yml (pgvector, redis, one-shot
migrate, uvicorn backend, arq worker, nginx edge). Frontend SPA builds
same-origin and nginx reverse-proxies /api + /mcp. Dev Dockerfiles/compose
untouched. client.ts uses `??` so an empty VITE_API_URL is honored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
backend Deployment + HPA + PDB, arq worker, frontend + PDB, path-based
ingress (/api + /mcp -> backend, / -> SPA), ServiceAccount. Alembic runs
as a pre-install/pre-upgrade hook Job (ConfigMap/Secret hooks ordered
before it) so migrations land before new pods roll and replicas never
race. Secrets via chart-created Secret or existingSecret (external-secrets
seam). Passes helm lint + kubeconform -strict.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each provisions managed Postgres 16 (pgvector) + managed Redis + a secret
store with the assembled DSNs/keys (keys map 1:1 to backend env for
external-secrets) + object storage + optional network + an identity/policy
for external-secrets. Compute (EKS/GKE/AKS) is deliberately separate state.
All three pass terraform validate + fmt; lockfiles committed, tfvars
gitignored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
release.yml builds and pushes both images to GHCR, then deploys via a
reusable helm-deploy composite action: push to main -> staging, tag v* ->
production (gated by environment reviewers). Deploys pin to the commit SHA
with --wait --atomic (auto-rollback). deploy-validate.yml lints the chart
(kubeconform) and Terraform (fmt/validate) on PRs touching deploy/**.
actionlint-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
backup.sh (pg_dump custom format -> AES-256/openssl) and restore.sh
(decrypt -> pg_restore, guarded by RESTORE_CONFIRM), an in-cluster backup
CronJob example, a DR runbook (backup/restore, region rebuild, Alembic
upgrade path, quarterly rotation with the ENCRYPTION_KEY caveat), and a
production config reference. Scripts are shellcheck-clean. deploy/README.md
ties the whole track together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHANGELOG entry under 2.0.0, a Production Deployment section + feature
bullet in README, and the full Packaging & deployability section in
CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kosminus kosminus merged commit 893fd11 into main Jun 8, 2026
6 checks passed
@kosminus kosminus deleted the feat/packaging-deployability branch June 8, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant