Skip to content

experiment(k8s): scaffold k3s-local + Kapsule poc overlays for matchID#55

Merged
rhanka merged 8 commits into
devfrom
experiment/k8s
May 17, 2026
Merged

experiment(k8s): scaffold k3s-local + Kapsule poc overlays for matchID#55
rhanka merged 8 commits into
devfrom
experiment/k8s

Conversation

@rhanka
Copy link
Copy Markdown
Member

@rhanka rhanka commented May 16, 2026

Summary

Scaffold of deploy/k8s/ with a kustomize base + overlays for:

  • overlays/local — k3d / k3s local: NodePort, hostPath PV, no nodeSelector.
  • overlays/poc — Scaleway Kapsule poc: pool=burst nodeSelector, scw-bssd PVCs, Traefik IngressRoute.
  • overlays/prod — placeholder only; production remains out of scope for this POC branch.

This PR lands the POC scaffolding. It is not a production migration and it does not replace the current deploy-remote path.

Scope

Workloads in base/:

Pod Image Port
deces-backend matchid/deces-backend:latest 8080
deces-ui matchid/deces-ui:latest 8083
elasticsearch docker.elastic.co/elasticsearch/elasticsearch:7.17.28 9200
redis redis:7-alpine 6379

ES is pinned to 7.17.28 for the POC quota. The long-term ES-to-surch swap remains a separate experiment.

External Contract

  • rhanka/poc-k8s#3 intake for matchID is merged.
  • The live Kapsule smoke must follow the poc-k8s tenant mechanism: tenant namespace + tenant kubeconfig / KUBE_CONFIG_DATA, or the later OIDC contract if/when it lands there.
  • This PR does not invent a separate production-style SCW credential path for matchID.

Validation

  • Local k8s smoke CI is green:
    • push run 25975527555: smoke-local (k3s in runner) success
    • PR run 25975528412: smoke-local (k3s in runner) success
  • PR code review result: merge-with-followup, no blocker.

Follow-Ups

  • Track B: wire the POC Kapsule smoke according to ../poc-k8s contract, then run it.
  • Track k8s-hardening: do not promote to prod while images are :latest / IfNotPresent, ES is single-node, and app state issues from deploy/k8s/K8S_READINESS_AUDIT.md remain open.

Generated with Claude Code

rhanka and others added 2 commits May 15, 2026 15:14
Adds a `deploy/k8s/` tree to drive matchID on a local k3d cluster and
on the Scaleway Kapsule `poc` cluster (rhanka/poc-k8s). Not wired into
CI/CD yet; meant to be applied by hand for the burst-mode test sessions
described in the matchID onboarding intake against poc-k8s.

Files added:
- deploy/k8s/README.md           local k3d + poc cluster flows, known gaps
- deploy/k8s/Makefile            k3d-up/down, apply-local/poc, port-forward, logs, status
- deploy/k8s/base/               kustomize base
  * namespace.yaml               matchid Namespace (skipped in poc overlay)
  * deces-backend.deployment + service.yaml  matchid/deces-backend:latest, 8080
  * deces-ui.deployment + service.yaml       matchid/deces-ui:latest, 8083
  * elasticsearch.statefulset.yaml           ES 7.17.28, dev profile, JVM -Xmx512m
  * ingress.yaml                              Traefik IngressRoute (deces.local)
  * kustomization.yaml
- deploy/k8s/local/kustomization.yaml        alias overlay pointing at overlays/local/
- deploy/k8s/overlays/local/     k3d / k3s-local
  * NodePort services (UI 30083, backend 30080)
  * hostPath PV for ES (StorageClass matchid-local)
  * drops the privileged sysctl init container (k3s ships with vm.max_map_count high enough)
- deploy/k8s/overlays/poc/       Scaleway Kapsule poc
  * nodeSelector pool=burst + toleration on every workload
  * replicas: 0 at rest on all three workloads (burst-mode tenant)
  * scw-bssd PVC for ES, IngressRoute on matchid-poc.matchid.io with cert-manager TLS
  * deletes the base Namespace (poc-k8s owns it under tenants/matchid/)

Resource sizing matches the poc-k8s intake (PR request/matchid-onboarding):
  deces-backend  100m / 500m   + 256Mi / 512Mi
  deces-ui        50m / 200m   +  64Mi / 128Mi
  elasticsearch  250m / 1500m  + 512Mi / 1Gi

Validation:
  kubectl apply --dry-run=client --validate=false -k overlays/local/  OK
  kubectl apply --dry-run=client --validate=false -k overlays/poc/    OK

Caveats / not yet wired (documented in deploy/k8s/README.md):
- ES version drift: repo Makefiles pin 8.6.1 today; we ship 7.17.28
  here to stay under 1 GiB heap on the poc cluster. Reconciliation is
  part of the surch swap follow-up.
- Surch swap: long-term plan is to drop the ES StatefulSet and point
  deces-backend at the surch tenant's surch-api Service (blocked on
  the DSL inventory in EXPERIMENT_SURCH.md).
- cert-manager / letsencrypt-prod ClusterIssuer is referenced but
  provisioned out-of-band by poc-k8s.
- OIDC auth + SMTP secrets are envFrom: secretRef with optional: true;
  the Secret itself is provisioned out-of-tree.
- No .github/workflows/k8s-*.yml yet; CI/CD wiring is a follow-up
  referenced in the poc-k8s intake (request/matchid-onboarding).
- deces-dataprep (INSEE ingest Job) not manifested yet; read-path
  lands first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…veat

- Add "Environment tiers" section: CI=k3s/k3d, Dev=Kapsule poc, Prod=TBD
- Document local prereqs: Docker + kubectl + k3d + ≥15% free disk on /,
  with the diagnostic+fix when kubelet's DiskPressure taint hits.

Caught during a smoke run today: laptop at 99% on / put every Pod in
Pending with FailedScheduling pointing at DiskPressure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51708952-9a78-4548-b618-fad2db50e136

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch experiment/k8s

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

rhanka and others added 6 commits May 16, 2026 18:24
…e smoke

Adds .github/workflows/k8s-smoke.yml driving two paths:

- smoke-local (auto on push/PR with deploy/k8s/** changes): installs k3d
  inside the ubuntu-latest runner, brings up a single-node k3s cluster,
  applies overlays/local, waits Traefik CRDs + workload availability,
  curls the deces-backend healthcheck + UI through NodePort. Tear down
  at the end regardless of outcome.

- smoke-poc (workflow_dispatch only): pulls a kubeconfig for the
  Scaleway Kapsule `poc` cluster via the SCW CLI, applies overlays/poc,
  waits availability, curls the IngressRoute (Host:
  matchid-poc.matchid.io). Falls back to port-forward if the Traefik LB
  IP isn't ready yet.

Path-filter on deploy/k8s/** + workflow file to keep cost low.

New secrets needed (header comment in the workflow lists them); the
smoke-local path is the CI gate for this experimental branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive audit of architectural patterns breaking under Kubernetes:
- 8 P0 findings (in-memory state, file persistence, ES single-node)
- 6 P1 findings (sticky sessions, job timeouts, worker isolation)
- 4 P2 findings (logging, encryption, user DB)

Most critical:
1. OTP store in memory (mail.ts:60) - loses all OTPs on pod restart
2. IP-rate-limit maps (auth.ts:4-5) - routing to different pods bypasses bans
3. Job state arrays (processStream.ts:57-60) - lost on restart, breaks bulk

Effort: 6-8 weeks to K8s-ready (P0: 2w, P1: 2w, P2: 1w).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n diags on cancel

The first runs (#25974574306, #25974575275) cancelled at the 15min job
cap with no diagnostics because `if: failure()` doesn't fire on
cancellation. Patched workflow:

- Split "Wait for deployments" into ES-first then backend/UI so a slow
  ES doesn't eat the budget meant for backend.
- Background poller emits `kubectl get pods -o wide` + events every 30s
  during the wait, so the run logs always show why a pod is unhappy
  even when the parent step times out.
- Diagnostics now triggers on `failure() || cancelled()` so we capture
  state when the runner reaps the job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ES container runs as uid 1000; some storage drivers (hostPath in local
overlay, plain manual PV) don't honour `fsGroup: 1000`, leaving the
mount root-owned. ES then crashes on boot with:

  java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
  Caused by: org.elasticsearch.ElasticsearchException: failed to bind service

Add a `fix-data-permissions` busybox init container that chowns the
data dir to 1000:1000 before ES starts. Carried in base (so Kapsule
inherits it harmlessly) and re-stated in the local overlay (since the
overlay was previously setting `initContainers: []` to drop the sysctl
init container).

Caught by CI run #25974962800 — diagnostic poller surfaced the actual
ES stack trace which the original `if: failure()`-only diag step would
have missed (the run cancelled instead of failed cleanly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…APP_FRONTEND)

Three pod-level crash-loops on the prior smoke run:

1. deces-backend exited with "BullMQ Worker: concurrency must be a
   finite number greater than 0" because BACKEND_JOB_CONCURRENCY and
   BACKEND_CHUNK_CONCURRENCY were unset. Added defaults (2/2) plus
   the rest of the env vars the image's index.js reads at module load
   (APP_DNS, APP_URL, BACKEND_LOG_TIMER, BACKEND_TMP_*, DISPOSABLE_MAIL,
   COMMUNES_JSON, DB_JSON, WIKIDATA_LINKS → /dev/null for non-fatal
   warnings on missing data files).

2. deces-backend ALSO has no Redis to talk to. Added a minimal Redis
   Deployment + Service in base (redis:7.2-alpine, 128MB maxmem,
   ephemeral). Wired REDIS_HOST=redis / REDIS_PORT=6379 into the
   backend env block. Service name 'redis' resolves intra-namespace.

3. deces-ui:latest (built 2026-04-26) ships an older nginx/run.sh that
   checks `APP` (current main checks `APP_FRONTEND`). Setting both env
   vars on the Deployment so the manifest works against either tag.

Workflow patched to wait for redis (2m) before ES (6m) before
backend/ui (4m).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The deces-ui nginx run.sh substitutes every `<VAR>` placeholder in the
template (nginx.conf.template + default.conf.template) with the value
of the matching env var. Any unreplaced placeholder leaves invalid
syntax like `$<API_USER_SCOPE>` in /etc/nginx/nginx.conf line 19, and
nginx aborts with:

  [emerg] invalid variable name in /etc/nginx/nginx.conf:19

Adds deces-ui-nginx ConfigMap with defaults copied verbatim from
packages/deces-ui/Makefile: API_USER_SCOPE, API_*_LIMIT_RATE,
API_*_BURST, API_READ_TIMEOUT, API_SEND_TIMEOUT, API_MAX_BODY,
NGINX_CSP, GOOGLE_ANALYTICS_ID, GOOGLE_ADSENSE_ID, DATAGOUV_*.
deces-ui Deployment now `envFrom`-s it.

Caught by CI run #25975385301 where backend / redis / ES all reached
Ready 1/1 but deces-ui crashed in nginx config validation. ES went
from CrashLoopBackOff to Running thanks to the previous chown init
container fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rhanka
Copy link
Copy Markdown
Member Author

rhanka commented May 17, 2026

Verdict

merge-with-followup — POC scaffolding is solid, smoke is green, no security gaps. Follow-ups are honest non-blockers given the experimental gate.

Follow-ups

  • base/elasticsearch.statefulset.yaml:46-65 + overlays/local/elasticsearch.local-storage.yaml:13-24: the local overlay's comment claims it keeps "ONLY the fix-data-permissions init container", but kustomize strategic-merge merges initContainers by name, so the rendered output still includes sysctl-vm-max-map-count. It's harmless on k3d (the || true swallows the read-only sysctl failure), but the comment lies about behavior. Either drop the comment, or use a JSON-6902 patch / $patch: replace to actually replace the list.
  • base/deces-backend.deployment.yaml:25, base/deces-ui.deployment.yaml:28: :latest everywhere. Accepted for POC, but pin a digest before promoting to prod/ overlay — imagePullPolicy: IfNotPresent with :latest will silently freeze on whatever the node first cached.
  • Audit P0 gap: packages/deces-backend/src/webhook.ts:100 declares export const webhookRegistry = new Map<string, WebhookRecord>() — same in-memory ban/validation pattern as authentification.ts:4-5 (registry holds 4h-ban records, attempt counters). Pod restart drops all validated webhooks; multi-replica = inconsistent ban state. Same class of bug as audit's P0 Fix/make dev #1 and Refacto Makefile structure for monorepo #2 — worth promoting to P0 in K8S_READINESS_AUDIT.md for the surch swap.
  • base/deces-backend.deployment.yaml:83-86: deces-backend-secrets is optional: true and not shipped. Fine while no auth is wired, but once OTP / Brevo lands, missing-secret-silent-success will hide config drift. Consider flipping optional: false + provisioning via sealed-secrets or external-secrets when the POC actually exercises auth.
  • base/redis.deployment.yaml: ephemeral Deployment (no PVC). BullMQ job state is lost on Redis pod eviction — fine for smoke, but the README only mentions it under "production should use managed Redis"; worth a # WARNING next to replicas: 1 for the next dev.
  • .github/workflows/k8s-smoke.yml:42: concurrency.group keys on target || 'local', so a workflow_dispatch for poc on the same ref won't cancel an in-flight local push run. Probably intentional (different clusters), but worth a comment.

Nits

  • base/deces-ui.configmap.yaml: several values are unquoted strings with spaces (e.g. API_AGG_GLOBAL_BURST: 30 nodelay). YAML accepts it, but quote them for grep-friendliness and parser-agnosticism.
  • Audit mail.ts:60, authentification.ts:4-5, processStream.ts:57-60 claims spot-checked and accurate.
  • Smoke run #25975527555 confirms kubectl apply -k overlays/local/ reconciles end-to-end with {"msg":"OK"} on /deces/api/v1/healthcheck.

@rhanka rhanka marked this pull request as ready for review May 17, 2026 10:47
@rhanka rhanka merged commit b689c6f into dev May 17, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant