-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This wiki is the operational + deployment companion to
noetl/ops. Topics that live in
this repository — Kubernetes manifests, deployment playbooks,
CI/CD, infrastructure automation — have their reference pages
here.
For NoETL application documentation (Python API, DSL semantics, the v2 distributed-runtime spec, etc.) see the noetl/noetl wiki.
Where the manifests live. As of the Scope B consolidation (May 2026), all NoETL operational manifests live exclusively in
noetl/ops/ci/manifests/. The previous parallel copy atnoetl/noetl/ci/manifests/was deleted; only aMOVED.mdbreadcrumb remains there. Theautomation/development/noetl.yamlplaybook reads from localci/manifests/...paths (no more cross-repo$NOETL_REPO/ci/manifests/...).
| Page | What |
|---|---|
| KEDA Scaler | Worker-pool autoscaling via NATS JetStream consumer lag. Install Helm chart + apply ScaledObject. |
| NATS Supercluster | Multi-cluster JetStream topology with gateway-meshed clusters. Apply 2-cluster reference manifest. |
| System worker pool |
Proposed. Deploy topology for the privileged worker-system-pool that runs platform-internal logic (auth, RBAC, scheduled cleanups) as WASM-compiled NoETL playbooks. Tracked under noetl/ai-meta#45 + noetl/ai-meta#46. |
| mTLS (Rust stack) | cert-manager-issued mutual TLS between noetl-server-rust + noetl-worker-rust (Secrets Wallet Phase 4, noetl/ai-meta#61). ci/manifests/noetl/tls/. |
| Page | What |
|---|---|
| Production monitoring (GMP) | Prod runs Google Managed Prometheus, not VictoriaMetrics. The PodMonitoring (worker+server scrape) + Rules (materializer-lag) under ci/manifests/noetl/gmp/, query recipes, managedAlertmanager pager wiring, and the CQRS PUBLISH_ONLY flip prep status (noetl/ai-meta#103). |
| Page | What |
|---|---|
| GKE Helm install | Install + upgrade NoETL on GKE via the Helm chart + Cloud SQL + PgBouncer + chart-templated KEDA. The GKE deploy path the project supports. |
| Firestore MCP agent | Firestore document, event-log, replay, and batch helper methods used by domain playbooks. |
(The
automation/development/noetl.yaml
kind playbook is currently documented inline in the noetl/noetl wiki
under operational sections — will migrate here in a future Scope B
refactor.)
Reproducible end-to-end smoke tests for individual feature surfaces. Each rig is <feature>-validation.yaml + validate-<feature>.sh + validate-<feature>.sql.
| Rig | What it exercises | Worker pool |
|---|---|---|
rust-worker-r2-validation |
R-2.1 cross-node durable PUT, R-2.1 colocated shm cache, R-2.2 Arrow IPC encoding, producer-side credential scrub. Same PIN_RUST_WORKER=1 auto-pinning shape as the other Rust-only rigs (scales Python worker pool → 0, waits for drain, restores on exit via cleanup trap). SQL probes filter by execution_id = :exec_id passed via psql -v since worker_id only lands on command.claimed events under the post-EE-4 schema. |
Rust |
result-fetch-validation |
result_fetch tool kind (noetl-tools 2.11+) — producer over-budget Arrow IPC ref → fetch_via_flight + fetch_via_http via the playbook surface. Scales the Python worker pool to 0 + waits for full pod drain to pin commands to the Rust worker (Phase A over-budget branch only fires on the Rust side). |
Rust |
flight-tls-validation |
R-2.3 Phase C2 full trust boundary — server TLS (C2.1) + client TLS (C2.2) + bearer-token middleware (C2.3) + mTLS (C2.4) all on, talking through the result_fetch tool kind. Companion generate-flight-tls.sh bootstraps the certs + Secrets via openssl (private tmpdir, no repo leakage); --off reverts. Production swap to cert-manager is drop-in — Secret shape stays the same. |
Rust |
validate-shard-drift-guard.sh |
Phase F R3b end-to-end: posts to noetl-server GET /api/runtime/shard-info (R3b-1) and noetl-gateway GET /sharding/preview (R3b-2) for a battery of (execution_id, shard_count) pairs and asserts shard_index agreement. Catches runtime drift the unit-test pinning can't see — twox-hash crate version split, SHARD_HASH_SEED divergence, i64→bytes endianness flip. No NoETL playbook execution (it probes diagnostic endpoints only); auto-cleans port-forwards on exit. |
n/a (control-plane probe) |
validate-shard-routing-n2.sh |
Phase F R4-5 end-to-end: validates the in-server DbPoolMap routing. Creates noetl_shard_0 + noetl_shard_1 + noetl_cluster databases on the existing postgres pod (cheap path — exercises routing without 3 separate Postgres pods; per-pod isolation is a Phase G concern), applies the noetl schema DDL to each, patches noetl-server-rust deployment with NOETL_SHARDS + NOETL_CLUSTER_DSN env vars, spawns N executions via POST /api/execute, asserts each landed on the predicted shard per shard_for(execution_id, 2) (queries each per-shard DB directly + cross-checks against the R3b-1 endpoint), then re-runs the R3b drift-guard against the sharded server. Auto-reverts the deployment patch on exit (trap EXIT); idempotent re-creation of databases. |
Rust |
Each .sh is self-contained: registers the playbook, kicks off the execution, polls completion, runs the SQL probes, samples the worker /metrics. Pairs with agents/rules/deployment-validation.md — anything that ships in a container image MUST run through one of these before GKE rollout.
- noetl/noetl wiki — application code.
- noetl/noetl — application repo.
-
noetl/ops — this repo (deployment
- automation).
- noetl/docs — design docs + feature specs (the v2 distributed-runtime spec lives there).