-
Notifications
You must be signed in to change notification settings - Fork 0
runtime shape
Status: Design proposal — tracked under noetl/ai-meta#45 (compiled rewrite) and noetl/ai-meta#46 (system pool + WASM plug-ins). Not yet implemented in this crate.
The engineer-facing architecture rationale lives in the docs site: System Worker Pool and WASM Plug-in Surface. This page is the implementation-level companion — what shape this Rust crate takes and which binaries it produces.
The Rust replacement for the Python control plane is one crate
that ships one image with four binaries. Each Helm
Deployment runs the same image with a different args: to select
its role.
repos/server/
├── Cargo.toml # name = "noetl-server", produces 4 bins
├── src/
│ ├── lib.rs # shared library — see "Shared library" below
│ ├── bin/
│ │ ├── server.rs # --mode=server HTTP control plane
│ │ ├── publisher.rs # --mode=publisher Postgres outbox → NATS
│ │ ├── projector.rs # --mode=projector NATS → noetl.event
│ │ └── system_pool.rs # --mode=system wasmtime host + dispatch
│ └── wasm/ # only compiled into system_pool.rs
│ ├── host.rs
│ ├── cache.rs
│ └── caller.rs
This shape is Postgres-style: one source tree, multiple roles via separate binaries that share the same library. See ADR for the design rationale.
Every binary depends on the same set of primitives:
| Module | Purpose | Used by |
|---|---|---|
config |
env → typed config | all bins |
db::pool |
sqlx PgPool with min/max connections | server, publisher, projector |
db::models |
noetl.event, noetl.command, noetl.execution row types |
server, projector |
nats::client |
async-nats wrapper, JetStream context, publish + subscribe helpers | server, publisher, projector, system_pool |
nats::subjects |
subject helpers — event_subject_for_exec, command_subject_for_pool, system_subject_for_kind
|
server, publisher, projector, system_pool |
envelope |
the canonical event envelope per PR-EE-3 (snowflake event_id, meta.attempts, status taxonomy) |
server, projector, system_pool |
scrub |
response-boundary credential redaction (scrub_in_place) |
server, projector |
metrics |
prometheus-crate registry + the per-binary metric defs |
all bins |
tracing_init |
structured tracing setup, execution_id span helpers per observability.md
|
all bins |
snowflake |
snowflake-id generator (snowflaked crate, NoETL epoch, machine_id from pod name hash) | server, system_pool |
The 60% of code that any single Python pod has today lives here
once. Each binary's main is a thin shell around the shared
modules.
src/bin/server.rs builds the axum app, wires routes, listens
on :8082.
Route inventory (parity with the Python noetl.server):
| Method | Path | What |
|---|---|---|
| GET | /api/health |
Liveness |
| POST | /api/catalog/list |
Browse catalog |
| GET | /api/catalog/{path}/ui_schema |
UI metadata for a playbook |
| POST | /api/catalog/register |
Register a playbook |
| GET | /api/catalog/resource |
Fetch raw playbook YAML |
| POST | /api/execute |
Start a playbook execution |
| GET | /api/executions/{id} |
Status |
| GET | /api/executions/{id}/events |
Event stream |
| POST | /api/events |
Worker writes an event (the put_result boundary) |
| GET | /api/events/{id}/result |
Fetch a stored result |
| POST | /api/credentials/... |
Credential CRUD |
| GET | /api/runtime/contract |
The runtime contract the gateway and SPA depend on |
| ... | ... | (see Python noetl.server.api for the full list) |
What's the auth+RBAC layer? In the plug-in design, the
server's per-route auth middleware delegates to the
system/auth and system/rbac playbooks (dispatched onto
worker-system-pool). See the ADR for the capability
surface. Until the plug-in ring lands, auth stays inline in the
server binary.
src/bin/publisher.rs runs a single loop:
async fn main() -> anyhow::Result<()> {
// 1. PG LISTEN on `noetl.event_outbox_notify`
// 2. On wake (or 100ms tick), SELECT FOR UPDATE SKIP LOCKED ...
// 3. For each row: PUBLISH to NATS NOETL_EVENTS with the row's subject
// 4. DELETE the row (or mark published)
// 5. Loop
}Latency win over today's Python publisher: LISTEN/NOTIFY instead
of poll-tail. When the server inserts into event_outbox inside
the same txn as the business write, the post-commit trigger fires
NOTIFY, and this publisher wakes within microseconds. The
Python publisher polls (because asyncpg's LISTEN inside FastAPI's
event loop is awkward); tokio-postgres handles LISTEN cleanly.
Scaling: 1 replica. Outbox publishing is single-writer by
design (FOR UPDATE SKIP LOCKED makes it safe to run >1, but
there's no throughput gain).
Metrics (per
observability.md):
-
noetl_publisher_published_total(counter) — rows published -
noetl_publisher_lag_seconds(gauge) — time betweenevent_outbox.created_atand NATS publish ack -
noetl_publisher_publish_duration_seconds(histogram) — per-row publish latency
src/bin/projector.rs consumes NOETL_EVENTS and batch-INSERTs
into noetl.event.
async fn main() -> anyhow::Result<()> {
// 1. Subscribe to NOETL_EVENTS consumer NOETL_PROJECTOR_NATS_CONSUMER
// (which equals the pod name — sharded)
// 2. Accumulate N events or wait T ms, whichever first
// 3. INSERT INTO noetl.event VALUES (...), (...), ...
// with ON CONFLICT (event_id) DO NOTHING (idempotent)
// 4. ACK the batch
// 5. Loop
}Sharding model preserves today's Python projector behaviour:
one StatefulSet, each pod's consumer is named for the pod (e.g.
noetl-projector-0), each consumer subscribes to a filter subject
that hashes by execution_id. Same NATS supercluster manifests in
noetl/ops/ci/manifests/nats-supercluster/.
Why this stays compiled (not a plug-in): Throughput. Peak load is thousands of events per second; WASM's 2-5× overhead and the cross-boundary memory copies for each row payload would make this the bottleneck.
Metrics:
-
noetl_projector_inserted_total(counter) -
noetl_projector_batch_size(histogram) -
noetl_projector_lag_seconds(gauge) — NATS-consumer-side lag -
noetl_projector_insert_duration_seconds(histogram)
src/bin/system_pool.rs is a NATS worker — same pull-claim-execute
shape as noetl-worker-rust — but its dispatch target is a
WASM module loaded from the catalog rather than a tool kind from
noetl-tools.
async fn main() -> anyhow::Result<()> {
let engine = wasmtime::Engine::default();
let module_cache = ModuleCache::new(&engine); // (path, version) -> Module
loop {
let cmd = claim_next_command().await?; // from NOETL_COMMANDS
let playbook = fetch_catalog_entry(cmd.path).await?;
let module = module_cache.get_or_compile(&playbook).await?;
let store = wasmtime::Store::new(&engine, /* host capabilities */);
let instance = linker.instantiate(&mut store, &module).await?;
let result = instance.get_func("execute")
.unwrap()
.call_async(&mut store, ...).await?;
write_result(cmd, result).await?;
}
}Capability surface (granted to system WASM modules via
wasmtime::Linker):
-
host_put_event(envelope)— emit an event (server boundary) -
host_get_credential(name)— read from keychain -
host_query_pg(sql, params)— execute SQL (logged + audited) -
host_read_event_log(execution_id, after_id)— read events -
host_mutate_catalog(path, yaml)— register / update catalog entry -
host_system_call(name, args)— extensible kernel-like surface
Tenant-supplied WASM modules (e.g. acme/system/auth_with_saml)
run with a restricted linker that omits host_mutate_catalog,
host_read_event_log (except own), and host_system_call.
Module cache invalidation: keyed on (path, version, digest).
Catalog version bump → cache miss → recompile → next claim picks
up the new module. Hot reload without process restart.
# Single ConfigMap shared by all four deployments
apiVersion: v1
kind: ConfigMap
metadata:
name: noetl-server-config
data:
POSTGRES_HOST: postgres.postgres.svc.cluster.local
NATS_URL: nats://nats.nats.svc.cluster.local:4222
...
---
# Deployment 1 — HTTP server
apiVersion: apps/v1
kind: Deployment
metadata:
name: noetl-server
spec:
replicas: 1
template:
spec:
containers:
- name: server
image: ghcr.io/noetl/server:<v>
args: ["--mode=server"]
envFrom:
- configMapRef: { name: noetl-server-config }
---
# Deployment 2 — outbox publisher
apiVersion: apps/v1
kind: Deployment
metadata:
name: noetl-outbox-publisher
spec:
replicas: 1
template:
spec:
containers:
- name: publisher
image: ghcr.io/noetl/server:<v> # same image
args: ["--mode=publisher"]
---
# Deployment 3 — projector (StatefulSet for stable pod names → consumer names)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: noetl-projector
spec:
serviceName: noetl-projector
replicas: 1 # KEDA scales N
template:
spec:
containers:
- name: projector
image: ghcr.io/noetl/server:<v>
args: ["--mode=projector"]
env:
- name: NOETL_PROJECTOR_SHARD_ID
valueFrom: { fieldRef: { fieldPath: metadata.name } }
---
# Deployment 4 — system worker pool (KEDA-scaled on NATS lag)
apiVersion: apps/v1
kind: Deployment
metadata:
name: noetl-worker-system-pool
spec:
replicas: 1 # KEDA scales 1-N
template:
spec:
containers:
- name: system-pool
image: ghcr.io/noetl/server:<v>
args: ["--mode=system"]
env:
- name: NATS_CONSUMER
value: noetl_worker_pool_system
- name: NATS_FILTER_SUBJECT
value: noetl.commands.system.>Operations team owns one image lifecycle, one chart, one set of
ConfigMaps. See the
noetl-ops wiki — system worker pool deploy page
for the full manifest reference.
Smallest surface first to amortise the shared-library scaffolding:
-
publisher(smallest LoC, narrowest surface). Adds the shared library skeleton. -
projector(reuses the shared envelope + NATS code). -
server(largest; everything else is now proven). -
system_pool(depends onserverfor catalog reads).
Steps 1 + 2 drop two of the three Python pods even before
server ships — useful intermediate checkpoint.
Step 4 unblocks the plug-in ring per the
ADR and noetl/ai-meta#46.
- Event envelope
- Event-sourced execution
- API surface
- Runtime shape (compiled + plug-in ring)
- Cursor / claim loop mode
- noetl/cli wiki
- noetl/worker wiki
- noetl/tools wiki
- noetl/noetl wiki — Python implementation (twin during migration)
- noetl/ops wiki