runtime shape

Runtime shape — compiled core and plug-in ring

Status: Design proposal — tracked under noetl/ai-meta#45 (compiled rewrite) and noetl/ai-meta#46 (system pool + WASM plug-ins). Not yet implemented in this crate.

The engineer-facing architecture rationale lives in the docs site: System Worker Pool and WASM Plug-in Surface. This page is the implementation-level companion — what shape this Rust crate takes and which binaries it produces.

The crate produces multiple binaries

The Rust replacement for the Python control plane is one crate that ships one image with four binaries. Each Helm Deployment runs the same image with a different args: to select its role.

repos/server/
├── Cargo.toml              # name = "noetl-server", produces 4 bins
├── src/
│   ├── lib.rs              # shared library — see "Shared library" below
│   ├── bin/
│   │   ├── server.rs       # --mode=server     HTTP control plane
│   │   ├── publisher.rs    # --mode=publisher  Postgres outbox → NATS
│   │   ├── projector.rs    # --mode=projector  NATS → noetl.event
│   │   └── system_pool.rs  # --mode=system     wasmtime host + dispatch
│   └── wasm/               # only compiled into system_pool.rs
│       ├── host.rs
│       ├── cache.rs
│       └── caller.rs

This shape is Postgres-style: one source tree, multiple roles via separate binaries that share the same library. See ADR for the design rationale.

Shared library — what goes in `src/lib.rs`

Every binary depends on the same set of primitives:

Module	Purpose	Used by
`config`	env → typed config	all bins
`db::pool`	sqlx PgPool with min/max connections	server, publisher, projector
`db::models`	`noetl.event`, `noetl.command`, `noetl.execution` row types	server, projector
`nats::client`	async-nats wrapper, JetStream context, publish + subscribe helpers	server, publisher, projector, system_pool
`nats::subjects`	subject helpers — `event_subject_for_exec`, `command_subject_for_pool`, `system_subject_for_kind`	server, publisher, projector, system_pool
`envelope`	the canonical event envelope per PR-EE-3 (snowflake event_id, `meta.attempts`, status taxonomy)	server, projector, system_pool
`scrub`	response-boundary credential redaction (`scrub_in_place`)	server, projector
`metrics`	`prometheus`-crate registry + the per-binary metric defs	all bins
`tracing_init`	structured `tracing` setup, `execution_id` span helpers per `observability.md`	all bins
`snowflake`	snowflake-id generator (snowflaked crate, NoETL epoch, machine_id from pod name hash)	server, system_pool

The 60% of code that any single Python pod has today lives here once. Each binary's main is a thin shell around the shared modules.

Binary 1 — `server` (HTTP control plane)

src/bin/server.rs builds the axum app, wires routes, listens on :8082.

Route inventory (parity with the Python noetl.server):

Method	Path	What
GET	`/api/health`	Liveness
POST	`/api/catalog/list`	Browse catalog
GET	`/api/catalog/{path}/ui_schema`	UI metadata for a playbook
POST	`/api/catalog/register`	Register a playbook
GET	`/api/catalog/resource`	Fetch raw playbook YAML
POST	`/api/execute`	Start a playbook execution
GET	`/api/executions/{id}`	Status
GET	`/api/executions/{id}/events`	Event stream
POST	`/api/events`	Worker writes an event (the `put_result` boundary)
GET	`/api/events/{id}/result`	Fetch a stored result
POST	`/api/credentials/...`	Credential CRUD
GET	`/api/runtime/contract`	The runtime contract the gateway and SPA depend on
GET	`/api/worker/pools`	List worker pool runtime registrations
...	...	(see Python `noetl.server.api` for the full list)

Implementation status (Phase A of noetl/ai-meta#49): the inventory above is the parity target. Currently wired in Rust: everything except /api/catalog/{path}/ui_schema (the YAML inference helper port — tracked as noetl/server#18). /api/runtimes (a no-filter list of all runtime registrations) was a Rust-only extension and is removed in noetl/server#19 until the Python backport lands.

What's the auth+RBAC layer? In the plug-in design, the server's per-route auth middleware delegates to the system/auth and system/rbac playbooks (dispatched onto worker-system-pool). See the ADR for the capability surface. Until the plug-in ring lands, auth stays inline in the server binary.

Binary 2 — `publisher` (outbox → NATS)

src/bin/publisher.rs runs a single loop:

async fn main() -> anyhow::Result<()> {
    // 1. PG LISTEN on `noetl.event_outbox_notify`
    // 2. On wake (or 100ms tick), SELECT FOR UPDATE SKIP LOCKED ...
    // 3. For each row: PUBLISH to NATS NOETL_EVENTS with the row's subject
    // 4. DELETE the row (or mark published)
    // 5. Loop
}

Latency win over today's Python publisher: LISTEN/NOTIFY instead of poll-tail. When the server inserts into event_outbox inside the same txn as the business write, the post-commit trigger fires NOTIFY, and this publisher wakes within microseconds. The Python publisher polls (because asyncpg's LISTEN inside FastAPI's event loop is awkward); tokio-postgres handles LISTEN cleanly.

Scaling: 1 replica. Outbox publishing is single-writer by design (FOR UPDATE SKIP LOCKED makes it safe to run >1, but there's no throughput gain).

Metrics (per observability.md):

noetl_publisher_published_total (counter) — rows published
noetl_publisher_lag_seconds (gauge) — time between event_outbox.created_at and NATS publish ack
noetl_publisher_publish_duration_seconds (histogram) — per-row publish latency

Binary 3 — `projector` (NATS → event log)

src/bin/projector.rs consumes NOETL_EVENTS and batch-INSERTs into noetl.event.

async fn main() -> anyhow::Result<()> {
    // 1. Subscribe to NOETL_EVENTS consumer NOETL_PROJECTOR_NATS_CONSUMER
    //    (which equals the pod name — sharded)
    // 2. Accumulate N events or wait T ms, whichever first
    // 3. INSERT INTO noetl.event VALUES (...), (...), ...
    //    with ON CONFLICT (event_id) DO NOTHING (idempotent)
    // 4. ACK the batch
    // 5. Loop
}

Sharding model preserves today's Python projector behaviour: one StatefulSet, each pod's consumer is named for the pod (e.g. noetl-projector-0), each consumer subscribes to a filter subject that hashes by execution_id. Same NATS supercluster manifests in noetl/ops/ci/manifests/nats-supercluster/.

Why this stays compiled (not a plug-in): Throughput. Peak load is thousands of events per second; WASM's 2-5× overhead and the cross-boundary memory copies for each row payload would make this the bottleneck.

Metrics:

noetl_projector_inserted_total (counter)
noetl_projector_batch_size (histogram)
noetl_projector_lag_seconds (gauge) — NATS-consumer-side lag
noetl_projector_insert_duration_seconds (histogram)

Binary 4 — `system_pool` (wasmtime host + dispatch)

src/bin/system_pool.rs is a NATS worker — same pull-claim-execute shape as noetl-worker-rust — but its dispatch target is a WASM module loaded from the catalog rather than a tool kind from noetl-tools.

async fn main() -> anyhow::Result<()> {
    let engine = wasmtime::Engine::default();
    let module_cache = ModuleCache::new(&engine); // (path, version) -> Module

    loop {
        let cmd = claim_next_command().await?;          // from NOETL_COMMANDS
        let playbook = fetch_catalog_entry(cmd.path).await?;
        let module = module_cache.get_or_compile(&playbook).await?;
        let store = wasmtime::Store::new(&engine, /* host capabilities */);
        let instance = linker.instantiate(&mut store, &module).await?;
        let result = instance.get_func("execute")
            .unwrap()
            .call_async(&mut store, ...).await?;
        write_result(cmd, result).await?;
    }
}

Capability surface (granted to system WASM modules via wasmtime::Linker):

host_put_event(envelope) — emit an event (server boundary)
host_get_credential(name) — read from keychain
host_query_pg(sql, params) — execute SQL (logged + audited)
host_read_event_log(execution_id, after_id) — read events
host_mutate_catalog(path, yaml) — register / update catalog entry
host_system_call(name, args) — extensible kernel-like surface

Tenant-supplied WASM modules (e.g. acme/system/auth_with_saml) run with a restricted linker that omits host_mutate_catalog, host_read_event_log (except own), and host_system_call.

Module cache invalidation: keyed on (path, version, digest). Catalog version bump → cache miss → recompile → next claim picks up the new module. Hot reload without process restart.

Helm deployment shape

# Single ConfigMap shared by all four deployments
apiVersion: v1
kind: ConfigMap
metadata:
  name: noetl-server-config
data:
  POSTGRES_HOST: postgres.postgres.svc.cluster.local
  NATS_URL: nats://nats.nats.svc.cluster.local:4222
  ...
---
# Deployment 1 — HTTP server
apiVersion: apps/v1
kind: Deployment
metadata:
  name: noetl-server
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: server
        image: ghcr.io/noetl/server:<v>
        args: ["--mode=server"]
        envFrom:
        - configMapRef: { name: noetl-server-config }
---
# Deployment 2 — outbox publisher
apiVersion: apps/v1
kind: Deployment
metadata:
  name: noetl-outbox-publisher
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: publisher
        image: ghcr.io/noetl/server:<v>   # same image
        args: ["--mode=publisher"]
---
# Deployment 3 — projector (StatefulSet for stable pod names → consumer names)
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: noetl-projector
spec:
  serviceName: noetl-projector
  replicas: 1    # KEDA scales N
  template:
    spec:
      containers:
      - name: projector
        image: ghcr.io/noetl/server:<v>
        args: ["--mode=projector"]
        env:
        - name: NOETL_PROJECTOR_SHARD_ID
          valueFrom: { fieldRef: { fieldPath: metadata.name } }
---
# Deployment 4 — system worker pool (KEDA-scaled on NATS lag)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: noetl-worker-system-pool
spec:
  replicas: 1   # KEDA scales 1-N
  template:
    spec:
      containers:
      - name: system-pool
        image: ghcr.io/noetl/server:<v>
        args: ["--mode=system"]
        env:
        - name: NATS_CONSUMER
          value: noetl_worker_pool_system
        - name: NATS_FILTER_SUBJECT
          value: noetl.commands.system.>

Operations team owns one image lifecycle, one chart, one set of ConfigMaps. See the noetl-ops wiki — system worker pool deploy page for the full manifest reference.

Sequencing

Smallest surface first to amortise the shared-library scaffolding:

publisher (smallest LoC, narrowest surface). Adds the shared library skeleton.
projector (reuses the shared envelope + NATS code).
server (largest; everything else is now proven).
system_pool (depends on server for catalog reads).

Steps 1 + 2 drop two of the three Python pods even before server ships — useful intermediate checkpoint.

Step 4 unblocks the plug-in ring per the ADR and noetl/ai-meta#46.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime shape

Runtime shape — compiled core and plug-in ring

The crate produces multiple binaries

Shared library — what goes in `src/lib.rs`

Binary 1 — `server` (HTTP control plane)

Binary 2 — `publisher` (outbox → NATS)

Binary 3 — `projector` (NATS → event log)

Binary 4 — `system_pool` (wasmtime host + dispatch)

Helm deployment shape

Sequencing

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noetl-server

Reference

Design

Operations

Related repos

External

Clone this wiki locally

runtime shape

Runtime shape — compiled core and plug-in ring

The crate produces multiple binaries

Shared library — what goes in src/lib.rs

Binary 1 — server (HTTP control plane)

Binary 2 — publisher (outbox → NATS)

Binary 3 — projector (NATS → event log)

Binary 4 — system_pool (wasmtime host + dispatch)

Helm deployment shape

Sequencing

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noetl-server

Reference

Design

Operations

Related repos

External

Clone this wiki locally

Shared library — what goes in `src/lib.rs`

Binary 1 — `server` (HTTP control plane)

Binary 2 — `publisher` (outbox → NATS)

Binary 3 — `projector` (NATS → event log)

Binary 4 — `system_pool` (wasmtime host + dispatch)