Skip to content

Umbrella System Pool Design

Kadyapam edited this page Jun 2, 2026 · 17 revisions

Umbrella — System Pool + WASM Plug-in Surface (design)

ai-task: noetl/ai-meta#46 · Opened: 2026-06-02 · Last update: 2026-06-02 (ADR merged + wiki pages live) · Status: Design (ADR landed, no implementation started) · ADR: System Worker Pool and WASM Plug-in Surface · Sibling design: Umbrella: Python Services to Rust

Goal

Introduce a privileged system worker pool that runs platform-internal logic (auth, RBAC, scheduled cleanups) as NoETL playbooks under a system/ namespace. Use WASM as the plug-in compilation target so the privileged ring is hot-reloadable without process restart and tenant- overridable for custom auth / policy.

Model analogy: Oracle's SYS schema (privileged namespace, platform extends itself with its own primitives) plus PostgreSQL extensions (CREATE EXTENSION loads compiled code at runtime via dlopen).

Where this fits

The compiled core (publisher / projector / HTTP routing — see Umbrella: Python Services to Rust) stays compiled because those are hot loops where playbook- dispatch overhead would cost real throughput.

The plug-in ring is for cold-loop / customisable services where pluggability and per-tenant override matter more than per-call latency:

  • system/auth — session validation, token lookup, IdP integration
  • system/rbac — per-action authorisation
  • system/scheduled_cleanup — TTL enforcement, stale-row reaping
  • system/credential_rotate — refresh long-lived tokens before expiry
  • Tenant overrides (e.g. acme/system/auth_with_saml) — same surface, restricted capability set

See the ADR for the full split between compiled core and plug-in ring.

Open design questions

Captured in the ADR's "Open questions" section. Each will get a decision in the implementation ADR phase:

  1. Minimum compiled core. Could HTTP routing itself be plug-in-driven? Probably not, but worth a final check.
  2. WASM panic handling. wasmtime::Trap → contained failure, structured event.
  3. Per-tenant override scope. All system playbooks, or curated subset?
  4. Versioning + capability evolution. How do we add a new host function without breaking older WASM modules?
  5. Audit trail. noetl.event table or separate noetl.system_event with stricter ACL?

Catalog model — two options

Option 1 — WasmPlaybook as a first-class catalog kind. Simpler model; explicit; user can register hand-written WASM. Cons: exposes WASM as user-facing surface.

Option 2 — YAML stays the source; WASM is an internal compilation target. More elegant; playbook authors keep writing YAML; the platform compiles to WASM as an internal optimisation. Cons: requires the YAML-to-WASM compiler.

Recommendation in ADR: start with Option 1 for the initial implementation (faster to ship, validates the runtime + capability + reload pipeline). Migrate to Option 2 once the compiler is built.

Hot-reload landscape (Rust)

Rust does not have first-class hot reload like Erlang's code:load_file/1. Trade-off table in the ADR:

Approach Hot reload Isolation Performance Fit
libloading (.so) Yes Same process Native Use only if other paths fail
WASM (wasmtime) Yes Sandboxed ~2-5× native Leading candidate
Sub-process exec Yes OS boundary Fork/exec overhead Cold loops only
YAML → closure JIT Re-register only Same process Native Fastest if no cross-restart needed

WASM wins because:

  • NoETL already has a wasm tool-kind concept in Appendix H thinking.
  • Reload is trivial (cache by (path, version, digest); catalog bump invalidates).
  • Capability-based imports give a clean security boundary for tenant overrides.
  • Same .wasm runs on amd64 + arm64 + GKE Linux without per-arch compilation (solves multi-arch headache for the plug-in ring).

Bootstrap circular dependency

If the projector is itself a playbook... the projector's events depend on the projector running. Resolution (chosen): two-tier event log — system events flow through a compiled-in fast projector; user events flow through the playbook projector (if/when one exists). The plug-in surface starts with non-projector services (auth, RBAC, scheduled cleanup) where the bootstrap problem doesn't exist.

Sub-tasks (to be filed)

  • Capability surface design — concrete list of host functions exposed to system WASM modules (host_put_event, host_get_credential, host_query_pg, etc.). ADR sketches this; implementation needs typed signatures.
  • Module cache design(path, version, digest) → compiled wasmtime::Module; eviction policy; warm-up on startup.
  • Linker templatewasmtime::Linker populated with host functions; tenant-vs-system capability split.
  • First system playbooksystem/echo as smoke test (takes input string, returns it). Validates the full pipe: catalog → fetch → compile → execute → emit event.
  • Routing extensionPOOL_FILTER_MAP adds system_* family; server-side validation that only system/ catalog entries declare system_* tool kinds.

Recent activity

Date Event
2026-06-02 Issue filed during the noetl/ai-meta#45 placement discussion when the project lead proposed the Oracle-SYS-analogue model.
2026-06-02 Hot-reload trade-off matrix (libloading / WASM / sub-process / closure JIT) captured as comment on #46.
2026-06-02 ADR merged via noetl/docs#176 → published at https://noetl.dev/docs/architecture/system_pool_and_wasm_plugins.
2026-06-02 Cross-linked wiki pages live: noetl-server wiki — Runtime shape (capability surface for system WASM modules) and noetl-ops wiki — System worker pool (KEDA scaler, Deployment, RBAC reserved shapes).
2026-06-02 No implementation started. Open ADR questions (5) remain to be answered before sub-issues open.

Next concrete steps

  1. Finalise the ADR's open questions (one Q per pull request / discussion).
  2. Pick a capability surface; mock it in repos/server/src/ without a real wasmtime integration yet.
  3. Prototype the system/echo playbook end-to-end with a tiny hand-written WASM module.
  4. Validate on kind: route a command to noetl.commands.system.<eid>, observe the system pool worker claim + execute + emit event.
  5. Bring up the first real plug-in: system/auth.

Implementation depends on the Umbrella: Python Services to Rust delivering the shared noetl/server crate scaffolding (step 1 or 2 of that umbrella).

Related

NoETL Dashboard

Active Umbrellas

Closed Umbrellas

Conventions

Per-repo wikis

Clone this wiki locally