-
Notifications
You must be signed in to change notification settings - Fork 0
Umbrella System Pool Design
ai-task: noetl/ai-meta#46 · Opened: 2026-06-02 · Last update: 2026-06-02 (ADR merged + wiki pages live) · Status: Design (ADR landed, no implementation started) · ADR: System Worker Pool and WASM Plug-in Surface · Sibling design: Umbrella: Python Services to Rust
Introduce a privileged system worker pool that runs
platform-internal logic (auth, RBAC, scheduled cleanups) as
NoETL playbooks under a system/ namespace. Use WASM as
the plug-in compilation target so the privileged ring is
hot-reloadable without process restart and tenant-
overridable for custom auth / policy.
Model analogy: Oracle's SYS schema (privileged namespace,
platform extends itself with its own primitives) plus
PostgreSQL extensions (CREATE EXTENSION loads compiled code
at runtime via dlopen).
The compiled core (publisher / projector / HTTP routing — see Umbrella: Python Services to Rust) stays compiled because those are hot loops where playbook- dispatch overhead would cost real throughput.
The plug-in ring is for cold-loop / customisable services where pluggability and per-tenant override matter more than per-call latency:
-
system/auth— session validation, token lookup, IdP integration -
system/rbac— per-action authorisation -
system/scheduled_cleanup— TTL enforcement, stale-row reaping -
system/credential_rotate— refresh long-lived tokens before expiry - Tenant overrides (e.g.
acme/system/auth_with_saml) — same surface, restricted capability set
See the ADR for the full split between compiled core and plug-in ring.
Captured in the ADR's "Open questions" section. Each will get a decision in the implementation ADR phase:
- Minimum compiled core. Could HTTP routing itself be plug-in-driven? Probably not, but worth a final check.
-
WASM panic handling.
wasmtime::Trap→ contained failure, structured event. - Per-tenant override scope. All system playbooks, or curated subset?
- Versioning + capability evolution. How do we add a new host function without breaking older WASM modules?
-
Audit trail.
noetl.eventtable or separatenoetl.system_eventwith stricter ACL?
Option 1 — WasmPlaybook as a first-class catalog kind. Simpler model; explicit; user can register hand-written WASM. Cons: exposes WASM as user-facing surface.
Option 2 — YAML stays the source; WASM is an internal compilation target. More elegant; playbook authors keep writing YAML; the platform compiles to WASM as an internal optimisation. Cons: requires the YAML-to-WASM compiler.
Recommendation in ADR: start with Option 1 for the initial implementation (faster to ship, validates the runtime + capability + reload pipeline). Migrate to Option 2 once the compiler is built.
Rust does not have first-class hot reload like Erlang's
code:load_file/1. Trade-off table in the ADR:
| Approach | Hot reload | Isolation | Performance | Fit |
|---|---|---|---|---|
libloading (.so) |
Yes | Same process | Native | Use only if other paths fail |
| WASM (wasmtime) | Yes | Sandboxed | ~2-5× native | Leading candidate |
| Sub-process exec | Yes | OS boundary | Fork/exec overhead | Cold loops only |
| YAML → closure JIT | Re-register only | Same process | Native | Fastest if no cross-restart needed |
WASM wins because:
- NoETL already has a
wasmtool-kind concept in Appendix H thinking. - Reload is trivial (cache by
(path, version, digest); catalog bump invalidates). - Capability-based imports give a clean security boundary for tenant overrides.
- Same
.wasmruns on amd64 + arm64 + GKE Linux without per-arch compilation (solves multi-arch headache for the plug-in ring).
If the projector is itself a playbook... the projector's events depend on the projector running. Resolution (chosen): two-tier event log — system events flow through a compiled-in fast projector; user events flow through the playbook projector (if/when one exists). The plug-in surface starts with non-projector services (auth, RBAC, scheduled cleanup) where the bootstrap problem doesn't exist.
- Capability surface design — concrete list of host
functions exposed to system WASM modules (
host_put_event,host_get_credential,host_query_pg, etc.). ADR sketches this; implementation needs typed signatures. - Module cache design —
(path, version, digest)→ compiledwasmtime::Module; eviction policy; warm-up on startup. - Linker template —
wasmtime::Linkerpopulated with host functions; tenant-vs-system capability split. - First system playbook —
system/echoas smoke test (takes input string, returns it). Validates the full pipe: catalog → fetch → compile → execute → emit event. - Routing extension —
POOL_FILTER_MAPaddssystem_*family; server-side validation that onlysystem/catalog entries declaresystem_*tool kinds.
| Date | Event |
|---|---|
| 2026-06-02 | Issue filed during the noetl/ai-meta#45 placement discussion when the project lead proposed the Oracle-SYS-analogue model. |
| 2026-06-02 | Hot-reload trade-off matrix (libloading / WASM / sub-process / closure JIT) captured as comment on #46. |
| 2026-06-02 | ADR merged via noetl/docs#176 → published at https://noetl.dev/docs/architecture/system_pool_and_wasm_plugins. |
| 2026-06-02 | Cross-linked wiki pages live: noetl-server wiki — Runtime shape (capability surface for system WASM modules) and noetl-ops wiki — System worker pool (KEDA scaler, Deployment, RBAC reserved shapes). |
| 2026-06-02 | No implementation started. Open ADR questions (5) remain to be answered before sub-issues open. |
- Finalise the ADR's open questions (one Q per pull request / discussion).
- Pick a capability surface; mock it in
repos/server/src/without a real wasmtime integration yet. - Prototype the
system/echoplaybook end-to-end with a tiny hand-written WASM module. - Validate on kind: route a command to
noetl.commands.system.<eid>, observe the system pool worker claim + execute + emit event. - Bring up the first real plug-in:
system/auth.
Implementation depends on the
Umbrella: Python Services to Rust
delivering the shared noetl/server crate scaffolding (step 1
or 2 of that umbrella).
- Umbrella: Python Services to Rust — must land first; provides the shared crate.
- ADR: System Worker Pool and WASM Plug-in Surface.
- noetl-server wiki: Runtime shape — implementation-level companion.
- noetl-ops wiki: System worker pool — deploy topology.
- Home — overview
- Repo Map
- Releases
- Sessions Log
- Secrets Wallet (#61) — SECURITY (design)
- Rust Server Port (#49) — PRIMARY
- Decoupled Context + Event Chain (#115) — RFC (design), reframes #101
- Orchestrator Scaling (#101) — reframed by #115; consume side = #115 Phase 1
- Event WAL + Derivable Storage (#104) — Round 01 (locator) PR open
- WASM Plug-in Compilation (#105) — system-pool plug-in hot-reload (ADR Phase 4)
- System Pool Design (#46) — PRIMARY
- Regression Baseline Migration (#98) — e2e
- Subscription / Listener Tool (#90) — RFC
- Container Tool Callback (#43)
- Rust Worker Parity Gaps (#47 · #48)
- Event Envelope Reconciliation (#51 in TaskList)
- Cursor Loop Mode (#100) — server v3.8.0 + tools v3.10.1, 2026-06-15
- Transfer Tool Credentials (#99) — tools v3.10.0 + worker v5.22.0, 2026-06-14
- Explicit Input Binding (#77) — v3.0.0 shipped 2026-06-09
- Rust Worker Migration (#30)
- Python Services → Rust (#45)
- Issue Tracking
- Wiki Convention
- Handoffs
- Deployment Validation
- Execution Model
- Data Access Boundary
- Observability
- noetl/noetl wiki — app + DSL
- noetl/server wiki — Rust control plane
- noetl/worker wiki — Rust pull worker
- noetl/tools wiki — tool registry crate
- noetl/cli wiki — CLI + local mode
- noetl/gateway wiki — gatekeeper
- noetl/ops wiki — Helm + manifests
- noetl/travel wiki — domain SPA reference
- Docs site — engineer-facing architecture