-
Notifications
You must be signed in to change notification settings - Fork 0
system worker pool
Status: Design proposal — tracked under noetl/ai-meta#45 and noetl/ai-meta#46. Not yet implemented. This page reserves the operational shape so when the work lands, the manifests have a known home.
For the architectural rationale, see the docs site: System Worker Pool and WASM Plug-in Surface.
For the implementation-level Rust crate layout, see the noetl-server wiki — Runtime shape page.
After the full Rust migration plus the system-pool design, the NoETL cluster has:
| Workload | Image | Role | Pool | Replicas |
|---|---|---|---|---|
noetl-server |
ghcr.io/noetl/server:<v> |
HTTP control plane | n/a | 1-3 |
noetl-outbox-publisher |
ghcr.io/noetl/server:<v> |
Postgres outbox → NATS | n/a | 1 |
noetl-projector |
ghcr.io/noetl/server:<v> |
NATS → event log | n/a | 1-N (sharded) |
noetl-worker-rust |
ghcr.io/noetl/worker:<v> |
User-playbook compute | worker-rust-pool |
1-20 (KEDA) |
noetl-worker-cpu |
ghcr.io/noetl/worker:<v> |
User-playbook compute, Python tools fallback | worker-cpu-01 |
1-20 (KEDA) |
noetl-worker-system-pool |
ghcr.io/noetl/server:<v> |
System playbook compute (WASM) | worker-system-pool |
1-3 (KEDA) |
The system worker pool is new. It runs the same image as
the server (because it shares the wasmtime host code + the
capability surface), but with --mode=system and a NATS
consumer that filters on noetl.commands.system.>.
The per-pool routing scheme from noetl/ai-meta#42 extends naturally:
noetl.commands (legacy bare subject, drained post-cutover)
noetl.commands.shared.<eid> (default — Rust + Python pools race)
noetl.commands.python.<eid> (Python-only kinds, e.g. agent)
noetl.commands.system.<eid> (NEW — system pool only)
POOL_FILTER_MAP in the server gains the system family:
POOL_FILTER_MAP = {
"agent": "python",
"system_auth": "system",
"system_rbac": "system",
"system_cleanup": "system",
"system_credential_rotate": "system",
# ... default → "shared"
}Server-side validation: only catalog entries under the system/
path may declare system_* tool kinds. User playbooks
attempting to declare kind: system_auth are rejected at
register-time.
To live at
ci/manifests/keda/scaledobject-worker-system-pool.yaml
once the work lands:
# NoETL system worker pool autoscaler.
#
# Scales `noetl-worker-system-pool` based on backlog of the
# `noetl_worker_pool_system` JetStream consumer. System playbooks
# are typically low-frequency (auth checks, scheduled cleanups,
# credential rotation), so the pool defaults to 1 replica with
# room to scale on bursts.
#
# Generated by:
# from noetl.core.runtime.keda import (
# ScaledObjectSpec, build_worker_scaledobject, dump_scaledobject_yaml,
# )
# spec = ScaledObjectSpec(
# worker_pool_urn="noetl://tenant/default/org/default/worker/worker-system-pool",
# deployment="noetl-worker-system-pool",
# nats_consumer="noetl_worker_pool_system",
# )
# print(dump_scaledobject_yaml(build_worker_scaledobject(spec)))
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: noetl-worker-system-pool-scaler
namespace: noetl
labels:
app: noetl-worker-system-pool
worker-pool: worker-system-pool
managed-by: noetl
spec:
scaleTargetRef:
name: noetl-worker-system-pool
minReplicaCount: 1
maxReplicaCount: 5 # smaller cap than user pools
pollingInterval: 10
cooldownPeriod: 30
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: nats.nats.svc.cluster.local:8222
account: NOETL
stream: NOETL_COMMANDS
consumer: noetl_worker_pool_system
lagThreshold: '5' # tighter than user pools (10)
activationLagThreshold: '1'
useHttps: 'false'Smaller maxReplicaCount than user pools (5 vs 20) reflects the
expected workload — system playbooks are not high-throughput.
Tighter lagThreshold (5 vs 10) keeps auth + RBAC latency low
during bursts.
To live at
ci/manifests/noetl/worker-system-pool-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: noetl-worker-system-pool
namespace: noetl
labels:
app: noetl-worker-system-pool
component: system-worker
runtime: rust
worker-pool: worker-system-pool
spec:
replicas: 1
selector:
matchLabels:
app: noetl-worker-system-pool
worker-pool: worker-system-pool
template:
metadata:
labels:
app: noetl-worker-system-pool
component: system-worker
runtime: rust
worker-pool: worker-system-pool
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9090"
spec:
serviceAccountName: noetl-worker-system-pool # distinct RBAC
initContainers:
- name: wait-for-api
image: curlimages/curl:8.7.1
command: ["sh", "-c", "until curl -sf http://noetl.noetl.svc.cluster.local:8082/api/health; do sleep 3; done"]
containers:
- name: system-pool
image: ghcr.io/noetl/server:<v>
imagePullPolicy: IfNotPresent
args: ["--mode=system"]
ports:
- name: metrics
containerPort: 9090
env:
- name: WORKER_POOL_NAME
value: worker-system-pool
- name: WORKER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NATS_URL
value: nats://noetl:noetl@nats.nats.svc.cluster.local:4222
- name: NATS_STREAM
value: NOETL_COMMANDS
- name: NATS_CONSUMER
value: noetl_worker_pool_system
- name: NATS_FILTER_SUBJECT
value: noetl.commands.system.>
- name: NOETL_SERVER_URL
value: http://noetl.noetl.svc.cluster.local:8082
- name: WASM_MODULE_CACHE_DIR
value: /var/cache/noetl/wasm
- name: RUST_LOG
value: "info,noetl_server_system_pool=debug"
resources:
requests:
cpu: "100m"
memory: "256Mi" # WASM modules + cache
limits:
cpu: "1000m"
memory: "1Gi"
volumeMounts:
- name: wasm-cache
mountPath: /var/cache/noetl/wasm
volumes:
- name: wasm-cache
emptyDir:
sizeLimit: 512MiKey differences from worker-rust-deployment.yaml:
-
Image:
ghcr.io/noetl/server(notworker) — the system pool ships in the same crate as the server because it shares the wasmtime host code and the capability surface. -
Service account:
noetl-worker-system-pool— distinct from the user-pool service account. RBAC grants:- read access to the catalog (for fetching system playbook YAML)
- write access to the keychain (for
host_get_credential/host_credential_rotate) - write access to
noetl.event(forhost_put_event) - read access to
noetl.event(forhost_read_event_log)
- Memory request: 256Mi (vs user pool's 128Mi) for the WASM module cache.
-
Volume:
wasm-cacheemptyDir for compiled WASM module artefacts. Catalog version bump invalidates entries by(path, version, digest). -
No
WORKER_MAX_CONCURRENT: system pool serialises by default (one WASM execution per worker pod at a time) for determinism. Scale horizontally via KEDA instead.
To live at
ci/manifests/noetl/serviceaccount-system-pool.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: noetl-worker-system-pool
namespace: noetl
labels:
app: noetl-worker-system-pool
---
# K8s-side RBAC. Application-side RBAC (which keychain
# credentials, which catalog paths) lives in the keychain ACL
# and the catalog ACL, not here.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: noetl
name: noetl-worker-system-pool-role
rules:
# Read configmap with system pool config
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["noetl-worker-system-pool-config"]
verbs: ["get", "list", "watch"]
# Read secret with NATS + DB credentials
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["noetl-worker-system-pool-secrets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: noetl
name: noetl-worker-system-pool-rolebinding
subjects:
- kind: ServiceAccount
name: noetl-worker-system-pool
namespace: noetl
roleRef:
kind: Role
name: noetl-worker-system-pool-role
apiGroup: rbac.authorization.k8s.ioThe system pool extends the chart's worker-pool template
(noetl/ops/helm/noetl/templates/worker-pool.yaml — present
today for the Rust and Python pools). Add a third values.yaml
section:
# noetl/ops/helm/noetl/values.yaml
workerPools:
cpu-01: # existing — Python pool
enabled: true
image: ghcr.io/noetl/worker
replicas: 1
natsConsumer: noetl_worker_pool
rust-pool: # existing — Rust user pool
enabled: true
image: ghcr.io/noetl/worker
replicas: 1
natsConsumer: noetl_worker_pool_shared
natsFilterSubject: noetl.commands.shared.>
system-pool: # NEW
enabled: false # default off until plug-in ring lands
image: ghcr.io/noetl/server # NOT noetl/worker
args: ["--mode=system"]
replicas: 1
natsConsumer: noetl_worker_pool_system
natsFilterSubject: noetl.commands.system.>
wasmCacheSize: 512MiWhen the system pool is disabled, no Deployment is rendered, no KEDA scaler is rendered, no service account is created. Opt-in per cluster.
Per agents/rules/deployment-validation.md, every operational manifest validates on the local kind cluster before GKE. The system pool's validation rig will live at:
repos/ops/automation/development/system-pool-validation.yamlrepos/ops/automation/development/validate-system-pool.sh
Smoke-test playbook: a tiny system/echo WASM module that takes
an input string and echoes it back as the result. Exercises:
- Catalog can store a
WasmPlaybookentry - Server publishes the dispatch to
noetl.commands.system.<eid> - System pool worker claims, fetches the WASM, executes
- Result lands back via
POST /api/events(the same boundary the Rust user pool uses) - Catalog version bump invalidates the cached module — re-claim compiles the new version
Per the implementation sequencing in the server wiki:
| Step | New manifests | When |
|---|---|---|
| 1 | (none — publisher replaces existing Python pod's command:) |
After --mode=publisher ships |
| 2 | (none — projector replaces existing Python pod's command:) |
After --mode=projector ships |
| 3 | (none — server replaces existing Python pod's command:) |
After --mode=server ships |
| 4 | All four reserved manifests above | After --mode=system ships |
The first three steps change image references in existing manifests but don't add new ones. The system pool is the only new operational surface.
- Home
- KEDA Scaler — base autoscaling pattern this extends.
-
NATS Supercluster — the JetStream
topology that carries
noetl.commands.system.>. - System Pool and WASM Plug-ins (ADR)
- noetl/server wiki — Runtime shape page
- noetl/ai-meta#46 — system pool design
- noetl/ai-meta#45 — replace Python services with Rust