deployment specification

Deployment Specification

This page is the durable reference for deploying noetl-worker into any environment. It covers the runtime contract the binary expects, the resources it consumes, the network surface it exposes, and — critically — every environment variable it reads, with the why behind each one.

This page is the single source of truth for the deployment shape. Any code change that adds, renames, removes, or shifts the meaning of an env var MUST update the Environment Variables section in the same change set. Same rule for ports, dependencies, and runtime requirements. See agents/rules/wiki-maintenance.md.

The matching deployment manifests live in noetl/ops (Helm chart + kind overlays). This wiki page describes what the manifests need to provide; the manifests are the implementation.

Component summary

Field	Value
Repo	noetl/worker
Binary	`noetl-worker`
Container image	`noetl-worker` (built from the repo's Dockerfile)
Image versioning	crates.io version pinned in `Cargo.toml`; semver releases tagged `vX.Y.Z`
Current version	see `Cargo.toml` `package.version`
Language / runtime	Rust 1.91+; Tokio multi-threaded
Process model	Single binary, single process per pod
Role	NATS pull consumer + tool dispatch. Stateless atomic-compute block per `agents/rules/execution-model.md`.

Runtime contract

What the binary expects from its environment to start cleanly:

NATS reachable at ${NATS_URL} with the JetStream consumer + stream provisioned (see NATS layout). Hard requirement; without it the worker exits.
noetl-server reachable at ${NOETL_SERVER_URL} (HTTP). The worker's per-command server_url override (set by the publishing server on every NATS notification per noetl/ai-meta#53) takes precedence at runtime; the env var is the initial default for one-server deployments.
Worker pool slot: WORKER_POOL_NAME identifies which pool this worker belongs to. Server-side runtime registration is keyed by (kind, name); multiple workers in one pool share work.
Worker identity: WORKER_ID distinguishes this individual worker. When unset, the worker generates a uuid (fine for ephemeral pods; for StatefulSets prefer setting it explicitly from the pod ordinal so logs + metrics stay readable across restarts).
KEDA-managed scaling: the worker is designed to be scaled by KEDA on NATS consumer lag. See Scaling.

Network surface

Ports

Port	Protocol	Purpose	Bind
`9090`	HTTP	Prometheus scrape endpoint at `/metrics` + `/healthz` + `/readyz`	`${WORKER_METRICS_BIND}` (default `0.0.0.0:9090`)

The worker doesn't expose an API. All traffic flows out (NATS pull, HTTP to server) except the scrape port.

Dependencies (outbound)

Target	Protocol	Why
NATS JetStream	TCP 4222 (default)	Pull commands from `noetl.commands.*` subjects. Receive notifications with the publishing server's URL.
noetl-server	HTTP (default port 8082)	Fetch command details (`GET /api/commands/{event_id}`), claim atomically (`POST /api/commands/{event_id}/claim`), emit lifecycle events (`POST /api/events`), put result blobs (`POST /api/result_store/...`). Per-command `server_url` from the NATS notification overrides this default.
External APIs (Auth0 / Duffel / OpenAI / ...)	HTTPS	Whatever the executing playbook tool calls. Credentials resolved via NoETL keychain or `NOETL_KEYCHAIN_ENV_VARS` allowlist (see Keychain credentials).

The worker does NOT call Postgres directly. Per agents/rules/data-access-boundary.md, NoETL platform data is accessible via server API only.

Resources

Recommended starting point for production. Workers scale on NATS backlog (KEDA), so each replica should be sized for one concurrent playbook dispatch.

Resource	Request	Limit	Notes
CPU	100m	500m	Per-replica; CPU-bound only during tool execution (Python eval, JSON transforms). Idle workers consume <5m.
Memory	128Mi	256Mi	Dominated by the Arrow IPC cache budget (`NOETL_IPC_CACHE_BUDGET_BYTES`, default 256 MB → cap memory at 384Mi if you raise that).
Ephemeral storage	100Mi	500Mi	Tool execution scratch (Python eval temp dirs).

WORKER_MAX_CONCURRENT (default 1) governs how many commands a single worker pod processes in parallel. Stay at 1 unless the pool's tools are I/O-bound (HTTP fetches dominated by external latency); CPU-bound tools should scale out via more replicas, not more concurrency per pod.

Health probes

Probe	Path	Initial delay	Period	Failure threshold	Effect
Liveness	`/healthz`	10s	10s	3	Pod restart
Readiness	`/readyz`	5s	5s	3	Removed from Service endpoints (only relevant if the metrics port is fronted by a Service)

/readyz confirms NATS connection + heartbeat path is live.

NATS layout

Stream: noetl_commands (default; override via NATS_STREAM). Subjects: noetl.commands.>.
Consumer: pull-based, durable. Default name worker-pool (override via NATS_CONSUMER). Filter subject: noetl.commands (override via NATS_FILTER_SUBJECT).
Subjects the worker reads: per-pool filter shapes vary; the server publishes commands to noetl.commands.{system|shared}.<execution_id> per the Phase F sharding-design. A "shared" pool worker filters on noetl.commands.shared.*; a "system" pool worker filters on noetl.commands.system.*.

Scaling

The worker is the scale-out unit per agents/rules/execution-model.md. KEDA reads NATS consumer lag and scales the Deployment between minReplicas and maxReplicas.

Standard KEDA ScaledObject for the worker pool:

triggers:
  - type: nats-jetstream
    metadata:
      account: $G
      natsServerMonitoringEndpoint: "nats.noetl.svc:8222"
      stream: noetl_commands
      consumer: worker-pool
      lagThreshold: "5"

Workers can be added, removed, or restarted freely — state lives in the server's event log + cache; the worker is stateless.

Snowflake ID generation

The worker mints its own snowflake IDs via src/snowflake.rs for client-side event IDs (per agents/rules/observability.md Principle 3). The 10-bit node id comes from either:

NOETL_SNOWFLAKE_NODE_ID (preferred — set per pod by the deployment manifest).
NOETL_SHARD_ID (back-compat alias for the same idea).
Derived from NOETL_NODE_ID / NODE_NAME / pod hostname via FNV-1a hash to 10 bits.

For multi-replica deployments, set NOETL_SNOWFLAKE_NODE_ID explicitly to avoid hash collisions producing duplicate IDs.

The epoch comes from NOETL_SNOWFLAKE_EPOCH_MS if set, otherwise defaults to a build-time constant. Match the server's epoch (2024-01-01T00:00:00Z UTC) to keep IDs mutually orderable across producers.

Environment variables

All env vars the binary reads at startup or runtime, with the why behind each one.

Worker identity + pool

Variable	Default	Required	Why
`WORKER_ID`	(uuid v4 generated at startup)	recommended	Unique identifier for this worker pod. Embedded in event metadata + runtime registration so the server can track which worker handled which command. For StatefulSet pods, derive from the pod ordinal so logs/metrics stay stable across restarts.
`WORKER_POOL_NAME`	`default`	yes for non-default pools	Pool the worker registers under. Drives the NATS subject filter (system pool reads `noetl.commands.system.`, shared pool reads `noetl.commands.shared.`). Must match the pool's `runtime` row name on the server side.
`WORKER_HEARTBEAT_INTERVAL`	(see code; ~10s)	no	Seconds between heartbeat POSTs to `/api/worker/pool/heartbeat`. Lower = faster failover detection; higher = less HTTP load. Tune to match the server's `NOETL_RUNTIME_OFFLINE_SECONDS`.
`WORKER_MAX_CONCURRENT`	`1`	no	Number of commands this pod processes concurrently. See Resources.
`WORKER_METRICS_BIND`	`0.0.0.0:9090`	no	Bind address for the metrics + health HTTP server. Override only when you don't want to accept scrape traffic on all interfaces.
`WORKER_NATS_LAG_POLL_INTERVAL`	(see code)	no	Seconds between consumer-lag polls used by the worker's own lag-aware logic (independent of KEDA's external poll). Tune only when the default is shown to be too lazy or too aggressive.

Server endpoint

Variable	Default	Required	Why
`NOETL_SERVER_URL`	(built-in fallback, typically `http://noetl.noetl.svc:8082`)	yes	Initial default URL for the noetl-server HTTP API. Note: as of noetl/worker#41 the per-command `server_url` from the NATS notification overrides this at runtime, so a worker that picks up a command published by a different server replica (or by the Rust server vs Python server) will correctly route lifecycle events back to the publishing server.

NATS

Variable	Default	Required	Why
`NATS_URL`	`nats://localhost:4222`	yes	NATS server URL. In kind: `nats://nats.noetl.svc:4222`; in GKE: matches the NATS Helm release's Service.
`NATS_USER`	—	when NATS auth	NATS user.
`NATS_PASSWORD`	—	when NATS auth	NATS password. Should come from a K8s Secret.
`NATS_STREAM`	`noetl_commands`	no	JetStream stream name. Override only when running multiple NoETL deployments on one NATS cluster.
`NATS_CONSUMER`	`worker-pool`	no	Durable consumer name. All workers in one pool share the same consumer name.
`NATS_SUBJECT`	`noetl.commands`	no	Base subject prefix. Pool-specific filter (`noetl.commands.system.` vs `noetl.commands.shared.`) appends below this.
`NATS_FILTER_SUBJECT`	(derived from pool)	no	Explicit override for the consumer's filter subject. Use only when the pool routing scheme needs a custom filter.

Snowflake ID generation

Variable	Default	Required	Why
`NOETL_SNOWFLAKE_NODE_ID`	—	per-replica in prod	10-bit node id (0–1023) for the worker's snowflake generator. Each pod in a deployment MUST set a distinct value to avoid id collisions producing duplicate event IDs. Same shape as `NOETL_SERVER_MACHINE_ID` on the server side.
`NOETL_SHARD_ID`	—	(alias)	Back-compat alias for `NOETL_SNOWFLAKE_NODE_ID`; same semantics.
`NOETL_NODE_ID`	(HOSTNAME)	—	Fallback identifier hashed to derive the 10-bit node id when neither `NOETL_SNOWFLAKE_NODE_ID` nor `NOETL_SHARD_ID` is set. Read at startup.
`NODE_NAME`	(set by container runtime via downward API)	—	Same fallback chain as `NOETL_NODE_ID`; standard K8s downward-API value.
`NOETL_SNOWFLAKE_EPOCH_MS`	(build-time default: 2024-01-01T00Z)	no	Epoch the snowflake timestamps count from. Override only to match an alternate epoch elsewhere in the system — but the noetl-server uses 2024-01-01, so changing this on the worker breaks cross-component ordering.

Keychain credentials (`NOETL_KEYCHAIN_ENV_VARS`)

Variable	Default	Required	Why
`NOETL_KEYCHAIN_ENV_VARS`	—	when env-aliased credentials used	Comma-separated allowlist of env var names that hold credentials playbook steps can reference by alias. Example: `NOETL_KEYCHAIN_ENV_VARS=NOETL_FLIGHT_BEARER_TOKEN,OPENAI_API_KEY` permits a playbook to write `bearer_token: NOETL_FLIGHT_BEARER_TOKEN` and the worker resolves it from the env at dispatch time. The values themselves are loaded from the same env (the variable name is the alias and the var holds the secret). See `src/executor/command.rs::KEYCHAIN_ENV_ALLOWLIST_VAR`. When unset, env aliasing is disabled and all credentials must go through the NoETL keychain API.

The mechanism is a bridge for credentials that already exist as env vars in the worker's environment (because they come from GKE Workload Identity, an existing K8s Secret mount, or similar already-in-place trust per agents/rules/execution-model.md). Net-new business-logic credentials belong in the NoETL keychain, not in this allowlist.

Misc / standard tooling

Variable	Default	Required	Why
`HOSTNAME`	(set by container runtime)	—	Fallback for snowflake node-id derivation.
`RUST_LOG`	(build default)	no	Standard `tracing-subscriber` filter. Set to `noetl_worker=debug` for targeted debugging.
`NOETL_IPC_CACHE_BUDGET_BYTES`	`268435456` (256 MB)	no	Arrow IPC shared-memory cache budget for same-node zero-copy reads. Tune up only when working sets exceed 256 MB; cap pod memory limit accordingly.

Secrets handling

Same shape as the server:

Secret	Storage	Mount as
`NATS_PASSWORD`	K8s Secret	`valueFrom.secretKeyRef`
Any value listed in `NOETL_KEYCHAIN_ENV_VARS`	K8s Secret (when used)	`valueFrom.secretKeyRef` per allowlisted name

Per agents/rules/execution-model.md "Secrets and credentials rule": business-logic credentials (third-party API tokens, tenant database DSNs) belong in the NoETL keychain via the server's /api/keychain/* routes. The allowlist mechanism above is a bridge only for credentials that already live in the pod env via existing trust (GKE Workload Identity, etc.).

Observability

Metrics: Prometheus surface at :9090/metrics per agents/rules/observability.md. Surface includes NATS pull rate, command claim outcomes, tool dispatch durations per tool kind, result-store PUT latency.
Tracing: tracing spans on every NATS message, every tool dispatch. execution_id is a span field.
Logs: structured JSON via tracing-subscriber.

Validation procedure

When changing any of the above, validate per agents/rules/deployment-validation.md:

cargo build --release --bins
cargo test --quiet
Build image locally + load into kind: kind load docker-image …
Apply the manifests against kubectl --context kind-noetl
Smoke-test:
- Submit a playbook execution via the noetl CLI.
- Confirm the worker claims the command, runs the tool, and POSTs lifecycle events back.
- For env-var changes: verify kubectl exec … env | grep <VAR> shows the value AND the worker's startup log line confirms it was honored.

Only after kind passes does the change roll forward to Cloud Build + GKE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment specification

Deployment Specification

Component summary

Runtime contract

Network surface

Ports

Dependencies (outbound)

Resources

Health probes

NATS layout

Scaling

Snowflake ID generation

Environment variables

Worker identity + pool

Server endpoint

NATS

Snowflake ID generation

Keychain credentials (`NOETL_KEYCHAIN_ENV_VARS`)

Misc / standard tooling

Secrets handling

Observability

Validation procedure

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noetl-worker

Architecture

Operations

Related repos

External

Clone this wiki locally

deployment specification

Deployment Specification

Component summary

Runtime contract

Network surface

Ports

Dependencies (outbound)

Resources

Health probes

NATS layout

Scaling

Snowflake ID generation

Environment variables

Worker identity + pool

Server endpoint

NATS

Snowflake ID generation

Keychain credentials (NOETL_KEYCHAIN_ENV_VARS)

Misc / standard tooling

Secrets handling

Observability

Validation procedure

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noetl-worker

Architecture

Operations

Related repos

External

Clone this wiki locally

Keychain credentials (`NOETL_KEYCHAIN_ENV_VARS`)