worker credentials

Worker Credentials — keychain env-var allow-list

The worker bridges platform-mounted Kubernetes Secret values into the in-process keychain map that tools' ctx.get_secret(alias) calls read. Shipped in noetl-worker 5.7.0 (noetl/worker#35); closes noetl/ai-meta#34.

Per agents/rules/execution-model.md, business-logic credentials (bearer tokens, API keys, DSNs, signing keys) live in the NoETL keychain and are referenced by alias from playbook config. This page describes how the worker honours that convention end-to-end for the env-mounted case (the typical k8s deployment shape).

TL;DR

# Worker pod spec
env:
  - name: NOETL_KEYCHAIN_ENV_VARS
    value: NOETL_FLIGHT_BEARER_TOKEN,DUFFEL_API_KEY,SLACK_BOT_TOKEN
envFrom:
  - secretRef:
      name: noetl-worker-credentials   # provides the three vars above

# Playbook step references each by env-var name as the keychain alias
- step: fetch_secure
  tool:
    kind: result_fetch
    bearer_token: NOETL_FLIGHT_BEARER_TOKEN

At worker startup, CommandExecutor::new reads the comma-separated allow-list + lifts each named env var into the per-executor keychain map. Each command's ExecutionContext.secrets gets seeded from that map before tool dispatch. Tools call ctx.get_secret("NOETL_FLIGHT_BEARER_TOKEN") and get the env-mounted value back.

Why an allow-list instead of a prefix

Two design alternatives:

Approach	Shape	Trade-off
Prefix-strip (`NOETL_SECRET_` → `secrets[]`)	Operator names env vars with a fixed prefix; worker scans + strips.	Forces a renaming convention; playbook references the stripped name (mismatch with the Secret key name).
Allow-list (`NOETL_KEYCHAIN_ENV_VARS=A,B,C` → `secrets[A]+secrets[B]+secrets[C]`)	Operator explicitly lists which env vars are credentials.	Two env vars to manage (allow-list + the value vars), but env-var name matches the alias verbatim.

The worker uses the allow-list approach because:

The operator owns naming. A k8s Secret with key NOETL_FLIGHT_BEARER_TOKEN becomes an env var with the same name, which is the exact alias the playbook references. No prefix-strip transformation, no mismatch to debug.
Operator-controlled scope. Only the listed vars become secrets; everything else (PATH, HOSTNAME, NATS_URL, runtime config) stays out of the secrets map. No accidental dumping.
Empty / unset allow-list ⇒ pre-#35 deployments keep working unchanged.

Env-var contract

Var	Type	Meaning
`NOETL_KEYCHAIN_ENV_VARS`	Comma-separated list of env-var names	Allow-list. Empty / unset ⇒ no env vars get lifted (keychain map is empty).
Each listed name	String	The credential value. Typically populated via `envFrom: secretRef` so the literal stays in k8s, never in deployment manifests.

Two ways playbooks reference a credential

The worker honours both shapes, in this order of preference:

Inline struct — playbook author writes the noetl-tools AuthConfig directly:
```
- step: fetch_secure
  tool:
    kind: http
    url: https://api.example.com
    auth:
      type: bearer
      credential: NOETL_FLIGHT_BEARER_TOKEN
```
The AuthResolver in noetl-tools looks up credential in ctx.secrets (seeded from the env allow-list above) and emits the Authorization: Bearer <token> header.
Bare alias string — playbook author writes a string in auth: and the worker resolves it via the keychain endpoint at dispatch time:
```
- step: create_test_table
  tool:
    kind: postgres
    auth: "{{ pg_auth }}"     # renders to "pg_local"
    command: |
      CREATE TABLE ...
```
The worker (src/executor/auth_alias.rs, noetl-worker 5.10+ / noetl/ai-meta#48) detects the string before serde deserialization, fetches the credential via GET /api/credentials/{alias}?include_data=true&execution_id=<eid>, then either:
- type: postgres → strips the auth slot, merges db_host / db_port / db_user / db_password / db_name into the postgres tool's flat connection-config slots (host / port / user / password / database). Mirrors the Python worker's normalize_postgres_fields path so the same alias produces the same connection string on either pool.
- type: bearer / bearer_token → replaces auth with the noetl-tools AuthConfig shape {type: bearer, credential: <alias>} and seeds the bearer value into ctx.secrets under the same alias so AuthResolver.resolve_bearer finds it.
- type: api_key → {type: api_key, credential: <alias>, header: <data.header or "X-API-Key">} + secret seed.
- type: basic → {type: basic, credential: <alias>, username: <data.username>} + secret seed of the password.
- Anything else → clear error naming the offending type.
- Missing alias (server 404) → Credential alias '<name>' not found in keychain rather than the cryptic expected struct AuthConfig serde mismatch the worker used to surface. This — and every other terminal resolution failure — now emits a terminal call.error so the execution fails cleanly; see Pre-dispatch failure handling below.

The alias-string shape is what Python playbooks have used since before the noetl-tools AuthConfig struct existed. The Rust worker accepts both so the regression-suite playbooks (and any hand-written legacy playbook) run on the Rust pool unchanged.

Playbook overrides win over keychain defaults — port: 6543 set on the step keeps that value even if the credential's db_port is 5432.

Unprefixed keys (host, port, user, password, database) are also accepted on the credential's data dict, for back-compat with hand-edited keychain rows that pre-date the db_* convention.

Parsing tolerates:

Whitespace — " A , B" parses to {A, B}.
Empty entries — "A,,B," parses to {A, B}.
Allow-listed but unset env vars — silently skipped. An operator can stage rollouts (define the allow-list ahead of mounting the Secret, or vice versa) without startup spam.
Empty string values — rejected. Distinguishing "unset" from "set to empty string" is a common deployment-time surprise; both shapes are skipped. Otherwise a Secret with a blank field would silently authenticate as an empty token, which is worse than failing closed.

Pre-dispatch failure handling — terminal vs retryable

Credential-alias resolution runs before the tool dispatch (src/executor/command.rs → super::auth_alias::resolve_auth_alias). A failure here used to early-return without emitting any lifecycle event: the worker logged Command execution failed and dropped the error, so the execution sat at command.started forever and noetl status showed it RUNNING indefinitely. Shipped in noetl-worker 5.15.1; closes noetl/ai-meta#78.

The worker now classifies every pre-dispatch failure and emits a terminal call.error + command.failed for the terminal ones, so the execution reaches a FAILED state and clears instead of hanging. The classification is a typed error (CredentialResolutionError), not a string match on the message:

Failure	Classification	What the worker does
Keychain returns a clean 404 (alias not bound)	terminal (`AliasNotFound`)	emit `call.error` → FAILED
Keychain returns a deterministic error status — 400 / 401 / 403 / 500 (incl. `500 "Decryption failed: aead::Error"`)	terminal (`Invalid`)	emit `call.error` → FAILED
Resolved credential has an unsupported type or malformed shape (e.g. non-numeric `db_port`)	terminal (`Invalid`)	emit `call.error` → FAILED
Malformed tool config (serde deserialization fails)	terminal	emit `call.error` → FAILED
Keychain HTTP call is a transient transport error (connection refused / timeout) or a retryable status (408 / 429 / 502 / 503 / 504)	retryable (`Transient`)	no terminal event — leave the command path's retry/redelivery to run

The terminal-vs-retryable split is deliberate: a 404 / 400 / decryption 500 / malformed config will produce the same failure on every retry, so failing the execution cleanly with a clear call.error is the correct outcome. A 503 / connection-refused / timeout is a transient infra condition where a later attempt may reach a healthy keychain, so the worker keeps the command retryable.

A retryable failure is escalated to terminal once the command's attempt counter reaches MAX_PREDISPATCH_ATTEMPTS (3) — an unreachable keychain can't hang the execution forever.

The HTTP status is surfaced from the credential-fetch path as a typed CredentialHttpError (carrying the numeric status) so the classifier in auth_alias::classify_fetch_error decides retryability by status code rather than by parsing the formatted error string.

Live repro (noetl/ai-meta#78): the test/postgres fixture's start step references auth: "pg_noetl_k8s". That alias's credential record exists but its stored ciphertext can't be decrypted server-side, so GET /api/credentials/pg_noetl_k8s returns 500 {"error":"Decryption failed: aead::Error"}. Before the fix the execution hung at command.started; after it, the step emits call.error and the execution fails cleanly.

Observability

At startup, when the allow-list is non-empty, the worker logs the key names (not values):

INFO Loaded keychain credentials from NOETL_KEYCHAIN_ENV_VARS
     count=2 aliases=["NOETL_FLIGHT_BEARER_TOKEN", "DUFFEL_API_KEY"]

Per agents/rules/observability.md Principle 3 — never log credential values; log enough that an operator can verify the allow-list took effect via kubectl logs.

Layering with playbook-step credentials

The keychain map seeded at worker startup acts as a default. Per-command secrets from the playbook step (auth: block, future per-step credential injection) layer on top — env-mounted entries can be overridden. This matches the postgres-tool credential pattern that's been in noetl-tools since R-1.x.

Production deployment shape

For GKE / production, the typical shape is:

k8s Secret (created via secret manager, cert-manager, or static manifest) carries the credential values.
Deployment mounts the Secret via envFrom.
Deployment sets NOETL_KEYCHAIN_ENV_VARS env var pointing at the names the Secret carries.
Playbook references each credential by its env-var name as the alias.

The Phase C2 kind validation rig (automation/development/generate-flight-tls.sh + validate-flight-tls.sh in noetl/ops) is the worked example — it generates a fresh CA + server cert + client cert + bearer token, creates the Secrets, patches the deployments to add NOETL_KEYCHAIN_ENV_VARS=NOETL_FLIGHT_BEARER_TOKEN, and the result_fetch.bearer_token: NOETL_FLIGHT_BEARER_TOKEN playbook field resolves to the actual token at runtime.

Source

src/executor/command.rs — KEYCHAIN_ENV_ALLOWLIST_VAR const, load_keychain_env_allowlist() helper, CommandExecutor::keychain_env field, per-command ctx.set_secret(...) seed loop; MAX_PREDISPATCH_ATTEMPTS const + CommandExecutor::handle_predispatch_failure (the terminal-vs-retryable emission path).
src/executor/auth_alias.rs — CredentialResolutionError (the typed terminal/retryable classification), classify_fetch_error + is_retryable_status.
src/client/control_plane.rs — CredentialHttpError (carries the HTTP status so the classifier decides retryability by code, not by string-matching).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker credentials

Worker Credentials — keychain env-var allow-list

TL;DR

Why an allow-list instead of a prefix

Env-var contract

Two ways playbooks reference a credential

Pre-dispatch failure handling — terminal vs retryable

Observability

Layering with playbook-step credentials

Production deployment shape

Source

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noetl-worker

Architecture

Operations

Related repos

External

Clone this wiki locally