-
Notifications
You must be signed in to change notification settings - Fork 0
worker credentials
The worker bridges platform-mounted Kubernetes Secret values into the in-process keychain map that tools' ctx.get_secret(alias) calls read. Shipped in noetl-worker 5.7.0 (noetl/worker#35); closes noetl/ai-meta#34.
Per agents/rules/execution-model.md, business-logic credentials (bearer tokens, API keys, DSNs, signing keys) live in the NoETL keychain and are referenced by alias from playbook config. This page describes how the worker honours that convention end-to-end for the env-mounted case (the typical k8s deployment shape).
# Worker pod spec
env:
- name: NOETL_KEYCHAIN_ENV_VARS
value: NOETL_FLIGHT_BEARER_TOKEN,DUFFEL_API_KEY,SLACK_BOT_TOKEN
envFrom:
- secretRef:
name: noetl-worker-credentials # provides the three vars above# Playbook step references each by env-var name as the keychain alias
- step: fetch_secure
tool:
kind: result_fetch
bearer_token: NOETL_FLIGHT_BEARER_TOKENAt worker startup, CommandExecutor::new reads the comma-separated allow-list + lifts each named env var into the per-executor keychain map. Each command's ExecutionContext.secrets gets seeded from that map before tool dispatch. Tools call ctx.get_secret("NOETL_FLIGHT_BEARER_TOKEN") and get the env-mounted value back.
Two design alternatives:
| Approach | Shape | Trade-off |
|---|---|---|
Prefix-strip (NOETL_SECRET_* → secrets[*]) |
Operator names env vars with a fixed prefix; worker scans + strips. | Forces a renaming convention; playbook references the stripped name (mismatch with the Secret key name). |
Allow-list (NOETL_KEYCHAIN_ENV_VARS=A,B,C → secrets[A]+secrets[B]+secrets[C]) |
Operator explicitly lists which env vars are credentials. | Two env vars to manage (allow-list + the value vars), but env-var name matches the alias verbatim. |
The worker uses the allow-list approach because:
- The operator owns naming. A k8s Secret with key
NOETL_FLIGHT_BEARER_TOKENbecomes an env var with the same name, which is the exact alias the playbook references. No prefix-strip transformation, no mismatch to debug. - Operator-controlled scope. Only the listed vars become secrets; everything else (
PATH,HOSTNAME,NATS_URL, runtime config) stays out of the secrets map. No accidental dumping. - Empty / unset allow-list ⇒ pre-#35 deployments keep working unchanged.
| Var | Type | Meaning |
|---|---|---|
NOETL_KEYCHAIN_ENV_VARS |
Comma-separated list of env-var names | Allow-list. Empty / unset ⇒ no env vars get lifted (keychain map is empty). |
| Each listed name | String | The credential value. Typically populated via envFrom: secretRef so the literal stays in k8s, never in deployment manifests. |
The worker honours both shapes, in this order of preference:
-
Inline struct — playbook author writes the noetl-tools
AuthConfigdirectly:- step: fetch_secure tool: kind: http url: https://api.example.com auth: type: bearer credential: NOETL_FLIGHT_BEARER_TOKEN
The
AuthResolverinnoetl-toolslooks upcredentialinctx.secrets(seeded from the env allow-list above) and emits theAuthorization: Bearer <token>header. -
Bare alias string — playbook author writes a string in
auth:and the worker resolves it via the keychain endpoint at dispatch time:- step: create_test_table tool: kind: postgres auth: "{{ pg_auth }}" # renders to "pg_local" command: | CREATE TABLE ...
The worker (
src/executor/auth_alias.rs, noetl-worker 5.10+ / noetl/ai-meta#48) detects the string before serde deserialization, fetches the credential viaGET /api/credentials/{alias}?include_data=true&execution_id=<eid>, then either:-
type: postgres→ strips theauthslot, mergesdb_host/db_port/db_user/db_password/db_nameinto the postgres tool's flat connection-config slots (host/port/user/password/database). Mirrors the Python worker'snormalize_postgres_fieldspath so the same alias produces the same connection string on either pool. -
type: bearer/bearer_token→ replacesauthwith the noetl-toolsAuthConfigshape{type: bearer, credential: <alias>}and seeds the bearer value intoctx.secretsunder the same alias soAuthResolver.resolve_bearerfinds it. -
type: api_key→{type: api_key, credential: <alias>, header: <data.header or "X-API-Key">}+ secret seed. -
type: basic→{type: basic, credential: <alias>, username: <data.username>}+ secret seed of the password. - Anything else → clear error naming the offending type.
- Missing alias (server 404) →
Credential alias '<name>' not found in keychainrather than the crypticexpected struct AuthConfigserde mismatch the worker used to surface. This — and every other terminal resolution failure — now emits a terminalcall.errorso the execution fails cleanly; see Pre-dispatch failure handling below.
-
The alias-string shape is what Python playbooks have used since
before the noetl-tools AuthConfig struct existed. The Rust
worker accepts both so the regression-suite playbooks (and any
hand-written legacy playbook) run on the Rust pool unchanged.
Playbook overrides win over keychain defaults — port: 6543 set
on the step keeps that value even if the credential's db_port is
5432.
Unprefixed keys (host, port, user, password, database)
are also accepted on the credential's data dict, for back-compat
with hand-edited keychain rows that pre-date the db_* convention.
Parsing tolerates:
-
Whitespace —
" A , B"parses to{A, B}. -
Empty entries —
"A,,B,"parses to{A, B}. - Allow-listed but unset env vars — silently skipped. An operator can stage rollouts (define the allow-list ahead of mounting the Secret, or vice versa) without startup spam.
- Empty string values — rejected. Distinguishing "unset" from "set to empty string" is a common deployment-time surprise; both shapes are skipped. Otherwise a Secret with a blank field would silently authenticate as an empty token, which is worse than failing closed.
Credential-alias resolution runs before the tool dispatch
(src/executor/command.rs → super::auth_alias::resolve_auth_alias).
A failure here used to early-return without emitting any lifecycle
event: the worker logged Command execution failed and dropped the
error, so the execution sat at command.started forever and
noetl status showed it RUNNING indefinitely. Shipped in
noetl-worker 5.15.1; closes
noetl/ai-meta#78.
The worker now classifies every pre-dispatch failure and emits a
terminal call.error + command.failed for the terminal ones, so the
execution reaches a FAILED state and clears instead of hanging. The
classification is a typed error (CredentialResolutionError), not a
string match on the message:
| Failure | Classification | What the worker does |
|---|---|---|
| Keychain returns a clean 404 (alias not bound) | terminal (AliasNotFound) |
emit call.error → FAILED |
Keychain returns a deterministic error status — 400 / 401 / 403 / 500 (incl. 500 "Decryption failed: aead::Error") |
terminal (Invalid) |
emit call.error → FAILED |
Resolved credential has an unsupported type or malformed shape (e.g. non-numeric db_port) |
terminal (Invalid) |
emit call.error → FAILED |
| Malformed tool config (serde deserialization fails) | terminal | emit call.error → FAILED |
| Keychain HTTP call is a transient transport error (connection refused / timeout) or a retryable status (408 / 429 / 502 / 503 / 504) | retryable (Transient) |
no terminal event — leave the command path's retry/redelivery to run |
The terminal-vs-retryable split is deliberate: a 404 / 400 / decryption
500 / malformed config will produce the same failure on every retry, so
failing the execution cleanly with a clear call.error is the correct
outcome. A 503 / connection-refused / timeout is a transient infra
condition where a later attempt may reach a healthy keychain, so the
worker keeps the command retryable.
A retryable failure is escalated to terminal once the command's attempt
counter reaches MAX_PREDISPATCH_ATTEMPTS (3) — an unreachable keychain
can't hang the execution forever.
The HTTP status is surfaced from the credential-fetch path as a typed
CredentialHttpError (carrying the numeric status) so the classifier in
auth_alias::classify_fetch_error decides retryability by status code
rather than by parsing the formatted error string.
Live repro (noetl/ai-meta#78): the
test/postgresfixture'sstartstep referencesauth: "pg_noetl_k8s". That alias's credential record exists but its stored ciphertext can't be decrypted server-side, soGET /api/credentials/pg_noetl_k8sreturns500 {"error":"Decryption failed: aead::Error"}. Before the fix the execution hung atcommand.started; after it, the step emitscall.errorand the execution fails cleanly.
At startup, when the allow-list is non-empty, the worker logs the key names (not values):
INFO Loaded keychain credentials from NOETL_KEYCHAIN_ENV_VARS
count=2 aliases=["NOETL_FLIGHT_BEARER_TOKEN", "DUFFEL_API_KEY"]
Per agents/rules/observability.md Principle 3 — never log credential values; log enough that an operator can verify the allow-list took effect via kubectl logs.
The keychain map seeded at worker startup acts as a default. Per-command secrets from the playbook step (auth: block, future per-step credential injection) layer on top — env-mounted entries can be overridden. This matches the postgres-tool credential pattern that's been in noetl-tools since R-1.x.
For GKE / production, the typical shape is:
- k8s Secret (created via secret manager, cert-manager, or static manifest) carries the credential values.
-
Deployment mounts the Secret via
envFrom. -
Deployment sets
NOETL_KEYCHAIN_ENV_VARSenv var pointing at the names the Secret carries. - Playbook references each credential by its env-var name as the alias.
The Phase C2 kind validation rig (automation/development/generate-flight-tls.sh + validate-flight-tls.sh in noetl/ops) is the worked example — it generates a fresh CA + server cert + client cert + bearer token, creates the Secrets, patches the deployments to add NOETL_KEYCHAIN_ENV_VARS=NOETL_FLIGHT_BEARER_TOKEN, and the result_fetch.bearer_token: NOETL_FLIGHT_BEARER_TOKEN playbook field resolves to the actual token at runtime.
-
src/executor/command.rs—KEYCHAIN_ENV_ALLOWLIST_VARconst,load_keychain_env_allowlist()helper,CommandExecutor::keychain_envfield, per-commandctx.set_secret(...)seed loop;MAX_PREDISPATCH_ATTEMPTSconst +CommandExecutor::handle_predispatch_failure(the terminal-vs-retryable emission path). -
src/executor/auth_alias.rs—CredentialResolutionError(the typed terminal/retryable classification),classify_fetch_error+is_retryable_status. -
src/client/control_plane.rs—CredentialHttpError(carries the HTTP status so the classifier decides retryability by code, not by string-matching).
-
noetl-executor adoption — how the worker dispatches to
noetl-tools::ToolRegistry;ctx.get_secretis the consumer side of the credential map. -
agents/rules/execution-model.md— secrets + credentials rule (the boundary discipline this page implements).