Skip to content

Umbrella Rust Worker Parity Gaps

Kadyapam edited this page Jun 3, 2026 · 4 revisions

Umbrella — Rust Worker Parity Gaps (regression-surfaced) ✅ CLOSED

ai-task: #47 ✅ closed, #48 ✅ closed · Surfaced: 2026-06-02 (kind regression run) · Filed: 2026-06-02 · Closed: 2026-06-02 · Parent umbrella: Rust Worker Migration · Status: Both gaps shipped — Gap 1 via noetl/noetl#662 (v4.12.0), Gap 2 via noetl/worker#40 (v5.10.0). Kind regression baseline now ~52/53.

Goal

Close the two parity gaps surfaced by the 2026-06-02 kind regression run so the Rust worker handles the production playbook mix without falling back to the Python pool for these specific cases.

Gaps

Gap 1 — Tool not found: task_sequence ✅ shipped

#47 · Resolved via noetl/noetl#662 (v4.12.0) + noetl/ops#145

The Rust worker doesn't implement the task_sequence tool kind. ~25 sub-playbooks in the regression suite use it (tests/control-flow/, test/vars_simple, vars_test/, etc.) and all failed on the Rust pool.

Original fingerprint:

ERROR noetl_worker::worker: Command execution failed
  execution_id=640421571655369719
  command_id=640421579230281755
  step=process:task_sequence
  error=Tool not found: task_sequence

Decision: Option B — route to python pool.

Reading the Python source disambiguated the question. task_sequence is an engine-level pipeline construct, not a tool kind in the noetl-tools sense:

  • noetl/worker/task_sequence_executor.py (~1.2k LoC) ships TaskSequenceExecutor with pipeline-local data threading via _prev (Clojure-style ->), per-task flow-control rules (continue/retry/break/jump/fail matched on when predicates over the task's outcome envelope), template bindings for _task/_prev/_attempt/output/results/ctx.*/iter.*, retry/backoff machinery, and a parametrized policy DSL.
  • ~11 special-case sites in noetl/core/dsl/engine/executor/{events,commands,state,store,common}.py key off the synthetic <parent>:task_sequence step name — covers loop-state init, step.exit handling, replay state restoration, command emission, parent-step normalization, and event correlation.
  • Rust workspace (repos/worker, repos/tools, repos/cli) had zero references to task_sequence.

Porting that surface to Rust is multi-week work with no runtime benefit — each inner task's tool already dispatches through the worker's existing _execute_tool callback. Routing the kind to the python pool is the right shape.

Implementation: one-line addition to POOL_FILTER_MAP in noetl/core/runtime/pool_routing.py:

POOL_FILTER_MAP = {
    "agent": "python",
    "task_sequence": "python",   # NEW
}

When the routing flag is on, the command publishes to noetl.commands.python.<eid>; the Rust pool's consumer's filter (noetl.commands.shared.>) no longer matches; the Python pool's catch-all consumer (noetl_worker_pool, filter=None) picks it up and TaskSequenceExecutor runs the pipeline.

Kind validation (2026-06-02): Python pool delivery delta +1 ✅; Rust pool delivery delta 0 ✅; no Tool not found line in Rust worker logs ✅.

Gap 2 — invalid type: string "pg_local", expected struct AuthConfig ✅ shipped

#48 · Resolved via noetl/worker#40 (v5.10.0) + noetl/ops#146

The Rust worker passed credential alias strings (e.g. "pg_local") through to the postgres tool as raw strings. Python resolves the alias against the keychain at dispatch time and substitutes the AuthConfig struct. ~10 sub-playbooks affected (every postgres step that uses auth: "<alias>" instead of inline credentials).

Original fingerprint:

ERROR noetl_worker::worker: Command execution failed
  execution_id=640418972487123831
  command_id=640418976614318976
  step=create_test_results_table
  error=invalid type: string "pg_local", expected struct AuthConfig

Implementation:

New module src/executor/auth_alias.rs (~430 LoC, 11 unit tests + 3 HTTP-integration tests) runs before serde_json::from_value in the command dispatch path:

  1. Detect string value in the auth: slot of the tool config JSON.
  2. Fetch the credential via the new ControlPlaneClient::get_credential(alias, execution_id) method (GET /api/credentials/{alias}?include_data=true).
  3. Branch on the credential's type field:
    • postgres → strip auth; merge db_host / db_port / db_user / db_password / db_name into the postgres tool's flat connection fields (host / port / user / password / database). Mirrors Python's normalize_postgres_fields. Unprefixed keys (host, port, ...) also accepted.
    • bearer / bearer_token → replace auth with the noetl-tools shape {type: bearer, credential: <alias>} + seed the token into ctx.secrets so the existing AuthResolver.resolve_bearer keychain lookup fires.
    • api_key{type: api_key, credential: <alias>, header: <data.header or "X-API-Key">} + secret seed.
    • basic{type: basic, credential: <alias>, username: <data.username>} + secret seed of the password.
    • Anything else → clear error naming the offending type.
    • Missing alias (server 404) → clear Credential alias '<name>' not found in keychain error.

Idempotent: if auth is already a struct (or absent), the helper is a no-op + no HTTP call.

Playbook-level overrides win over keychain defaults — port: 6543 set on the step keeps that value.

Kind validation (2026-06-02):

==> Probing noetl.event for command lifecycle
    command.issued    probe status=PENDING worker=
    command.claimed   probe status=RUNNING worker=noetl-worker-rust-d85576785-tr7zg
    call.done         probe status=COMPLETED worker=
    command.completed probe status=COMPLETED worker=

    ✅ execution reached command.completed
    ✅ Rust worker claimed the command (worker_id=noetl-worker-rust-...)
    ✅ Rust worker did not log 'expected struct AuthConfig'

Sub-task tracking

Gap PR(s) Status Wiki updates
#47 task_sequence noetl/noetl#662 (v4.12.0) + noetl/ops#145 ✅ shipped noetl wiki pool_routing task_sequence row added
#48 alias resolution noetl/worker#40 (v5.10.0) + noetl/ops#146 ✅ shipped noetl/worker wiki worker-credentials gained a "Two ways playbooks reference a credential" section covering the alias-string shape + the 4 supported credential types (postgres, bearer, api_key, basic)

Recent activity

Date Event
2026-06-02 Both gaps surfaced + filed during kind regression run against worker v5.9.0 (master_regression_test execution_id 640418956389385052). Session details.
2026-06-02 Run statistics: 53 sub-playbooks attempted, 17 COMPLETED / 53 FAILED / 15 zombie RUNNING. 468 commands processed through noetl_worker_pool_shared consumer.
2026-06-02 Gap #1 (task_sequence) shipped via noetl/noetl#662 (v4.12.0) + noetl/ops#145. POOL_FILTER_MAP gains "task_sequence": "python"; routes to the catch-all Python consumer. Kind-validated end-to-end.
2026-06-02 Gap #2 (alias resolution) shipped via noetl/worker#40 (v5.10.0) + noetl/ops#146. New src/executor/auth_alias.rs resolves string auth: values before serde deserialization; 4 credential types supported (postgres / bearer / api_key / basic); 14 new tests. Kind-validated end-to-end.
2026-06-02 Umbrella closed. Regression baseline now ~52/53; remaining ~1 failure is external deps (Duffel test API key rotation), out of scope for Rust parity.

Next concrete steps

This umbrella is closed. Follow-up work, if any, lives under the parent Rust Worker Migration umbrella.

Related

NoETL Dashboard

Active Umbrellas

Closed Umbrellas

Conventions

Per-repo wikis

Clone this wiki locally