Skip to content

python to rust migration

Kadyapam edited this page Jun 26, 2026 · 3 revisions

Python → Rust playbook migration gotchas

The travel SPA's playbooks were first authored against the retired Python NoETL runtime. The production runtime is now the Rust noetl-server-rust + noetl-worker-rust stack, which parses every playbook against a strict typed schema (orchestrate-core/src/playbook.rs) and validates structure before it will execute. Several Python-era shapes the Python engine accepted leniently are rejected by Rust.

This page is the running list of those drifts and the corrected shapes. It exists because each one is a silent trap: the playbook registers fine (registration is permissive), then fails at execute time. Validate every migrated playbook by executing it, not just registering it.

1. Tool kind must be a real Rust ToolKind

The tool.kind value is a closed enum. Accepted kinds: http, postgres, duckdb, ducklake, python, workbook, playbook, playbooks, secrets, iterator, container, script, snowflake, transfer, snowflake_transfer, gcs, gateway, nats, shell, artifact, noop, task_sequence, rhai, subscription, wasm. There is no catch-all — an unknown kind makes the step's tool: block match neither variant of the untagged ToolDefinition enum, producing:

400 Bad Request - {"error":"workflow[N]: data did not match any variant
of untagged enum ToolDefinition at line L column C","status":400}

The line/column point at the offending step's mapping.

kind: agentkind: playbook (sub-playbook / MCP calls)

The Python agent tool shape is not a Rust kind. Convert each MCP / sub-playbook call:

# Python (rejected by Rust)
tool:
  kind: agent
  framework: noetl
  entrypoint: automation/agents/mcp/firestore
  payload: { method: tools/call, tool: get_doc, arguments: {...} }

# Rust (accepted)
tool:
  kind: playbook
  path: automation/agents/mcp/firestore   # entrypoint → path
  payload: { method: tools/call, tool: get_doc, arguments: {...} }
  • entrypoint:path: — the value is unchanged; it already equals the child playbook's metadata.path.
  • drop framework: noetl (Python-only).
  • keep payload: — the Rust playbook tool forwards it to the child as workload input (same {method, tool, arguments, …} contract the MCP playbooks read).

Caveat — the playbook tool does not return the child's result. See §5.

2. Every workflow needs a step named start

The Rust runtime's validate_playbook rejects any workflow that has no step literally named start:

422 Unprocessable Entity - {"error":"Workflow must have a step named 'start'","status":422}

The Python engine treated the first declared step as the entry point; Rust requires an explicit start. Add a noop entry step that routes to the real first step:

workflow:
  - step: start
    tool: { kind: noop }
    next:
      spec: { mode: exclusive }
      arcs:
        - step: normalize_input   # the original first step

This validation runs after parse, so a ToolDefinition error (§1) masks it — fix the kinds first, then this surfaces.

3. HTTP / callback request bodies: json: not data:

The Rust http tool reads a JSON request body from json:. The Python-era data: key is silently ignored, so the request goes out with an empty body. This bit the gateway session-validate and authorization playbooks (noetl/ai-meta#133 / #134):

# Python
tool: { kind: http, method: POST, url: "...", data: { foo: "bar" } }
# Rust
tool: { kind: http, method: POST, url: "...", json: { foo: "bar" } }

Same rule for callback payloads posted back into the runtime.

4. Pass inputs through input: — don't reach for ambient context

Python steps could pull values out of an implicit context object (context.get("x")). The Rust runtime renders a step's templates against an explicitly-declared input: block; values a step needs must be bound there (then referenced by name in code:). The same fix replaced context.get(...) with input: + a pick in the auth playbooks. Also watch accessor depth on prior-step results — e.g. .command_0.rows collapsed to .rows during that migration.

5. The playbook tool returns status, not the child's result

The Rust playbook tool (noetl-tools/src/tools/playbook.rs) has two modes, and neither returns the child playbook's result data:

  • async (no return_step): returns {status: "started", execution_id, path, async: true}.
  • blocking (return_step set): polls GET /api/executions/{id}/status and returns the execution status payload (status, current_step, progress, is_cancelled) — still not the child's output.

So a downstream reference like {{ call_google_places }} resolves to a status envelope, not the Google Places results. For a playbook that consumes its MCP children's outputs (itinerary-planner feeds call_google_places / call_duffel_offers / call_amadeus_hotels into normalize_tool_response, and load_slot_state.data into extract_turn), this is a functional blocker, not just a cosmetic one.

This needs a runtime capability that does not exist yet (a synchronous, result-returning sub-playbook invocation). Tracked as gated issue noetl/ai-meta#136. Do not work around it by inlining MCP logic or round-tripping results through Firestore — that changes the architecture.

6. Step-result accessor paths: drop .context entirely

A python step returns a result dict. The Rust worker wraps it in a nested envelope; downstream steps must reference the bare step name + field, with no .context segment.

Verified from a real prod event (execution 328414768463355904, extract_turn call.done result): the stored envelope is

{ "status": ...,
  "context": { "result": { "context": { "data": {
      "first_tool": "mcp/google-places.search_text", ... } } } } }

so in the raw row the value sits at context → result → context → data → first_tool. But you never write that path in a template. The orchestrator's extract_user_data (state.rs) unwraps context.result.context.data and build_context exposes the inner dict two ways:

Template Resolves?
{{ extract_turn.first_tool }} ✅ the value
{{ extract_turn.data.first_tool }} ✅ synthetic .data mirror
{{ extract_turn.context.first_tool }} null
{{ extract_turn.context.data.first_tool }} null

itinerary-planner was authored with the .context. form on ~23 references (the routing when: guards on append_turn_events, the MCP arguments:, every normalize_tool_response / render_widget_chat input). All resolved null, so the five tool-dispatch arcs read an empty first_tool and every one skipped — the execution wedged at append_turn_events_atomically in RUNNING forever.

Fix: delete the .context segment from every step-result reference. {{ extract_turn.context.first_tool }}{{ extract_turn.first_tool }}. The steps that already worked (normalize_input.thread_path, load_slot_state.data) used the bare/.data form — match them. Tracked: noetl/ai-meta#135.

7. Exclusive arcs skip non-selected targets terminally — don't point a shared convergence node at one

next.spec.mode: exclusive selects the first arc whose when: is true and emits step.skipped for every other arc target. That skip is terminal: a later step routing to a skipped node cannot revive it, so the run wedges.

itinerary-planner hit this twice once §6 unblocked the arcs:

  • render_widget_chat was both an exclusive arc of append_turn_events (when: first_tool == '') and the unconditional successor of normalize_tool_response. On a tool turn the empty-first_tool arc is false → render_widget_chat skipped → normalize_tool_response's arc can't run it → wedge at normalize_tool_response.
  • append_render_events_atomically had the identical shape under render_widget_chat (post_docs == 0) vs the successor of persist_render_docs_atomically.

The working pattern is already in the same playbook: the six call_* MCP steps (google-places, duffel-offers, duffel-create-order, hotelbeds-hotels, hotelbeds-activities, hotelbeds-transfers) all fan into normalize_tool_response via unconditional arcs, and the conditionality lives on each branch step's entry. A skipped branch step's unconditional arc does not skip the shared target (only a false when: arc on the selected-exclusive set does).

Fix: never list a shared convergence node as a conditional exclusive arc. Route the odd branch through a dedicated noop that fans in unconditionally:

# append_turn_events arcs: 6 call_* (conditional) + skip_tool_dispatch (when first_tool=='')
- step: skip_tool_dispatch          # noop
  when: "{{ extract_turn.first_tool == '' }}"
# skip_tool_dispatch -> normalize_tool_response (unconditional)
# normalize_tool_response -> render_widget_chat (single, unconditional predecessor)

itinerary-planner added skip_tool_dispatch and skip_render_persist for exactly this. Audit: a node reached by a conditional arc and any other arc is the smell.

8. Arc-level set: ctx.* is one-hop-transient — reference the producing step instead

The Rust orchestrator persists step-level set: blocks as durable ctx.updated events (folded into context for all later steps). Arc-level set: (a set: under next.arcs[]) is not persisted — it is rendered and merged into the immediate next step's command context only (orchestrator.rs arc_set_varsapply_set_mutations on step_context), then discarded. No ctx.updated event is emitted.

itinerary-planner set ctx.first_widget / ctx.post_events / ctx.bot_message on the arc out of render_widget_chat, then read them two and three steps later in append_render_events_atomically and final_result. Only the one-hop neighbour (persist_render_docs_atomically, reading ctx.post_docs) saw its value; the widget events and the caller-facing render came back empty — the trip "completed" but delivered nothing.

Fix: read the producing step directly. A completed step's result stays in the steps map for the whole execution, so {{ render_widget_chat.post_events }} / {{ render_widget_chat.first_widget }} resolve from any later step. Drop the arc set: blocks and repoint every ctx.* consumer at render_widget_chat.*.

How to validate a migrated playbook

# register (permissive — does NOT catch the drifts above)
noetl --host=localhost --port=8082 catalog register playbooks/<name>.yaml
# execute (strict — this is what surfaces the parse + validation errors)
noetl --host=localhost --port=8082 exec "<catalog/path>" --runtime distributed \
  --set <key>=<value>
# then read the event trace for per-step pass/fail
curl -s "http://localhost:8082/api/executions/<exec_id>" | jq '.events'

Register the child MCP playbooks too — a parent kind: playbook step 404s if the child isn't in the catalog.

Related

Clone this wiki locally