Skip to content

event envelope

Kadyapam edited this page Jun 6, 2026 · 7 revisions

Event envelope

The wire format for POST /api/events — what workers (and CLI in distributed mode) send to ingest events into the event log.

Current shape (noetl-server v2.9.0+, after the noetl-events crate adoption)

pub struct EventRequest {
    pub execution_id: String,        // i64 on the wire is a String (browser JSON-number precision)
    pub step: String,
    #[serde(alias = "name")]
    pub event_type: String,
    #[serde(default, alias = "context")]
    pub payload: serde_json::Value,
    pub meta: Option<serde_json::Value>,
    pub worker_id: Option<String>,
    pub result_kind: String,         // "data" | "ref" | "refs"  (server-only)
    pub result_uri: Option<String>,  // server-only
    pub event_ids: Option<Vec<i64>>, // server-only
    pub actionable: bool,            // server-only — orchestrator dispatch gate
    pub informative: bool,           // server-only — log-only persistence gate
    pub event_id: Option<String>,    // app-side snowflake per observability.md Principle 3
    pub status: Option<String>,      // STARTED / RUNNING / COMPLETED / FAILED
    pub created_at: Option<DateTime<Utc>>,
}

Since noetl-server v2.9.0 (PR noetl/server#38), the SHARED SUBSET of EventRequest is anchored to the canonical noetl_events::ExecutorEvent envelope via direct noetl-events = "0.1" dep + bidirectional conversion impls. See the noetl-events crate adoption section below for the design call and the four wire-compat tests that guard the boundary.

Three layered serde aliases keep pre-EE clients working without changes:

  • nameevent_type (legacy field name)
  • contextpayload (executor producers send context; server stores as payload)
  • Pre-EE clients that omit event_id / status / created_at fall back to server-side defaults (DB snowflake / name-derivation / Utc::now()).

Mirrors the Python EventEmitRequest shape — both servers accept the same wire format.

Cross-stack reconciliation — progress

Tracked on noetl/ai-meta#30 — Appendix H Rust migration umbrella. Reconciliation aligns four shapes (worker WorkerEvent, Python EventEmitRequest, Rust EventRequest, executor ExecutorEvent) so all accept the same wire format and worker switching is a one-liner.

PR Repo Status
EE-1 noetl/cli ✅ merged (#37) — executor 0.3.1 enriches ExecutorEvent with optional event_id / worker_id / meta + payload serde alias
EE-2 noetl/server ✅ merged (#6) — this server's EventRequest rename + new optional fields (noetl-server 2.0.0; pipeline-fix #7 published 2.0.1)
EE-4 noetl/noetl (Python) ✅ merged (#639) — Python EventEmitRequest aliases + worker_id top-level field + EventType: Literal[...]str (the Literal was already out of sync with the dot-notation event types used throughout production Python code)
EE-3 noetl/worker ✅ merged (#11) — noetl-worker 3.0.0: WorkerEvent replaced with ExecutorEvent re-export from noetl_executor::events; EventEmitter stamps worker_id + created_at at the source per observability.md Principle 4

Series complete + validated end-to-end (kind-noetl, 2026-05-31). All four producers / consumers now emit + accept the same wire format on /api/events.

EE-4 finalisation post-merge

EE-4 (noetl/noetl#639) added EventEmitRequest with validation_alias=AliasChoices(...) declarations but the schema lived in dead code — the actually-mounted /api/events endpoint at core/events.py::handle_event used the legacy EventRequest model with no aliases. Surfaced during the noetl-worker (Rust) kind-validation pass: the EE-3 wire shape (event_type, context, execution_id: i64) failed Pydantic validation at the live endpoint.

noetl/noetl#641 (noetl 4.0.0) closes the gap by adding the same validation_alias declarations to core/models.py::EventRequest:

  • nameevent_type
  • stepnode_name
  • payloadcontext
  • execution_id accepts JSON integer OR string (mode='before' coercion)

After this fix the EE umbrella's wire-shape promise truly holds — every producer's events land cleanly in noetl.event with the EE-3 fields populated. Kind validation summary on noetl/ai-meta#30.

noetl-events crate adoption (EE-4 follow-up, noetl/ai-meta#49)

After the historical EE-1..EE-4 series above landed, the wire-format envelope still lived in two places: noetl-executor::events::ExecutorEvent (the producer-side source) and noetl-server's hand-aligned EventRequest (the consumer-side mirror, kept in sync by code review). When the Rust server started producing events itself (Phase D R2 orchestrator wiring, noetl/server#31), the duplicated shape became actively maintenance-heavy.

The follow-up tracked under noetl/ai-meta#49 extracted the envelope into a dedicated noetl-events crate:

Round PR What
1 noetl/cli#49 Carve ExecutorEvent + EventSink + EventEmitter + NoopSink out of noetl-executor::events into the dedicated noetl-events workspace crate. noetl-executor::events becomes a 1-line re-export so existing call sites compile unchanged.
2 noetl/cli#50 Crates.io publish prep + actual publish: noetl-events 0.1.0 first release, noetl-executor bumped to 0.4.0.
3 noetl/server#38 This server takes a direct noetl-events = "0.1" dep + adds From<ExecutorEvent> for EventRequest + TryFrom<&EventRequest> for ExecutorEvent impls so the shared subset is anchored to the canonical envelope. Server bumps to v2.9.0.

Why the two types stay distinct (not a literal struct swap)

EventRequest carries a strictly larger field set than the canonical envelope. Five of those fields are legitimately server-only:

Field Why server-only
result_kind Drives the constraint-compliant {status, reference} / {status, context} result shape (noetl/server#29).
result_uri Same: required when result_kind is "ref".
event_ids Same: required when result_kind is "refs".
actionable Controls whether the orchestrator dispatches commands on this event.
informative Controls whether the event is log-only.

Wire format also differs in two places: EventRequest encodes execution_id and event_id as String on output (browser JSON-number precision concession for the dashboard); ExecutorEvent uses i64 for both.

So PR 3 didn't replace EventRequest with ExecutorEvent. Instead, the shared subset (execution_id, step, event_type, payload/context, meta, worker_id, event_id, status, created_at) is now anchored to noetl_events::ExecutorEvent via the conversion impls. Future changes to either type that break compat fail at the wire-compat tests below instead of being caught in a kind-validation cycle.

EE-5 follow-up — lax decode of integer execution_id / event_id (v2.19.1, 2026-06-04)

EE-4 anchored the shape but not the input wire type. When the Rust-only stack ran end-to-end in kind for Phase F R5, worker → server emission failed with serde "invalid type: integer 321079436235509760, expected a string at line 1 column 34" — the worker emits ExecutorEvent with execution_id: i64 over .json(&event) (JSON integer on the wire), and the server's EventRequest.execution_id: String strict-decoded to a string only. Python had hidden the drift via Pydantic v2's lax int→str coercion; serde does not.

Tracked on noetl/ai-meta#55noetl/server#56 → PR noetl/server#57 (v2.19.1). Two custom deserializers (deserialize_string_or_i64 for required + deserialize_optional_string_or_i64 for Option<String>) accept BOTH wire shapes and route everything into the same String field. Applied to three inbound fields:

  • EventRequest.execution_id
  • EventRequest.event_id
  • BatchEventRequest.execution_id

Outbound encoding stays String — browser clients are unaffected and the documented design intent (avoid JSON-number precision loss for large snowflakes) is preserved. The fix is purely input-side, in addition to the existing wire-compat tests.

Six new unit tests pin the dual-shape contract: test_event_request_accepts_integer_execution_id, test_event_request_accepts_string_execution_id, test_event_request_accepts_integer_event_id, test_event_request_event_id_null_is_none, test_batch_event_request_accepts_integer_execution_id, test_event_request_rejects_garbage_execution_id (arrays / objects still 422).

Validated end-to-end in kind (Rust server + Rust worker, Python deployments scaled to 0): noetl exec tests/fixtures/playbooks/hello_world runs through both steps to playbook.completed — the same scenario that returned Failed to emit event after 3 retries on v2.19.0.

Wire-compat guard

Four tests in src/handlers/events.rs pin the round-trip semantics:

  • ee4_executor_event_converts_into_event_request — canonical envelope projects into EventRequest with server-only fields taking handler defaults.
  • ee4_event_request_converts_into_executor_event — full-fidelity round-trip back to the envelope.
  • ee4_try_from_event_request_fills_defaults_for_missing_status_and_created_at — when producers omit status / created_at, the conversion applies the same fallbacks the live handler uses (event_status_from_name, Utc::now()).
  • ee4_try_from_event_request_rejects_non_numeric_execution_id — the wire shape is "stringified i64"; anything else surfaces as an error instead of silently dropping the event into the log with execution_id=0.

Conversion impls

impl From<noetl_events::ExecutorEvent> for EventRequest {
    // i64 → String for execution_id / event_id; server-only fields take
    // the same defaults the handler uses when a producer omits them.
}

impl TryFrom<&EventRequest> for noetl_events::ExecutorEvent {
    type Error = anyhow::Error;
    // String → i64 parse can fail (returns 400-equivalent error);
    // missing status / created_at fall back to event_status_from_name /
    // Utc::now() respectively.
}

The conversions are infrastructure — no current call site in the live POST /api/events / POST /api/events/batch handlers consumes them; they're there so follow-up callers can thread the canonical envelope through downstream code without re-deriving the conversion at each site.

Cross-stack envelope state after EE-1 + EE-2 + EE-3 + EE-4 (+ noetl-events crate adoption)

Source event_type field step / node_name field context / payload field worker_id field event_id field
noetl-events::ExecutorEvent (canonical, since 2026-06) event_type step context (alias: payload) optional worker_id optional i64
noetl-executor 0.4.0+ (re-exports from noetl-events) event_type step context (alias: payload) optional worker_id optional i64
noetl-server (Rust) v2.9.0+ (EventRequest + From/TryFrom impls anchored to noetl-events) event_type (alias: name) step payload (alias: context) optional worker_id optional String
Python EventEmitRequest (v3.0.0+) event_type (alias: name) node_name (alias: step) context (alias: payload) optional worker_id optional String
noetl-worker v3.0.0+ event_type step context top-level worker_id None (server gen_snowflake() default fires; app-side generation follow-up)

All four shapes accept the same wire format on input AND emit it on output. The dual-direction alignment lets workers / executors / projectors switch implementations without breaking event ingestion.

Why ExecutorEvent's shape was the target

  • step + status are first-class fields rather than buried in payload — easier for the projector + dashboard queries.
  • created_at is stamped at emit time → avoids server-clock skew when ordering events.
  • execution_id: i64 matches the Postgres bigint column type directly — no String⇄i64 conversion in the ingest path (the wire still uses String to avoid JSON-number precision loss for large snowflakes in browser clients).
  • Documented in executor-crate-architecture as the design target.

Real finding while landing EE-4

Production Python code uses dot-notation event types (step.exit, call.done, command.completed, command.issued, etc.) extensively in noetl/core/dsl/engine/executor/ and noetl/server/api/core/execution.py. But the schema's EventType = Literal["step_completed", "step_started", ...] (underscored, past-tense) only accepted the underscored variants. The strict Literal was either dead code or its validation never actually fired against real worker traffic.

EE-4 loosened EventType to str; semantic validation now lives at the call site (orchestrator + projector) where the real taxonomy is enforced. The dot-notation values are the actual production taxonomy; the Literal was aspirational and never matched reality.

Terminal events on orchestrator evaluate failure (v2.27.2, 2026-06-05)

handlers::events::trigger_orchestrator calls WorkflowOrchestrator::evaluate. A deterministic evaluate failure — an invalid template in a step body (e.g. {{ ctx.* }} rendered by engine::commands::build_tool_command), an unknown step in a next arc, malformed routing — fails identically on every retry. Before v2.27.2 the caller logged only a WARN and returned Ok, emitting no terminal event; the execution stayed RUNNING forever and GET /api/executions/{id} never resolved.

Now trigger_orchestrator emits a terminal playbook.failed event (status FAILED, the error surfaced in result.context.error, parented on the trigger event) on an evaluate Err, so the run resolves to FAILED. Transient/infra errors before evaluate (DB load of events / catalog) keep ?-propagating to the WARN-only path — those are retryable and must not kill a recoverable execution.

This is the same stall class as the command.failed fix in noetl/ai-meta#58: a deterministic failure must still produce a terminal event. Surfaced by the noetl/ai-meta#54 e2e regression sweep (noetl/server#95, closes server#94); the list-status aggregation maps playbook.failedFAILED.

See also

Clone this wiki locally