Skip to content

Releases

Kadyapam edited this page Jun 23, 2026 · 192 revisions

Releases

Per-repo release history with links to GitHub Releases pages. Maintained alongside submodule pointer bumps.

Last refreshed: 2026-06-23 (🚀 #104 — noetl-server v3.45.0 WI/ADC GCS auth MERGED + ROLLED TO PROD: server runs as the WI-bound KSA noetl-server-rust with the result-tier GCS ENV applied (auth=adc, bucket noetl-demo-19700101-results, cell usc1-a/256), all tier-enable flags OFF → tier inert; off-server cutover stayed sole-writer/lag-0/never-scan, 0 restarts. Prod is now configured + WI-live, ready for staged B→C→D enablement. Prior: 🪶 #104 Phases E + F MERGED — the FINAL build phases: side-effect barrier (tools v3.17.0, worker v5.44.0) + result-tier GC/DR (server v3.44.0, worker v5.45.0); flags default-off → inert in prod. Prior: Phase D MERGED — the minting flip (dual-write window); 4 PRs squash-merged in dependency order cut noetl-server v3.43.0 (mint-authoritative flag + result_store dual-write counter) + noetl-worker v5.43.0 (authoritative tier writer + tier-primary consume + rollback fallback); ops#204 + e2e#78 pointer-only. No new crate publish — no Cargo.toml dep change in either PR; server resolves noetl-locator 0.1.1, worker resolves from the registry. All flags default-off → inert in prod; repo-only (prod minting cutover is a separate next task). Prior: 🪶 #104 Phase C MERGED — resolve-by-URN read path; 3 PRs in dependency order cut noetl-server v3.42.0 (GCS object backend + cell registry + GET /api/internal/cells) + noetl-worker v5.42.0 (resolve-by-URN read path + fixes B/B1, closes OQ6, 38 unit tests); e2e#77 is pointer-only. No new crate publish — worker resolves noetl-tools 3.16.0 + adds published arrow=53; server resolves noetl-locator 0.1.1, both from the registry (no git/branch dep → no repoint). All flags default-off → inert in prod. Prior: 🪶 #104 Phase B MERGED — shadow Feather result tier; 5 PRs in dependency order cut noetl-tools v3.16.0 + noetl-locator v0.1.1 (member bumped 0.1.0→0.1.1 pre-merge so the additive API publishes), noetl-server v3.41.0 (sibling consumer), noetl-worker v5.41.0 (consume-loop); flags default-off → inert in prod. Prior: 📦 #104 Phase A MERGED — slim dependency-free noetl-locator 0.1.0 extracted + published to crates.io (tools#76noetl-tools v3.15.0 dc0c5d8; feat(locator): subject → minor + member-publish), and the server accepts the canonical result URI behind NOETL_RESULT_URI_ACCEPT (default off) parsed via the slim crate so the control plane never pulls noetl-tools' heavy graph (server#260noetl-server v3.40.0 c89d078, dep repointed git→0.1.0 pre-merge, 623 tests green) + the e2e#75 kind rig (eeca8b7). Flag default-off → no prod deploy; reaches prod on a later server rollout. — Prior: ✅ #95 SHIPPED + CLOSED — postgres pg_value_to_json now serializes tz-naive timestamp (NaiveDateTime) + date/time/uuid/numeric/bytea instead of returning null: noetl-tools v3.14.2 (tools#75, 6d9b674, published to crates.io) + noetl-worker v5.40.5 (worker#126, da24952, pin bump only). Built prod noetl-worker-rust:v5.40.5 @sha256:45212dbe… (Cloud Build us-central1) + rolled by digest onto noetl-worker-rust + noetl-worker-system-pool under the live off-server cutover (server stays v3.39.5, CPU 250m/2 kept) — rolling restart clean, off-server cutover stayed healthy (materializer sole-writer lag 0, command lag 0, WAL rehydrated per #119). Fixes the auth0_login expires_at: null repro. — Prior: ✅ #123 SHIPPED + CLOSED — a non-iterable loop in: now fails loudly instead of silently wedging: noetl-server v3.39.6 (server#258, 7f109a9, squash 275b914) makes the off-server drive decode the {"error":…} envelope (decode_orchestrate_error) and emit a terminal playbook.failed (metric noetl_orchestrate_drive_total{stage="drive_error"}, structured execution_id) instead of leaving the run at commands=0/RUNNING-forever; orchestrate-core prefixes the offending step name onto the existing evaluate_loop error. An empty iterable ([]/{}) still short-circuits to next. Server-only (worker stays v5.40.3); 600 server + 135 orchestrate-core tests + clippy; kind-validated prod-exact (absent iterable → FAILED with a clear message, valid [1,2,3] still COMPLETED 3-way, #120/#124 unaffected, 0 restarts); code-only, ships to prod on a future server rollout. — Prior: ✅ #121 SECOND-HALF SHIPPED + CLOSED — off-server system/* WAL-chain wedge fully fixed: noetl-server v3.39.5 (server#257, c421273, squash 54ac277) gates both off-server-drive sites in trigger_orchestrator_inner on should_publish(catalog_id) so system/* execs drive server-built run_state; regular execs keep the off-server path (preserves the #256 win). Server-only (worker stays v5.40.3); 612 tests + clippy; kind full-gate system/scheduled_cleanupWAL chain incomplete → COMPLETED 0 loops. #256 was only the first half (#121 reopened after a v3.39.4 prod re-cutover still wedged on the system-pool worker). — Prior: ✅ #121 first-half (partial) — off-server WAL-chain-incomplete loop on system/ executions FIXED: noetl-server v3.39.4 (server#256, 77aaa06, squash 28b17cb) links the gate-off command.claimed (+ the gate-off batch INSERT) through ChainHeads so it can no longer become an orphaned NULL-prev_event_id spine head, AND stops routing system/ executions to the off-server WAL drive (their events INSERT to noetl.event and never enter the noetl_events WAL → __offserver_retry__ loop). Server-only (changes confined to src/handlers/events.rs + src/state.rs → no worker bump). 598 tests + clippy clean; kind-validated system/scheduled_cleanup orphan + 17×+ re-drive loop → linked chain, COMPLETED in 6s. Live prod off-server-cutover wedge (applied frozen while offserver_retry→~389, all prev_event_id=NULL) is the real-world repro; armed revert recovered. PROD GKE default (STATE_BUILDER=server) untouched. — Prior: ✅ #125 + #126 SHIPPED + CLOSED — task_sequence control flow (do: jump/break/retry) and http tool body data-shape fixed; full 10×1000 batch pft_flow_test clean: noetl-tools v3.14.0 (tools#73, 638c3c6) + v3.13.1 (tools#72, 8dd0e1f); noetl-worker v5.40.3 (worker#124, 6dd3449). Perf follow-up #127 filed. — Prior: ✅ #124 SHIPPED — distributed task_sequence forward set:/sibling bindings no longer render empty: noetl-server v3.39.3 (server#255, 365d3be) defers rendering of sub-task templates whose vars aren't resolvable at command-build time (render_value_deferring_unresolved) so a forward set:/sibling ref propagates to later sub-tasks via the worker re-render. Prior: 🚀 PROD CQRS rollout RECORDED (ops#200ops@d6633f6, ai-meta 08c73e5) + LIVE-PROD e2e validation of the gate-ON cutover (28/30 execs PASS, sole-writer + clean-chain + never-scan, lag 0; 2 pg-credential-env-diff FAILs, not cutover bugs); prod left healthy on the new path. Prior: ✅ #119 + #118 SHIPPED + gate-ON kind-validated + CLOSED — single- AND multi-replica off-server now blemish-free (single-root incl. finalize, zero fallback) AND restart-robust (the WAL index rehydrates): noetl-worker v5.40.2 (worker#123, 48b0bde) makes the off-server WAL drain rebuild its in-memory index from the retained noetl_events WAL on every boot (ephemeral DeliverPolicy::All; the durable consumer's persisted cursor used to outrun the empty post-restart index → off-server execs stalled offserver_retry), and noetl-server v3.39.1 (server#253, c5f8cb2) + e2e (e2e#73, fe97d92) land the exactly-one-terminal-per-execution FinalizedGuard (suppresses a duplicate finalize before the chain linker so a single-replica straggler drive can't orphan the chain). Gate-ON kind: restart-rehydration proven (indexed_executions=17 wal_events=200 after a forced mid-flight pod delete); single-replica 6/6 stress / ~126 execs all roots=1(incl. terminal)/terminals=1/zero-scan; multi-replica 21 execs COMPLETE, forwarded_ok +202; PROD/defaults untouched. Prior: ✅ #117 SHIPPED — off-server from_events spine ordered by prev_event_id chain + walked from the real tip (the high-concurrency fan-out reduce wedge): noetl-worker v5.40.1 (worker#122, baeae78) + e2e (e2e#72, cdf1768); 2-replica affinity gate-ON stress 6/6 iterations / 108 execs COMPLETE, 15 real id-inversions all fired the fan-in reduce; prior: ✅ #116 program-scale step 2 SHIPPED — execution-affinity single-owner write ordering, multi-replica gate-ON validated: noetl-server v3.39.0 (server#252, 5e00d0a) + e2e 66b6e1b; prior: ✅ #115 Phase 5 SHIPPED — atomic-working-item context tenet 6: server v3.37.0 + worker v5.40.0; prior: Phase 4 DRIVE CUTOVER → ai-meta pointers bumped: noetl-worker v5.38.0 (worker#119, bef13e5) + noetl-server v3.34.0 (server#247, f0922bd) + ops (ops#198, b1da9f1) + e2e (e2e#65, b38b6dd) make the orchestrator drive construct its WorkflowState off the server from the noetl_events WAL spine (wasm run/from_events) under NOETL_STATE_BUILDER=offserver (default server, prod unchanged) — a durable noetl_state_builder consumer + a staleness guard (expected_head) that keeps the WAL state never staler than the server's view (fixed a fan-in-barrier re-issue the rig caught mid-session). Gate-ON parity rig PASS: offserver==server fingerprint, fan-in exactly-once, worker served +3 / event_scans 0 / wal_events +25, server state_build_event_scans 0, cache cold+1/incr+2, sole-writer 25==25 / lag-0; linear + loop legs also COMPLETED off-server. Prior — ✅ #115 Phase 4 KERNEL + FLAG SHIPPED + shadow kind-validated → noetl-worker v5.37.0 (worker#118, fef961c) lands the off-server state-builder kernel (state_builder — a pool-side per-execution chain index sourced from the noetl_events WAL; chain_walk() head→root spine in event_id order = parity by construction; cache keyed by the immutable chain head with CacheHit / Incremental tail-advance / ColdRebuild) + a live WAL shadow loop (NOETL_STATE_BUILDER_SHADOW, default off) + metrics, and noetl-server v3.33.0 (server#246, 3e6006d) the NOETL_STATE_BUILDER=offserver|server flag scaffold (default server); gate-ON kind-validated on live cluster (WAL-read wal_events_total=993 / event_scans_total=0; cache cold_rebuild=28 + incremental=21; shadow spines indexed==spine matching Phase-3 topologies 13/62/25/31/55; fresh fan-out COMPLETED gate-ON, sole-writer + lag-0; baseline restored); the offserver drive cutover is staged; no default changed. Prior — ✅ #115 Phase 3 MERGED → noetl-server v3.32.0 (server#245, 8338417) lands the chain-walk state builder behind NOETL_STATE_BUILD_MODE=chain_walk (default event_scan, prod unchanged) — the drive reconstructs WorkflowState by following the one-level prev_event_id chain head→root (in-memory ChainHeads head + (execution_id,event_id) PK lookups, never a WHERE execution_id scan of noetl.event) and feeds the same from_events (parity by construction); falls back to event-scan on cold-head / lag / non-genesis; NOETL_STATE_BUILD_PARITY_CHECK shadow-asserts both builds equal in one REPEATABLE READ snapshot; gate-ON kind-validated parity 41/41 MATCH, scans=0 / 1064 PK hops / 0 fallbacks, all topologies COMPLETE, sole-writer + lag-0, 577 tests + clippy green; Phase 4 (off-server builder + WAL cache) in progress. Prior — ✅ #115 Phase 2 MERGED: noetl-server v3.31.0 (server#244, f5bd4a8) lands the one-level prev_event_id event chain (each noetl.event/noetl.command carries the chain link, stamped at the emit chokepoint from a per-execution chain-head watermark, covering both gate paths + the materializer; additive — Phase 3 chain-walk state builder reads it, in progress) + noetl ecd16a2 (noetl#667) the canonical schema_ddl.sql chain columns; ai-meta pointer afdb365. Prior — ✅ #115 Phase 1 — references-in-state consume side — shipped: noetl-worker v5.36.0 (worker#117) makes resolve_context_references selective (resolve a noetl:// ref only when the tool input binds the step's bulk; predicate/scalar/_ref reads off the bounded extracted summary), and noetl-server v3.30.0 (server#243) surfaces _ref/_store/_uri on kept refs + flips refs_in_state default true. Closed #113 + #114 — all 9 stalled core fixtures reach playbook.completed gate-ON (max command ctx 412KB, 0 __orchestrate__ event rows, materializer lag 0). Off-server-drive prod cutover (#107/#111) unblocked on the size axis; prod GKE untouched. Prior — 🐞 #114 oversized-command.issued offload shipped — noetl-server v3.29.5 (server#242) offloads a command.issued context over NOETL_COMMAND_CONTEXT_MAX_BYTES (512KB) to noetl.result_store with a {__context_ref__} marker, resolved in get_command/claim_command (metrics context_offloaded/context_ref_resolved) — so the published event stays under NATS max_payload and the publish-only gate never wedges; kind gate-ON rig PASS, all command.issued <1MB, 6 of #113's 9 fixtures COMPLETE, the remaining 3 + the off-server-drive cutover gated on the refs_in_state consume side (#101); companion rig e2e#64. Prior — 🐞 #113 off-server-drive payload-size + cancel fix shipped — noetl-server v3.29.4 (server#241) makes apply_worker_orchestration resolve+decode an OFFLOADED __orchestrate__ drive result (over the 100KB inline budget → durable result-store reference.ref, no inline output_b64) instead of dropping it → non-convergent re-loop (new noetl_orchestrate_drive_total{stage=ref_resolved}); and stops the drive on cancel (match underscore playbook_cancelled + ExecutionState::is_terminal terminal guard evicts the orch-cache, no server restart). Kind gate-ON: 785KB drive result → ref_resolved→COMPLETED, 0 decode WARNs; cancel froze a drive-loop instantly; sole-writer lag 0; companion rig e2e#63. 5/9 #113 large-context fixtures COMPLETE; the other 4 hit a DISTINCT oversized-command.issued (full upstream context embedded → >1MB NATS payload) stall tracked in #114. Default behaviour unchanged; no prod default flipped. Prior — 🛡️ #103 materializer-lag GUARDRAIL shipped — noetl-worker v5.35.0 (worker#116) adds the materializer-consumer lag gauge (noetl_worker_nats_consumer_pending{consumer="noetl_materializer"}) on an independent poller; paired with ops #195/#196 (VMRule + worker scrape + VMAlert + dashboard + flip runbook). The pre-flip observability gate for PUBLISH_ONLY; kind-proven induce→fire→recover→clear. Default-off. Prior — 🎯 #103 server CQRS cutover COMPLETE — FLIP-READY — noetl-server v3.29.3 (server#240) routes the 2 ExecutionService cancel/finalize writers through the emit_event chokepoint, closing the last synchronous server noetl.event writer under the gate (kind-proven both modes; gate-off byte-identical INSERT, gate-on PUBLISHED + materializer sole writer + terminal state + 0 loss/dup); dual-mode e2e rig kind_validate_cancel_finalize_gate.sh (e2e#62). All three flip blockers closed → flipping PUBLISH_ONLY on is a staged operator decision. Default-off; no prod default changed. Prior — #104 off-server-drive × gate reconciliation PROVEN — noetl-server v3.29.2 (server#238) cold-cache apply rebuilds WorkflowState from the durable log instead of dropping the in-flight off-server drive result (crash-recovery; the #104 WAL-rebuild principle) + committed e2e rig kind_validate_orchestrate_gate.sh (e2e#61); kind-proven that gate-ON × off-server-drive compose (0 server writes / materializer sole writer / read-your-writes / hard-kill crash-recovery via cold_rebuild). Unblocks the PUBLISH_ONLY flip — only the 2 cancel/finalize sites remain. Default-off. Prior — #103 ack-after-materialize cascade landed — noetl-tools v3.13.0 (tools#71) deferred (ack-after-processing) ack in the subscription SourceClient/NATS source (AckMode::Defer + $JS.ACK durable handles + ack/nack/term); noetl-worker v5.34.0 (worker#115) in-process CQRS materializer consume-loop — drains noetl_events with deferred ack, acks each batch only after events/project succeeds → a transient project failure redelivers, no loss; ops#194 wires NOETL_MATERIALIZER_ENABLED on the system pool (kind on, prod false). Closes the ack-after-materialize durability gap before the PUBLISH_ONLY flip; kind fault-injection proven (gate-on sole-writer happy-path zero-loss; fault before ack → redeliver → materialize, loss=0, idempotent). Default-off; flip still gated on off-server-drive×gate (#104) + 2 cancel/finalize sites. Prior — 2026-06-18: #108 (c) — noetl-server v3.28.0 (server#233) flips the worker-driven orchestrator drive to default ON (NOETL_ORCHESTRATE_PLUGIN_DRIVE defaults true), scale-soak-gated on kind (zero noetl.event burst + system-pool isolation), closing #108; noetl-worker v5.33.0 (worker#114) the pool-affinity decline. Prior — 2026-06-16: WASM plug-in capability + Resource Locator â�� noetl-server v3.11.0 (server#210) plug-in module registry; noetl-worker v5.23.0 â�� v5.24.0 (worker#93 wasmtime host + Arrow data ABI + capability ring; worker#95 HTTP PluginSource) â�� catalog-loading loop closed, all behind off-by-default wasm-plugin; noetl-tools v3.11.0 (tools#68) the Resource Locator naming foundation. Tracked under #105 (plug-in compilation) + #104 (event-WAL). Prior â�� cursor/claim loop mode + PFT green on Rust â�� noetl-server v3.8.0 (server#196) cursor loop engine + output namespace; noetl-tools v3.10.1 (tools#66) postgres -- comment splitter fix; noetl/worker#88 worker dep bump; test_pft_flow_v2 all_passed:true on kind; #100 closed. Prior: Snowflakeâ��Postgres bidirectional transfer (#99) â�� tools v3.10.0 + worker v5.22.0 + e2e#58. Prior: Snowflake key-pair JWT auth validated end-to-end â�� noetl-tools v3.9.0 / v3.9.1 / v3.9.2 (tools#62 / #63 / #64); worker bumps (worker#83â��#86) + e2e#57 fixture cleanup; create_sf_database + setup_sf_table COMPLETED via JWT on kind against the live sf_test account; transfer step deferred to #99. Prior: Spool refinements #94 (s3 spool backend) + #93 (cross-restart drain) SHIPPED â�� noetl-tools v3.7.1 (tools#58/#59/#60 time =0.3.47 pin for the async-nats E0119) + noetl-worker v5.20.0 (worker#80) + ops MinIO (ops#177) + e2e (e2e#49); live on kind/MinIO: outageâ��6 bufferedâ��restartâ��startup auto-drainâ��6 replayed in order, idempotent, no loss; #94 + #93 CLOSED, #92 (shared crate extraction) remains. Prior: subscription/listener RFC #90 Phase 7 SHIPPED + live proof green â�� #90 CLOSED (all 7 phases complete) â�� scale hardening lands: server v3.5.0 (server#189) POST /api/execute/batch + opt-in exactly-once dedup window; worker v5.19.0 (worker#79) batch dispatch + dedup opt-in + per-subscription rate limits (token-bucket fetch-side backpressure, no loss); ops (ops#176) + e2e (e2e#48); no tools change. Live on kind: batch 12â��12 COMPLETED + per-message traceparent; dedup duplicateâ��1 execution + audit event; rate-limit engaged + 10/10 â�� executions (no loss). Follow-ups tracked: #91â��#94 + tools#57. Prior: #90 Phase 6 â�� noetl subscribe runs a kind: Subscription listener standalone in local mode (cli v4.11.0 cli#60, closes cli#59): reuses the same noetl_tools source + spool engine, emits the same ExecutorEvent envelope to a local FileEventSink (JSONL), dispatches in-process via PlaybookRunner (RFC §5.3) with a local_disk spool (§8.6). cli-only â�� no tools change / crate cascade (noetl-tools v3.5.0 source+spool reused). Live on in-cluster NATS (kind): 5/5 drained â�� in-process dispatch â�� COMPLETED (19-event JSONL trail); local_disk spool outage â�� 6 buffered (no loss) â�� recovery â�� 6 replayed in order â�� drained to 0. ai-meta â�� cli 2fb3fb0 (v4.11.0). Phases 1â��6 complete; #90 stays open for Phase 7 (scale hardening, volume-gated). Prior: subscription/listener RFC #90 Phase 5 SHIPPED + live out-of-cluster proof green â�� the runtime runs out-of-cluster on Google Cloud Run (RFC §5.2) + the gcs spool backend landed. noetl-tools v3.5.0 (tools#56) GcsBackend (GCS impl of the Phase-4 SpoolBackend trait, ADC + reqwest, no new dep); noetl-worker v5.18.0 (worker#77) gcs spool wiring + optional bearer auth + $PORT health bind; noetl-server v3.4.2 (server#187) gcs/s3 spool credential optional (ADC/Workload Identity); + ops automation/cloud-run/ (ops#175) + docs (docs#179) + e2e (e2e#47). Live on noetl-demo-19700101 (server via cloudflared tunnel to kind): 6/6 Pub/Sub msgs â�� HTTPS dispatch â�� COMPLETED on the subscription pool; a msg buffered to the real GCS bucket under a live outage. Finding: real Pub/Sub sync-pull needs timeout_ms >= 10s (tools#57). All test resources torn down. ai-meta -> tools 0f29c57 (v3.5.0) + server 67669ba (v3.4.2) + worker e1a74ce (v5.18.0) + ops 14c5bf1 + docs cb48772 + e2e 99cda2b. #90 stays open (Phases 6-7). Prior: subscription/listener RFC #90 Phase 4 SHIPPED + live outage proof green â�� store-and-forward spool + per-downstream circuit breaker (§8). noetl-tools v3.4.0 (tools#54) noetl_tools::spool (circuit breaker, SpoolItem + noetl://spool ref + sha256, nats_object/local_disk backends, ordered replay + idempotency + dead-letter + retention/GC; 44 unit tests + real-NATS integration test); noetl-worker v5.17.0 (worker#75) run-loop wiring (probeâ��circuitâ��spoolâ��ack, NATS-KV circuit state, drain-on-recovery, 6 events); noetl-server v3.4.1 (server#184+server#185) spool validation + lifecycle-status fix; + ops (ops#173) + e2e (e2e#44+e2e#45). Live on kind (6 msgs): scale downstreamâ��0 â�� circuit opened â�� 6 spooled (0 dispatched, no loss) â�� scaleâ��1 â�� circuit closed + draining â�� 6 replayed â�� 6 COMPLETED on the subscription pool â�� spool drained to 0 â�� idempotency held. ai-meta â�� tools 02110a5 + server 51dc0d1 + worker 65fb27d + ops ab9af34 + e2e 2d0ad0a. #90 stays open (Phases 5â��7 remain). Prior: subscription/listener RFC #90 Phase 3 SHIPPED â�� gateway push-ingress (Mode C) + auth-gated directive trust. noetl-gateway v3.3.0 (gateway#28) POST /ingress/{listener}: HMAC / bearer / Pub-Sub-OIDC verify â�� only-then header directives â�� one POST /api/execute per delivery on the dedicated pool (verify-and-forward gatekeeper, no DB on the ingress path); noetl-server v3.3.0 (server#182) push catalog validation + GET /api/internal/ingress/{listener} Wallet-secret config endpoint; + ops (ops#172) gateway internal-token env + e2e (e2e#43) push runner. Live E2E green on kind: HMAC 12/12 + bearer 12/12 assertions â�� verified deliveries â�� executions on the subscription pool â�� COMPLETED, allowlisted header redirect honored only after verification, and a tampered/unauth delivery (carrying the redirect header) â�� 401 with no execution + no directive applied; Pub/Sub-push envelope + attributes-channel directive proven live; OIDC signature path unit-proven (every negative: bad-sig / expired / wrong-aud / wrong-SA / unknown-kid). ai-meta â�� server fa1ff3f + gateway 38f024b + ops 54f2d65 + e2e 1421267. #90 stays open (Phases 4â��7 remain). Prior: subscription/listener RFC #90 Phase 2 SHIPPED â�� noetl-tools v3.3.0 (tools#52) header-directive engine + build_source factory, noetl-server v3.2.0 (server#180) kind: Subscription type + lifecycle + pool routing + W3C trace, noetl-worker v5.16.0 (worker#73) continuous runtime (Mode B), + ops (ops#171) dedicated pool/KEDA + e2e (e2e#42). Live E2E green (13/13): 6 NATS msgs â�� 6 executions on the dedicated subscription pool â�� all COMPLETED; 2 header-redirected; W3C traceparent into all 6 children; lifecycle registeredâ��activatedâ��pausedâ��resumedâ��drainingâ��deactivated event-logged. ai-meta â�� server ebd2944 + worker 1f74992 + tools 4995692 + ops 242e420 + e2e 32df918. Prior: subscription/listener RFC Pub/Sub + Kafka brought to live-E2E parity with NATS â�� no new release tag that round (server v3.1.0 + worker v5.15.2 + tools v3.2.0 already shipped): the two remaining Phase-1 brokers were stood up in kind (Pub/Sub emulator + single-broker KRaft apache/kafka:3.9.1, ops#170) with bounded-drain fixtures + runners (e2e#41); both drain count=5 acked=true â�� COMPLETED â�� event trail, the same bar NATS met, no adapter code change needed. ai-meta â�� ops 568a4ac + e2e 8d21e7a. Prior: subscription/listener RFC Phase 1 full E2E green â�� after the tool shipped (noetl-tools v3.2.0), the in-cluster playbook-dispatch E2E surfaced + fixed two integration gaps: noetl-server v3.1.0 (server#178) accepts the subscription ToolKind (the orchestrator validates tool.kind against a typed enum; unknown kind â�� HTTP 400), and noetl-worker v5.15.2 (worker#71) resolves nats/pubsub/kafka credential aliases by merging connection fields into the tool config (apply_credential previously errored on type nats). E2E proof: publish 5 â�� examples/subscription_e2e â�� subscription poll drains count=5 acked=true, consumer 5â��0, execution COMPLETED, full event chain in the log; cluster restored to a clean :dev build. ai-meta pointers â�� server 810e05c + worker e3b4de7. Prior: noetl-tools v3.2.0 (tools#50, closes tools#49 · tracks ai-meta#90) â�� Phase 1 of the subscription/listener RFC: a new atomic subscription registry tool (operation: poll) that performs a bounded drain of a message source (fetch up to N / until empty / until timeout, ack, return the normalized batch) â�� js_consume generalised across backends behind a reusable SourceClient abstraction (PolledMessage/PollOptions/PollOutcome/AckMode + decode_payload/normalize_headers). Three backends: NATS JetStream (refactors js_consume into the shared drain_pull_consumer; the nats tool now delegates), Pub/Sub pull (REST pull+acknowledge via gcp_auth, emulator support, feature pubsub), Kafka poll (pure-Rust kafka crate, feature kafka). Not a long-lived listener â�� honors the worker-slot contract; the continuous runtime / gateway push / spool / header-directive engine are later RFC phases. Worker dispatches via the generic registry â�� no dispatch-match change. 323 lib tests + gated integration tests; NATS poll path validated live against the in-cluster NATS JetStream broker (create stream â�� publish â�� drain â�� ack â�� second-drain-empty). ops example ops#169; wiki SubscriptionTool. Worker dep bump to v3.2.0 cascades after publish. Prior: noetl-server v3.0.6 (server#177, closes ai-meta#89) â�� round-trip JSON null in whole-object {{ step }} references: a null field in a re-injected step envelope rendered as the JS token undefined (invalid JSON) so the consuming step got an unparseable str; render_to_value now retries a lone {{ expr }} with | tojson (undefined/none â�� JSON null), mirroring the noetl-tools engine the server copy had diverged from. Root cause was the server template renderer, not the worker. 5 new tests; 619 lib + 8 parity green; kind-validated cursor pagination collects 35 through the terminal next_cursor: null page (4th check_pagination errorâ��success); ai-meta â�� server 8e17fbe. Prior: noetl-server v3.0.5 (server#176, closes ai-meta#85) â�� workflow-arc loops advance across iterations + terminate: durable event-sourced loop-ctx propagation (ctx.updated events, latest-wins fold, once-per-completion keyed by completion event_id) + a structural loop-branch-point fix for the loop-exit hang; kind-validated counter 0â��1â��2â��3 + 4-page offset pagination; ai-meta â�� server e519fdc. Prior: noetl-tools v3.1.1 (tools#48, closes ai-meta#87) â�� multi-tool task_sequence now injects each sub-tool's result under its label so a later sub-tool resolves {{ <label>.<field> }} (was rendering empty â�� syntax error at or near "," in unquoted numeric SQL); worker adopts via worker#69; ai-meta pointers â�� tools 76f942a + worker b97f642. Prior: noetl-server v3.0.4 (server#175, closes ai-meta#83 + #84) â�� fan-in barrier no longer deadlocks workflow loops (back-edge exclusion) + loop.done arc gates now fire (event.name injected); landed with e2e fixture fix e2e#39 (closes #86); ai-meta pointers â�� server 480ba72, e2e b0a5c85, cli a3e22ef (#58 doc fix, no tag). Prior: noetl-worker v5.15.1 (worker#68, closes ai-meta#78) â�� pre-dispatch credential-alias / tool-config failures now emit a terminal call.error + command.failed instead of hanging at command.started; typed CredentialResolutionError + CredentialHttpError classify terminal-vs-retryable by HTTP code (terminal 404/400/401/403/500 incl. the live pg_noetl_k8s 500 "Decryption failed"; retryable 408/429/502/503/504 + transport); folds in the noetl-tools path-dep â�� "3" revert (3.1.0). ai-meta pointer â�� worker 99e2c66. Prior: noetl-server v3.0.3 (server#173, tracks ai-meta#43) â�� container-callback call.done insert matches the deployed noetl.event schema (was targeting a non-existent attempt column â�� HTTP 500 on every watcher callback); unblocked the container-callback chain (kind-val GREEN both probes â�� happy_path â�� succeeded, oom â�� failed_oom), landed alongside ops#168 (watcher curl + OOM classification) + e2e#38 (OOM fixture). ai-meta pointer â�� 5d2cf58. Prior: noetl-gui v1.11.0 + v1.11.1 â�� credential View/Edit recovery for pre-wallet records (gui#36, closes ai-meta#82) + dev:kind convenience script (gui#35); ai-meta pointer â�� 8cacc9e. Prior: noetl-server v3.0.2 (server#172, closes ai-meta#81) â�� container-tool command type contradiction fix: ToolSpec.command Option<String> â�� Option<serde_json::Value> so the container tool's K8s-Job-style array command decodes server-side + passes through to the worker's Vec<String>; scalars stay JSON strings for shell/db tools; ToolCall::from_spec forwards verbatim. 2 regression tests; clippy clean. Kind-val GREEN end-to-end (server accepts the array command, worker creates the K8s Job, Job reaches Complete 1/1). Prior: e2e-sweep cleanup: noetl-tools v3.1.0 (tools#47) + noetl-server v3.0.1 (server#171) â�� when: true boolean / |tojson object-template fallback (tools) + 64 MB result-store body limit + pipeline command/spec stash (server); tracks ai-meta#49. Prior: #77 post-merge: noetl-tools v3.0.0, noetl-server v3.0.0, noetl-cli v4.10.0. Prior: noetl-server v2.63.0 â�� ctx shims + step-level set: via server#168. Prior: noetl-server v2.62.1 â�� clippy cleanup, closes server#161. Prior: noetl-server v2.62.0 â�� Sequential-mode iterator dispatch; closes ai-meta#76. Prior: noetl-server v2.51.0 â�� Phase D R5 R1: Replay endpoint scaffold + execution projection (server#149, tracks server#148, tracks ai-meta#49 Phase D R5). Opens Phase D Round 5 â�� the Replay engine port (~1236 LoC of Python). Sub-issue server#148 documents the 7-round decomposition. Round 1 ships the surface byte-for-byte against Python's endpoint.py + the minimal execution projection using the same terminal-event short-circuit pattern Phase D R4 landed in the orchestrator + status endpoint. Lib 508/0/0 (was 499/0/0). Prior: noetl-server v2.50.1 â�� Phase D R4 follow-up: status endpoint short-circuits on terminal events (server#147, closes server#146). Fixes the read-side bug surfaced during the Phase D R4 kind-val: status endpoint returned RUNNING indefinitely after playbook.completed landed. Two causes â�� missing terminal-event lookup + lowercase status filter mismatch. Lib 499/0/0. Kind-validated end-to-end: prior execution flipped RUNNINGâ��COMPLETED on the same DB data; fresh fanout_reduce execution reached COMPLETED in ~600ms. Prior: noetl-server v2.50.0 â�� Phase D R4 slice 2: apply_event handles step.skipped (server#145, closes server#144, tracks ai-meta#49 Phase D R4). Closes the gap the slice 1 PR documented with #[ignore] â�� fan-in barrier now correctly treats a guard-skipped upstream as terminal. Lib 493/0/0 ignored (was 490/0/+1). Phase D R4 slice 1 + 2 both shipped at the orchestrator level; remaining work is kind-validation against the fanout_reduce_phase6 fixture. Prior: noetl-worker v5.14.0 â�� Container Tool Callback umbrella #43 Round 4 worker-side pending_callback adoption (worker#60, closes worker#59, tracks ai-meta#43). executor::command checks tool_result.pending_callback after success: when Some(true) logs INFO + bumps noetl_worker_call_done_skipped_pending_callback_total{tool_kind} + skips own call.done emit (the server's container-callback endpoint emits the terminal event via the watcher path). Cargo.toml bumps tools 2.18â��2.21 + executor 0.3â��0.4 (lock resolves the published 0.4.1). 126/0 lib tests. This is the last code piece of the Container Tool Callback umbrella â�� kind-validation against a fresh worker image is the remaining housekeeping step. Prior: noetl-server v2.49.0 â�� Phase D R4 first slice: fan-in / reduce barrier (server#143, closes server#142, tracks ai-meta#49 Phase D R4). Orchestrator now defers dispatch of any multi-incoming-arc step until ALL upstream steps reach terminal state; pre-PR it fired the reduce step on the FIRST completing upstream â�� never seeing the others' results â�� which broke fanout_reduce_phase6. Implementation: new build_incoming_arcs(steps) helper mirrors the Python planner's incoming map across all four NextSpec variants; barrier-defer check between same-pass dedup and dispatch. Single-upstream targets unaffected. 490 lib tests + 1 ignored (documents an apply_event step.skipped follow-up). Prior: noetl-cli v4.10.0 â�� noetl-executor 0.4.1 propagates ToolResult.pending_callback (cli#56, closes cli#55, tracks ai-meta#43 Round 4). Patch bridge fix unblocking noetl/worker#60 (the worker-side pending_callback adoption follow-up for the closed Container Tool Callback umbrella); after cli@77be8be releases noetl-executor 0.4.1 to crates.io, worker's cargo build against the bumped deps (noetl-tools 2.18â��2.21, noetl-executor 0.3â��0.4) flips green automatically. Prior: noetl-tools v2.21.0 â�� Container Tool Callback umbrella (noetl/ai-meta#43) Round 3 â�� Tool::Container + ToolResult.pending_callback marker (tools#37; closes tools#36). Tool::Container creates a labeled K8s Job (noetl.execution-id / noetl.step-name / noetl.tool-kind=container on both Job + PodTemplate) and returns immediately â�� worker slot frees as soon as api.create() returns. ContainerConfig mirrors the catalog YAML shape (image + command + args + env with literal value XOR value_from { secret_name, secret_key } + resources + timeout_seconds + service_account + namespace + backoff_limit + restart_policy). Default ns noetl, default backoffLimit: 0 (playbook's own retry: block owns retries), default restartPolicy: Never. generateName: noetl-container-<slug>-<eid>- (DNS-1123-safe). Additive ToolResult.pending_callback: Option<bool> marker â�� set by Container to suppress the worker's own call.done emit. Worker-side adoption is a coordinated follow-up; until then the watcher's callback is treated as stale by the server (recorded by noetl_container_callback_stale_total), which is a harmless race during the transition. 10 existing struct-literal sites backfilled with pending_callback: None. 17 new unit tests; lib 258/0. Closes the last code round of the umbrella â�� only Round 5 (e2e kind-val rig, e2e#29) remains. Prior: noetl-server v2.48.0 â�� Container Tool Callback umbrella (noetl/ai-meta#43) Round 2 â�� POST /api/internal/container-callback/{execution_id}/{step} (server#141; closes server#140). External K8s watcher (Round 1, ops#166) POSTs Job terminal-state events here when a Job carrying the noetl.execution-id label transitions to a terminal state. Six TerminalState variants matching the umbrella's failure-mode taxonomy (succeeded / failed / failed_image_pull / failed_oom / failed_node_lost / failed_timeout); each survives in meta.terminal_state so playbooks branch on the specific failure reason. Stale check: single indexed SELECT on noetl.event; zero rows â�� bump noetl_container_callback_stale_total{state} + log INFO + return 202. Match â�� emit call.done via the standard insert_event path. Returns 202 unconditionally on path-param validation success. Auth via the existing RequireInternalApiToken extractor. Two new counters; 7 new unit tests; lib 487/0. Round 2 unblocks Round 1 (watcher) + Round 3 (Tool::Container). Prior: noetl-tools v2.20.0 â�� artifact tool kind added to Rust registry â�� get-only ResultFetchTool alias (tools#35; closes noetl/ai-meta#64). Thin ArtifactTool adapter translates the Python-era YAML shape (action: get + input.result_ref) into a ResultFetchTool call; keeps the three e2e fixtures using kind: artifact working without modification. action: put returns a configuration error pointing at the worker's call.done emit path per agents/rules/execution-model.md (the playbook-side push surface is intentionally absent in the Rust path). All six result_fetch pass-through fields (prefer/flight_endpoint/bearer_token/tls_ca_path/client_cert_path/client_key_path) honoured. 8 new unit tests; lib 241/0. Backward compatible. Prior: noetl-server v2.47.0 â�� Secrets Wallet #61 umbrella closes â�� three cloud-specific dynamic-secret providers shipped this session. v2.47.0 Phase 6d.2 GCP iamcredentials.generateAccessToken (server#138, closes server#133): mints short-lived OAuth2 access tokens for a target service account via workload-identity impersonation; reference shape <target-sa-email>[#<scope>]; 10 unit tests. v2.46.0 Phase 6d.3 Azure AAD client-credentials (server#139, closes server#134): off-cluster (non-IMDS) client_credentials flow for deployments outside AKS; service-principal triple from env; sovereign-cloud overrides via NOETL_AZURE_AAD_HOST; 14 unit tests. v2.45.0 Phase 6d.1 AWS STS AssumeRoleWithWebIdentity (server#137, closes server#132): EKS-IRSA path â�� exchanges the projected SA JWT for short-lived AWS temp credentials; no SigV4 (STS anonymous action); response parser handles both XML and JSON; 15 unit tests. All three providers return SecretValue.expires_at populated â�� Phase 6d's cache_decision clamps cache TTL; Phase 7c.3 background refresh re-resolves inside the window. Umbrella is feature-complete: envelope encryption + KMS + 5 static-secret providers (GCP-SM, K8s, Vault, AWS-SM, Azure-KV) + 3 dynamic-secret providers (AWS-STS, GCP-IAM, Azure-AAD) + residency policy + cross-region broker + KEK rotation + audit + auto-renewal with stampede collapse. Prior: noetl-server v2.44.0 â�� Secrets Wallet #61 Phase 7c.3: resolver-side stampede mutex + background re-resolve (server#136). Wires the Phase-7c decision primitive + the Phase-7c.2 cache-side companion into the resolver's cache-hit path. When CredentialService::try_resolve_keychain hits a fresh-but-aging row, the cached value returns IMMEDIATELY and a background tokio::spawn re-resolves via the Phase-3b SecretProvider + updates the cache via KeychainService::set. Stampede collapse via new src/services/keychain_refresh.rs RefreshInflight (Arc<tokio::sync::Mutex<HashSet<(i64, String)>>>) â�� N workers crossing the refresh threshold for the same (catalog_id, alias) collapse to one provider call; concurrent callers piggy-back via noetl_secret_refresh_total{outcome=\"stampede_collapsed\"}. Refactor: extracted resolve_via_provider from try_resolve_keychain so cache-miss inline + background refresh share identical code. 6 new unit tests; lib 441/0. Phase 7c series wire-complete (7c primitive + 7c.2 cache companion + 7c.3 resolver integration). Remaining work on the umbrella is the three cloud-specific dynamic-secret providers (6d.1 AWS STS server#132 · 6d.2 GCP iamcredentials server#133 · 6d.3 Azure AAD server#134). Prior: noetl-server v2.43.0 â�� Secrets Wallet #61 Phase 7b.2 + 7c.2 follow-up rounds: noetl.secret_audit table + DbAuditSink + GET /api/internal/secret-audit query endpoint (server#129) + KeychainService::should_refresh cache-side primitive (server#131). Single release covering both PRs. Prior: noetl-server v2.42.0 â�� Secrets Wallet #61 Phase 7a.2: KEK rotation endpoint + key-status + DB scans (server#127) â�� POST /api/internal/wallet/rotate-kek runs a batched cursor scan across noetl.credential + noetl.keychain, GET /api/internal/wallet/key-status reports per-version row counts. Wraps the Phase-7a rewrap_storage_string primitive in operator-facing endpoints; plaintext NEVER reconstructed. All three Phase-7 named rounds (rotation / audit / auto-renewal) now have functional endpoints + DB storage. Three cloud-specific dynamic-secret provider sub-issues filed: 6d.1 AWS STS (server#132) · 6d.2 GCP iamcredentials (server#133) · 6d.3 Azure AAD (server#134). Prior: noetl-server v2.41.0 â�� Secrets Wallet #61 Phase 7c: token auto-renewal primitives (server#125) â�� closes Phase 7; all named rounds (1â��7) of the Secrets Wallet umbrella are complete. secrets::dynamic::should_refresh(expires_at, refresh_window, now) decision primitive â�� true iff expires_at set + still valid + within refresh window. KEYCHAIN_CACHE_REFRESH_WINDOW_SECS env (default 60). Two new metrics: noetl_secret_refresh_total{outcome="triggered\|succeeded\|failed\|stampede_collapsed"} counter (failed alert-worthy) + noetl_secret_refresh_duration_seconds histogram (50msâ��5s buckets). 5 new unit tests; lib 427/0. Lib-only. Prior: noetl-server v2.40.0 â�� Secrets Wallet #61 Phase 7b primitives: secret-resolution audit service (server#123) â�� AuditEvent struct (NEVER contains the secret value); bounded Operation + Outcome enums; AuditSink trait + NoopAuditSink default + SecretAuditService with record_async / record_strict / record modes. NOETL_SECRET_AUDIT_REQUIRED env (default false; truthy enables strict â�� wallet refuses to release a credential if the audit can't be recorded). noetl_secret_audit_writes_total{operation, outcome, status} counter (failed_strict alert-worthy). 8 new unit tests; lib 422/0. Lib-only. Prior: noetl-server v2.39.0 â�� Secrets Wallet #61 Phase 7a: KEK rotation primitives (server#121) â�� starts Phase 7. KeyManager::current_key_version() trait accessor; EnvelopeCipher::rewrap_storage_string primitive returns Skipped { key_version } when same version, otherwise Rewrapped { old_key_version, new_key_version, new_storage_string } after unwrapping under the historical KEK version and re-wrapping under the current one. Plaintext payload NEVER reconstructed â�� pure DEK re-wrap, AES-GCM ciphertext bytes byte-identical. noetl_wallet_rotate_total{table, status} counter. 4 new unit tests; lib 414/0. Lib-only. Prior: noetl-server v2.38.0 â�� Secrets Wallet #61 Phase 6e: cross-region broker (server#119) â�� closes Phase 6. BrokerRegistry (region â�� broker_url from NOETL_SECRET_BROKER_REGISTRY env; empty default = pre-6e fail-closed); POST /api/internal/cross-region/resolve peer endpoint (validates expected_entry_region == server_region(), resolves locally, seals via Phase-5a primitives to the requesting worker's pubkey directly); get_sealed handler falls back to broker on AppError::ResidencyViolation; KeychainDef.no_broker_fallback per-credential opt-out; AppError::CrossRegionUnreachable â�� HTTP 502. Two new metrics: noetl_secret_broker_call_total{broker_region, outcome} counter + noetl_secret_broker_call_duration_seconds{broker_region} histogram (50msâ��5s buckets). 10 new unit tests; lib 410/0. Both residency shapes operational: hard isolation (strict + no broker = HTTP 403) + soft federation (strict + broker = transparent cross-region routing). Phase 6 closes. Prior: noetl-server v2.37.0 â�� Secrets Wallet #61 Phase 6d primitives: dynamic-secret support + cache honors issuer TTL (server#117) â�� SecretValue.expires_at field; src/secrets/dynamic.rs cache_decision() honors min(default_ttl, expires_at - now - safety_margin) and returns SkipCacheAlreadyExpired when the deadline is already past or inside the safety margin; KEYCHAIN_CACHE_DYNAMIC_SAFETY_MARGIN_SECS env (default 60); resolve_keychain_entry_with_meta returns the bundle's earliest expires_at; CredentialService::resolve_via_provider consumes the helper. Two new metrics: noetl_secret_dynamic_ttl_seconds histogram + noetl_secret_cache_skip_total{reason="already_expired"} counter. 7 new unit tests; lib 398/0. Backward compatible (providers without expires_at keep the 600 s default). Follow-ups (each its own sub-issue): 6d.1 AWS STS · 6d.2 GCP iamcredentials · 6d.3 Azure AAD. Prior: noetl-server v2.36.0 â�� Secrets Wallet #61 Phase 6c: residency-policy gate (server#115) â�� KeychainDef.residency enum (none\|advisory\|strict, default none) + allowed_regions allowlist; resolver runs the gate BEFORE any provider call so strict-mode mismatches short-circuit with AppError::ResidencyViolation (HTTP 403, clear message that NEVER includes the value); noetl_secret_residency_check_total{policy, decision} counter â�� strict + violation_blocked is alert-worthy, advisory + violation_allowed is the migration-window signal. Defensive: empty string in allowlist never matches empty server region. 8 new unit tests; lib 391/0. Lib-only â�� no schema migration. Prior: noetl-server v2.35.0 â�� Secrets Wallet #61 Phase 6b: ProviderRegistry + per-(provider, region) metrics (server#113) â�� server-side cache of (provider_id, region) â�� Arc<dyn SecretProvider> so the resolver doesn't rebuild from env on every cache-miss; RwLock + double-checked locking on the build path; optional TTL via NOETL_SECRET_PROVIDER_TTL_SECONDS (default 0 = process lifetime). New noetl_secret_provider_build_total{provider,region,status="cache_hit\|ok\|error"} counter + noetl_secret_resolve_duration_seconds{provider,region} histogram (5 ms â�� 5 s buckets). 7 new unit tests; lib 383/0. Lib-only. Prior: noetl-server v2.34.0 â�� Secrets Wallet #61 Phase 6a: region tag on keychain entries + per-region routing (server#111) â�� starts Phase 6 (residency-aware distributed resolution). KeychainDef.region optional field (no schema migration â�� lives in the existing JSON blob); SecretRef.region provider-agnostic; AWS provider consumes it as the regional endpoint with explicit precedence (<region>: ref prefix > field > legacy project overload > AWS_REGION env). New NOETL_SERVER_REGION env + server_region() / effective_region() fallback helpers. noetl_secret_resolve_total{provider,region,status} counter per observability.md Principle 1 (region label bounded-cardinality, execution_id stays on the matching span). 5 new unit tests; lib 376/0. Lib-only â�� backward compatible. Prior: noetl-worker v5.13.0 â�� Secrets Wallet #61 Phase 5c: worker integration (worker#58) â�� long-lived X25519 keypair generated once at startup, pubkey registered in the runtime JSON blob, ControlPlaneClient::get_sealed_credential calls /api/credentials/{id}/sealed, unseals via the same primitives (drift-guard test pins server constants), zeroizes the cleartext after the auth-alias resolver consumes it. Env-gated (NOETL_SEALED_CREDENTIALS=true|1|yes, defaults off â�� backward compat). Cross-repo kind-val (server :5b + worker :5c, env on): server seals to the worker's registered pubkey; noetl_credentials_sealed_total{status="ok"} increments per fetch. Phase 5 (sealed payload delivery) is fully merged across server (5a+5b) + worker (5c). Prior: noetl-server v2.33.0 â�� Secrets Wallet #61 Phase 5b: wire format + sealing endpoint (server#109) â�� new GET /api/credentials/{id}/sealed?worker_id=<name> returns a SealedEnvelope JSON addressed to the named worker; workers opt in via the runtime JSON worker_public_key (no schema migration); 400 BadRequest when the worker_pool row exists but didn't register a pubkey; noetl_credentials_sealed_total{status} counter + credential.seal span per observability.md. Kind-validated end-to-end (Python cryptography + HKDF + ChaCha20-Poly1305 opened the envelope â�� recovered the bearer token + scope round-trip). Prior: noetl-server v2.32.0 â�� Secrets Wallet #61 Phase 5a: sealed payload crypto primitives (server#107) â�� src/crypto/sealed.rs X25519 ECDH + HKDF-SHA256 + ChaCha20-Poly1305 sealed-box (nonce derived from the shared secret, AAD pins alg+v for clean alg-mismatch rejection); 12 unit tests (round-trip, tamper, alg/version-mismatch, JSON wire stability); lib 369/0; clippy lib-clean. Lib-only â�� no schema/API change yet (5b adds the runtime-registry worker pubkey + sealing endpoint, 5c the worker side). Defense-in-depth on top of Phase-4 mTLS: cleartext never enters the response body. Prior: noetl-server v2.31.0 â�� Secrets Wallet #61 providers 3.x: AWS Secrets Manager + Azure Key Vault providers (server#105) â�� two new backends behind the one SecretProvider trait completing the 5-provider matrix (GCP/K8s/Vault/AWS/Azure); AWS SM uses hand-rolled SigV4 (no aws-sdk dep tree; signing key verified against AWS's published reference vector), Azure KV uses IMDS Managed Identity; 21 new unit tests; cloud-only backends (kind-val at unit-test layer like GCP). noetl-worker v5.12.0 â�� Secrets Wallet #61 Phase 4b: worker control-plane mTLS client (worker#56), NOETL_TLS_CLIENT_CERT/KEY present a client cert + NOETL_TLS_CA trusts a private-CA server (rustls-tls reqwest); cross-repo kind-val ran a hello_world playbook to COMPLETED over https+mTLS. noetl-server v2.30.0 â�� Secrets Wallet #61 Phase 4a: server opt-in TLS/mTLS listener (server#103), NOETL_TLS_CERT+NOETL_TLS_KEY â�� HTTPS, +NOETL_TLS_CLIENT_CA â�� mTLS; ring rustls + axum-server bind_rustls; kind-validated (200 w/ client cert, rejected w/o, plain HTTP refused). Prior: v2.29.0 â�� Secrets Wallet #61 providers 3.x: HashiCorp Vault provider (server#101), a provider: vault keychain alias resolves from a Vault KV v2 secret, kind-validated end-to-end against an in-cluster Vault (second backend validatable on kind). Prior: v2.28.1 â�� /api/executions list query candidate-first rewrite + status-drift fix (server#99, #62): 6.5 s â�� 0.015 s (~430Ã�), statuses corrected. Prior: v2.28.0 â�� Secrets Wallet #61 providers 3.x: Kubernetes Secrets provider (server#97), a provider: k8s keychain alias resolves from an in-cluster Secret, kind-validated end-to-end with a real value. noetl-tools v2.19.3 â�� python tool accepts the nested script.source.code (inline) shape (tools#33, #63 round 1). Prior: v2.27.2 â�� orchestrator emits a terminal playbook.failed on a deterministic evaluate failure instead of stranding the run in RUNNING (server#95 + fixture e2e#28; #54 sweep). Prior: v2.27.1 â�� NextSpec untagged-order parser fix; server main fully green. Prior: Secrets Wallet Phase 3c â�� noetl-server v2.27.0: execution-scoped envelope-encrypted keychain cache so an auth: "{{ alias }}" lookup isn't re-fetched per step; also fixed the long-broken keychain storage layer / /api/keychain endpoints. Standardization pointer debt cleared: ops#161 + travel#58)

How this page stays current. Every chore(sync): bump <repo> to <sha> commit in ai-meta should add a line here if the bumped SHA is a tagged release. See Wiki Convention and Sessions Log for cross-references.

Timeline (all repos, last 14 days)

Date Repo Version Headline
2026-06-23 noetl/worker v5.46.0 🪶 #104 OQ5 Option A — producer-staged result tier (worker#132, 27c7c17; ai-meta pointer bumped via #128). Producing worker stages the over-budget tier object at emit time under NOETL_RESULT_PRODUCER_STAGE (default off → byte-identical no-op), decoupling the tier write from noetl.result_store; materializer skip-on-exists; shared decide_tier → byte-identical. feat(result): → minor bump; crate + multi-arch image published. Soak-gate alert rules: ops#206 (open, do-not-apply). Flag default-off → inert in prod; result_store retirement is a separate gated soak.
2026-06-23 noetl/server v3.45.0 🚀 #104 — Workload Identity / ADC auth for the GCS result-tier backend + PROD ROLLOUT (3-mode none/static/adc matrix via gcp_auth; NOETL_OBJECT_STORE_GCS_AUTH=autoadc on real GCS). Server rolled to prod GKE as the WI-bound KSA noetl-server-rust (@sha256:d3cbf1ad…), result-tier ENV applied (OBJECT_STORE_BACKEND=gcs, bucket noetl-demo-19700101-results, cell usc1-a/256 shards) — all tier-enable flags stay OFF, tier inert. Server up healthy auth=adc (no token minted; lazy on first I/O), off-server cutover stayed sole-writer/lag-0/never-scan, 0 restarts (server#265, fad5d8a).
2026-06-23 noetl/server v3.44.0 🪶 #104 Phase F — result-tier GC sweeper (NOETL_RESULT_TIER_GC, default off): conservative dry-run-first, never deletes a live-referenced object (server#264, 341b614).
2026-06-23 noetl/worker v5.45.0 🪶 #104 Phase F — result-tier DR re-derive (NOETL_RESULT_TIER_DR, default off): materializer verify-and-repair, byte-identical rebuild of a missing/corrupt tier object (worker#131, dd07016).
2026-06-23 noetl/worker v5.44.0 🪶 #104 Phase E — side-effect durability barrier (NOETL_SIDE_EFFECT_BARRIER, default off): adopt-only, side effects fire exactly once across re-drive; depends on published noetl-tools 3.17 (worker#130, d696f7e).
2026-06-23 noetl/tools v3.17.0 🪶 #104 Phase E — registry::kind_is_side_effecting side-effect classifier (conservative default true; only noop/rhai false) (tools#78, 1d49dd5; members noetl-directives + noetl-locator re-published first).
2026-06-23 noetl/server v3.43.0 🪶 #104 Phase D — mint-authoritative flag + result_store dual-write counter (NOETL_RESULT_MINT_AUTHORITATIVE, default off; the result_store PUT records each write on noetl_result_store_dual_write_total as the reversible dual-write fallback leg of the minting flip; no Cargo.toml change, resolves noetl-locator 0.1.1 from the registry; default-off byte-identical) (server#263, 6f6b9ef).
2026-06-23 noetl/worker v5.43.0 🪶 #104 Phase D — the minting flip (NOETL_RESULT_MINT_AUTHORITATIVE, default off: the result materializer becomes the authoritative tier writer (implies the Phase B flag) + resolve-by-URN becomes the primary consume read path (implies the Phase C flag), with the dual-written result_store as the fail-safe fallback; new noetl_worker_result_mint_authoritative_total{path} = tier | legacy_fallback; no Cargo.toml change, 247 lib tests + clippy) (worker#129, be6863a).
2026-06-23 noetl/server v3.42.0 🪶 #104 Phase C — GCS object backend + cell-endpoint registry + GET /api/internal/cells (server side of the resolve-by-URN read path; default-off, inert in prod; no Cargo.toml change, resolves noetl-locator 0.1.1 from the registry) (server#262, c2d5ca9).
2026-06-23 noetl/worker v5.42.0 🪶 #104 Phase C — resolve-by-URN read path + fixes B/B1 (references-in-state behavior, flatten_single_tool_result; closes OQ6; adds the published arrow = "53" direct dep, resolves noetl-tools 3.16.0 from the registry; 38 unit tests) (worker#128, 7971041).
2026-06-22 noetl/tools v3.16.0 + noetl-locator v0.1.1 🪶 #104 Phase B — ResultCoordinates::parse/from_locator on the slim noetl-locator (inverse of logical_uri: recover (tenant, project, execution_id, step, frame, row, attempt) from the worker-stamped URI). Additive → member bumped 0.1.0→0.1.1 pre-merge so the publish step fires; semantic-release cut noetl-tools v3.16.0 + published noetl-locator 0.1.1 (member-publish ordering); noetl-tools re-exports it (tools#77, 7da39d8).
2026-06-22 noetl/server v3.41.0 🪶 #104 Phase B — ensure the sibling noetl_result_materializer durable consumer at stream-birth (own ack cursor, so object-store latency never back-pressures the noetl.event audit fold); no new deps, control plane stays slim (server#261, 4a6659e).
2026-06-22 noetl/worker v5.41.0 🪶 #104 Phase B — shadow Feather result tier: a separate noetl_events consume-loop (noetl_result_materializer) writes the over-budget result (tabular → Arrow Feather, non-tabular → JSON, small → inline no-op) to the derived §7 key alongside noetl.result_store; gated NOETL_RESULT_MATERIALIZER_ENABLED (default off → not spawned → no-op) (worker#127, 4b1c15b).
2026-06-22 noetl/server v3.40.0 📦 #104 Phase A — accept the canonical result URI behind NOETL_RESULT_URI_ACCEPT (default off), parsed via the slim noetl-locator 0.1.0 (noetl/server#260, c89d078; ai-meta pointer bumped). Dep repointed git→0.1.0 pre-merge; cargo build/cargo tree resolve from crates.io and the heavy graph stays absent from the control plane (duckdb/kube/arrow/tonic/rhai/gcp_auth = 0 occurrences); 623 tests green. Flag default-off → inert in prod until a future server rollout enables it (no prod deploy this session).
2026-06-22 noetl/tools v3.15.0 + new crate noetl-locator v0.1.0 📦 #104 Phase A — extract the slim dependency-free noetl-locator crate (pure std); noetl-tools re-exports it as noetl_tools::locator (noetl/tools#76, dc0c5d8; ai-meta pointer bumped). The Resource Locator (ResourceLocator/ResultCoordinates/shard_key/CellPlacement/legacy parse) is now a lean workspace member so the control-plane server parses the result URI without noetl-tools' heavy graph (duckdb/kube/arrow/tonic/rhai). feat(locator): subject → noetl-tools minor + the release CI published the new member noetl-locator 0.1.0 to crates.io (member-publish order: locator before the root crate). Worker stamp path unchanged. Mirrors the lean noetl-directives member (#92).
2026-06-22 noetl/worker v5.40.5 ✅ #95 — adopt noetl-tools 3.14.2 (postgres temporal/identity serialization) + ship to prod (noetl/worker#126, squash 60a849d → release da24952; ai-meta pointer bumped). Cargo.lock pin 3.14.13.14.2, deps-only, no worker source change; cargo check + cargo clippy --all-targets clean. Built the prod image via Cloud Build us-central1 (us-central1-docker.pkg.dev/noetl-demo-19700101/noetl/noetl-worker-rust:v5.40.5 @sha256:45212dbe…, --machine-type=e2-highcpu-8 --timeout=5400s) and rolled it by digest onto prod noetl-worker-rust + noetl-worker-system-pool (server stays v3.39.5, worker CPU req 250m/limit 2 kept) — rolling restart clean (2/2 + 1/1 Ready, 0 crashloop), off-server CQRS cutover stayed healthy (materializer sole-writer lag 0, command lag 0, project_errors 0, WAL index rehydrated 5 execs/198 events per #119).
2026-06-22 noetl/tools v3.14.2 ✅ #95 — postgres pg_value_to_json serializes tz-naive timestamp (NaiveDateTime) + date/time/uuid/numeric/bytea instead of null (noetl/tools#75, squash 06302ac → release 6d9b674; published to crates.io). The tool probed i64/i32/f64/bool/String/json/DateTime<Utc> and fell through to Value::Null for everything else — a plain timestamp column nulled even though the value was present (the auth0_login expires_at: null repro that tripped the gateway's validation). Added arms for timestamp (ISO-8601, no offset suffix), date/time, uuid (hyphenated lowercase), numeric/decimal (lossless postgres-numeric binary-wire decode → exact decimal string), bytea (base64). 409 lib tests + clippy + a kind-gated live-postgres before/after integration test (tests/postgres_temporal_kind.rs).
2026-06-22 noetl/worker v5.40.4 ✅ #127 — adopt noetl-tools 3.14.1 (task_sequence per-sub-task context optimization) + ship to prod (noetl/worker#125, squash 1a10a73 → release 0afbf5c; ai-meta pointer bumped). No worker source change — Cargo.toml pin noetl-tools 3.143.14.1; Cargo.lock resolves 3.14.1 so the runtime picks up the behavior-preserving per-sub-task context optimization; cargo clippy --all-targets clean (no new warnings vs baseline). Built the prod image via Cloud Build (us-central1-docker.pkg.dev/noetl-demo-19700101/noetl/noetl-worker-rust:0afbf5c) and rolled it onto prod noetl-worker-rust + noetl-worker-system-pool (server left on v3.39.5/.6, worker CPU req 250m/limit 2 kept) — rolling restart clean (pods Ready, 0 crashloop), off-server CQRS cutover stayed healthy (materializer sole-writer projected==acked, lag ~0, command lag 0, system-pool WAL index rehydrated per #119). Closes #127; the code-opt + the CPU-limit bump compound on the batch hot path.
2026-06-22 noetl/tools v3.14.1 ✅ #127 — behavior-preserving task_sequence per-sub-task context optimization (noetl/tools#74, squash 9dd9aa6 → release c8656c1; published to crates.io; ai-meta pointer bumped). The task_sequence drain rebuilt the template context per sub-task (running_ctx.clone() + 2–4× to_template_context() deep-clones + per-block ExecutionContext clones + a fresh context_to_value() per templated field), dominating worker CPU under 16 slots × 10k patients (the #127 hot path). TemplateEngine::render_value now builds the proxied minijinja context ONCE and threads it through the recursion (render_value_with/render_with; minijinja Value is Arc-backed → reuse is a refcount bump — helps every tool dispatch); new build_context_with_overlay(&variables, overlay) builds straight from &variables + a small overlay, skipping the intermediate to_template_context() HashMap deep-clone + per-block ExecutionContext clones in the set/policy paths. Isolated micro-bench (CPU held constant): per-sub-task context cost 2988.9µs→1147.1µs (−61.6%, 2.6×). 407 lib tests + 2 new equivalence pins + clippy clean. Worker adopts via worker#125 (v5.40.4).
2026-06-22 noetl/server v3.39.6 ✅ #123 — surface a non-iterable loop in: as a terminal playbook.failed (off-server drive no longer silently wedges at commands=0) (noetl/server#258, squash 275b914 → release 7f109a9; ai-meta pointer bumped). A loop step whose in: rendered to a non-iterable (e.g. an absent workload.batch_slots → null) used to wedge RUNNING-forever under the prod-default off-server drive: the system/orchestrate wasm plug-in returns a {"error":…} envelope, which apply_worker_orchestration couldn't decode → WARN + decode_error + Ok(0) → no terminal event. Fix: apply_worker_orchestration decodes the drive ERROR envelope (decode_orchestrate_error) and emits a terminal playbook.failed (metric noetl_orchestrate_drive_total{stage="drive_error"}, structured execution_id), matching the in-process drive; orchestrate-core prefixes the offending step name onto the existing evaluate_loop error. An empty iterable ([]/{}) still short-circuits to next. Server-only (worker stays v5.40.3); 600 server + 135 orchestrate-core tests + clippy clean; kind-validated prod-exact (PLUGIN_DRIVE=true+PUBLISH_ONLY=true+STATE_BUILDER=offserver): absent workload.batch_slots → FAILED with a clear message, valid [1,2,3] still COMPLETED 3-way, #120/#124 unaffected, 0 restarts. Code-only — ships to prod on a future server rollout; PROD GKE + all defaults untouched.
2026-06-22 noetl/server v3.39.5 ✅ #121 second-half — gate BOTH off-server-drive sites in trigger_orchestrator_inner on should_publish so system/* execs drive server-built run_state (regular execs keep off-server) (server#257, c421273, squash 54ac277); pure is_system_path helper + unit test; server-only (worker stays v5.40.3); 612 tests + clippy; kind full-gate system/scheduled_cleanupWAL chain incomplete → COMPLETED 0 loops, regular loop still off-server. #256 was only the first half.
2026-06-21 noetl/server v3.39.4 ✅ #121 first-half (partial) — off-server WAL-chain-incomplete loop on system/ executions FIXED: link the gate-off command.claimed through ChainHeads + don't route system/ execs to the off-server WAL drive (noetl/server#256, squash 28b17cb → release 77aaa06; ai-meta pointer bumped). Two distinct defects: (1) the gate-off claim_command raw in-tx INSERT (and the handle_batch_events gate-off batch INSERT) bypassed the emit_events/ChainHeads chokepoint, so command.claimed got prev_event_id=NULL and the head was never advanced — the off-server chain_walk_from then hit a NULL-prev non-genesis head → Incomplete; fired for every system/ playbook because should_publish is false for system execs even under global PUBLISH_ONLY=true. Fixed by stamping prev_event_id via link_batch on both gate-off INSERT paths. (2) The loop itself — the stateless off-server drive builds state from the noetl_events WAL, but system-execution events never enter the WAL → the worker's WAL build can never complete → __offserver_retry__ re-drive loop; fixed by gating the off-server WAL drive on should_publish(catalog_id) so system/ execs drive server-built. Server-only (no worker rebuild). 598 tests + clippy clean; kind-validated prod-exact (PUBLISH_ONLY=true+STATE_BUILDER=offserver): system/scheduled_cleanup orphan + 17×+ WAL chain incomplete (wedged 120s+) → linked chain, 0 loop lines, COMPLETED in 6s. Live prod off-server-cutover wedge is the real-world repro. PROD GKE + all defaults untouched.
2026-06-21 noetl/worker v5.40.3 ✅ #125 + #126 — adopt noetl-tools 3.14.0 (task_sequence control flow + http body data-shape fixes; 10×1000 batch pft_flow_test clean) (noetl/worker#124; ai-meta pointer bumped). No worker source change — Cargo.toml pin 3.133.14; Cargo.lock resolves 3.14.0 so the runtime runs tools#72 + tools#73. Clippy clean. Kind-validated: zero patient loss, invariants clean (roots=1/dangling=0, materializer sole-writer, never-scan). Perf follow-up #127 filed (throughput plateau ~60 patients/s vs Python ~54 s baseline — capped by pipeline overhead, not rate limiter). PROD GKE + all defaults untouched.
2026-06-21 noetl/tools v3.14.0 ✅ #125 — task_sequence honours all control-flow directives: do: jump/break/retry (noetl/tools#73; ai-meta pointer bumped). The Rust task_sequence runner matched only "fail" in the do: branch; do: jump/to: (loop to named step with infinite-jump guard), do: break (exit the sequence), do: retry (configurable attempts/backoff) were silently ignored — slots ran once and returned regardless of the directive. In pft_flow_test the batch loop's do: jump never re-entered, leaving 16 batches done and the rest pending (patient loss). Fix honours all three variants. Squash 62d0948 → semantic release 638c3c6. PROD GKE + all defaults untouched.
2026-06-21 noetl/tools v3.13.1 ✅ #126 — http tool body exposed under data (Python-era output.data.data contract restored; body back-compat alias kept) (noetl/tools#72; ai-meta pointer bumped). The Rust http tool nested the parsed response body under data.body, but playbooks written against the Python-era contract read output.data.data (the body). pft_flow_test's save_batch step passed the result to jsonb_to_recordset, which received a non-array object → Postgres error → zero rows written. Fix: expose the parsed body under data; keep body as a back-compat alias for any caller that uses the Rust-era key. Squash 86f0216 → semantic release 8dd0e1f. PROD GKE + all defaults untouched.
2026-06-21 noetl/server v3.39.3 ✅ #124 — distributed task_sequence forward set:/sibling bindings no longer render empty (noetl/server#255; ai-meta pointer 365d3be). Inside a task_sequence step, a later sub-task's templates referencing a value a prior sub-task produces at runtime (forward set:, policy set:, or a sibling result) were rendered empty at command-build time, before the worker's per-sub-task binding ran — render_pipeline_config rendered every non-set/args/spec/command field against the step-entry context under Chainable, collapsing unresolved {{ iter.* }}/sibling paths (e.g. pft_flow_test's fetch_batch url …/batch/{{ iter.data_type }}…/batch/ → 404 → 0 rows). Fix: new TemplateRenderer::render_value_deferring_unresolved renders only templates whose variable paths all resolve in the build-time context; any template with an unresolved path is preserved verbatim so the worker re-renders it against the per-sub-task running context. cargo test 134/134 + clippy clean; kind-verified (fetch_batch now hits the real per-type URLs). PROD GKE + all defaults untouched.
2026-06-21 noetl/server v3.39.2 ✅ #120 — reduce barrier no longer deadlocks (commands=0) on open/asymmetric loop joins (noetl/server#254; ai-meta pointer 28e8950). A back-edge U → T whose forward return path from T is absent (T does NOT forward-reach U) used to be counted by the reduce barrier as a genuine fan-in parent of T, so dispatch of T was deferred forever because U never runs on the taken path (pft_flow_test setup_facility_work stalled commands=0 for 4.5 min). Fix: a runtime liveness filter in the barrier — an upstream blocks dispatch only if it is live on the current path (done/skipped, entered/in-flight, or forward-reachable from a currently-active not-yet-terminal step); a declared predecessor that can never run no longer blocks. build_incoming_arcs unchanged (an open back-edge is still a genuine static fan-in). Affects the in-server and off-server drives identically (shared orchestrate-core); closed loops + genuine reduces unaffected. New unit test test_open_loop_back_edge_does_not_block_dispatch; cargo test 133/133 + clippy clean; kind-validated (post-fix the 2×2 off-server/gate matrix all COMPLETE, fanout_reduce/pagination/loop spot-checks green). PROD GKE + all defaults untouched.
2026-06-20 noetl/ops d6633f6 (ops#200) 🚀 PROD CQRS rollout RECORDED + LIVE-PROD validated — prod manifests pinned to the executed digests (server-rust v3.39.1 @sha256:197a6d10 + worker-rust v5.40.2 @sha256:41713265, worker replicas 1→2), configmap event-stream keys aligned to lowercase noetl_events, executed-flip runbook record. Prod LIVE on PUBLISH_ONLY=true + STATE_BUILDER=offserver (materializer sole writer); validated by a live-prod e2e run (28/30 execs PASS, sole-writer + clean-chain + never-scan, lag 0; the 2 non-PASS are pg-credential-unreachable env diffs, not cutover bugs). ai-meta 08c73e5.
2026-06-20 noetl/worker v5.40.2 ✅ #119 — off-server WAL state-builder drain rebuilds the in-memory index from the retained noetl_events WAL on every boot (worker-restart stall FIXED) (noetl/worker#123; ai-meta pointer 48b0bde). The authoritative drain used a durable noetl_state_builder consumer whose cursor persists across restarts while the in-memory WalEventIndex rebuilds empty → the cursor outran the fresh index → build_spine_to(expected_head) permanently Incomplete → off-server execs looped offserver_retry and never completed (this also hid the #118 symptom, which only manifests on completing execs). Fix (worker-only, inside NOETL_STATE_BUILDER=offserver — PROD runs the in-server drive so untouched): the drain now defaults to an ephemeral DeliverPolicy::All consumer rebuilding the full index from the retained WAL on every boot (no persisted cursor to outrun; also correct for >1 worker pod). Instant revert NOETL_STATE_BUILDER_DURABLE=1; proof = index rehydrated… log + noetl_worker_state_builder_indexed_executions gauge; never reintroduces a noetl.event scan. Gate-ON kind: forced mid-flight pod delete --force → new pod index rehydrated … indexed_executions=17 wal_events=200; single-replica 6/6 stress (~126 execs, roots=1/terminals=1/zero-scan) + multi-replica 21 execs COMPLETE. 224 lib tests + clippy green.
2026-06-20 noetl/server v3.39.1 ✅ #118 — exactly-one-terminal-per-execution FinalizedGuard (single-replica off-server terminal-finalize chain fork FIXED) (noetl/server#253; ai-meta pointer c5f8cb2). Under offserver + PUBLISH_ONLY on a single replica a straggler drive on a materializer-lagged WAL re-drove (state not terminal yet) and emitted a SECOND playbook.completed that linked to the now-evicted chain head → NULL prev_event_id orphan (2 roots) + a benign state-build event-scan. Fix: a bounded process-local FinalizedGuard suppresses the duplicate terminal at emit_events before the chain linker (a suppressed duplicate never advances/consumes the head); first terminal wins. Gate-off byte-identical (a duplicate never occurs on the synchronous in-process drive); metric noetl_terminal_dedup_total{suppressed}. Absent under multi-replica execution-affinity (#116 serializes finalize to the owner). Companion rig HARD terminals==1 assertion (noetl/e2e#73, fe97d92). Gate-ON kind (unblocked by #119): single-replica 6/6 stress iterations / ~126 execs all roots=1(incl. terminal)/terminals=1/zero-scan; multi-replica 21 execs clean; 597 lib tests + clippy green.
2026-06-20 noetl/worker v5.40.1 ✅ #117 — off-server from_events spine ordered by prev_event_id chain + walked from the real tip (high-concurrency fan-out reduce wedge FIXED) (noetl/worker#122; ai-meta pointer baeae78). The off-server spine was sorted by event_id, assuming id order == causal chain order. Under high-concurrency fan-out two branch completions arrive at the owner reordered vs their producer ids, so emit_events stamps a higher-id event as the predecessor of a lower-id one. The worker tracked the head as max(event_id) but ChainHeads.link_batch advances the watermark to event_ids.last() (the real tip) — so under the inversion a max-id walk MISSED the inverted tip and the fan-in reduce never fired (wedge ~1/9 on the 2-replica affinity topology). Fix (worker-only, inside NOETL_STATE_BUILDER=offserver — PROD untouched): build_offserver_input builds from expected_head (the real tip) via build_spine_to/advance_to/chain_walk_from and orders by the prev_event_id walk (head→root, reversed) — SpineOrder::Causal (default; NOETL_OFFSERVER_SPINE_ORDER=event_id reverts); staleness guard intrinsic. Byte-identical to the old sort for monotonic chains. 2-replica affinity gate-ON kind: stress 6/6 iterations, 108/108 COMPLETE, 15 execs with a real prev_event_id > event_id inversion all fired reduce_customer + completed; never-scan + sole-writer + roots=1 hold. 15+223 tests + clippy green.
2026-06-20 noetl/server v3.39.0 ✅ #116 program-scale step 2 — execution-affinity single-owner WRITE ORDERING (multi-replica gate-ON validated) (noetl/server#252; ai-meta pointer 5e00d0a). Closes the chain-fork race step 1 left open: the command.issued prev-read + head CAS-advance are two non-atomic steps, so concurrent cross-replica emits forked the chain. Affinity routes every trigger for an execution (POST /api/events, which fires the drive) to the single replica that sharding::ShardConfig::owns(execution_id) owns (stable XxHash64); a non-owner forwards a reverse-proxy POST (one-hop loop guard, degrade-to-local). On the owner the single-process drive lock + in-memory ChainHeads make the read→advance atomic, no distributed lock; KV is the genesis/handoff vehicle (owner resolves LOCAL → kv_remote_hit→0). New src/affinity.rs; flags NOETL_EXECUTION_AFFINITY / NOETL_PEER_URL_TEMPLATE / NOETL_SHARD_INDEX_FROM_HOSTNAME (all default off, prod unchanged); metric noetl_execution_affinity_total{outcome}. Multi-replica gate-ON kind (2-replica StatefulSet): rig PASS — chains roots=1/dangling=0/walk==total (NO fork), forwarded_ok +9, never-scan + sole-writer across replicas, executions COMPLETE; single-replica unchanged; 595 tests + clippy green. Follow-up #117: off-server from_events spine ordered by event_id wedges fan-in under a chain-order≠id-order inversion (high-concurrency fanout).
2026-06-20 noetl/server v3.38.0 ✅ #115 program-scale step 1 — multi-replica coherence DATA LAYER (NATS-KV-backed ChainHeads + ExecDescriptor); execution-affinity STAGED (noetl/server#251; ai-meta pointer 8f39a79). NOETL_REPLICA_COHERENCE=nats_kv (default local, prod unchanged) backs the off-server drive's watermark + descriptor with JetStream KV buckets (noetl_chain_heads, noetl_exec_descriptors) so 2+ replicas resolve the same value — head advance = CAS (one chain under concurrent emits), descriptor = CAS read-modify-write (seed + terminal merge); in-process maps become a write-through cache / degraded fallback (local → bit-identical). New src/coherence.rs (CoherenceKv, lazy buckets); ChainHeads/ExecDescriptors async; ExecDescriptor serde; metric noetl_replica_coherence_total{structure,op,outcome} (proof series kv_remote_hit). Kind: single-replica nats_kv bit-for-bit parity with local (all topologies COMPLETE, clean chains, scans +0); 2-replica proved cross-replica resolves work (kv_remote_hit advanced for head + descriptor, no kv_unavailable). Necessary but NOT sufficient — on 2+ replicas concurrent cross-replica emits still fork the chain (the issuing_event head-read vs head-advance is non-atomic across replicas); the remaining piece is execution-affinity (one replica owns an execution's drive + chain write; substrate present in src/sharding.rs), STAGED as step 2. 588 server tests + clippy green; default local, prod unchanged.
2026-06-20 noetl/server v3.37.0 ✅ #115 Phase 5 — atomic-working-item context (tenet 6): the drive hands a worker only its minimal declared slice (noetl/server#250; ai-meta pointer a96ade8). NOETL_ATOMIC_ITEM_CONTEXT (default false, prod unchanged). New orchestrate-core::input_bindinganalyze(step) statically extracts the base-context keys a step references (minijinja undeclared_variables; ctx.XX, bare step-name→key, injected roots→none), conservative (any unbounded ref → bounded=false → full context); project_context narrows the worker-bound context. CommandBuilder/WorkflowOrchestrator::with_atomic_item_context narrow the persisted context for plain non-loop steps while server-side rendering still runs against the full context; plugin input structs carry a #[serde(default)] flag; metric noetl_atomic_item_context_total{outcome}. Builds on #77 (Explicit Input Binding, CLOSED — the declaration surface). Gate-ON kind-validated: flag-ON consumer render_context = [producer_a] ONLY (all other accumulated keys dropped), COMPLETED, narrowed +1; flag-OFF full 8-key context, COMPLETED (back-compat); offserver regression COMPLETED / __orchestrate__ event=0 / lag-0; 7+132+584+10 tests + clippy green.
2026-06-20 noetl/worker v5.40.0 ✅ #115 Phase 5 — forward the atomic-item-context flag onto the off-server from_events drive input (noetl/worker#121; ai-meta pointer 2484d17). build_offserver_input forwards atomic_item_context onto the from_events OrchestrateInput so the off-server drive narrows each worker-bound command context to its minimal declared slice (the wasm reuses orchestrate-core's build_command); the run_state fallback already carries it. 10 state_builder tests + clippy green; default false (the server omits it → full-context dispatch unchanged).
2026-06-20 noetl/server v3.36.0 ✅ #115 Phase 6 — retire the hot-path noetl.event read class; the table is AUDIT-ONLY (noetl/server#249; ai-meta pointer b71ca1d). NOETL_EVENT_READ_PATH=event_scan|audit_only (default event_scan, prod unchanged). Phase 4 removed the drive's state-rebuild scan under offserver; Phase 6 retires the remaining lifecycle readers of noetl.event (the WHERE execution_id replay class outside the drive). Under audit_only get_catalog_id (per-ingest) + inherit_parent_trace + the subscription dedup-audit + container-callback catalog/existence reads serve from the in-memory execute-time ExecDescriptor; a cold descriptor (post-terminal straggler after the descriptor is evicted on terminal, or a restart mid-execution) resolves catalog_id from noetl.command (the synchronous queue, authoritative under the gate) — never a noetl.event scan. A cold descriptor never re-seeds (re-seeding an evicted terminal exec would re-accumulate the per-execution memory the eviction frees). New proof metric noetl_event_hotpath_reads_total{site,outcome}. Gate-ON kind-validated: hot-path scan Δ0 (served_descriptor +96 + served_command +3), drive state_build_total Δ0 + event_scans Δ0 ⇒ ZERO noetl.event scans anywhere on the hot path, end-to-end; linear/loop/fan-out/output_select COMPLETE; sole-writer + lag-0; audit still works (direct SELECT + status + replay event_count=25); 585 tests + clippy green; baseline restored. The RFC never-scan end state (tenet 3) is reached under the flag.
2026-06-20 noetl/server v3.35.0 ✅ #115 Phase 4 REMAINDER — stateless off-server drive edge (zero state rebuild + zero noetl.event reads) (noetl/server#248; ai-meta pointer 6e30fc3). Removes the residual server-side chain-walk bookkeeping on the drive path. Under NOETL_STATE_BUILDER=offserver a per-execution ExecDescriptor (catalog_id + routing seeded at playbook_started; terminal stamped at the emit_events chokepoint) lets the drive route system/orchestrate WITHOUT building WorkflowState; expected_head from the in-memory ChainHeads; trigger_event_id passed so the worker resolves the trigger type off its WAL; no server-built state rides the command. apply_worker_orchestration sources catalog_id+routing from the descriptor + evicts on terminal worker-built state. Cold descriptor (restart) falls through to the server-built path → chain_walk + event_scan stay fallbacks. Gate-ON kind-validated: noetl_state_build_total Δ0 + event_scans Δ0, dispatched_offserver_stateless+3 / applied_stateless+3, offserver==server parity, sole-writer 25==25, lag-0; 583 tests + clippy green; default server, prod unchanged. Completes #107 step 2 server-side.
2026-06-20 noetl/worker v5.39.0 ✅ #115 Phase 4 REMAINDER — stateless off-server drive (resolve trigger type off the WAL + no-op on incomplete chain) (noetl/worker#120; ai-meta pointer 8e1f651). ExecutionChain::event_type_of + build_offserver_input(trigger_event_id) resolve trigger_event_type off the pool WAL index when the server omits it (defaults command.completed). resolve_offserver_orchestrate_input returns `OffserverDispatch{Wasm
2026-06-19 noetl/worker v5.38.0 ✅ #115 Phase 4 DRIVE CUTOVER — off-server WAL build authoritative (noetl/worker#119; ai-meta pointer bef13e5). The shadow WalEventIndex is promoted to a shared pool-side index fed by an authoritative durable consumer (noetl_state_builder, explicit-ack — mirrors the materializer). dispatch_wasm detects an __offserver_build__ system/orchestrate command and builds the drive's WorkflowState from the WAL spine via the wasm run/from_events entry (zero noetl.event reads), with a staleness guard: it serves only once the index head ≥ the server's expected_head (a bounded retry waits for the drain), else falls back to the server-built run_state — so the WAL state is never staler than the server's view, preventing a lag-induced re-issue of a fan-in barrier step. New metric noetl_worker_state_builder_drive_builds_total{served|fallback_incomplete|fallback_disabled}. Gate-ON parity rig PASS (offserver==server fingerprint, fan-in fires once, served +3 / scans 0 / sole-writer / lag-0); 10 unit tests + clippy green; default off (prod unchanged).
2026-06-19 noetl/server v3.34.0 ✅ #115 Phase 4 DRIVE CUTOVER — mark the off-server drive command + carry expected_head (noetl/server#247; ai-meta pointer f0922bd). Under NOETL_STATE_BUILDER=offserver (default server, prod unchanged), trigger_orchestrator_inner marks the system/orchestrate command __offserver_build__ and carries execution_id + expected_head (the highest event applied to the server-built state) so the worker self-sources the drive state from the WAL once its pool-side index has caught up to that watermark — the server-built state still rides along as the worker's incomplete-chain fallback (worst case == today). Records noetl_orchestrate_drive_total{stage="dispatched_offserver"}. 580 lib tests + clippy green; no prod default changed. The server retains its chain-walk bookkeeping for terminal/cancel/catalog/routing; removing that residual rebuild is the staged Phase-4 remainder.
2026-06-19 noetl/worker v5.37.0 ✅ #115 Phase 4 — off-server state-builder kernel + WAL shadow loop (noetl/worker#118; ai-meta pointer bumped). src/state_builder.rs moves orchestrator WorkflowState construction off the server onto the system worker pool: a per-execution chain index sourced from the noetl_events WAL (not the materialized noetl.event table), each event carrying its prev_event_id; chain_walk() walks head→root and returns the spine in event_id order (== the server event-scan input → parity by construction, same from_events); a cache keyed by the immutable chain headCacheHit (unchanged head), Incremental (tail-only walk, pointer-continuity instead of COUNT(*)), ColdRebuild (miss/restart), terminal eviction. A live WAL shadow loop (NOETL_STATE_BUILDER_SHADOW, default off; ephemeral DeliverAll/AckNone consumer, never the materializer's durable one) proves the mechanics on-cluster. Metrics: noetl_worker_state_builder_{wal_events_total,event_scans_total,builds_total{outcome},chain_hops}. Gate-ON kind-validated: wal_events_total=993 / event_scans_total=0 / cold_rebuild=28 + incremental=21; shadow spines indexed==spine matching Phase-3 topologies; fresh fan-out COMPLETED gate-ON, sole-writer + lag-0. 8 unit tests + clippy green; default off → other workers unaffected. The offserver drive cutover is staged.
2026-06-19 noetl/server v3.33.0 ✅ #115 Phase 4 — NOETL_STATE_BUILDER=offserver|server flag scaffold (noetl/server#246; ai-meta pointer bumped). StateBuilder enum + AppConfig.state_builder (envy NOETL_STATE_BUILDER), default Server (prod/default behavior unchanged). offserver routes the drive to obtain WorkflowState from the pool-side off-server builder (worker v5.37.0 state_builder, reading the noetl_events WAL with a pool-side cache); the server event-scan + Phase-3 chain-walk stay as fallbacks. Flag only — the offserver drive-cutover wiring is staged (the pool-side builder + its WAL shadow loop landed first in worker v5.37.0; flipping to offserver before the wiring lands is a no-op on the build path). 2 config tests; cargo build + clippy green; no prod default changed.
2026-06-19 noetl/server v3.32.0 ✅ #115 Phase 3 — chain-walk state builder (flagged, default-off) (noetl/server#245; ai-meta pointer bumped). Behind NOETL_STATE_BUILD_MODE=chain_walk the orchestrator drive reconstructs WorkflowState by following the one-level prev_event_id chain (Phase 2) from the in-memory ChainHeads head back to the genesis event, each hop a (execution_id, event_id) PK point-lookup — never a WHERE execution_id scan of noetl.event. The collected events feed the SAME WorkflowState::from_events, so the built state is equivalent to the event-scan build (parity by construction; orchestrate-core unchanged). Conservative fallback to event_scan (the unchanged default) on cold-head / materializer-lag / non-genesis / empty so correctness is never sacrificed. NOETL_STATE_BUILD_PARITY_CHECK shadow-builds both ways inside one REPEATABLE READ snapshot and records noetl_state_build_parity_total{match|mismatch}. New metrics: noetl_state_build_total{mode,outcome}, noetl_state_build_event_scans_total (no-scan proof — 0 under chain_walk), noetl_state_build_chain_hops, noetl_state_build_parity_total. Gate-ON kind-validated: parity 41/41 MATCH, scans=0 / 1064 PK hops / 0 fallbacks across 5 topologies, all COMPLETE, sole-writer + lag-0, 577 lib tests + clippy green. The in-process proof of tenets 3/4 before the off-server builder + cache (Phase 4, in progress). No prod default changed.
2026-06-19 noetl/server v3.31.0 ✅ #115 Phase 2 — one-level prev_event_id event chain (noetl/server#244; companion DDL noetl/noetl#667 ecd16a2; ai-meta pointer afdb365). Each noetl.event gains prev_event_id (the immediately-previous event in causal order) + each noetl.command the issuing-event link, stamped at the emit chokepoint emit_events from a per-execution chain-head watermark (ChainHeads) — one path covers drive events + command.issued + worker-lifecycle on both the gate-off INSERT and the gate-on publish, the materializer persisting it. Additive — nothing reads the link yet (that is Phase 3, the chain-walk state builder, in progress). Chain-correctness proven gate-ON walkable / 1-root / no-gap / no-scan across 6 topologies (linear, loop, fan-out w/ real shared branch origin, sub-playbook, + Phase-1 output_select/storage_tiers bounded); 573 lib tests + clippy green. PROD untouched; no gate default changed.
2026-06-19 noetl/worker v5.36.0 ✅ #115 Phase 1 — selective render-time ref resolution (refs-in-state consume side) (noetl/worker#117). resolve_context_references resolves a noetl:// ref only when this command's tool input binds the step's bulk — a path the bounded extracted summary can't satisfy (whole-object bind, .data over a summarised rowset, array element past [0], _truncated node); predicate / scalar / _ref access reads off the summary with no store round-trip, and an upstream result the step doesn't consume stays a reference (foreign bulk never inflates the render). Pairs with server#243 (_ref/_store surfacing + refs_in_state default true). Closed #113 + #114; kind gate-ON all 9 stalls COMPLETE (max command ctx 412KB, 0 __orchestrate__ event rows, lag 0). 7 new unit tests.
2026-06-19 noetl/server v3.30.0 ✅ #115 Phase 1 — surface _ref/_store on kept refs + refs_in_state default true (noetl/server#243). hydrate_result_references keep_refs branch merges the reference block's ref/store/uri onto the bounded extracted summary it surfaces as context.data (with_ref_accessors), so reference-only consumers — {{ step._ref }} (artifact.get lazy-load), {{ step._ref is defined }} / {{ step._store }} (storage-tier predicates) — resolve off the summary without bulk. refs_in_state now defaults true (the #114 experiment proved the flip needs the worker consume side first — landed in the same change set as worker#117) — references stay out of state + commands, drive state + command.issued bounded. Builds on #113 (v3.29.4) + #114 (v3.29.5). Closed #113 + #114; all 9 stalls COMPLETE gate-ON. NOETL_REFS_IN_STATE=false reverts.
2026-06-19 noetl/server v3.29.5 🐞 #114 — offload oversized command context (noetl/server#242). Under the publish-only gate the off-server drive (refs_in_state=false) embedded the full upstream context into the next step's command, so its command.issued event reached ~1.32MB > NATS max_payload (1MB) → publish never acked → wedge. When a command context's {tool_config, args, render_context} exceeds NOETL_COMMAND_CONTEXT_MAX_BYTES (default 512KB) persist_engine_command(s) offload it to noetl.result_store with a {__context_ref__} marker; get_command/claim_command resolve it before the worker sees the command (new metrics noetl_orchestrate_drive_total{stage=context_offloaded|context_ref_resolved}). Within-budget commands unchanged; apply_event reads only command.issued meta so rebuild/sole-writer/idempotency hold. Kind gate-ON: off-server rig PASS (new test_oversize_command_context COMPLETED, max command.issued ctx 585B, offload+resolve fired, 0 __orchestrate__ event rows, lag 0); all command.issued events <1MB; 6 of #113's 9 fixtures COMPLETE. Chose ref-on-oversize over refs_in_state=true (a kind experiment showed the latter breaks bulk-consuming fixtures — consume side not impl); remaining 3 fixtures + cutover gated on #101. Companion rig e2e#64. No prod default flipped.
2026-06-19 noetl/server v3.29.4 🐞 #113 — off-server drive: recover offloaded drive result + stop drive on cancel (noetl/server#241). Two off-server-drive (NOETL_ORCHESTRATE_PLUGIN_DRIVE) fixes: (1) when an __orchestrate__ drive result exceeds the worker's 100KB inline budget it is offloaded to the durable result store with only a reference.refapply_worker_orchestration now resolves the ref + decodes it (new metric noetl_orchestrate_drive_total{stage=ref_resolved}) instead of dropping the drive decision → non-convergent re-loop; (2) cancel emits playbook_cancelled (underscore) which apply_event now matches + a terminal guard (ExecutionState::is_terminal) evicts the orch-cache and stops re-dispatch — no server restart. Kind gate-ON proven: 785KB drive result → ref_resolved→COMPLETED, 0 decode WARNs; cancel froze a drive-loop instantly; sole-writer lag 0. 5/9 #113 large-context fixtures now COMPLETE; the other 4 hit a distinct oversized-command.issued NATS-payload stall tracked in #114. Companion rig e2e#63. Default behaviour unchanged; no prod default flipped.
2026-06-19 noetl/worker v5.35.0 🛡️ #103 — materializer-lag gauge (the PUBLISH_ONLY flip guardrail) (noetl/worker#116). Extends the JetStream lag poller to also track the noetl_events/noetl_materializer consumer when NOETL_MATERIALIZER_ENABLED is set, recording into the existing noetl_worker_nats_consumer_pending{stream,consumer} (+_ack_pending) gauge. The poller is an independent task, so the gauge climbs even when the materializer loop has stalled/died (it can't report its own lag). consumer_lag_for(stream,consumer) queries an arbitrary consumer over the same connection (no new connection/metric). Cheap: one extra consumer-info round-trip per tick, system pool only. The metric the ops materializer-lag alerts read. Kind-proven: gauge climbs 0→684 under induced lag, alerts fire, drains→0 on recovery, alerts clear. Default-off.
2026-06-19 noetl/server v3.29.3 🎯 #103 — server CQRS cutover COMPLETE, FLIP-READY: cancel/finalize through the chokepoint (noetl/server#240). The 2 ExecutionService terminal writers (cancelplaybook_cancelled, finalizeplaybook_completed/playbook_failed) now route through the emit_event chokepoint, honouring NOETL_EVENT_INGEST_PUBLISH_ONLY — closing the last synchronous server noetl.event writer under the gate. ExecutionService carries AppState; resolve_catalog_id falls back to noetl.command (mirrors #236). Kind-proven both modes (gate-off byte-identical INSERT, error preserved; gate-on cancel/finalize PUBLISHED, materializer sole writer, terminal state, 0 loss/dup). All three flip blockers now closed → flipping PUBLISH_ONLY on is a staged operator decision. Default-off; no prod default changed. Companion e2e rig e2e#62.
2026-06-19 noetl/server v3.29.2 #104 — off-server-drive × gate crash-recovery: cold-cache rebuild (noetl/server#238). apply_worker_orchestration rebuilds WorkflowState from the durable log on a cold-cache apply (server restarted mid-drive between dispatching __orchestrate__ and its call.done) instead of dropping the in-flight off-server drive result — the #104 "rebuild from the WAL/projection" principle. Confined to the cold branch the warm happy path never enters; idempotent re-apply; adds noetl_orchestrate_drive_total{stage=cold_rebuild|cold_rebuild_failed}. Proves the off-server drive × PUBLISH_ONLY gate compose (the combo #103 left unproven): kind happy-path 0 server writes / materializer sole writer / read-your-writes, and hard-kill-mid-drive crash-recovery → cold_rebuild → COMPLETED. Companion e2e rig e2e#61. Unblocks the flip (only the 2 cancel/finalize sites remain). Default-off.
2026-06-19 noetl/worker v5.34.0 #103 — in-process CQRS event materializer (ack-after-materialize) (noetl/worker#115). Opt-in (NOETL_MATERIALIZER_ENABLED, default off, system pool) consume-loop: drains noetl_events with deferred ack (noetl-tools 3.13), POSTs events/project, acks each batch only on 2xx — on failure it stays un-acked → JetStream redelivers, no loss. Chosen over playbook deferred-ack (step model can't hold an ack handle across pods/steps). Closes the ack-after-materialize durability gap before the PUBLISH_ONLY flip. Kind fault-injection: gate-on sole-writer happy path zero-loss; fault before ack → redeliver → materialize, loss=0, idempotent.
2026-06-19 noetl/tools v3.13.0 #103 — deferred (ack-after-processing) ack capability (noetl/tools#71). AckMode::Defer in the subscription SourceClient: poll surfaces a durable per-message ack handle (NATS $JS.ACK reply subject — connection/process-independent within ack-wait) instead of acking inline; SourceClient::ack(ack_ids, AckDisposition) = Ack/Nack/Term (NATS + Pub/Sub impls); tool operation: ack|nack|term. Opt-in — on_success/manual/none unchanged. The capability the worker materializer (v5.34.0) drives for ack-after-materialize.
2026-06-18 noetl/server v3.29.1 #103 2d-3 — catalog_id FK fix: gate-on sole-writer now COMPLETES end-to-end (noetl/server#236). Under PUBLISH_ONLY noetl.event is empty, so get_catalog_id returned catalog_id=0 for worker-emitted events → event_catalog_id_fkey violation → the materializer's batch INSERT failed → ack-on-fetch'd events lost. Now falls back to noetl.command (synchronous under the gate). Kind-proven: a full gate-on execution COMPLETES via the materializer-as-sole-writer path — server writes 0 noetl.event rows, materializer writes all 31, drained=materialized=31, zero loss, ordering+idempotency hold. Default-off.
2026-06-18 noetl/server v3.29.0 #103 2d-3 — CQRS sole-writer chokepoint + PUBLISH_ONLY gate (default-off) (noetl/server#235). An emit_event chokepoint routes all 13 server-side noetl.event producer sites; NOETL_EVENT_INGEST_PUBLISH_ONLY (default false) flips them from synchronous INSERT to a publish to noetl_events so the system/event_materializer becomes the sole writer, with the orchestrator trigger relocated to events_project (read-your-writes) and system/* drainers exempt. Kind: gate-off byte-identical (off-server e2e PASS + 25/25 regression); gate-on a fresh exec writes 0 noetl.event rows (server no longer writes the log). Operator-gated; no prod default flipped.
2026-06-18 noetl/server v3.28.0 #108 (c) — worker-driven orchestrator drive now DEFAULT ON (noetl/server#233, closes #108). NOETL_ORCHESTRATE_PLUGIN_DRIVE defaults true after a kind scale soak proved zero noetl.event burst + full system-pool isolation (694-drive cursor+fan-out COMPLETED, __orchestrate__ event rows = 0, all drives on the system pool; default-on path reproduced it; 15/15 regression green). In-process drive kept as the =false revert.
2026-06-18 noetl/worker v5.33.0 #108 (b) — pool-affinity decline (noetl/worker#114). A worker declines (ACK+skip) command notifications whose execution_pool differs from its own segment, so the orchestrate drive lands on the dedicated system pool even under JetStream consumer-filter drift.
2026-06-17 noetl/worker v5.31.2 #101 — rebuild ctx/workload shims at render (noetl/worker#90). Paired with server#207: the worker reconstructs the ctx/workload namespace shims at render time (the server stopped persisting them on commands). Kind-validated with v3.15.3: 7300+ context-dependent commands completed, 0 errors.
2026-06-17 noetl/server v3.15.3 #101 — stop persisting ctx/workload shim copies (noetl/server#207). The orchestrator built ctx/workload namespace aliases by deep-copying the whole context + persisted them on every command — a 1.7MB step output ballooned command.issued to ~5MB, re-written every cursor frame (it was blocking the CQRS stream). The shims are a rendering concern → dropped from the persisted command; worker#90 rebuilds them at render.
2026-06-17 noetl/server v3.15.1 #102 — batch event-log multi-row INSERT (noetl/server#199). handle_batch_events does one QueryBuilder multi-row INSERT instead of N individual INSERTs in the txn loop.
2026-06-17 noetl/server v3.14.0 → v3.15.0 #103 — CQRS phase 2 (all default-off) (noetl/server#202 event-log→JetStream tailer; server#215 projector owns projection_snapshot + noetl_materializer consumer + shared normalize_event_to_row / POST /api/internal/events/materialize; server#216 build hotfix). The write-path CQRS scaffold; the 2d-3 cutover flip is the next step.
2026-06-17 noetl/worker v5.31.1 #105 — wasm plug-in input flows (inputargs) (noetl/worker#110). wasm_config_to_ref reads the server's canonical args (a step's input:), input fallback. Re-validated on kind: a wasm playbook with input: {hello: world} now lands {"hello":"world"} (17 bytes) in noetl.object_store (was empty) — the full WASM dispatch path carries real data end to end.
2026-06-17 noetl/server v3.13.0 #105 — accept tool_kind: "wasm" (noetl/server#214). ToolKind::Wasm in the playbook schema (same pattern as Subscription) — a tool: {kind: wasm, plugin: {...}} step registers + emits a wasm command. The last piece of the live WASM e2e.
2026-06-17 noetl/worker v5.31.0 #105 — wasm-plugin on by default (noetl/worker#108). The dispatch routing landed, so the deployed worker carries the wasmtime host + routes tool_kind: "wasm". Kind-validated: PFT green on host-carrying workers + a real wasm playbook routed to the host (object_put landed in noetl.object_store).
2026-06-17 noetl/worker v5.27.0 → v5.28.0 #105 Round 5 â�� WASM dispatcher + Feather-tier flush (noetl/worker#101 dispatcher: load→run→collect→flush; noetl/worker#103 object_put → object-store endpoint). The plug-in runtime + its durable boundary-correct data path are complete behind off-by-default wasm-plugin.
2026-06-17 noetl/server v3.12.0 #105 Round 5 â�� object-store Feather tier (noetl/server#212). noetl.object_store + PUT/GET /api/internal/objects/{*key} â�� server-mediated backend for noetl.object_put, keyed by the §7 physical key.
2026-06-16 noetl/worker v5.26.0 #104 R02b � stamp the logical URI on result references (noetl/worker#99; refs noetl/ai-meta#104). Over-budget durable references now carry reference.uri = noetl://<tenant>/<project>/results/<eid>/<step>/<frame>/<row>/<attempt> (the §8 Resource Locator), collision-free across cursor fan-out. nats::source::translate copies the orchestrator-stamped cursor.{frame,row} into render_context; the dispatch site builds the URI via noetl_tools::locator (3.12.0). Additive; first consumer is the materialiser.
2026-06-16 noetl/worker v5.25.0 #105 Round 5 � reference Rust�wasm plug-in (noetl/worker#97; refs noetl/ai-meta#105). plugins/reference-materializer: a hand-written Rust plug-in compiled to wasm32-unknown-unknown (303 bytes, no_std, no WASI, imports only noetl.object_put); host test loads the compiled .wasm and asserts the capability call � a real compiled plug-in (not WAT) runs on the host through the data-plane ABI + capability ring. Validates the full stack: registry � HTTP source � host � real plug-in.
2026-06-16 noetl/tools v3.12.0 #104 R02 � two-level cursor fan-out coordinate (noetl/tools#70; refs noetl/ai-meta#104). ResultCoordinates gains row; logical URI + §7 physical key become .../<frame>/<row>/<attempt> � two rows of one cursor frame no longer collide (a mode: cursor loop fans out twice: frame of rows � body command per row). Prereq for the worker stamping reference.uri (R02b).
2026-06-16 noetl/server v3.11.0 #105 Round 4 � plug-in module registry (noetl/server#210; refs noetl/ai-meta#105). noetl.plugin_module table keyed by (path, version) (digest + media_type + BYTEA) + POST/GET /api/internal/plugins/{*path} � the durable PluginSource backend for the system worker pool's wasmtime host. Mirrors result_store; 666 tests unaffected.
2026-06-16 noetl/worker v5.24.0 #105 Round 4b � HTTP PluginSource (noetl/worker#95; refs noetl/ai-meta#105). HttpPluginSource fetches a compiled module from the server registry on a cache miss (GET /api/internal/plugins/...; 200�bytes, 404�NotLoaded, 409�stale-digest), closing the catalog-loading loop server�host�wasmtime. PluginSource made async. Behind off-by-default wasm-plugin.
2026-06-16 noetl/worker v5.23.0 #105 Rounds 1-3 � wasmtime plug-in host (noetl/worker#93; refs noetl/ai-meta#105). WasmPluginHost: capability Linker ring, module cache keyed by PluginKey{path,version,digest}, hot-reload via evict_other_versions; Arrow byte data-plane ABI (invoke_bytes � alloc-export + linear memory, real Arrow IPC round-trips byte-identical); materialiser capability ring (noetl.event_publish/result_put/object_put, deny-by-default). Off-by-default wasm-plugin feature (wasmtime 27).
2026-06-16 noetl/tools v3.11.0 #104 Round 01 � Resource Locator (noetl/tools#68; refs noetl/ai-meta#104). noetl_tools::locator: ResourceLocator (stable §8 logical URI), ResultCoordinates (§7 physical key), stable FNV-1a shard_key (reproducible across binaries/arch/time, locked test), CellPlacement, legacy parse � the single source of truth for result naming. (Fan-out (frame,row) refinement � tools#70, open.)
2026-06-15 noetl/server v3.9.0 #101 � bounded-memory orchestrator + stall-proof reconcile (noetl/server#197; refs noetl/ai-meta#101). Projection-snapshot bounded rebuild (flat memory � 167KB snapshot at 200k events, was OOM at ~19k); throttled O(events) consistency COUNT off the hot path; background reconcile poller (force-advances every active execution every 8s � no permanent deadlock under DB backpressure); results-by-reference resolution; GET /api/executions/{id} memory-bomb fix (was loading all events). Validated kind 10�1000 (flat memory) + GKE db-g1-small/PgBouncer 10�200 (poller broke a stall, 0 fails/restarts).
2026-06-15 noetl/server v3.8.0 #100 � cursor/claim loop mode + output namespace (noetl/server#196; closes noetl/ai-meta#100). LoopMode::Cursor, CursorClaim, FrameSpec; orchestrator entry hook � advance � drain (__cursor_drained, event.name = loop.done); StepInfo.is_cursor; output namespace alias for arc when: / step set:; loop-back re-entry frame reset. Validated via test_pft_flow_v2 on kind (all_passed: true, 5/5 per data type).
2026-06-15 noetl/tools v3.10.1 #100 � postgres -- comment splitter fix (noetl/tools#66; closes noetl/ai-meta#100). The multi-statement splitter now skips -- line comments before scanning for ;. An apostrophe inside a comment was consuming the trailing semicolon and merging subsequent statements, producing "cannot insert multiple commands into a prepared statement" at setup_facility_work.
2026-06-15 noetl/worker #88 dep bump #100 � noetl-tools v3.10.1 bump (noetl/worker#88; refs noetl/ai-meta#100). Pulls noetl-tools v3.10.1 (postgres splitter fix) so the worker's postgres tool no longer merges statements that follow an apostrophe-containing -- comment.
2026-06-14 noetl/worker v5.22.0 #99 � transfer endpoint credential-alias resolution (noetl/worker#87; closes noetl/ai-meta#99). Pre-resolves the keychain alias on each transfer endpoint (source.auth/target.auth) before dispatch, mirroring the task_sequence pre-resolution pattern; bumps noetl-tools dep to 3.10.0. Full bidirectional data_transfer/snowflake_postgres fixture green on kind against the live sf_test account.
2026-06-14 noetl/tools v3.10.0 #99 � Snowflake�Postgres transfer arms + credential-alias endpoint config (noetl/tools#65; closes noetl/ai-meta#99). Implements the (Snowflake,Postgres) and (Postgres,Snowflake) directions (previously marked supported but unimplemented). SF�PG: new SnowflakeTool::query_rows returns rows as Vec<HashMap<String,String>>; target column types fetched from information_schema.columns; each cell coerced via $n::text::<udt>; Snowflake's internal TIMESTAMP_TZ format (<epoch>.<nanos> <tzmin>) reformatted to RFC3339 before the PG cast. PG�SF: generates SQL-escaped INSERT statements. SourceConfig/TargetConfig add #[serde(flatten)] extra so the worker can inject resolved credential fields (account, warehouse, private_key, connection string, etc.). Kind-validated end-to-end against the live account: id(int) / name(text) / value(numeric 100.50) / created_at(timestamptz 2026-06-15 05:55:01.262+00) / metadata(jsonb) all coerced correctly in the SF�PG direction; all 5 rows moved in PG�SF.
2026-06-14 noetl/tools v3.9.2 #98 � Snowflake SQL-API context in request body + multi-statement split (noetl/tools#64; refs noetl/ai-meta#98). Sets warehouse/role/database/schema in the request body (the SQL REST API rejects USE statements, code 391911); splits multi-statement command: blocks on ; (the API runs one statement per request; a whole block fails with code 000008); omits database/schema for CREATE/DROP DATABASE.
2026-06-14 noetl/tools v3.9.1 #98 � set User-Agent on the Snowflake HTTP client (noetl/tools#63; refs noetl/ai-meta#98). Snowflake's SQL REST API rejects requests with no User-Agent (400 / code 391903); reqwest sends none by default.
2026-06-14 noetl/tools v3.9.0 #98 � Snowflake key-pair JWT authentication (noetl/tools#62; refs noetl/ai-meta#98). RS256 JWT with iss = <ACCOUNT>.<USER>.SHA256:<base64(SHA256(DER))>, sub = <ACCOUNT>.<USER> (both uppercased, region segment dropped from account); sent as a Bearer token with X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT, bypassing password/MFA. New SnowflakeConfig.public_key (PEM) field. Deps: jsonwebtoken 9 + pem 3. Kind-validated against live sf_test account (NDCFGPC-MI21697): create_sf_database + setup_sf_table both COMPLETED.
2026-06-12 noetl/gateway v3.4.0 #92 � de-vendor the directive engine onto shared noetl-directives (noetl/gateway#29; refs noetl/ai-meta#92). Drops the serde-only vendored src/ingress/directives.rs for noetl-directives = "0.1" � the security-sensitive allowlist now has ONE implementation shared with the worker/tools runtime. Edge stays lean: cargo tree shows no duckdb/kube/noetl-tools creep. Pins time =0.3.47 (async-nats E0119 under rustc 1.92).
2026-06-12 noetl/tools v3.8.0 + noetl-directives v0.1.0 #92 � extract shared noetl-directives crate (noetl/tools#61; refs noetl/ai-meta#92). New lean (serde+thiserror only) noetl-directives workspace member owns the header/attribute directive engine (Control/DirectiveRule/DirectiveSpec/DispatchPlan/TraceContext/extract_w3c_trace/normalize_http_headers); tools::source re-exports it so worker call sites are unchanged. noetl-directives 0.1.0 published first (new crate), then noetl-tools 3.8.0. Deferred: noetl-spool extraction (single consumer, no drift).
2026-06-12 noetl/worker v5.20.0 #94 + #93 � wire s3 spool backend + cross-restart drain recovery (noetl/worker#80; refs noetl/ai-meta#94 + noetl/ai-meta#93). SpoolBackendKind::S3 arm (keychain-auth bucket credential � S3Backend); SpoolRuntime::recover_on_startup seeds recv_seq from the surviving spool's high-water + auto-drains on boot (closes the gcs/s3 in-memory-circuit cross-restart gap). Bumps noetl-tools 3.5�3.7.1; pins time =0.3.47 (async-nats E0119 under rustc 1.92). Live on kind (MinIO): outage�6 buffered�restart�startup auto-drain�6 replayed in order, idempotent, no loss.
2026-06-12 noetl/tools v3.7.1 #94 + #93 � s3 spool backend + cross-restart recovery helpers + time pin (noetl/tools#58 s3, noetl/tools#59 recovery, noetl/tools#60 time pin; refs noetl/ai-meta#94 + noetl/ai-meta#93). spool::S3Backend � hand-rolled AWS SigV4 over reqwest+hmac/sha2 (no AWS SDK; S3/MinIO/R2/B2), s3 feature on by default; recv_seq_from_object_key + SpoolEngine::high_water_recv_seq for the worker's startup drain. Pins time =0.3.47 (0.3.48 broke async-nats 0.38 with E0119 under rustc 1.92, failing the 3.6.0/3.7.0 publishes). s3 put/list/get/delete proven live against MinIO.
2026-06-12 noetl/server v3.5.0 #90 Phase 7 � POST /api/execute/batch + opt-in exactly-once dedup window (noetl/server#189; closes noetl/server#188; tracks noetl/ai-meta#90). Batch endpoint creates N executions in one round-trip with partial-failure containment, reusing the single-execute execute_one path so per-message routing/trace/dedup are intact (server still owns every DB write). Opt-in dedup window (RFC §10 OQ1): noetl.subscription_dedup (idempotent startup DDL, bounded by age, cluster-pool authority), execute takes dedup: { key, window_secs } scoped by the subscription; a duplicate within the window collapses to the existing execution + a subscription.message.deduplicated event; race-safe via INSERT � ON CONFLICT; default off. Validation of dispatch.batch_dispatch/batch_max/dedup/limits. noetl_execute_outcomes_total + noetl_execute_batch_size. Live on kind: batch 12�12 COMPLETED; dedup duplicate�1 execution; direct-curl within/outside-window + dedup-off green.
2026-06-12 noetl/worker v5.19.0 #90 Phase 7 � batch dispatch + dedup opt-in + per-subscription rate limits (noetl/worker#79; closes noetl/worker#78; tracks noetl/ai-meta#90). When dispatch.batch_dispatch the runtime drains a backlog � execute_batch in chunks of batch_max (each item its own playbook/pool/trace/dedup, per-message traceability intact); opt-in dedup stamps the block (idempotency_key�message_id, OQ8); per-subscription rate limits (RFC §9) via a new deterministic token-bucket RateGovernor (src/ratelimit.rs) enforced on the fetch side � over the cap the runtime stops fetching (source keeps the backlog, redelivers � no loss) + a subscription.rate_limited event. New batch/rate-limit counters. Live on kind: batch 12�12 + per-message traceparent; rate-limit engaged + 10/10 � executions (no loss).
2026-06-12 noetl/cli v4.11.0 #90 Phase 6 � noetl subscribe, local-mode subscription listener (noetl/cli#60; closes noetl/cli#59; refs noetl/ai-meta#90). noetl subscribe <spec.yaml> runs a kind: Subscription listener standalone in local mode � no k8s, no NATS-dispatch server for the listening itself � reusing the same noetl_tools::tools::source clients + header-directive engine + noetl_tools::spool engine the in-cluster worker uses, and emitting the same ExecutorEvent envelope to a local FileEventSink (one event/line JSONL � replayable trail). Local dispatch (RFC §5.3): in-process via PlaybookRunner (the pure-local default) or POST /api/execute (--dispatch server). local_disk spool (§8.6): circuit breaker + buffer + ordered replay + idempotency + dead-letter against a local dir, circuit state in a local file. src/subscribe/{mod,spec,sink,dispatch,runtime,spool}.rs + examples/subscribe/. cli-only � the source+spool surface already ships in noetl-tools v3.5.0, so no tools change / crate cascade (bumps the noetl-tools lock 3.0.0 � 3.5.0). 12 subscribe tests + full bin suite (53) green incl. a deterministic outage�spool�ordered-replay�idempotency proof on the real engine. Live (in-cluster NATS on kind): 5/5 drained � in-process dispatch � COMPLETED; local_disk spool outage � 6 buffered (no loss) � recovery � 6 replayed in order � drained to 0. Wiki: cli subscribe.
2026-06-12 noetl/tools v3.5.0 #90 Phase 5 � gcs store-and-forward spool backend (noetl/tools#56; closes noetl/tools#55; tracks noetl/ai-meta#90). GcsBackend � the GCS impl of the Phase-4 SpoolBackend trait over the JSON API, reusing the existing GcpAuth (ADC) + reqwest (no new dependency); one bucket / many subscriptions by prefix, live+dlq split, recv_seq-ordered keys, idempotent put/delete; gcs feature (default-on). Live round-trip proven against a real GCS bucket. Finding filed: real Pub/Sub sync-pull needs timeout_ms >= 10s (tools#57).
2026-06-12 noetl/worker v5.18.0 #90 Phase 5 � Cloud Run parity (noetl/worker#77; closes noetl/worker#76; tracks noetl/ai-meta#90). Wires spool.backend: gcs into the WORKER_MODE=subscription run-loop (ADC / Workload Identity; in-memory circuit out-of-cluster); optional NOETL_INTERNAL_API_TOKEN bearer auth to the control plane; $PORT-aware metrics/health bind (Cloud Run startup probe, no new HTTP code). Live out-of-cluster proof green.
2026-06-12 noetl/server v3.4.2 #90 Phase 5 � gcs/s3 spool credential optional (ADC / Workload Identity) (noetl/server#187; closes noetl/server#186; tracks noetl/ai-meta#90). The Phase-4 validation required a keychain credential for gcs/s3; the Cloud Run runtime uses Workload Identity on the platform bucket. credential now optional: absent � ADC, present � tenant-bucket keychain alias. bucket stays required.
2026-06-12 noetl/tools v3.4.0 #90 Phase 4 � store-and-forward spool engine + per-downstream circuit breaker (noetl/tools#54; closes noetl/tools#53; tracks noetl/ai-meta#90). noetl_tools::spool: pure circuit breaker (trip/half-open/close; NATS-KV-serializable; one breaker per declared downstream � OQ2), SpoolItem (SHA-256 + noetl://spool/<sub>/<recv_seq>/<id> ref + recv_seq-ordered object keys), SpoolBackend trait + nats_object (reuses NATS Object Store) + local_disk, and the engine (ordering global/per_key/none + idempotency idempotency_key�message_id + poison�dead-letter + retention max_age/max_bytes/on_full + GC) + http/tcp/nats probes. 44 unit tests under simulated outage + a real-NATS nats_object integration test.
2026-06-12 noetl/worker v5.17.0 #90 Phase 4 � spool wired into the subscription run-loop (noetl/worker#75; closes noetl/worker#74; tracks noetl/ai-meta#90). SpoolRuntime: stands the engine up over a backend, persists per-downstream circuit state in NATS KV (survives a restart mid-outage), drives active downstream probes, routes each message (closed�dispatch / open�spool+ack), drains on recovery (ordered replay + idempotency + dead-letter), emits the 6 events + noetl_subscription_spool_bytes gauge. buffer_and_ack (push default) + hybrid loss-safe; off = spool disabled. Live outage proof green on kind.
2026-06-12 noetl/server v3.4.1 #90 Phase 4 � spool config validation + subscription lifecycle-status fix (noetl/server#184 + noetl/server#185; closes noetl/server#183; tracks noetl/ai-meta#90). Validate the spec.spool block at registration (mode/backend/bucket/path/credential/ordering + interleave-vs-global guard). Fix: the lifecycle-status reconstruction queries matched subscription.% so an open circuit's subscription.circuit.opened (status OPEN) 500'd subscription_get/activate � now matches only the six lifecycle event types. Surfaced + fixed during the Phase-4 live kind outage validation.
2026-06-11 noetl/gateway v3.3.0 #90 Phase 3 � push-ingress (Mode C) + auth-gated directive trust (noetl/gateway#28; closes noetl/gateway#27; tracks noetl/ai-meta#90). POST /ingress/{listener} � the gateway terminates untrusted webhook / Pub-Sub-push traffic as a verify-and-forward gatekeeper (no DB on the ingress path). src/ingress/verify.rs: HMAC-SHA256 over the raw body (constant-time, optional sha256= prefix), bearer (constant-time), Google Pub/Sub OIDC (RS256 vs Google JWKS, aud + email/service_account + email_verified + exp). Secrets resolved from the Wallet by alias via the server's GET /api/internal/ingress/{listener} � never a gateway env var. src/ingress/directives.rs: serde-only vendored port of noetl-tools v3.3.0's directive engine (the edge must not pull duckdb/kube); allowlist + value-allowlist preserved. verify_then_plan fuses verify + directive resolution so the auth gate is a testable invariant � a failed verification yields no DispatchPlan, so an unauthenticated caller can never drive routing (RFC §7.5). Pub/Sub-push envelopes unwrapped (attributes channel); auth headers stripped from the forwarded workload; first /metrics surface (noetl_ingress_received/rejected/dispatched_total). 25 ingress + verify unit tests (every negative incl. directives_applied_only_after_verification_passes); clippy clean. Live E2E green: HMAC 12/12 + bearer 12/12.
2026-06-11 noetl/server v3.3.0 #90 Phase 3 � push-ingress config endpoint + push catalog validation (noetl/server#182; closes noetl/server#181; tracks noetl/ai-meta#90). mode: push now requires an ingress.verify block (hmac_sha256 | bearer | pubsub_oidc; none rejected � push always verifies); per-scheme required fields validated at registration. New GET /api/internal/ingress/{listener} (gated by RequireInternalApiToken): resolves the kind: Subscription by ingress.gateway_path, resolves the verify-secret alias through the Secrets Wallet (CredentialService), idempotently registers the subscription (parent lineage + lifecycle), and returns verify + dispatch + directives + source config to the gateway � the gateway's DB-free config source (data-access-boundary.md). subscription::ensure_registered extracted + reused. 9 push-validation + listener-match unit tests; clippy clean. Live-validated on kind.
2026-06-11 noetl/tools v3.3.0 #90 Phase 2 � header-directive engine + public build_source factory (noetl/tools#52; closes noetl/tools#51; tracks noetl/ai-meta#90). source/directives.rs: DirectiveSpec�DispatchPlan turns allowlisted message headers into dispatch instructions � redirect dispatch.playbook, dispatch.execution_pool, priority�pool map, idempotency_key, content_type/schema_hint, and W3C trace extraction (traceparent/tracestate/allowlisted baggage). Untrusted by default (§7.5): allowlist-only, routing controls require an allowed:/map: value constraint enforced at parse; multi-value headers last-wins; applied[] audit list for the directives_applied event. Public build_source(cfg, ctx) factory extracted so the worker continuous runtime constructs the same SourceClient. 12 new tests; 335 lib green; clippy clean across feature combos.
2026-06-11 noetl/server v3.2.0 #90 Phase 2 � kind: Subscription type + lifecycle + pool routing + W3C trace (noetl/server#180; closes noetl/server#179; tracks noetl/ai-meta#90). First-class kind: Subscription catalog type (source/mode/dispatch validation, no step-DAG); event-sourced lifecycle endpoints /api/subscriptions (register�activate�pause/resume�drain�deactivate, idempotent register reusing a non-deactivated subscription per path, GET list/get); execution_pool override on /api/execute routing the whole run to noetl.commands.<pool>.<eid> (persisted in playbook_started meta, orchestrator reads back); W3C trace into meta.trace + the command notification + child-execution inheritance. Startup-seeds the subscription resource kind (FK to noetl.resource); decodes noetl.event.created_at as TIMESTAMP. 8 catalog + 4 lifecycle unit tests; 633 lib green; clippy clean. Live E2E green (13/13).
2026-06-11 noetl/worker v5.16.0 #90 Phase 2 � continuous subscription runtime (Mode B) (noetl/worker#73; closes noetl/worker#72; tracks noetl/ai-meta#90). New WORKER_MODE=subscription run-mode: load a kind: Subscription spec, build the Phase-1 SourceClient via the tools build_source factory, register�activate�(pause/resume)�drain�deactivate (drain on SIGTERM, the K8s termination signal), and loop poll()�one POST /api/execute per message on the dedicated subscription pool segment � applying header directives (redirect/pool/idempotency/content + W3C trace) + emitting subscription.message.directives_applied audit events. Observability triad (spans + noetl_subscription_{messages_received,executions,directives_applied}_total). Adopts noetl-tools 3.3.0 + serde_yaml. 6 runtime + 141 lib tests; clippy clean.
2026-06-11 noetl/server v3.1.0 Accept subscription tool kind in ToolKind validation (noetl/server#178; tracks noetl/ai-meta#90). The orchestrator parses every step's tool.kind into the typed ToolKind enum; an unknown kind failed ToolDefinition deserialization and POST /api/execute returned HTTP 400 ("data did not match any variant of untagged enum ToolDefinition"). Adds the Subscription variant (+ Display arm + parse test) so a playbook using the new subscription tool (noetl-tools v3.2.0) passes validation; the worker still dispatches it generically. Surfaced by the in-cluster subscription-tool E2E. 620 lib tests green. ai-meta pointer � 810e05c.
2026-06-11 noetl/worker v5.15.2 Resolve nats/pubsub/kafka credential aliases into tool config (noetl/worker#71; tracks noetl/ai-meta#90). The worker's apply_credential only knew postgres/bearer/api_key/basic; a type-nats credential errored ("unsupported type 'nats'"), so the nats tool and the new subscription tool could not resolve a connection from an auth: keychain alias. Adds a `nats
2026-06-11 noetl/tools v3.2.0 Bounded-drain subscription tool + source-client abstraction (noetl/tools#50; closes noetl/tools#49; tracks noetl/ai-meta#90). Phase 1 (Mode A � bounded drain) of the subscription/listener RFC. New atomic registry tool subscription (operation: poll): fetch up to batch / until empty / until timeout_ms (both hard-capped), ack per policy, return the normalized batch � js_consume generalised across backends behind the reusable tools::source::SourceClient trait (PolledMessage/PollOptions/PollOutcome/AckMode + decode_payload/normalize_headers, RFC §7.1). Backends: NATS (refactors js_consume into the shared drain_pull_consumer; the nats tool delegates � one NATS drain in the crate), Pub/Sub pull (REST pull+acknowledge via existing reqwest + gcp_auth ADC, emulator via endpoint/PUBSUB_EMULATOR_HOST, feature pubsub), Kafka poll (pure-Rust kafka crate, no librdkafka, spawn_blocking, feature kafka; Phase-1 limits: no record headers, soft batch cap, plaintext). Both pubsub+kafka default-on so the shipped worker supports all three. Worker dispatches through the generic registry lookup � no dispatch-match change. Observability triad (tool.dispatch.subscription span, noetl_subscription_messages_fetched_total{source}, execution_id). 323 lib tests + gated integration tests; clippy -D warnings clean across feature combos. NATS poll path validated live against the in-cluster NATS JetStream broker (create stream � publish 3 � drain � ack � second drain returns 0). ops example ops#169; wiki SubscriptionTool. Worker dep bump to v3.2.0 cascades after the crates.io publish.
2026-06-11 noetl/server v3.0.6 Round-trip JSON null in whole-object {{ step }} references (noetl/server#177; closes noetl/ai-meta#89). A single-expression {{ step }} reference to a prior result envelope carrying a null field rendered that field as the JS token undefined via minijinja's map repr (json_value_to_minijinja maps JSON null�Value::UNDEFINED); render_to_value then failed serde_json::from_str and returned the whole envelope as a raw string, so the consuming python/rhai step received an unparseable str and crashed ('str' object has no attribute 'get'). Root cause was the server orchestrator's renderer (src/template/jinja.rs), not the worker the issue hypothesized � the server's render_to_value was a divergent copy that lacked the | tojson retry the noetl-tools TemplateEngine::render_value already had. Fix adds that retry: a lone {{ expr }} whose plain render is container-shaped-but-invalid JSON re-renders with | tojson, and minijinja_to_json maps undefined/none � JSON null so the field round-trips as null. 5 new regression tests; 619 lib + 8 parity green; clippy clean. Kind-validated against the live paginated-api test-server: cursor pagination walks all 4 pages, terminal next_cursor: null handled, validate_results collects 35 (first_id=1, last_id=35, success) � matching offset; pre-fix the 4th check_pagination was command.completed error. ai-meta pointer � server 8e17fbe.
2026-06-10 noetl/server v3.0.5 Workflow-arc loops advance across iterations + terminate (noetl/server#176; closes noetl/ai-meta#85). Two coupled orchestrator fixes atop the dispatch-guard re-entry layer. (1) Durable event-sourced loop-ctx propagation: step-level set: ctx.* loop variables were recomputed per pass and reverted to the workload default � the loop thrashed 0,0,1,0,1,2,� because start's initializer set re-fired every pass in random HashMap order against check_pagination's advancing set. Fix persists each completion's rendered set: as a ctx.updated event (latest-wins fold in WorkflowState.ctx + build_context overlay), emitted once per completion keyed by the stable completion event_id (the Utc::now()-fallback completed_at varies across reconstructions and defeated an earlier guard). (2) Loop-exit hang: the exit branch was marked step.skipped on a loop-body-completion pass (recency-based branch-point detector missed the branch point), turning it terminal so the is_step_done guard suppressed the exit dispatch; fixed with a structural loop-branch-point test (any step with a back-edge arc). 614 lib tests (+6; 2 verified to fail without their guard); clippy-clean. Kind-validated: counter loop 0�1�2�3 + terminates; real-http offset pagination 4 pages collecting 35. Separate follow-up filed: e2e offset/cursor fixtures read response.data.users but the Rust http tool nests it at response.body.users. ai-meta pointer � server e519fdc.
2026-06-10 noetl/tools v3.1.1 Multi-tool sibling references (noetl/tools#48; closes noetl/ai-meta#87). In a tool: [list] step, TaskSequenceTool stored each sub-tool's result for the aggregated step output but never injected it into the running context, so a later sub-tool's {{ <label>.<field> }} rendered empty � masked in quoted positions, a syntax error at or near "," in unquoted numeric SQL (save_edge_cases test_large_payload). Fix injects each sub-tool's result under its label after it completes, with a synthetic .data self-reference (mirrors build_context) so both {{ label.field }} and {{ label.data.field }} resolve; also visible to a later python sub-tool's stdin variables. 2 new unit tests; 300/0 lib; clippy clean. Published to crates.io (transient HTTP2-flake re-run). Worker adopts via worker#69 (Cargo.lock � 3.1.1). Kind-validated on a worker built from the fix: save_edge_cases test_large_payload � record_count = 100 (no syntax error), save_delegation_test clean. ai-meta pointer � tools 76f942a + worker b97f642.
2026-06-10 noetl/server v3.0.4 Unblock workflow loops + loop.done-gated transitions (noetl/server#175; closes noetl/ai-meta#83 + #84). Found in a full e2e regression re-sweep (19�27/36 on kind, server v3.0.3�v3.0.4). (1) The fan-in/reduce barrier counted a loop back-edge (check_pagination � fetch_page) as an upstream and deferred the loop head forever (Reduce step 'fetch_page' deferring dispatch � 1 of 2 upstream(s) still pending); fix excludes back-edges via a new forward_reachable helper (genuine fan-in unaffected). (2) event.name was never populated for arc evaluation, so the canonical when: {{ event.name == "loop.done" }} gate (10+ fixtures) never matched � in-step loop: steps hung after completion; fix injects event.name = "loop.done" into a completed loop step's next-arc context. Landed with e2e fixture fix #39 (duckdb commands�command, closes #86). Follow-ups filed open: #85 (workflow-arc loop re-entry) + #87 (multi-tool sibling reference). 26 orchestrator unit tests +2 new; clippy clean; kind-validated (pagination loops + test_pagination_basic complete, fanout_reduce + container_callback green). ai-meta pointer � server 480ba72.
2026-06-10 noetl/worker v5.15.1 Pre-dispatch failures emit terminal call.error instead of hanging (noetl/worker#68; closes noetl/ai-meta#78, sub-issue worker#67). Credential-alias resolution + tool-config deserialization errors used to ?-propagate out of execute_with_server_url; the dispatch loop only logged them, so executions hung at command.started forever. Typed CredentialResolutionError (terminal AliasNotFound/Invalid vs retryable Transient) + CredentialHttpError carrying the HTTP status so classify_fetch_error decides retryability by code (terminal 404/400/401/403/500; retryable 408/429/502/503/504 + transport); handle_predispatch_failure emits call.error + command.failed, escalating retryable failures to terminal after MAX_PREDISPATCH_ATTEMPTS (3). Folded in the noetl-tools dep revert (path = "../tools" � "3", Cargo.lock � 3.1.0). The live pg_noetl_k8s repro is an HTTP 500 "Decryption failed" (not a 404). +7 tests; 133 lib + 9 integration green; clippy no new findings. Kind-val GREEN: test/postgres fails cleanly, hello_world completes. First Claude-direct worker Rust PR per handoff-routing.md.
2026-06-10 noetl/server v3.0.3 container-callback insert matches the deployed noetl.event schema (noetl/server#173; tracks noetl/ai-meta#43). The container-callback handler emitted its resume call.done via a stale query targeting attempt + id columns and RETURNING id � none of which exist on the deployed noetl.event (PK (execution_id, event_id)). Every watcher callback POST 500'd with column "attempt" of relation "event" does not exist, blocking the #43 chain. Replaced with an inline INSERT matching the working handlers::events column set; terminal outcome rides in a chk_event_result_shape-conforming result envelope. cargo build + clippy (no new warnings) + 7 container_callback unit tests pass. Kind-val GREEN: kind_validate_container_callback.sh both probes pass (happy_path � succeeded, oom � failed_oom). Last blocker on the container-callback chain alongside ops#168 + e2e#38.
2026-06-10 noetl/gui v1.11.1 dev:kind direct-mode dev server script (noetl/gui#35) � VITE_API_MODE=direct + skip-auth Vite target hitting the kind server on :8082, plus README.
2026-06-10 noetl/gui v1.11.0 Credential View/Edit recovery for pre-wallet (un-decryptable) records (noetl/gui#36; closes noetl/ai-meta#82) � after the Secrets Wallet migration (#61) pre-wallet records 500 on decrypt; View now explains the cause and Edit reopens with metadata + a warning so re-saving re-seals under the current wallet.
2026-06-10 noetl/server v3.0.2 container-tool command type contradiction fix (noetl/server#172; closes noetl/ai-meta#81). ToolSpec.command was Option<String> (scalar) but the container tool kind writes a K8s-Job-style array � an array failed the server's ToolDefinition untagged-enum match (400 data did not match any variant), a scalar was rejected by the worker's ContainerConfig.command: Option<Vec<String>> (expected a sequence). Typed command as Option<serde_json::Value> (same as args): scalar stays a JSON string for shell/db tools, array passes through to the worker's Vec<String>; ToolCall::from_spec forwards the value verbatim instead of wrapping in Value::String. Worker side unchanged. 2 new regression tests (playbook::types 18/18); clippy clean. Kind-val GREEN end-to-end: server accepts command: ["/bin/sh","-c"], worker dispatches the container tool, K8s Job reaches Complete 1/1 (pre-fix kubectl get jobs empty).
2026-06-09 noetl/tools v3.1.0 e2e-sweep cleanup (noetl/tools#47; tracks noetl/ai-meta#49). YAML boolean when: true in policy rules checks as_bool() before string-template fallthrough; `
2026-06-09 noetl/server v3.0.1 e2e-sweep cleanup (noetl/server#171; tracks noetl/ai-meta#49). Result-store PUT body limit raised to 64 MB (DefaultBodyLimit � was rejecting 15 MB+ payloads with HTTP 413); render_pipeline_config stashes set/args/spec/command blocks before Tera rendering; iter namespace map in build_iteration_command; cmd_render_ctx uses command.context override. Diagnostic tracing::debug! blocks stripped. All 7 e2e sweep playbooks PASS on Rust-only kind.
2026-06-09 noetl/tools v3.0.0 BREAKING: explicit input:/set: forward-only data binding (noetl/tools#45; tracks noetl/ai-meta#77). task_sequence pipeline reads each sub-tool's input: map and injects those key-value pairs as template variables. Removes _prev/_results positional workaround entirely � data flows forward through set: � input:, never backward. Semver major bump (2.24.x � 3.0.0).
2026-06-09 noetl/server v3.0.0 BREAKING: render_pipeline_config replaces render_value_deferring (noetl/server#169; tracks noetl/ai-meta#77). The server no longer defers _prev/_results templates since those constructs are removed. Companion to noetl-tools v3.0.0. Semver major bump (2.63.x � 3.0.0).
2026-06-09 noetl/cli v4.10.0 Dep bump: noetl-tools 3.0.0 + noetl-executor 0.5.0 (noetl/cli#57; tracks noetl/ai-meta#77). Adopts the breaking input:/set: pipeline changes via the updated executor crate. noetl-executor 0.5.0 published to crates.io.
2026-06-08 noetl/server v2.63.0 ctx/workload namespace shims + step-level set: mutation support (noetl/server#168). with_ctx_shims() helper at 7 orchestrator call sites; set_vars field on Step struct (#[serde(default, rename = "set")]) � orchestrator applies step-level set: mutations after each completed step. Kind-validated: 25/27 testable playbooks PASS. Subsequent e2e fixture fixes (noetl/e2e#34) bring the remaining 2 fixtures to PASS via the _prev deferred template pattern.
2026-06-08 noetl/server v2.62.1 Clippy cleanup � resolve all pre-existing warnings under -D warnings gate (noetl/server#167; closes server#161). 14 clippy categories resolved across 17 files: Box::new wrapping for large enum variants, u32 type mismatches in health.rs, too_many_arguments allow, and other mechanical lint fixes. Zero behavioral changes; PATCH bump only.
2026-06-08 noetl/tools v2.24.2 Clippy cleanup � resolve all 15 warnings under -D warnings gate (noetl/tools#44; closes tools#42). Mechanical lint fixes across 7 files: unused bindings, dead code suppression, identical if/else simplification, doc indent, .clamp() over .min().max(), const assertion blocks. 288 lib tests pass; zero behavioral changes.
2026-06-08 noetl/server v2.62.0 Sequential-mode iterator dispatch (noetl/server#166; closes noetl/ai-meta#76). First Claude-direct Rust PR under agents/rules/handoff-routing.md. LoopMode enum (Sequential default / Parallel); LoopSpec.mode parsed from loop.spec.mode YAML. StepInfo.iterations_dispatched tracks command.issued count for the sequential dispatch guard. Sequential pattern: dispatch iteration 0 at fan-out; on each command.completed, dispatch next if iterations_dispatched == iterations_completed(). Default is Sequential, so existing playbooks without explicit spec.mode get sequential behavior. 3 new tests; lib pass; clippy clean. Kind-val GREEN: test/loop COMPLETED + iterator_save_test COMPLETED.
2026-06-08 noetl/tools v2.24.1 PlaybookTool polling terminates on status: COMPLETED/FAILED/CANCELLED (noetl/tools#43; closes noetl/ai-meta#75). Seventh + final Codex Rust handoff of the day (the user codified agents/rules/handoff-routing.md mid-session to stop the pattern). The polling loop in PlaybookTool::execute read payload.completed / payload.failed as booleans, but /api/executions/{id}/status returns payload.status as a string with values COMPLETED / FAILED / CANCELLED / RUNNING plus a separate is_cancelled: bool. Both boolean lookups always returned false, so child playbooks dispatched with return_step: end timed out at 300s instead of returning their actual result. Fix: extract PlaybookTool::is_terminal_status(payload) helper, replace boolean lookups with string-status + is_cancelled check. Helper extraction enables sync unit-testing without HTTP mocks. Single file touched (src/tools/playbook.rs); 132 lines added / 9 removed. 7 new unit tests; lib 288 passed / 0 failed (was 281/0); release build + clippy clean. PATCH bump (fix: prefix). Kind re-val PARTIAL GREEN: built localhost/noetl-worker-rust:v5.15.0-tools2241 via podman, loaded into kind, rolled deploy. Re-ran playbook_composition.yaml (kind exec 322425980322844672) � 2 of 4 process_users iterations completed end-to-end with save_profile.status: COMPLETED from child playbooks (was: zero completions; {status: "timeout"} pre-fix). 4 child executions visible in noetl.event with parent_execution_id pointing here. Remaining 2 iterations stuck on a separate concern (sequential-mode iterator dispatch / worker concurrency) � to be filed as a follow-up. Worker lockfile bumped via noetl/worker#65 (Claude-authored, post-rule-change). Pointer bumps: repos/tools b6b80ce (v2.24.0) � 10cc751 (v2.24.1); repos/worker 802954b � be431a5 (lockfile only, no version tag).
2026-06-08 noetl/server v2.61.1 Honest in-flight check in /api/executions/{id}/status (noetl/server#165; closes noetl/ai-meta#72). Sixth Codex handoff of the day. Three surgical fixes in ExecutionService::get_status: (1) running_steps SQL filter switched to event_type IN ('command.claimed', 'command.started') AND status IN ('RUNNING', 'STARTED') � workers emit 'STARTED' (past tense), not 'RUNNING', so the old filter returned 0 even when N commands were mid-execution; (2) new in_flight_commands SQL query counts non-terminal rows in noetl.command via pool_for(execution_id) for sharding consistency; (3) COMPLETED branch now requires both stats.1 == stats.0 && stats.0 > 0 && in_flight_commands.0 == 0 � either signal alone trips prematurely (event-log when iterator steps fire one step.enter for N commands; command-table when projection lags), requiring both is honest. Single file touched (src/services/execution.rs); 160 lines added / 8 removed. 4 new unit tests; lib 598 passed / 0 failed (was 594/0); release build + clippy clean. Kind-val GREEN at the status-endpoint layer: built localhost/noetl-server-rust:v2.61.1 via podman, loaded into kind, rolled deploy. Re-ran playbook_composition.yaml (kind exec 322392447093051392) � status now correctly reports RUNNING, running_steps: 3 for 30+ seconds while the 4 process_users iterations are pending (was: COMPLETED, running_steps: 0 at t+2s pre-fix). Cancelled the stuck execution after verifying the fix. Underlying worker stall on task_sequence � tool: kind: playbook dispatch remains � the original reason the 4 commands never reached command.completed; will file as a separate ai-task follow-up. This PR only fixed the status endpoint's lying; commands still genuinely stall. Pointer bump: repos/server c01d3ce (v2.61.0) � 12eb2b8 (v2.61.1).
2026-06-08 noetl/server v2.61.0 Expose ctx + workload namespaces in dispatch render context (noetl/server#164; closes noetl/ai-meta#74). Fifth cross-agent handoff of the day � same Claude-dispatch + Codex-execute + ship it pattern. Wraps the render context in CommandBuilder::build_command and build_iteration_command with two new keys (ctx + workload) via entry().or_insert_with() pointing at a serde_json::Value::Object view of the original flat context. Mirrors Python's commands.py:915-916 (context["ctx"] = state.variables; context["workload"] = state.variables). The or_insert_with() guard preserves the pre-populated structured workload binding (execute.rs:453) so {{ workload.session_token }} keeps working; new ctx shim makes {{ ctx.test_var }} resolve via the post-apply_set_mutations flat keys. Command.context persists the original flat context (not the shimmed render_ctx) � event-log payloads stay compact. In build_iteration_command the shim runs AFTER iterator-var insertions so {{ ctx.<item_var> }} also resolves. Single file touched (src/engine/commands.rs); 210 lines added / 5 removed. 5 new unit tests; lib 594 passed / 0 failed (was 589/0); release build + clippy clean. Kind-val GREEN end-to-end: built localhost/noetl-server-rust:v2.61.0 via podman, loaded into kind, rolled deploy. Re-ran tests/test_args_passing (kind exec 322284179205132288) � all 3 steps `command.completed
2026-06-08 noetl/server v2.60.0 Propagate arc-level set: mutations into downstream step context (noetl/server#163; closes noetl/ai-meta#73). Fourth cross-agent handoff of the day � same Claude-dispatch + Codex-execute + ship it pattern. Renames NextArc.args � set_vars with #[serde(rename = "set")] (matches Python canonical; Python's schema rejects args: on arcs as legacy). Adds apply_set_mutations free function in state.rs mirroring Python's _apply_set_mutations verbatim: ctx.foo / iter.foo / step.foo strip to bare key; bare keys + unknown-scope dotted keys kept as-is. Wires apply at two orchestrator dispatch sites (main path + skip-chain loop): render set_vars templates via TemplateRenderer::render_value against producing-step completion context, then apply_set_mutations(&mut step_context, &rendered) before issuing downstream command. 4 files touched (evaluator/orchestrator/state/types); 457 lines added / 67 removed; 8 new unit tests; lib 589 passed / 0 failed (was 581/0). Kind re-val PARTIAL: actions_test.yaml (kind exec 322277934972801024) � 4 downstream steps recover (aggregate, helper, verify, end all success � were error pre-fix). test_args_passing.yaml (kind exec 322277934767280128) � use_vars still errors because the downstream Jinja template {{ ctx.test_var }} looks up a ctx namespace value, but the dispatch render context doesn't expose ctx as a key (Python handles this in commands.py:915 with context["ctx"] = state.variables). Filed as noetl/ai-meta#74 � small follow-up to wrap dispatch render context with {ctx, iter, step} namespace shims. The infrastructure shipped by this PR is necessary; #74 completes the chain. Pointer bump: repos/server 59b743c (v2.59.0) � 084cad4 (v2.60.0).
2026-06-08 noetl/server v2.59.0 Fan out start step when it has a loop: block (noetl/server#162; Refs noetl/ai-meta#73 gap 1). Third cross-agent handoff of the day � Claude authored the prompt at handoffs/active/2026-06-08-server-initial-iterator-fanout/round-01-prompt.md; Codex executed Phases A+B unattended; Claude pushed + opened the PR after ship it user gate. generate_initial_commands previously always called build_command() once for the start step, regardless of whether that step declared a loop: block � playbooks whose start step iterated over a collection (e.g. loop_test.yaml with numbers: [1,2,3,4,5]) therefore dispatched a single command with args: {} and the worker tool's input_data.get('num') returned None. This release mirrors the existing Phase D R3b orchestrator fan-out into the /api/execute initial-command path: if start_step.loop is Some, render loop_cfg.in_expr via TemplateRenderer::render_to_value � require a JSON array (non-array returns AppError::Validation with the typed message) � for each (index, item) build an IteratorMetadata + call build_iteration_command + persist_engine_command. Single file touched (src/handlers/execute.rs); 282 lines added / 8 removed. 3 new unit tests (test_generate_initial_commands_fans_out_when_start_has_loop, _single_command_when_no_loop, _rejects_non_array_loop_in); lib 581 passed / 0 failed (was 578/0). Kind-val GREEN: built localhost/noetl-server-rust:v2.59.0 via podman, loaded into kind, rolled deploy. Re-ran tests/test/loop (kind exec 322268172092706816) � orchestrator dispatched 5 iterator-bound start commands (was 1 empty-args pre-fix); all 11 command.completed events carry status: success; `playbook.completed
2026-06-08 noetl/tools v2.24.0 Python wrapper contract: inject input_data global + support top-level return X (noetl/tools#41; closes noetl/ai-meta#71). Second cross-agent handoff of the day � Claude authored the prompt at handoffs/active/2026-06-08-tools-python-wrapper-contract/round-01-prompt.md; Codex executed Phases A+B unattended; Claude pushed + opened the PR after ship it user gate. Adds two missing pieces of the legacy noetl/noetl/tools/python/executor.py contract that fixtures expect: (1) input_data = dict(args) injected as a global so step bodies can call input_data.get('foo'), (2) wrap_top_level_return() helper detects unindented return before any def/async def/class at column 0 and wraps the user code in an implicit def __noetl_step__(args, input_data, **kw): + result = __noetl_step__(args, input_data) line so the wrapper captures the return value as the result global. Three concurrent code styles now coexist: A (result = X, current shape; unchanged), B (def main() convention, noetl/ai-meta#65; unchanged), C (top-level return X; NEW). async def main() regression caught + fixed during development by the existing test_python_async_main_function test. 7 new unit tests (test_wrap_top_level_return_noop_when_no_return + _inside_def + _inside_async_def + test_input_data_global_is_injected + test_top_level_return_wraps_user_code + _with_no_input_data + test_main_function_convention_still_works_with_input_data_global); lib 281 passed / 0 failed (was 274/0); release build clean; zero new clippy errors in python.rs. Kind re-val MIXED: built localhost/noetl-worker-rust:v5.15.0-tools224 via podman, loaded into kind, rolled deploy. Re-ran the three failing #71 fixtures (loop_test, actions_test, test_args_passing). Wrapper layer GREEN � Style C wrap visible in tracebacks (result = __noetl_step__(args, input_data)); pre-#71 fingerprints (SyntaxError: 'return' outside function + NameError: input_data) gone. Orchestrator layer surfaced a new distinct gap � loop_test's start step gets dispatched with args: {} (iterator's num not bound); test_args_passing's use_vars gets tool_config.args: {test_var: null, computed: null} (next.set value not propagated). Filed as noetl/ai-meta#73 � orchestrator-side template-resolution gap, separate from this PR's scope. Surfaced follow-up: noetl/tools#42 � 12 pre-existing clippy errors (mcp.rs / nats.rs / snowflake.rs / result_fetch.rs) blocking the -D warnings CI gate. Pointer bump: repos/tools 7d3fcfd (v2.23.1) � b6b80ce (v2.24.0); repos/worker bump pending noetl/worker#64 merge.
2026-06-08 noetl/server v2.58.0 Port PUT /api/result/<eid> + GET /api/result/resolve from Python � durable result-store endpoints (noetl/server#160; closes noetl/ai-meta#70). First cross-agent handoff of the umbrella � Claude (dispatcher) authored the prompt at handoffs/active/2026-06-08-server-result-store-endpoint/round-01-prompt.md; Codex (executor) ran Phases A+B unattended on feat/result-store-put-resolve-endpoints (4 new files + 4 modified: src/db/queries/result_store.rs + src/services/result_store.rs + src/handlers/result_store.rs + src/metrics.rs + 3 module index files + src/main.rs startup wiring; 578 lib tests pass +10 new result_store unit tests; release build clean); Claude pushed + opened the PR after ship it user gate. MVP scope: startup-time CREATE TABLE IF NOT EXISTS noetl.result_store (mirrors secret_audit pattern from #61 Phase 7b.2); cluster-wide pool (shard-aware migration deferred to follow-up); DELETE/list/GC/TTL/scoping/Arrow Flight gRPC fast path all explicitly out-of-MVP-scope. Wire shape exact match to worker's ControlPlaneClient::put_result and Python ResultPutRequest/ResultPutResponse (verified field-by-field). 4 new metrics: result_store_put_total{status} + result_store_put_duration_seconds{status} + result_store_resolve_total{status} + result_store_resolve_duration_seconds{status}. Kind-val GREEN: built localhost/noetl-server-rust:v2.58.0 via podman, loaded into kind, rolled deployment (uptime reset, version reports 2.58.0). Re-ran tests/output_select_test (kind exec 322232289352224768) � COMPLETED with test_result: PASSED + output_select_worked: true (was hanging at step 3 with Invalid artifact config: invalid type: null pre-fix). Worker logs confirm the SUCCESS branch fires: Tool result exceeds inline budget; staged in durable result store + shared-memory cache. result_ref=noetl://execution/322232289352224768/result/start/...; zero put_result HTTP 404 lines in this execution. Server logs confirm result_store.put: stored + result_store.resolve: found flowing end-to-end. Surfaced follow-ups: noetl/server#161 � 14 pre-existing clippy errors that ARE NOT in result_store code but will block PR-160's -D warnings CI gate (didn't block the merge but worth cleaning up before the next PR). Pointer bump: repos/server f7ae136 (v2.57.2) � 7dab231 (v2.58.0).
2026-06-08 noetl/worker v5.15.0 call.done embeds inline context.data._ref on over-budget durable-success branch (noetl/worker#63; closes noetl/ai-meta#69). Single-commit MINOR bump. When a step's tool result exceeds INLINE_CONTEXT_MAX_BYTES (64 KB) and the durable result_store PUT succeeds, build_call_done_result now emits {status, context: { data: { _ref: <noetl://...> } }, reference: {kind: result_ref, ...}} � adding the inline context.data._ref block that the orchestrator's extract_user_data walks under the v10 nested-envelope shape (outer.context.result.context.data). Downstream {{ step._ref }} resolves to the URI string; consuming artifact / result_fetch tools dispatch the URI-based fetch for the full data. The full payload stays out-of-band in result_store + shm cache as before. Status-only fallback (durable + shm both failed) keeps the legacy behaviour � no URI to embed. 4 existing tests updated to assert the new inline shape; cargo test --lib 126/0/0; release build clean. Note: kind re-val of consuming fixtures (test_output_select.yaml, test_storage_tiers.yaml) is blocked on noetl/ai-meta#70 � the Rust noetl-server doesn't expose PUT /api/result/<eid> (server-side parity gap), so the worker's durable PUT 404s and lands on the degraded shm-only branch where there's no noetl:// URI to embed. Once #70 lands, kind-val GREEN follows.
2026-06-08 noetl/tools v2.23.1 artifact tool config accepts args: as alias for input: (noetl/tools#40; closes noetl/ai-meta#68). PATCH bump on top of v2.23.0. One-line #[serde(alias = "args")] on ArtifactConfig.input resolves the friction between noetl-server's ToolSpec field-name normalization (YAML input: � struct args via noetl/ai-meta#56) and the artifact tool's Python-parity expectation that the config name input. Both Python-fixture-native (input:) and server-normalized (args:) shapes deserialize. 9 unit tests on artifact module (was 8, +1); 273/0/0 lib total. Released + published to crates.io 2026-06-08T02:48:09Z.
2026-06-08 noetl/server v2.57.2 Rust orchestrator exclusive-routing fix � step.skipped for untaken siblings (noetl/server#159; closes noetl/ai-meta#67). Single-commit patch on top of v2.57.1. Under mode: exclusive routing, only one arc fires; pre-fix, the static planner declared the untaken sibling's target as an upstream of any downstream merge step (build_incoming_arcs), then the R4 fan-in barrier waited for that untaken sibling to reach a terminal state forever. comprehensive_test.yaml hung indefinitely after process_high.command.completed. Three-part fix in repos/server: (1) evaluator::evaluate_next_transitions stops break-ing on exclusive-mode match � surfaces every remaining sibling (and any arc whose when evaluated false in inclusive mode) as EvaluationResult { matched: false, next_step: Some(name) } via new not_matched_with_target helper; (2) orchestrator::process_in_progress two-pass refactor: pass 1 emits step.skipped for ALL unmatched arc targets across ALL completed steps BEFORE any barrier check runs (eliminates HashMap iteration-order non-determinism); pass 2 dispatches matched targets, with the R4 barrier consulting the in-pass step.skipped set so the downstream merge target dispatches in the same orchestrator pass; (3) new unit tests test_67_exclusive_routing_emits_step_skipped_for_unmatched_siblings (orchestrator layer pin) + test_jinja_conditional_short_circuits_on_undefined_else_branch (template regression guard). Server lib 568/0/0 (was 566). Kind-val GREEN: built localhost/noetl-server-rust:v2.57.2, loaded into kind, re-ran the comprehensive_test.yaml fixture (kind execution 322196926424420352) � COMPLETED in ~4s with the exact fix-shape event trace (start � step.skipped for process_low � process_high � summarize � end � playbook.completed).
2026-06-07 noetl/server v2.57.1 Rust orchestrator {{ step.data }} template accessor fix (noetl/server#158; closes noetl/ai-meta#66). Single-commit patch on top of v2.57.0. WorkflowState::build_context in repos/server/src/engine/state.rs now injects a self-referencing .data key on the extracted user_data dict so canonical v10 fixtures can use both {{ step.field }} (flat) and {{ step.data.field }} (wrapped) interchangeably. Guarded by !map.contains_key("data") to preserve the task_sequence flatten back-compat (a labeled sub-task's data field stays addressable as both <step>.<label>.data.x AND <step>.data.x). Root cause: single-tool python steps (e.g. tool: python returning a flat dict) skip the task_sequence wrapping that coincidentally populates .data from a labeled sub-task's data field, so {{ step.data }} resolved to None. Surfaced concretely in the previous session's #65 kind-val: kind execution 322087210360770560's verify step's {{ run_from_file.data }} assertion failed even though the upstream call.done data was correct. 2 new unit tests (live-envelope reproducer + task_sequence flatten back-compat). Server lib 566/0/0 (was 564/0/0); cargo build --release clean.
2026-06-07 noetl/tools v2.23.0 python: legacy main() function convention (noetl/tools#39; tracks noetl/ai-meta#65). When the user code doesn't set a non-None result AND defines a callable main, the python wrapper introspects main's signature, binds named params from args by name, forwards **kwargs, awaits async main via asyncio.run, and assigns the return value to result. Mirrors noetl/tools/python/executor.py::_invoke_main from the legacy Python tool � needed by canonical script_execution/* fixtures whose scripts define main(...) callables rather than setting the result global. An explicit result = {...} always wins (the convention only kicks in when result is unset/None). 4 new unit tests (sync main(name, count), explicit-result-wins, async main, main(**kwargs)). Lib 272/0/0.
2026-06-07 noetl/tools v2.22.0 python: external script loaders for file/gcs/http source types (noetl/tools#38; tracks noetl/ai-meta#65). Closes the loader half of the python script: block that v2.19.3 stubbed with a "not yet supported" error. Three external source.type values now resolve the script body � file (read script.uri via tokio::fs::read_to_string), gcs (download gs://bucket/object via the GCS JSON API alt=media + GCP ADC bearer token in devstorage.read_only scope; uses workload identity on GKE / GOOGLE_APPLICATION_CREDENTIALS locally), http (GET source.endpoint falling back to script.uri, honoring source.method (default GET) + source.timeout (default 30s)). New PythonSource enum + resolve_source() pure classification (no I/O); new async load_script_code() on PythonTool dispatches by source kind. PythonTool gains a reqwest::Client + GcpAuth. parse_gcs_uri + encode_gcs_object helpers (percent-encode object key so slashes land in a single JSON-API path segment). GCS uses ADC directly per agents/rules/execution-model.md "already-in-place trust" rule (rather than re-mediating through the keychain). source.auth accepted for forward-compat + YAML parity. 15 new unit tests covering classification, fallback chains, parse/encode, file-load read + missing-path error. Lib 268/0/0.
2026-06-07 noetl/server v2.57.0 Phase D R5 R7 � cross-server parity harness (noetl/server#157; closes server#148 � the Phase D R5 umbrella; tracks noetl/ai-meta#49 Phase D R5). Final slice of Phase D R5 � the Replay engine port is complete. All seven rounds shipped today (v2.51.0 � v2.57.0). This PR adds a hermetic parity test rig: tests/parity_harness/events.json (13 synthetic events exercising all six replay projections plus payload refs) + tests/parity_harness/expected.json (Python's structured fold output, pre-recorded) + tests/parity_harness/regenerate_expected.py (standalone Python 3.10+ script that is a verbatim extract of noetl/server/api/replay/service.py fold + helpers � no noetl-package imports; dodges the transitive-dep chain that blocked direct noetl-package import). tests/parity_harness.rs 8-test integration suite asserts structural parity field-by-field: top-level counts, execution status + last_node_name, execution.payload_refs, all six projection maps with per-key status / counters / payload refs / attributes. All 8 pass. Parity contract is structural, not byte-for-byte hex on checksum.value � Python and Rust hash different digest inputs (Python normalizes to flat rows; Rust hashes the typed state directly per R4's design); both deliver determinism but produce different hex. No kind-val required � test-only PR with no runtime changes (release-please rolling-MINOR bump from the feat(replay): commit prefix). The Replay engine port (Python's ~1236-LoC noetl/server/api/replay/service.py) is now ported to Rust with structural-parity unit-test coverage (lib 564/0/0) + 8-test cross-server parity harness.
2026-06-07 noetl/server v2.56.0 Phase D R5 R6 � payload resolver (noetl/server#156; Refs server#148; tracks noetl/ai-meta#49 Phase D R5). Sixth slice of Phase D R5 � every event's result.reference JSON gets parsed into a typed PayloadSummary and appended to the relevant projection's payload-refs list. Mirrors Python's _payload_ref / _payload_summary / per-projection payload_refs population in noetl/server/api/replay/service.py. PayloadSummary struct = {sha256, schema_digest, row_count, media_type, ref} � all Option<�> + skip_serializing_if; the ref field is serde(rename = "ref") over Rust reference_uri. PayloadRefEntry struct = {event_id, reference, summary} per Python's dict shape. ReplayEventRow.result: Option<serde_json::Value> � noetl.event.result jsonb column; SQL queries updated across all three load_events variants. ReplayExecutionState.payload_refs: Vec<PayloadRefEntry> appended in event_id order from every event with result.reference. ReplayFrameState.output_ref + output_ref_summary populated on frame.committed / frame.failed; summary is Some(default) (all-None fields) when the terminal event has no reference (mirrors Python's _payload_summary(None)). ReplayBusinessObjectState.payload_refs + last_payload_ref � every event touching the BO with a result.reference appended; last_payload_ref points at the most recent. extract_payload_ref(event) mirrors Python's _payload_ref (returns None when result absent, no reference key, or reference null). payload_summary(reference) mirrors Python's _payload_summary three-tier fallback (reference.<field> � rows_ref.meta.<field> � rows_ref.ipc.<field>; sha256 falls back to reference.digest; ref falls back to reference.uri). 15 new unit tests; server lib 564/0/0 (was 549/0/0). Kind-validated: built localhost/noetl-server-rust:v2.56.0, loaded into kind, rolled deployment. Re-probe of fanout_reduce execution 322023958058635264 returns identical shape as v2.55.0 (no result.reference events in this fixture). A second execution (640422512395813188) confirms the live resolver path � execution.payload_refs populates with 3 entries, each carrying a real SHA-256 hex digest (d0de6b8de78fd04b2e752a96ebef12df4a9b32e92565b3f6e55860ae12762133) extracted by the resolver; row_count + ref are null because those source events' result.reference JSON didn't carry those fields (fallback chain returned None as documented).
2026-06-07 noetl/server v2.55.0 Phase D R5 R5 � snapshot seed + base_state + upcaster digest (noetl/server#155; Refs server#148; tracks noetl/ai-meta#49 Phase D R5). Fifth slice of Phase D R5 � the replay fold can now start from a prior fold's output and continue from there rather than always re-folding from event 1. Mirrors Python's base_state + snapshot_seed + upcaster_registry_digest parameters on fold_replay_state in noetl/server/api/replay/service.py. ReplaySnapshotSeed struct mirrors Python's frozen dataclass (aggregate_id + aggregate_type + version: i64 + checksum: Checksum + state: ReplayState + meta: Map); ReplaySnapshotInfo is the output-side subset (omits the seed's full state because it already went into base_state); ReplayFoldOptions struct (Default impl) carries the three optional inputs. New ReplayState fields: upcaster_registry_digest: Option<String> + replay_snapshot: Option<ReplaySnapshotInfo> � both skip_serializing_if = "Option::is_none" so default-options folds produce the exact same JSON as R1�R4 (wire-shape back-compat preserved). fold_replay_state_with_options new entry point; existing 5-arg fold_replay_state becomes a thin back-compat shim passing ReplayFoldOptions::default(). Continuation semantics: base_state strips its checksum + projection_checksums (they recompute at the end); counters (event_count, last_event_id, �) continue from where base left off; caller's tenant_id / organization_id / execution_id override whatever base recorded; caller's upcaster_registry_digest wins over base's, but None from caller preserves base's value. Snapshot-storage backend (where to LOAD snapshots from) is a downstream sub-issue tracked under server#148 � R5 is the data-contract round, not the storage round. 8 new unit tests; server lib 549/0/0 (was 541/0/0). Kind-validated: built localhost/noetl-server-rust:v2.55.0, loaded into kind, rolled deployment. Re-probe of the prior fanout_reduce execution 322023958058635264 returns identical JSON shape as v2.54.0 (event_count=25, status=COMPLETED, commands_n=4, all six projection_checksums entries populated); new replay_snapshot + upcaster_registry_digest keys absent from JSON output (Option::is_none skip working correctly). Snapshot-seeded behaviour covered by unit-test layer (no snapshot store in kind yet).
2026-06-07 noetl/server v2.54.0 Phase D R5 R4 � typed Checksum + projection_checksums (noetl/server#154; Refs server#148; tracks noetl/ai-meta#49 Phase D R5). Fourth slice of Phase D R5 � every replay fold now produces a typed Checksum over the full state plus a 6-entry projection_checksums map covering every per-projection slot. Per user direction: the hash function is the type of the checksum, not a sibling field � future checksum types (BLAKE3, SHA-512, �) slot in via the enum without a wire-format break. ChecksumType enum with initial variant Sha256 (serializes lowercase snake_case "sha256" matching Python's state["checksum_algorithm"] wire form); Checksum struct = { type: ChecksumType, value: String /* hex */ }; ReplayState.checksum: Option<Checksum> + ReplayState.projection_checksums: BTreeMap<String, Checksum> with six entries on every fold (execution, stage, frame, command, business_object, loop). stable_json_bytes helper encodes values as deterministic JSON (sorted keys recursively + compact separators), matching Python's json.dumps(sort_keys=True, separators=(",", ":")) byte form. compute_checksums runs once at the end of fold_replay_state � per-projection SHA-256 over each typed sub-state, then top-level SHA-256 over the full state with projection_checksums populated and checksum field still None (skip_serializing_if handles the self-reference � the digest doesn't depend on itself). Design decision: the digest input is the typed Rust state directly, not Python's normalize_replayed_<projection>_projection flat-row layer. Reasons: typed BTreeMap ordering + stable_json_bytes sorted-key recursion deliver the same determinism guarantee; the Rust state IS the source of truth for the server's view; cross-Python byte-for-byte parity isn't an R4 requirement � that's R7's "cross-server parity harness" round (additive work, doesn't touch the R4 wire shape). 9 new unit tests; server lib 541/0/0 (was 532/0/0). Kind-validated: built localhost/noetl-server-rust:v2.54.0, loaded into kind, rolled deployment. Re-probe of the prior fanout_reduce execution 322023958058635264 returns checksum: {type: "sha256", value: "41265876487f32350fc60c5039358456ded76598b99e7a0833ac4a17ceaae426"} and projection_checksums with all six entries; sample command entry hex 58d8220005758b7f18e27d9042b3ef5fa8ca86471c9d2ea33a869fd0db31231b. Same event log � same hex (verified by deterministic-runs unit test); every projection's hex differs from every other (distinct identity per sub-state input).
2026-06-07 noetl/server v2.53.0 Phase D R5 R3 � loops + business_objects projections (noetl/server#153; Refs server#148; tracks noetl/ai-meta#49 Phase D R5). Third slice of Phase D R5 � the replay fold now populates the last two per-projection maps from the event stream, loops and business_objects. Mirrors Python's state["loops"] / state["business_objects"] per-projection dicts from noetl/server/api/replay/service.py. Two new typed state structs (ReplayLoopState, ReplayBusinessObjectState) replace R2's serde_json::Map placeholders; ReplayState.{loops,business_objects} flip to BTreeMap for deterministic key ordering (matters when R4 lands the typed Checksum + projection_checksums bundle). Two new ID extractors (extract_loop_id, extract_business_object_identity) mirror Python's _loop_id / _business_object_identity resolution order � loop id reads meta.loop_id / meta.loop_event_id / meta.__loop_epoch_id; business-object identity reads `meta.business_object.{object_type
2026-06-07 noetl/server v2.52.0 Phase D R5 R2 � stages + frames + commands projections (noetl/server#152; Refs server#148; tracks noetl/ai-meta#49 Phase D R5). Second slice of Phase D R5 � the replay fold now populates stages + frames + commands projections from the event stream. Mirrors Python's state["stages"] / state["frames"] / state["commands"] per-projection dicts from noetl/server/api/replay/service.py. ReplayEventRow extended with stage_id / frame_id / command_id / worker_id / aggregate_type / aggregate_id / meta columns (all #[sqlx(default)] for back-compat); SQL queries updated across all three load_events variants; R1's AT TIME ZONE 'UTC' AS created_at cast preserved. Three new typed state structs (ReplayStageState / ReplayFrameState / ReplayCommandState) replace R1's serde_json::Map placeholders. ReplayState.{stages,frames,commands} flip from serde_json::Map to BTreeMap<String, Replay{Stage,Frame,Command}State> for deterministic key ordering (matters when R4 lands the typed Checksum + projection_checksums). Three new ID extractors (extract_stage_id / extract_frame_id / extract_command_id) mirror Python's resolution order: top-level column � aggregate_type+aggregate_id fallback � meta.<key> fallback. Three new populate functions with full status-transition coverage: stage opened � OPEN / closed � CLOSED; frame dispatched � CLAIMED / started � RUNNING / committed � COMPLETED / failed � FAILED / abandoned � ABANDONED; command full lifecycle (issued � PENDING / claimed � CLAIMED / started � RUNNING / completed � COMPLETED (or uppercased event-type suffix) / failed � FAILED / cancelled � CANCELLED); no-op when the event row carries no relevant identity. 10 new unit tests; server lib 518/0/0 (was 508/0/0). Kind-validated: built localhost/noetl-server-rust:v2.52.0, loaded into kind, rolled deployment. Re-probe of the prior fanout_reduce execution 322023958058635264 returns commands map populated with 4 entries (one per dispatched command: start, normalize_customer, enrich_customer, reduce_customer) carrying worker_id / issued_event_id / last_event_id from the event row; stages + frames stay empty because the v10 control-flow shape doesn't emit stage.* / frame.* events (expected).
2026-06-07 noetl/server v2.51.0 Phase D R5 R1 � Replay endpoint scaffold + execution projection (noetl/server#149; tracks server#148; tracks noetl/ai-meta#49 Phase D R5). Opens Phase D Round 5 � the Replay engine port (Python's noetl/server/api/replay/service.py ~1236 LoC � Rust). Sub-issue server#148 documents the 7-round decomposition (R1 scaffold + execution / R2 stages+frames+commands / R3 loops+business_objects / R4 typed Checksum + projection_checksums / R5 snapshot seeds / R6 payload resolver / R7 cross-server parity harness). This release ships R1: new GET /api/replay/state route mirroring Python's endpoint.py byte-for-byte (query params + defaults + projection enum names execution | stage | frame | command | business_object | loop | all + mutually-exclusive cutoffs as_of_event_id/as_of_position/as_of_time returning 400 BadRequest); new services::replay module with ReplayService + ReplayCutoff + ReplayProjection + ReplayState + ReplayExecutionState + pure deterministic fold_replay_state; minimal execution projection fold using the same terminal-event short-circuit pattern Phase D R4 landed in the orchestrator + status endpoint (playbook.completed � COMPLETED / playbook.failed � FAILED / playbook.cancelled � CANCELLED; step.enter flips UNKNOWN � RUNNING + tracks last_node_name). Other map fields (stages / frames / commands / business_objects / loops) stay empty in R1 � populated in R2/R3. 9 new unit tests (8 service + 1 handler). Server lib 508/0/0 (was 499/0/0).
2026-06-07 noetl/server v2.50.1 Phase D R4 follow-up: status endpoint short-circuits on terminal events (noetl/server#147; closes server#146; tracks noetl/ai-meta#49 Phase D R4 read-side). Read-side bug surfaced during the Phase D R4 fanout_reduce kind-val: GET /api/executions/{id}/status continued to return RUNNING for 90s+ after playbook.completed landed in the event log. Two compounding causes: (1) the inline SQL step-stats heuristic in get_status had no terminal-event lookup; (2) the completed_steps filter looked for status='COMPLETED' only, missing the realistic 'success' lowercase emitted by command.completed events. Fix: new terminal query short-circuits on playbook.completed/playbook.failed (mirrors the list endpoint's existing bool_or(playbook.completed) semantics); widened completed_steps filter to accept status IN ('COMPLETED', 'completed', 'success'). 6 new unit tests for the in-memory determine_status helper covering all terminal-event paths + underscore aliases + RUNNING fallback. Server lib 499 passed / 0 failed / 0 ignored (was 493/0/0). Kind-validated: built localhost/noetl-server-rust:v2.50.1, loaded into kind, rolled deployment. Re-query of prior execution 322018338286866432 went from {"status":"RUNNING","completed_steps":0} to {"status":"COMPLETED","completed_steps":4} on the same DB data. Fresh fanout_reduce execution 322023958058635264 reached COMPLETED in ~600ms wall (was previously stuck indefinitely). Phase D R4 read-side endpoint now matches the orchestrator's decision the moment a terminal event lands.
2026-06-07 noetl/server v2.50.0 Phase D R4 slice 2 � apply_event handles step.skipped (noetl/server#145; closes server#144; tracks noetl/ai-meta#49 Phase D R4). Closes the gap exposed by slice 1's #[ignore] test (test_reduce_step_treats_skipped_upstream_as_done). New `"step.skipped"
2026-06-07 noetl/worker v5.14.0 Container Tool Callback umbrella #43 Round 4 � worker-side pending_callback adoption (noetl/worker#60; closes worker#59; tracks noetl/ai-meta#43). The last follow-up after the closed Container Tool Callback umbrella. executor::command checks tool_result.pending_callback after a successful tool execution. When Some(true) � today only Tool::Container sets this � the worker logs INFO (with execution_id), bumps the new noetl_worker_call_done_skipped_pending_callback_total{tool_kind} counter, and skips its own call.done emit. Terminal call.done then arrives via the server's /api/internal/container-callback/... endpoint (Round 2, v2.48.0) driven by noetl-k8s-watcher (Round 1, ops@8892043). When None (every existing tool today, the default) the existing emit path is preserved bit-for-bit. Cargo.toml: noetl-tools = "2.18" � "2.21", noetl-executor = "0.3" � "0.4" (Cargo.lock resolves the freshly-published 0.4.1). New unit test (call_done_skipped_pending_callback_counter_increments_per_tool_kind) covers the tool_kind label + per-label series isolation + Prometheus text-format encoding round-trip. 126/0 worker lib tests pass. Expected dashboard fingerprint after a fresh worker image rolls in kind: server's noetl_container_callback_stale_total{state} stops moving (the race window closes); worker's noetl_worker_call_done_skipped_pending_callback_total{tool_kind="container"} � server's noetl_container_callback_total{state=...}. Kind validation follows (Cloud Build � load image � e2e/scripts/kind_validate_container_callback.sh).
2026-06-07 noetl/server v2.49.0 Phase D Round 4 first slice � fan-in / reduce barrier (noetl/server#143; closes server#142; tracks noetl/ai-meta#49 Phase D R4). Orchestrator now defers dispatch of any step with more than one incoming arc until ALL upstream steps reach a terminal state (`Completed
2026-06-07 noetl/cli v4.10.0 noetl-executor 0.4.1 � propagate ToolResult.pending_callback (noetl/cli#56; closes cli#55; tracks noetl/ai-meta#43 Round 4). Patch-level bridge fix unblocking the worker-side pending_callback adoption (noetl/worker#60). noetl-tools 2.21.0 added a non-Default ToolResult.pending_callback: Option<bool> field; the executor's reshape_duckdb_result adapter rebuilds a ToolResult by-field at executor/src/tools_bridge.rs:570, and the HTTP test fixture at line 1386 constructs one too � both broke with E0063 "missing field" when downstream consumers bumped noetl-tools = "2.21". Patch: production bridge propagates result.pending_callback unchanged through reshape_duckdb_result (DuckDB has no async-callback semantics, so it just echoes whatever the inner tool emitted); HTTP test pins pending_callback: None. noetl-tools dep bumped to ^2.21 so the same struct definition flows transitively to consumers. Executor crate bumped to 0.4.1 (patch � no executor API surface change). 102/0 unit tests pass. Top-level noetl bin version bumped to 4.10.0 via the release-please rolling-MINOR convention. After this lands + noetl-executor 0.4.1 publishes to crates.io, the worker PR's cargo build flips to green automatically.
2026-06-07 noetl/tools v2.21.0 Container Tool Callback umbrella #43 Round 3 � Tool::Container + ToolResult.pending_callback marker (noetl/tools#37; closes tools#36; tracks noetl/ai-meta#43). Tool::Container creates a labeled K8s Job and returns immediately � the worker slot frees as soon as api.create() returns. src/tools/container.rs (~570 lines): ContainerConfig mirrors the catalog YAML shape (image + command + args + env with literal value XOR value_from { secret_name, secret_key } + resources + timeout_seconds + service_account + namespace + backoff_limit + restart_policy). build_job translates to K8s Job: labels noetl.execution-id + noetl.step-name + noetl.tool-kind=container on BOTH Job.metadata.labels AND PodTemplateSpec.metadata.labels; generateName: noetl-container-<slug>-<eid>- where the slug strips chars outside [a-zA-Z0-9-] (DNS-1123-safe), truncates to 20 chars, lowercases (empty slug � literal \"step\"); default namespace noetl; default backoffLimit: 0 (playbook's own retry: block is the right place to express retry semantics; the Job controller's built-in retry would muddle the watcher's terminal-state mapping); default restartPolicy: Never; value � value_from mutually exclusive at build time. execute() builds the kube client via Client::try_default() (reads in-cluster SA token + cluster CA), POSTs via api.create(), returns immediately with pending_callback: Some(true) and the Job handle in data. Additive ToolResult.pending_callback: Option<bool> marker with skip_serializing_if; existing consumers see no change. Set by Tool::Container to signal "I created an external work item; suppress your normal call.done emit". Worker-side adoption (suppressing emit when marker set) is a coordinated follow-up tracked under the same umbrella � until then the worker emits call.done immediately, and the watcher's later callback is treated as stale by the server (recorded by noetl_container_callback_stale_total). That race is harmless during the transition: playbooks just see early completion; the stale-counter dashboard is the migration signal. 10 existing struct-literal sites backfilled with pending_callback: None (result.rs �5, http.rs �1, python.rs �1, rhai.rs �1, task_sequence.rs �2). 17 new unit tests covering label propagation on Job + Pod template, default + explicit namespace, generateName shape + slug stripping + empty fallback + lowercase + truncation, command + args propagation, env literal + secret-ref propagation, value + value_from XOR enforcement, empty image rejection, resources requests + limits propagation, empty resources � None, backoff + activeDeadlineSeconds, defaults (backoff = 0; restartPolicy = Never), service_account propagation. Lib 258/0 (241 + 17 new). Backward compatible � new tool kind; existing tools untouched; the additive pending_callback field doesn't change existing serialised shapes. This closes the last code round of the umbrella. Worker bumps to noetl-tools v2.21.0 in a follow-up commit; only Round 5 (e2e kind-val rig, e2e#29) remains to close the umbrella.
2026-06-07 noetl/server v2.48.0 Container Tool Callback umbrella #43 Round 2 � POST /api/internal/container-callback/{execution_id}/{step} (noetl/server#141; closes server#140; tracks noetl/ai-meta#43). External K8s watcher (Round 1, ops#166) POSTs Job terminal-state events here when a Job carrying the noetl.execution-id label transitions to a terminal state. Handler validates path params (execution_id parseable as i64; step non-empty � these are the few legitimate 4xx cases), checks staleness via a single indexed SELECT on noetl.event for the execution_id, and emits a call.done event with the structured terminal state on match (or bumps noetl_container_callback_stale_total{state} + logs INFO + returns 202 if no events exist for the execution). Six TerminalState variants matching the umbrella's failure-mode taxonomy: succeeded / failed / failed_image_pull / failed_oom / failed_node_lost / failed_timeout. Each survives in meta.terminal_state so the playbook can branch on the specific failure reason. Returns 202 unconditionally on path-param validation success (the watcher is idempotent + may race with retries; the server should never 4xx on a stale callback). Auth via the existing RequireInternalApiToken extractor (same shape as the rest of /api/internal/*). Observability per agents/rules/observability.md Principle 1: span container_callback carrying execution_id + step + state; counters noetl_container_callback_total{state} + noetl_container_callback_stale_total{state} (latter alert-worthy when sustained � indicates a mis-namespaced watcher or out-of-band Job); structured INFO on emit + stale paths. 7 new unit tests in handlers::container_callback::tests covering TerminalState::as_str round-trip for all six variants (drift guard against the watcher's docs), is_success matches only Succeeded, request deserialisation (minimal body / full body / unknown state rejection), response serialisation (in-flight with event_id / stale with skip_serializing_if elision). Lib 487/0 (480 + 7 new). Round 2 lands first per the umbrella's recommended ordering � smallest blast radius; unblocks Round 1 (watcher Deployment, ops#166) + Round 3 (Tool::Container with PendingCallback marker, tools#36).
2026-06-07 noetl/tools v2.20.0 artifact tool kind added to Rust registry � get-only result_fetch alias (noetl/tools#35; closes tools#34; closes noetl/ai-meta#64). Surfaced by the noetl/ai-meta#54 e2e regression sweep � test_output_select.yaml (+ two others) FAILED with Tool not found: artifact because the Rust registry didn't carry the Python-era kind: artifact surface. Picked the aliasing branch of the #64 design: new src/tools/artifact.rs ArtifactTool impls Tool (name() = \"artifact\"); holds a delegate ResultFetchTool + a TemplateEngine; execute() template-renders the raw config first (so input.result_ref: \"{{ start._ref }}\" resolves before deserialisation), translates to a synthetic result_fetch-shaped JSON, wraps in a ToolConfig with kind: \"result_fetch\", and delegates. Pass-throughs honoured: prefer, flight_endpoint, bearer_token, tls_ca_path, client_cert_path, client_key_path all copy through unchanged. Action handling: get translates (default when action: unset, matching the Python worker's default); put returns ToolError::Configuration pointing the operator at the worker's call.done emit path (R-2.1) per agents/rules/execution-model.md � the playbook-side push surface is intentionally absent in the Rust path because a step's result lands in the result store via the worker's emit, not via a tool kind invoked by the playbook author; unknown actions rejected with the unknown name surfaced; missing input: block � typed deserialiser names the missing field. 8 new unit tests covering: translate get with ref-only / get with all six pass-throughs / defaults action to get / put returns config error pointing at emit path / unknown action rejected / missing input returns config error / tool name is "artifact" / ToolConfig round-trip translation. Lib 241/0 (8 new). Backward compatible � new tool kind; existing tools untouched; existing fixtures using kind: result_fetch keep working. Worker bump (separate ai-task) re-runs the #54 e2e sweep on the three previously-failing fixtures (test_output_select.yaml, test_gcs_storage.yaml, test_storage_tiers.yaml).
2026-06-07 noetl/server v2.47.0 Secrets Wallet #61 Phase 6d.2 � GCP iamcredentials.generateAccessToken provider (noetl/server#138; closes server#133; tracks noetl/ai-meta#61). Second cloud-specific dynamic-secret provider � mints short-lived OAuth2 access tokens for a target service account by impersonating it via the IAM Credentials API. src/secrets/gcp_iam.rs � GcpIamProvider impls SecretProvider; provider id gcp_iam (alias gcp_iamcredentials). Reads the caller's Workload-Identity token from the GKE metadata server (same source as src/secrets/gcp.rs + src/crypto/kms_gcp.rs); shares the env override NOETL_GCP_METADATA_TOKEN_URL. POSTs JSON { scope: [...], lifetime: \"<secs>s\" } to https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<sa-email>:generateAccessToken with the caller token as bearer. Returns SecretValue { value: <access_token>, expires_at: <expireTime from response> }. Reference shape <target-sa-email>[#<scope>]; empty ref falls back to NOETL_GCP_IMPERSONATE_SA. Env: NOETL_GCP_IMPERSONATE_SA (default target SA email), NOETL_GCP_IAM_CREDENTIALS_ENDPOINT (override), NOETL_GCP_METADATA_TOKEN_URL (override), NOETL_GCP_IAM_LIFETIME_SECS (default 3600). 10 new unit tests: parse_ref (empty / bare-SA / with-scope / trailing-hash-ignored); endpoint_for (builds the iamcredentials URL with project=\"-\"; honours override); build_body (wraps scope in array; formats lifetime as \"<secs>s\" � common foot-gun pin); response parser (ISO-string expireTime round-trips; missing access_token returns parse error). Lib green. Cloud-only backend � kind-val at the unit-test layer (full E2E needs a GKE cluster with Workload-Identity bindings). Resolution-conflict round: two same-shape one-line string conflicts in src/secrets/mod.rs (the supported-providers diagnostic message) � git auto-merged the additive lines correctly; merged commits f326d11 then feeb31b to keep up with #137 then #139 landing first.
2026-06-07 noetl/server v2.46.0 Secrets Wallet #61 Phase 6d.3 � Azure AAD client-credentials provider (noetl/server#139; closes server#134; tracks noetl/ai-meta#61). Third (and final) cloud-specific dynamic-secret provider � off-cluster (non-IMDS) AAD client_credentials flow. For in-cluster AKS deployments use the existing azure (Key Vault) provider which already does IMDS Managed Identity; this provider is the generic AAD token source for deployments running outside AKS (other clouds, on-prem, worker box) that need to call Azure APIs. src/secrets/azure_oauth.rs � AzureOAuthProvider impls SecretProvider; provider id azure_oauth (alias azure_aad). Reads AZURE_TENANT_ID / AZURE_CLIENT_ID / AZURE_CLIENT_SECRET from env (canonical service-principal triple). POSTs grant_type=client_credentials&client_id=&client_secret=&scope= to https://login.microsoftonline.com/<tenant>/oauth2/v2.0/token (or sovereign-cloud equivalent via NOETL_AZURE_AAD_HOST � https://login.microsoftonline.us Gov, https://login.chinacloudapi.cn China). Returns SecretValue { value: <access_token>, expires_at: now + expires_in }. Reference shape [<tenant>:]<scope>; parser only treats the :-prefix as a tenant if it doesn't look like a URL scheme (https/http), so bare scope URLs parse correctly. Empty ref � falls back to NOETL_AZURE_OAUTH_SCOPE (default https://graph.microsoft.com/.default). 14 new unit tests covering parse_ref (4 shapes), token_url_for (uses tenant + sovereign-cloud override), build_body (form-urlencoded shape; specials in client_secret + scope percent-encoded), compute_expires_at (adds seconds; zero edge), response parser (full AAD shape; minimal-without-token_type tolerance; missing access_token returns parse error), percent_encode (unreserved preserved; specials escaped). Lib green. Backward compatible � new provider id; existing azure / azure_kv keep resolving to AzureKeyVaultProvider.
2026-06-07 noetl/server v2.45.0 Secrets Wallet #61 Phase 6d.1 � AWS STS AssumeRoleWithWebIdentity provider (noetl/server#137; closes server#132; tracks noetl/ai-meta#61). First cloud-specific dynamic-secret provider � closes the EKS-IRSA gap. A pod running in EKS with workload-identity has no static AWS access keys, only a projected ServiceAccount JWT; this provider exchanges that JWT for short-lived AWS temporary credentials via the STS AssumeRoleWithWebIdentity action. src/secrets/aws_sts.rs � AwsStsProvider impls SecretProvider; provider id aws_sts (alias aws_iam). Reads AWS_WEB_IDENTITY_TOKEN_FILE (default /var/run/secrets/eks.amazonaws.com/serviceaccount/token), AWS_ROLE_ARN, AWS_REGION / AWS_DEFAULT_REGION. Reference shape [<region>:]<role-arn>[#<session-name>]. POSTs form-urlencoded body Action=AssumeRoleWithWebIdentity&Version=2011-06-15&RoleArn=&RoleSessionName=&DurationSeconds=&WebIdentityToken= to https://sts.<region>.amazonaws.com/. No SigV4 � the WebIdentityToken IS the credential (STS anonymous action), so no static AWS_ACCESS_KEY_ID needed. Re-reads the token file on every fetch (kubelet rotates projected tokens every ~hour by default). Response parser handles BOTH JSON (modern STS) AND XML (legacy / VPC endpoints) � no new XML dep; reuses the existing regex crate. Returns SecretValue { value: <JSON of access_key_id + secret_access_key + session_token>, version: Some(role_id), expires_at: Some(Expiration) }. 15 new unit tests covering parse_ref (6 shapes), build_body (percent-encodes role ARN; preserves unreserved), endpoint_for (uses region; honours NOETL_AWS_STS_ENDPOINT_OVERRIDE), XML response parser, JSON response parser (epoch-float + ISO-string Expiration), percent_encode. Lib 456/0. Backward compatible � new provider id; existing aws / aws_sm keep resolving to AwsSmSecretProvider.
2026-06-07 noetl/server v2.44.0 Secrets Wallet #61 Phase 7c.3 � resolver-side stampede mutex + background re-resolve (noetl/server#136; closes server#135; tracks noetl/ai-meta#61). Wires the Phase-7c decision primitive (server#125) + the Phase-7c.2 cache-side companion (server#131) into the resolver's cache-hit path. When CredentialService::try_resolve_keychain hits a fresh-but-aging row, the cached value returns to the caller IMMEDIATELY (worker fetches stay on the fast path) and a background tokio::spawn re-resolves via the Phase-3b SecretProvider + updates the cache via KeychainService::set. Stampede collapse � new src/services/keychain_refresh.rs RefreshInflight wraps Arc<tokio::sync::Mutex<HashSet<(i64, String)>>> with try_claim (atomic insert returning whether the slot was free) + release. The struct is Clone (cheap Arc clone) so every CredentialService instance derived from the same root shares the same inflight set. N workers crossing the refresh threshold for the same (catalog_id, alias) collapse to one provider call; concurrent callers piggy-back via noetl_secret_refresh_total{outcome=\"stampede_collapsed\"}. Refactor: extracted the cache-miss provider resolution (catalog � playbook � provider � cache write) into a separate resolve_via_provider method. Both the cache-miss inline path AND the background-refresh task call it � identical code path, no behavior drift between cold-miss and proactive-refresh. Background task lifecycle � cache hit � maybe_spawn_refresh � should_refresh check (single indexed cache read; awaited) � try_claim � tokio::spawn with cloned service state � background: resolve_via_provider � record outcome metric (succeeded | failed) + duration histogram � release slot. On stampede: bump stampede_collapsed + return. Failure modes � should_refresh read errors � log + skip (cached value already went out, never fail the credential lookup); provider failure in background � log + bump outcome=\"failed\" + release slot; stampede � bump outcome=\"stampede_collapsed\". Six new unit tests in services::keychain_refresh::tests: single-claim succeeds; second-claim for same key returns false (stampede signal); release allows re-claim; distinct keys don't collide (alias-level + catalog-level independence); release is idempotent; clone shares inner state (load-bearing: verifies the stampede-collapse invariant works across CredentialService clones). Lib 441/0 passing. Backward compatible � the refresh-decision lookup short-circuits to false when the row has no expires_at (long-lived secret), is already expired, or is outside the refresh window. Existing deployments without dynamic-secret providers see no change. Phase 7c series is now wire-complete (7c primitive + 7c.2 cache companion + 7c.3 resolver integration). Remaining work on the umbrella is the three cloud-specific dynamic-secret providers � 6d.1 AWS STS server#132 · 6d.2 GCP iamcredentials server#133 · 6d.3 Azure AAD server#134 � each its own sub-issue.
2026-06-07 noetl/server v2.43.0 Secrets Wallet #61 Phase 7b.2 + 7c.2 follow-up rounds � noetl.secret_audit table + query endpoint + cache-refresh primitive (single release covering both PRs). Phase 7b.2 (noetl/server#129; closes server#128; tracks noetl/ai-meta#61): durable storage path for the Phase-7b in-process audit service. noetl.secret_audit table provisioned via CREATE TABLE IF NOT EXISTS at server startup (server-owned, no out-of-band migration step) � audit_id PK + credential / execution_id / occurred_at indexes; secret value column NEVER declared. DbAuditSink impl of AuditSink writes via db::queries::secret_audit::insert with ON CONFLICT (audit_id) DO NOTHING for idempotency. New GET /api/internal/secret-audit?credential=&execution_id=&from=&to=&limit= returns rows ORDER BY occurred_at DESC (hard cap 10_000). NoopAuditSink stays the default when NOETL_SECRET_AUDIT_REQUIRED is unset. Two merge conflicts with the Phase-7a.2 branch (db/queries/mod.rs + main.rs) resolved as additive (both modules + both route blocks coexist). Phase 7c.2 (noetl/server#131; closes server#130; tracks noetl/ai-meta#61): cache-layer companion of the Phase-7c decision primitive. KeychainService::should_refresh(catalog_id, keychain_name, execution_id, scope_type, now) reads the cache row's expires_at, asks secrets::dynamic::should_refresh_default (honours KEYCHAIN_CACHE_REFRESH_WINDOW_SECS), bumps noetl_secret_refresh_total{outcome="triggered"} on a true return. Falls through to false when the row is missing, has no expires_at, is already expired (eviction path, not refresh), or is outside the refresh window. Backward compatible � new method, existing call sites unchanged. Resolver-side wire-up (per-(catalog_id, alias) tokio::sync::Mutex stampede collapse + tokio::spawn background re-resolve + KeychainService::set) deferred to Phase 7c.3. All three Phase-7 named rounds (rotation / audit / auto-renewal) now have functional endpoints + DB storage.
2026-06-06 noetl/server v2.42.0 Secrets Wallet #61 Phase 7a.2 � KEK rotation endpoint + key-status + DB scans (noetl/server#127; closes server#126; tracks noetl/ai-meta#61). Operator-facing wrap of the Phase-7a rewrap_storage_string primitive. POST /api/internal/wallet/rotate-kek?batch_size=&max_batches=&table= runs a batched cursor scan across noetl.credential + noetl.keychain, calls rewrap_storage_string per row (per-row UPDATE + per-version count queries), returns RotateSummary { processed, rewrapped, skipped, failed, last_id } for progress checkpointing across runs. GET /api/internal/wallet/key-status reports per-version row counts so an operator can confirm completion before retiring the old KEK version. classify_failure heuristic maps thrown errors to `parse_error
2026-06-06 noetl/server v2.41.0 Secrets Wallet #61 Phase 7c � token auto-renewal primitives (closes Phase 7; all named rounds 1�7 done) (noetl/server#125; closes server#124; tracks noetl/ai-meta#61). Final named round of the Secrets Wallet umbrella. OAuth2 / JWT access tokens with expires_in are the dominant short-lived-credential shape in production. Phase 6d's cache-TTL plumbing kept dead tokens out; Phase 7c adds proactive refresh-before-expiry so the cache renews when the remaining lifetime drops below a threshold. Worker fetches stay on the cached-fresh-token fast path; the auth playbook runs at most once per natural token lifetime instead of once per worker burst. Tail latency stays flat across rotations. secrets::dynamic::should_refresh(expires_at, refresh_window, now) decision primitive: returns true iff expires_at is set, still valid (expires_at > now), and inside the refresh window (expires_at - refresh_window <= now). Pure function. secrets::dynamic::should_refresh_default(expires_at, now) � convenience wrapper reading the window from env. KEYCHAIN_CACHE_REFRESH_WINDOW_SECS env (default 60 s). noetl_secret_refresh_total{outcome} counter per agents/rules/observability.md Principle 1. outcome � {triggered, succeeded, failed, stampede_collapsed}. failed at sustained rate is alert-worthy � provider is unreachable AND a cached token is about to expire. Aliases are NOT a label (cardinality); per-alias detail rides the secret.refresh tracing span per Principle 4. noetl_secret_refresh_duration_seconds histogram � buckets [0.05, 0.1, 0.25, 0.5, 1, 2, 5] (auth round-trips dominate), observed regardless of outcome. Five new unit tests in secrets::dynamic::tests: returns false when no expires_at; returns false when already expired (defensive � that's the eviction path); returns false when outside window; returns true inside window; boundary case (expires_at = now + window) returns true. Lib 427 / 0 passing. Lib-only � no schema migration. All named phases (1�7) of the Secrets Wallet umbrella are now complete. Remaining queue: 7a.2 (rotation endpoint + DB scans), 7b.2 (audit table + endpoint + handler wire), 7c.2 (cache + resolver wire-up + stampede mutex), 6d.1 / 6d.2 / 6d.3 (AWS STS / GCP iamcredentials / Azure AAD dynamic-secret providers) � all discrete follow-up sub-issues that plug into the existing primitives without further trait / schema surgery. Server wiki: deployment-specification § Token auto-renewal (Phase 7c primitives).
2026-06-06 noetl/server v2.40.0 Secrets Wallet #61 Phase 7b primitives � secret-resolution audit service (noetl/server#123; closes server#122; tracks noetl/ai-meta#61). Phase 7 round 2. Today the wallet has no durable record of "who accessed credential X at time Y, on which execution, with what outcome." The tracing-span surface evaporates with log retention; compliance regimes (SOC 2, ISO 27001, FedRAMP, PCI-DSS) require a queryable trail with retention measured in years. This round ships the in-process service primitives so the actual table + endpoint + handler integration (7b.2) can land cleanly. AuditEvent struct: audit_id (application-side snowflake) + occurred_at + credential + bounded operation + bounded outcome + worker_id / execution_id / server_region / broker_region / kek_version / notes. NEVER contains the secret value. Operation + Outcome bounded enums with as_str() � drift guard against the strings used in the noetl_secret_audit_writes_total{operation, outcome, ...} metric labels. AuditSink trait + NoopAuditSink default impl + SecretAuditService wrapper with three calls: record_async (fire-and-forget; spawns a tokio task; never blocks the resolver; failed writes log + drop + dropped_async), record_strict (awaits the result; used when compliance requires the row exist before the value releases; failed writes propagate to the handler), and record (branches by strict � typical handler call). NOETL_SECRET_AUDIT_REQUIRED env (default false; 1/true/TRUE/yes/YES enable strict mode). noetl_secret_audit_writes_total{operation, outcome, status} counter per agents/rules/observability.md Principle 1. status � {written, dropped_async, failed_strict}; failed_strict is alert-worthy � it means the wallet refused to release a credential because the audit couldn't be recorded. Eight new unit tests in services::secret_audit::tests: builder fills audit_id + occurred_at; Operation + Outcome as_str round-trip (drift guard); noop sink always succeeds; record_strict blocks on sink failure; record_strict persists on success; record dispatches async when not strict; noop service records without blocking; from_env respects truthy values. Lib 422 / 0 passing. Lib-only � no schema migration. Backward compatible (existing deployments get NoopAuditSink + non-strict mode; no behavior change until 7b.2 wires the production sink + the DB table). Server wiki: deployment-specification § Secret-resolution audit service (Phase 7b primitives).
2026-06-06 noetl/server v2.39.0 Secrets Wallet #61 Phase 7a � KEK rotation primitives (noetl/server#121; closes server#120; tracks noetl/ai-meta#61). Starts Phase 7 of the Secrets Wallet umbrella � rotation, audit, and auto-renewal. Phase 6 closed with full residency coverage; Phase 7 hardens the wallet for the operational lifecycle. This round ships the rotation primitives so the actual rotation endpoint + table scans (7a.2) can land cleanly. KeyManager::current_key_version() trait accessor with a safe default ("unknown"); LocalDevKms reports its own version string (defaults "v1"). EnvelopeCipher::rewrap_storage_string(raw) primitive: parses raw as a stored envelope; if wrapped.key_version == current_key_version() returns RewrapOutcome::Skipped { key_version } (no KMS call); otherwise unwraps the DEK under the historical KEK version and re-wraps under the current version, returning RewrapOutcome::Rewrapped { old_key_version, new_key_version, new_storage_string }. Plaintext payload is NEVER reconstructed � pure DEK re-wrap; AES-GCM ciphertext bytes stay byte-identical; only the dek field of the stored envelope changes. noetl_wallet_rotate_total{table, status} counter per observability.md Principle 1. table � {credential, keychain}; status � {skipped, rewrapped, failed_unwrap, failed_wrap, parse_error}. failed_unwrap is alert-worthy � it means the KMS deleted the historical key version and the rotation can't complete without operator intervention. Four new unit tests in crypto::envelope::tests: rewrap skips records already on current version; rewrap emits new envelope under current version when older; rewrap rejects non-envelope storage value (forward-only contract preserved); LocalDevKms reports its key version (drift guard). Lib 414 / 0 passing. Lib-only � no schema migration. Backward compatible (existing records resolve unchanged; only an explicit rotation pass touches them). Server wiki: deployment-specification § Wallet KEK rotation primitives (Phase 7a).
2026-06-06 noetl/server v2.38.0 Secrets Wallet #61 Phase 6e � cross-region broker (closes Phase 6) (noetl/server#119; closes server#118; tracks noetl/ai-meta#61). Final named round of Phase 6. Phase 6c's residency gate is fail-closed; Phase 6e is the policy-corollary: when a server in region A is denied a credential whose home is B, route the request through a broker in B that re-seals to the requesting worker via the Phase-5 sealing primitives. Cleartext stays in the home region; only the sealed envelope crosses the wire; only the addressed worker can open it. src/secrets/broker.rs � BrokerRegistry (region � broker_url from NOETL_SECRET_BROKER_REGISTRY env, JSON object) + BrokerClient (forwards a sealed-credential request to a peer, maps every failure mode to AppError::CrossRegionUnreachable) + CrossRegionResolveRequest wire-shape struct. src/handlers/cross_region.rs � POST /api/internal/cross-region/resolve peer-server endpoint validates expected_entry_region == server_region() (defensive against misconfigured peer registries � a stale registry can't silently coerce a server into serving credentials for the wrong region; returns 403 on mismatch), resolves locally, seals via Phase-5a primitives to the requesting worker's pubkey, returns the SealedEnvelope. KeychainDef.no_broker_fallback: bool per-credential opt-out for credentials whose policy says "this data physically cannot leave its home region, full stop." AppError::CrossRegionUnreachable { broker_url, cause } new variant � HTTP 502 (distinguishes "policy says no" from "policy says yes via broker but the broker is down"). get_sealed handler � on AppError::ResidencyViolation, look up the entry's region in the BrokerRegistry; when configured, forward to broker via BrokerClient; return the broker's envelope directly. Otherwise propagate the violation per Phase-6c semantics. noetl_secret_broker_call_total{broker_region, outcome} counter � outcomes: ok / unreachable / denied_by_broker / wrong_region / bad_pubkey / resolve_error / serialize_error / seal_error. wrong_region is the alert-worthy combination � it means a peer's broker registry is out of date. noetl_secret_broker_call_duration_seconds{broker_region} histogram � buckets [0.05, 0.1, 0.25, 0.5, 1, 2, 5] s, observed regardless of outcome. NOETL_SECRET_BROKER_TIMEOUT_SECS env (default 10 s). Ten new unit tests; lib 410 / 0 passing. Lib-only � no schema migration. Broker registry is opt-in via env; deployments without a broker keep pre-6e fail-closed behaviour. After Phase 6 � both residency shapes operational: hard isolation (residency: strict + no broker � fail-closed HTTP 403) + soft federation (residency: strict + broker registered � transparent cross-region routing). That covers the original umbrella goal G7 in full. Server wiki: deployment-specification § Cross-region broker (Phase 6e).
2026-06-06 noetl/server v2.37.0 Secrets Wallet #61 Phase 6d primitives � dynamic-secret support + cache honors issuer TTL (noetl/server#117; closes server#116; tracks noetl/ai-meta#61). Phase 6 round 4. Some providers return secrets the issuer expires on a clock � AWS STS bearer tokens (15 min � 12 h), AAD access tokens (1 h), GCP iamcredentials.generateAccessToken (1 h default), OAuth2 access tokens with expires_in. The Phase-3c keychain cache used a fixed 600 s TTL; caching a token past expires_at means the next worker fetch gets a 401 and the playbook step fails. This round ships the primitives + cache plumbing so the concrete cloud-specific dynamic providers can land as separate clean rounds (6d.1 / 6d.2 / 6d.3). SecretValue.expires_at: Option<DateTime<Utc>> field; existing providers pass None since they return long-lived secrets. src/secrets/dynamic.rs::cache_decision returns CacheFor(secs) for normal-case secrets, or SkipCacheAlreadyExpired when the deadline is already past (or inside the operator's safety margin). Honours min(default_ttl, expires_at - now - safety_margin), floored at MIN_EFFECTIVE_TTL_SECS = 5. KEYCHAIN_CACHE_DYNAMIC_SAFETY_MARGIN_SECS env (default 60 s) � buffer for clock skew + wall-clock between cache write and next worker fetch. resolve_keychain_entry_with_meta companion to the existing resolve_keychain_entry that also returns the bundle's expires_at. For a map-shaped keychain entry the bundle TTL is the earliest of any contributing secret's expiry (caching past the soonest expiry would mean the next worker fetch gets a 401 from whichever member already expired). CredentialService::resolve_via_provider consumes the helper � the cache-write site honours the issuer-reported TTL. SkipCacheAlreadyExpired � WARN log, bump the skip counter, return the freshly-resolved value without writing the cache. noetl_secret_dynamic_ttl_seconds histogram (buckets [60, 300, 900, 3600, 14400, 43200] = 1 min / 5 min / 15 min / 1 h / 4 h / 12 h) � observed when issuer reports TTL. noetl_secret_cache_skip_total{reason} counter � already_expired for now; future reasons may follow in 6d.x rounds. Seven new unit tests in secrets::dynamic::tests: no expires_at uses default TTL; expires_at far future capped at default TTL; expires_at near future uses expires_at - margin; expires_at already past skips cache; expires_at inside safety margin skips cache; very-small remaining clamped to 5 s; zero safety margin treats expires_at as hard deadline. Lib 398 / 0 passing. Backward compatible: existing public resolve_keychain_entry signature unchanged, providers without expires_at keep the 600 s default cache TTL. Lib-only � no schema migration. Server wiki: deployment-specification § Dynamic short-lived secrets (Phase 6d primitives).
2026-06-06 noetl/server v2.36.0 Secrets Wallet #61 Phase 6c � residency-policy gate (noetl/server#115; closes server#114; tracks noetl/ai-meta#61). Phase 6 round 3 � the policy gate in front of the routing primitives + provider registry. A credential tagged region: eu-central-1 exists for a reason � usually GDPR-class data-residency obligations or contractual constraints. Phases 6a/6b ensured the fetch routes to the EU endpoint; the cleartext still landed in this server's memory if the server ran elsewhere. This round adds the fail-closed gate that prevents the crossing. KeychainDef.residency: Residency enum, default None: Advisory checks + records violations + lets resolution through (migration-window surface); Strict fail-closes the resolver before any provider call so the cleartext never enters this server's memory; None is back-compat. KeychainDef.allowed_regions: Vec<String> per-credential allowlist for regions other than the home region. Defensive: empty string in the list never matches an empty server region (a misconfigured allowlist can't silently disable the gate). src/secrets/residency.rs � evaluate(&KeychainDef, &str) returns ResidencyDecision (Allow(label) / AllowWithViolationLogged / Deny(AppError)); resolver runs the gate at the top of resolve_keychain_entry � strict-mode denial does NOT call secrets::get_provider, does NOT trigger a ProviderRegistry build, does NOT increment the 6b duration histogram. AppError::ResidencyViolation { credential, entry_region, server_region } new variant � HTTP 403 with a clear "credential X is region-locked to Y; this server is in Z" message that NEVER includes the value. noetl_secret_residency_check_total{policy, decision} counter per agents/rules/observability.md Principle 1. policy � {none, advisory, strict}; decision � {allowed_no_policy, allowed_same_region, allowed_in_allowlist, violation_allowed, violation_blocked}. strict + violation_blocked is alert-worthy. advisory + violation_allowed is the migration-window signal for finding existing cross-region flows before flipping to strict. Eight new unit tests in secrets::residency::tests: none policy allows everything; strict same-region allows; strict mismatch denies with ResidencyViolation; strict allowlist hit allows; advisory mismatch allows + records the violation; empty entry_region short-circuits to allowed_no_policy under any policy; empty string in allowlist does NOT falsely match empty server region; to_result propagates Deny. Lib 391 / 0 passing. Lib-only � no schema migration (residency + allowed_regions ride the existing keychain JSON blob via the flattened extra map). Backward compatible. Server wiki: deployment-specification § Residency policy (Phase 6c).
2026-06-06 noetl/server v2.35.0 Secrets Wallet #61 Phase 6b � ProviderRegistry + per-(provider, region) metrics (noetl/server#113; closes server#112; tracks noetl/ai-meta#61). Phase 6 round 2. Phase 6a (server#111) plumbed region through the resolver; the cache-miss path still rebuilt the provider from env on every fetch � re-reading AWS_ACCESS_KEY_ID, rebuilding the reqwest::Client (TLS bundle reparse on the rustls path), reparsing IMDS / token state. This round memoises that work behind a (provider_id, region)-keyed cache. src/secrets/registry.rs ProviderRegistry � RwLock-protected HashMap<(provider_id, region), Arc<dyn SecretProvider>>, double-checked locking on the build path so concurrent get_or_build for the same key only builds once. Optional TTL via NOETL_SECRET_PROVIDER_TTL_SECONDS env (default 0 = process lifetime); operator escape hatch before Phase 6d's dynamic-secret refresh path lands (short-lived AWS STS / Azure IMDS creds expire faster than the registry's default). Process-global singleton (OnceLock). secrets::get_provider(provider_id, region) public entry point for callers (the resolver + future direct API endpoints). noetl_secret_provider_build_total{provider, region, status="cache_hit|ok|error"} counter � together with the Phase 6a noetl_secret_resolve_total answers "is the cache effective?" (cache_hit / (ok + cache_hit) ratio) and "is a region's provider down?" (error rate per region). noetl_secret_resolve_duration_seconds{provider, region} histogram � resolver records around fetch; buckets [5 ms, 10 ms, 25 ms, 50 ms, 100 ms, 250 ms, 500 ms, 1 s, 2 s, 5 s] span the range where cloud secret managers + Vault clusters actually live. Observed regardless of outcome so a dashboard surfaces "slow" + "failing" independently (timeouts dominate failure-mode wall-clock). execution_id is NOT a label � it stays on the matching secret.resolve tracing span per Principle 4. Seven new unit tests in secrets::registry::tests: cache hit returns same Arc; different region / provider � different Arcs; TTL doesn't expire freshly-built entries; TTL=0 treated as no-TTL; concurrent get_or_build returns same Arc with no duplicate insertions under contention; missing entry attempts build and records error without caching the failure. Lib 383 / 0 passing. Lib-only. Server wiki: deployment-specification § Provider registry caching (Phase 6b).
2026-06-06 noetl/server v2.34.0 Secrets Wallet #61 Phase 6a � region tag on keychain entries + per-region routing (noetl/server#111; closes server#110; tracks noetl/ai-meta#61). Starts Phase 6 (residency-aware distributed resolution). Workers + secrets sometimes live in different regions; a credential tagged eu-central-1 should never round-trip through us-east-1 even when the requesting server pod happens to run there. This round establishes the routing primitives � the next rounds (registry caching, residency enforcement, dynamic secrets, cross-region broker) build on top. KeychainDef.region: Option<String> � optional field on each keychain entry (region: us-east-1). No schema migration � lives in the existing keychain JSON blob via the extra flattened map. SecretRef.region: Option<String> � provider-agnostic. AWS uses it as the regional endpoint host (secretsmanager.<region>.amazonaws.com) with explicit precedence: <region>: ref prefix > SecretRef.region (Phase 6a) > legacy SecretRef.project overload > default_region (AWS_REGION env). secrets::server_region() � reads NOETL_SERVER_REGION once at startup (OnceLock); used as the resolver's fallback when an entry doesn't declare its own region. resolver::effective_region(&KeychainDef) � entry's own region wins, otherwise server_region(), otherwise empty (legacy). noetl_secret_resolve_total{provider, region, status} counter per agents/rules/observability.md Principle 1 (region is bounded-cardinality; execution_id stays on the matching span per Principle 4). Five new unit tests in secrets::resolver::tests: region propagates into SecretRef for both map and map-less shapes; effective_region prefers entry's own field over env; literal "" is treated as unset. Lib 376 / 0 passing. Lib-only � backward compatible: entries without region: resolve exactly as before; the new metric records region="-" for the legacy path. Azure / Vault / GCP already pick region from their own ref shape today; the SecretRef.region override for those slots into follow-up rounds. Server wiki: deployment-specification § Region routing (Phase 6a).
2026-06-06 noetl/worker v5.13.0 Secrets Wallet #61 Phase 5c � worker integration: sealed credential delivery (noetl/worker#58; closes worker#57; tracks noetl/ai-meta#61). The worker half of Phase 5 � long-lived X25519 StaticSecret generated once at startup, wrapped in Arc so dispatch clones share the same recipient identity for the worker's lifetime. register_worker includes the base64 public key in the runtime JSON blob, picked up by Phase-5b's RuntimeService::get_worker_public_key (no schema migration). ControlPlaneClient::get_sealed_credential calls GET /api/credentials/{alias}/sealed?worker_id=<name>, decodes the envelope, runs ECDH + HKDF + AEAD via worker-side primitives mirroring the server's src/crypto/sealed.rs (open-only), returns the same Option<Credential> shape get_credential returns. Constants drift-guard test pins SEAL_ALG="x25519-hkdf-sha256-chacha20-poly1305", SEAL_V=1, KDF_INFO=b"noetl-sealed-v1" against the server. auth_alias::fetch_credential_maybe_sealed env-gates the transition: `NOETL_SEALED_CREDENTIALS=true
2026-06-06 noetl/server v2.33.0 Secrets Wallet #61 Phase 5b � wire format + sealing endpoint (noetl/server#109; closes server#108; tracks noetl/ai-meta#61). Wires the Phase-5a primitives into the request path. New GET /api/credentials/{identifier}/sealed?worker_id=<name> returns a SealedEnvelope JSON addressed to the named worker � the server looks up the worker's registered X25519 public key, fetches the credential with include_data=true, serialises the standard CredentialResponse JSON, seals it with the Phase-5a primitives, returns the envelope. Worker opt-in is schema-migration-free: RuntimeService::get_worker_public_key(name) reads runtime["worker_public_key"] (base64) out of the existing noetl.runtime.runtime JSONB column � workers just include {"worker_public_key":"<b64>"} in their register payload's runtime blob. 400 BadRequest with a clear "did not register a sealing pubkey" message when the worker_pool row exists but didn't register a pubkey (covers workers that haven't restarted with the 5c branch yet). Observability per agents/rules/observability.md Principle 1: `noetl_credentials_sealed_total{status="ok"
2026-06-06 noetl/server v2.32.0 Secrets Wallet #61 Phase 5a � sealed payload crypto primitives (noetl/server#107; closes server#106; tracks noetl/ai-meta#61). Starts Phase 5 (sealed payload delivery) � defense-in-depth on top of the now-merged Phase-4 mTLS transport. mTLS encrypts the wire; sealing encrypts the credential payload to a key only the recipient worker holds � cleartext never enters the response body, an operator with kubectl exec on the server pod sees only ciphertext. src/crypto/sealed.rs � standard ephemeral-static sealed box shape: fresh ephemeral X25519 keypair per call � ECDH shared secret � HKDF-SHA256 derives a 32-byte AEAD key + 12-byte nonce via a domain-separated info string � ChaCha20-Poly1305 encrypts with AAD = `
2026-06-06 noetl/server v2.31.0 Secrets Wallet #61 providers 3.x � AWS Secrets Manager + Azure Key Vault (noetl/server#105; closes server#104; tracks noetl/ai-meta#61). Fourth + fifth backends behind the secrets::SecretProvider trait � the 5-provider matrix is complete (GCP / K8s / Vault / AWS / Azure). AwsSmSecretProvider (provider: aws / aws_sm): JSON-over-POST against secretsmanager.<region>.amazonaws.com/ action secretsmanager.GetSecretValue with hand-rolled AWS Signature Version 4 signing (hmac + sha2 + hex, no aws-sdk dep tree); derive_signing_key chain verified by a unit test against AWS's published reference vector (20150830 / us-east-1 / iam) � the final HMAC key bytes match exactly. Ref shape [<region>:]<secret-id>[#<json-key>] with JSON-key extraction for AWS's recommended multi-field convention. Creds from env: AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN (the IRSA-injected triple). AzureKeyVaultProvider (provider: azure / azure_kv): REST GET https://<vault>.vault.azure.net/secrets/<name>[/<version>]?api-version=7.4 authenticated by IMDS Managed Identity (AKS / VMs) bearer with TTL caching. Ref shape [<vault>/]<secret-name>[#<version>]; sovereign clouds via NOETL_AZURE_KEYVAULT_DNS_SUFFIX. 21 new unit tests; lib 357/0; clippy lib-clean. Cloud-only backends � like GCP, full e2e resolution kind-validates at the unit-test layer. Kind probe: server starts cleanly with AWS+Azure env vars set. Follow-ups: STS AssumeRoleWithWebIdentity (AWS), AAD client-credentials (Azure). Server wiki: deployment-specification § Secret providers.
2026-06-06 noetl/worker v5.12.0 Secrets Wallet #61 Phase 4b � worker control-plane mTLS client (noetl/worker#56; closes worker#55; tracks noetl/ai-meta#61). The worker half of Phase-4 transport mTLS � the worker's ControlPlaneClient (credential fetch, command claims, event posts) was a plain reqwest client. New src/client/tls.rs: plain HTTP by default; NOETL_TLS_CLIENT_CERT+NOETL_TLS_CLIENT_KEY present a client cert (reqwest::Identity) for mTLS, NOETL_TLS_CA trusts a private-CA server (Certificate::from_pem_bundle). Built on the rustls-tls reqwest backend already in use; built once in ControlPlaneClient::new, shared across with_server_url dispatch clones. Fail-fast on partial/bad config. 5 unit tests, lib 117/0. Cross-repo kind-val (server in Phase-4a mTLS mode + this worker with a clientAuth cert): client::tls: TLS enabled mtls=true ca=true � Worker registered over https+mTLS, 0 heartbeat failures; a fixtures/playbooks/hello_world execution � COMPLETED (command claim + events all over mTLS), rust worker the sole worker. Finding (� 4c): the wait-for-api init container curls plain HTTP � blocks against an mTLS server. Worker wiki: deployment-specification § Transport security.
2026-06-06 noetl/server v2.30.0 Secrets Wallet #61 Phase 4a � opt-in TLS/mTLS listener for the control-plane API (noetl/server#103; closes server#102; tracks noetl/ai-meta#61). The transport half of sealed secret delivery � the worker�server credential channel (GET /api/credentials/<alias>) was plain HTTP. New src/tls.rs: plain HTTP unchanged by default; NOETL_TLS_CERT+NOETL_TLS_KEY � HTTPS; +NOETL_TLS_CLIENT_CA � mTLS (client cert required + verified, WebPkiClientVerifier). Built on the ring rustls provider the stack already uses (no aws-lc-rs clash); axum-server bind_rustls + graceful-shutdown Handle. 5 unit tests, lib 335/0. Kind-validated (CA + server cert (SAN localhost/svc-DNS) + clientAuth cert via a Secret): Server listening (TLS) tls=true mtls=true; curl --cacert + --cert/--key � 200, no client cert � TLS-rejected, plain HTTP � refused. Operational finding: mTLS breaks HTTP(S) K8s probes � tcpSocket (deployment-spec wiki documents it). Phases 4/5 reordered: transport mTLS = Phase 4 (4a server / 4b worker / 4c ops), payload sealing = Phase 5. Server wiki: deployment-specification § Transport security.
2026-06-05 noetl/server v2.29.0 Secrets Wallet #61 providers 3.x � HashiCorp Vault (KV v2) provider (noetl/server#101; closes server#100; tracks noetl/ai-meta#61). VaultSecretProvider behind the secrets::SecretProvider trait: a provider: vault keychain alias resolves from a Vault KV v2 secret � GET <addr>/v1/<mount>/data/<path> with X-Vault-Token; reference shape [<mount>/]<path>#<key> (bare � single-key), metadata.version carried. Config from VAULT_* / NOETL_VAULT_* env. Like Kubernetes Secrets, Vault runs in-cluster � the second backend fully kind-validatable end-to-end. 12 unit tests, lib 330/0. Kind-validated: a dev-mode Vault + vault kv put secret/duffel token=�, a provider: vault alias resolved the live value via GET /api/credentials/<alias> (keychain.resolve provider="vault" � secrets::vault secret.fetch). Follow-up: Vault K8s auth method (SA JWT � token) to drop the static VAULT_TOKEN. Server wiki: deployment-specification § Secret providers.
2026-06-05 noetl/server v2.28.1 /api/executions list query candidate-first rewrite + status-drift fix (noetl/server#99; closes server#98; #62). The list query GROUP BY'd the entire noetl.event table to find the N most-recent executions � a parallel seq scan of ~3.2M rows on the kind DB (6.5 s), scaling with event volume. Rewritten candidate-first: stage recent picks the N most-recent executions from their per-execution start event (event_type IN ('playbook.initialized','playbook_started'), indexed), then stage stats aggregates over only those candidates' events (indexed by execution_id); the start event is the first event so MIN(created_at) is unchanged � identical list + order. Also fixes the list-vs-detail status drift: the old MAX(CASE � ELSE 'RUNNING') string-MAX always reported RUNNING ('RUNNING' sorts after the terminal states) � replaced with a bool_or terminal-state priority. Kind-validated (3.2M events / 2805 execs): no-filter list 6.5 s � 0.015 s (~430�), identical list; statuses corrected (COMPLETED 41 / FAILED 6 / RUNNING 3, was all-RUNNING); status= filters return correct rows in <0.12 s (were 0 rows / 9 s).
2026-06-05 noetl/server v2.28.0 Secrets Wallet #61 providers 3.x � Kubernetes Secrets provider (noetl/server#97; closes server#96; tracks noetl/ai-meta#61). K8sSecretProvider behind the existing secrets::SecretProvider trait: a provider: k8s (or kubernetes) keychain alias resolves from an in-cluster Secret, read from the API server with the pod's ServiceAccount bearer token + cluster CA. Reference shape [<namespace>/]<secret>/<key>; config from NOETL_K8S_* env with projected-SA defaults; token re-read per fetch. The first secret backend fully kind-validatable end-to-end � no cloud / metadata server (GCP needs GKE). 11 unit tests, lib 318/0. Kind-validated: a provider: k8s alias resolved a real Secret via GET /api/credentials/<alias> (keychain.resolve provider="k8s" � secrets::k8s secret.fetch). Needs secrets: [get, list] RBAC on the server SA (ops follow-up). Server wiki: deployment-specification § Secret providers.
2026-06-05 noetl/tools v2.19.3 python tool accepts nested script.source.code (inline) shape (noetl/tools#33; closes tools#32; #63 round 1). Canonical v10 fixtures express Python source via script: { uri, source: { type, code } }, not only flat code: � so test_script_loading FAILED in the #54 sweep with missing field code. PythonConfig.code is now optional; resolve_code() accepts flat code: OR script.source.code (inline, also the type-less default; flat wins when both present). External sources (type: file/gcs/http) error clearly ("not yet supported") instead of a confusing decode failure. 5 unit tests, lib 233/0. Round 2 (#63): the file/gcs/http loaders + worker adoption + kind-val.
2026-06-05 noetl/server v2.27.2 Orchestrator emits terminal playbook.failed on a deterministic evaluate failure (noetl/server#95; closes server#94; surfaced by noetl/ai-meta#54 e2e sweep). A deterministic evaluate failure (an invalid template in a step code body � e.g. {{ ctx.* }} rendered by engine::commands::build_tool_command � an unknown step in a next arc, malformed routing) was caught in a WARN-only arm and emitted no terminal event, stranding the execution in RUNNING forever (GET /api/executions/{id} never resolved). trigger_orchestrator now emits a terminal playbook.failed (status FAILED, error surfaced in result.context.error, parented on the trigger) on an evaluate Err; transient/infra errors before evaluate stay WARN-only / retryable. Same stall class as the #58 command.failed fix. Unit test locks the precondition (invalid template body � evaluate returns Err); lib suite 307/0. Kind-validated: bad-template playbook � FAILED (was RUNNING-forever); companion fixture fix noetl/e2e#28. Server wiki: event-envelope.
2026-06-05 noetl/server v2.27.1 Parser: NextSpec untagged-variant order � list-form next no longer drops arcs (noetl/server#93; closes server#92). NextSpec is #[serde(untagged)] with the struct Router first; serde deserializes a YAML sequence into a struct positionally, so next: [{step: x}] was eaten by Router (spec from element 0, arcs � []), silently dropping targets � which also made validate_next_refs miss unknown-step references (test_parse_invalid_next_reference failed). Reordered so the sequence-shaped variants precede the struct. Full lib suite 306/0 (was 305/1); canonical { spec, arcs } map form unaffected.
2026-06-05 noetl/server v2.27.0 Secrets Wallet Phase 3c � execution-scoped keychain cache + keychain storage fix (noetl/server#91; closes server#90; tracks noetl/ai-meta#61). Write-through cache so a provider:-backed keychain alias isn't re-fetched from its secret manager per credential lookup: CredentialService composes KeychainService; read before the provider fetch (best-effort � degrades to a fresh resolution, never fails the lookup) + write after (envelope-encrypted, 600s TTL, execution-scoped). Keychain storage fix the cache surfaced: the queries never matched noetl.keychain (id/data/updated_at vs the real cache_key PK / data_encrypted TEXT / NOT-NULL credential_type,cache_type CHECK `secret
2026-06-05 noetl/server v2.26.0 Secrets Wallet Phase 3b R3b � keychain alias resolves from GCP Secret Manager (Phase 3 functionally complete) (noetl/server#89; closes server#88; tracks noetl/ai-meta#61). Wires the resolver into the credential path: CredentialService::get takes execution_id (the worker already sent it); on a credential-store miss with include_data, try_resolve_keychain loads the execution's playbook (start event � catalog_id � content) + workload, find_keychain(alias), and for a provider:-backed entry runs build_secret_provider + resolve_keychain_entry � the live secret, masked downstream. Non-keychain aliases fall through to not-found; store-hit path unchanged. Kind-validated end-to-end: GET /api/credentials/<gcp-keychain-alias>?execution_id=<eid> logged keychain.resolve � provider="gcp" and dispatched through find_keychain � build_secret_provider � resolve_keychain_entry � GcpSecretManager.fetch � Workload-Identity token (fails on kind with no metadata server; succeeds on GKE). auth: "{{ alias }}" against a provider: gcp keychain entry now resolves from GCP Secret Manager � the secret never enters the workflow data flow.
2026-06-05 noetl/server v2.25.0 Secrets Wallet Phase 3b R3a � keychain secret-source resolver logic (noetl/server#87; closes server#86; tracks noetl/ai-meta#61). The pure resolution algorithm: resolve_keychain_entry(kc, workload, provider) renders each map secret-path template against the execution workload (TemplateRenderer), fetches via the provider, and assembles { key: secret_value } (the "auth object as a map" shape � several secrets in one entry); a map-less entry resolves to a single value. Plus build_secret_provider(provider) factory (gcp � GcpSecretManager; mirrors crypto::build_key_manager). 4 async unit tests vs a fake provider; unwired pending the R3b cache-miss hook.
2026-06-05 noetl/server v2.24.0 Secrets Wallet Phase 3b R2 � model keychain secret-source defs (noetl/server#85; closes server#84; tracks noetl/ai-meta#61). The playbook keychain: block is already persisted (catalog raw YAML), so this is modeling, not new storage: KeychainDef gains typed provider / auth / map (lifted out of the unused extra flatten � map is the "auth object as a map" multi-secret shape), and Playbook::find_keychain(alias) looks up how an auth: "{{ alias }}" reference is sourced. 3 unit tests; unwired pending the R3 resolver hook.
2026-06-05 noetl/server v2.23.0 Secrets Wallet Phase 3b R1 � server-side GCP Secret Manager client (noetl/server#83; closes server#82; tracks noetl/ai-meta#61). The secret-resolution engine moves off the worker (the standalone tool was removed in tools v2.19.2) onto the server, so the keychain/credential resolver can fetch a secret on a cache miss, cache it envelope-encrypted (Phase 1), and return it masked. Adds secrets::{SecretProvider, SecretRef, SecretValue} + secrets::GcpSecretManager � Secret Manager REST :access + ambient GKE Workload-Identity token (shares the metadata-token pattern with crypto::GcpKms), token cached; short id or fully-qualified projects/.../secrets/... path. 8 unit tests. Unwired this round (like GcpKms was added before the KMS factory wired it); the R3 resolver hook validates the live GCP fetch on GKE.
2026-06-05 noetl/tools v2.19.2 Standalone secrets tool removed � resolution moves server-side (noetl/tools#31; tracks noetl/ai-meta#61). The secrets workflow tool returned the resolved secret as step output, putting the raw value into the data flow / shared cache / event log � sidestepping the auth:-path masking (worker scrub.rs) and the response-redaction contract. Removed the tool + its orphaned secrets/ provider module; the GCP Secret Manager client is re-homing into the noetl-server keychain resolver (Phase 3b, next to the existing GcpKms). No functional regression: the server only ever emitted secrets for this tool, and no real playbook resolved secrets through it (keychain entries are credential-source declarations, not tool steps). e2e actions_test's test_secrets step removed in lockstep (noetl/e2e#27). 228 lib tests pass.
2026-06-05 noetl/tools v2.19.1 Single secrets tool kind � drop the secret_manager alias (noetl/tools#30; tracks noetl/ai-meta#61). One formal name for secret resolution: removed the behavior-identical secret_manager tool alias + collapsed the gcp provider match to a single gcp spelling. Safe � the server's ToolKind only ever emitted secrets, so the alias was unreachable via dispatch. Deviating playbooks (all keychain entries) migrated to kind: secrets across e2e#26 / ops#161 / travel#58.
2026-06-05 noetl/tools v2.19.0 Secrets Wallet Phase 3a � GCP Secret Manager provider (noetl/tools#29; closes tools#28; tracks noetl/ai-meta#61). The secrets / secret_manager tools now resolve a secret reference to its value at step-execution time, dispatching on a provider field to a backend behind a new SecretProvider trait (noetl-tools::secrets). First provider: gcp � Google Secret Manager via the REST :access endpoint + ambient GKE Workload-Identity token (same token source as the Phase 2 Cloud KMS KeyManager); no SA key material in worker env (execution-model.md's secrets rule). env provider unchanged. SecretsConfig gains project + version; short id or fully-qualified projects/.../secrets/... path. 8 unit tests (URL shaping + base64 response parsing); live GCP fetch validated on GKE (no Workload Identity on kind). AWS SM / Azure KV / Vault / K8s follow behind the same trait.
2026-06-05 noetl/server v2.22.0 Secrets Wallet Phase 2 � GCP Cloud KMS KeyManager + runtime provider selection (noetl/server#81; closes server#80; tracks noetl/ai-meta#61). GcpKms implements KeyManager � wrap_dek/unwrap_dek call Cloud KMS REST :encrypt/:decrypt, authenticated by ambient GKE Workload-Identity token (metadata server, cached), over reqwest (no gRPC dep). Runtime factory NOETL_KMS_PROVIDER selects local (default � LocalDevKms over the master key) or gcp-kms (NOETL_GCP_KMS_KEY = cryptoKey resource). Services take the shared EnvelopeCipher; stored record format unchanged across providers. 24 crypto tests. Kind-validated on local (no regression: kms_provider=local, creds decrypt, fixtures green); GCP KMS round-trip is an operator GKE step. The KEK can now leave the process.
2026-06-05 noetl/server v2.21.0 Secrets Wallet Phase 1c/1d � envelope-encrypt credential + keychain storage (live) (noetl/server#79; closes server#78; tracks noetl/ai-meta#61). Credentials + keychain entries now store under envelope encryption: a per-record DEK (AES-256-GCM), the DEK wrapped by the KEK (LocalDevKms over the master key today; GCP Cloud KMS in Phase 2). Self-describing storage blob {v,alg,ct,dek:{p,kid,kv,ct}} in the existing column � no schema migration. Forward-only: no legacy master-key path (legacy records rejected; re-register). Kind-validated: raw stored form is the envelope JSON, 21 creds re-registered + decrypt, iterator_save_test / json_serialization_save green end-to-end. Phase 1 (envelope encryption) functionally complete with v2.20.0 (1b core) + v2.19.8 (1a fail-closed key).
2026-06-05 noetl/server v2.20.0 Secrets Wallet Phase 1b � envelope-encryption core (noetl/server#77; closes server#76; tracks noetl/ai-meta#61). KeyManager async trait (KEK boundary) + WrappedDek + LocalDevKms (wraps DEKs with the AES-256-GCM master key; GCP/AWS/Azure/Vault KMS land in Phase 2 behind the same trait) + EnvelopeCipher (per-record DEK � AES-256-GCM � wrap; DEKs zeroized). Library-only; 8 unit tests.
2026-06-05 noetl/server v2.19.8 Secrets Wallet Phase 1a � remove all-zeros default encryption key, fail closed (noetl/server#75; closes server#74; tracks noetl/ai-meta#61). The server silently fell back to a hardcoded all-zeros AES key when NOETL_ENCRYPTION_KEY was unset � equivalent to no encryption. Removed; resolve_encryption_key now fails closed (refuses to start) unless a key is configured, with an explicit NOETL_ALLOW_INSECURE_DEFAULT_KEY dev escape hatch (random ephemeral key + loud warning). 5 unit tests. Kind-validated: server reads the key from noetl-secret, existing creds still decrypt, fixtures green. First step of the Secrets Wallet initiative.
2026-06-05 noetl/tools v2.18.5 Dollar-quote-aware statement splitter (noetl/tools#27; tracks noetl/ai-meta#54). The multi-statement splitter (2.18.3) split on ; inside PostgreSQL $$ � $$ dollar-quoted blocks, shredding plpgsql function bodies / DO blocks (unterminated dollar-quoted string). Splitter now tracks dollar-quote state ($$/$tag$; $1 params excluded). Surfaced by postgres_jsonb_test on the regression sweep.
2026-06-05 noetl/tools v2.18.4 Canonical v10 config shapes � duckdb command: alias + http param/header coercion (noetl/tools#26; tracks noetl/ai-meta#54). duckdb accepts command: as an alias for query: (parity with postgres; 9+ fixtures use it). http params/headers/form coerce non-string template-rendered values � integers (offset: "{{ ctx.offset }}") and null (undefined ref) were rejected by the HashMap<String,String> fields (invalid type: integer/null, expected a string); numbers/bools stringified, null entries dropped. Adopted by worker#51 (chore(deps)). Validated on the Rust-only kind stack: pagination/{offset,cursor,max_iterations,pipeline} + duckdb_retry_query + retry_simple_config reach playbook.completed.
2026-06-05 noetl/tools v2.18.3 Multi-statement SQL + postgres command: alias (noetl/tools#24; closes tools#23; tracks noetl/ai-meta#54). Canonical v10 postgres steps use command: (incl. v10_canonical_example.yaml's own step); the config required query. Both postgres (query()/execute() extended protocol) and duckdb (prepare()) rejected multi-statement blocks (CREATE �; INSERT �; SELECT �). Added the command alias + a quote-aware statement splitter (leading statements batched via batch_execute/execute_batch, final statement keeps the typed single-statement path so a trailing SELECT returns typed rows; param-free SQL only). noetl/tools#25 added a task_sequence�duckdb regression test (no release). Adopted by worker#50 (chore(deps)). Validated: duckdb_test + json_serialization_save reach playbook.completed.
2026-06-05 noetl/tools v2.18.2 postgres tool surfaces the real SQL error instead of generic "db error" (noetl/tools#22; closes tools#21; tracks noetl/ai-meta#54). tokio_postgres::Error's Display renders just db error for server-side failures � the real SQLSTATE + message is in the attached DbError. format_pg_error surfaces severity: message (SQLSTATE code) + DETAIL/HINT, with a source-chain fallback for connection/type errors. Wired into the Query + Execute sites. Picked up by the worker via worker#49 (dep bump). Validated on the Rust-only kind stack: a bad query reports ERROR: relation "..." does not exist (SQLSTATE 42P01) in the call.error event.
2026-06-05 noetl/server v2.19.7 Defer task_sequence _prev/_results refs at command build (noetl/server#73; closes server#72; tracks noetl/ai-meta#54). The server pre-rendered the entire task_sequence pipeline config at command-build time, so {{ _prev.* }} / {{ _results.* }} (task_sequence RUNTIME values) rendered against an undefined _prev and � since the v2.19.5 Chainable change � silently became empty strings � malformed sub-task configs (empty SQL VALUES). New render_value_deferring masks {{ ... }} blocks referencing a deferred root with a null-delimited placeholder, renders the rest, then restores verbatim; the Pipeline branch defers _prev/_results. Keeps worker keychain-alias resolution working ({{ pg_auth }} still resolves at build time). Validated: iterator_save_test reaches playbook.completed and writes 3 rows (item1/100, item2/200, item3/300) to the real demo_noetl DB � the deepest v10 path (iterator � task_sequence pipeline � _prev chaining � nested credential � postgres write). 3 new template unit tests.
2026-06-05 noetl/worker v5.11.3 Resolve keychain aliases on task_sequence sub-tasks (noetl/worker#48; closes worker#47; tracks noetl/ai-meta#54). task_sequence pipelines carry the keychain alias on each SUB-task (credential: inside a pipeline entry), not the outer envelope; the task_sequence tool dispatches sub-tasks through noetl-tools (no ControlPlaneClient), so nested aliases never resolved � nested postgres/http steps fell back to a default unreachable connection. Split resolve_auth_alias into a dispatcher + single-tool helper; for a task_sequence envelope the worker walks the tool_config pipeline array and pre-resolves each sub-task's alias before dispatch. Validated: iterator_save_test's process_items.save_item (postgres-in-a-pipeline-in-an-iterator) connects to demo_noetl on the Rust-only kind stack.
2026-06-05 noetl/server v2.19.6 Credential store base64-armors encrypted data for the TEXT column (noetl/server#71; closes server#70; tracks noetl/ai-meta#54). noetl.credential.data_encrypted is TEXT, but the Rust credential path bound the AES-GCM ciphertext as Vec<u8> (BYTEA) and modelled CredentialEntry.data as Vec<u8> � POST /api/credentials 500'd with Rust type Vec<u8> (BYTEA) is not compatible with SQL type TEXT, blocking every keychain credential (and thus any playbook step resolving an auth:/credential: alias). Base64-encode the ciphertext on write + decode on read so it round-trips through the TEXT column. Validated on the Rust-only kind stack: pg_k8s postgres credential registers + decrypts correctly.
2026-06-05 noetl/worker v5.11.2 Resolve keychain alias under the v10 credential: key (noetl/worker#46; closes worker#45). resolve_auth_alias read only the auth: key, but canonical v10 playbook YAML carries the keychain alias under credential: � so every v10 postgres/http step that referenced an alias got no connection fields injected and fell back to a default (unreachable) connection. Accept the alias from auth OR credential; strip both. Validated: iterator_save_test's create_table connects + runs DDL against the real demo_noetl DB on the Rust-only kind stack.
2026-06-05 noetl/server v2.19.5 v10 control-flow end-to-end on Rust-only (noetl/server#69, 6 commits; closes server#68; tracks noetl/ai-meta#54). Phase F R5 Tier 4 re-probe found six server-side gaps that kept v10 control-flow playbooks from completing: (1) catalog get_next_version SQL � smallint+1 promotes to INT4, sqlx rejected the i16 decode so POST /api/catalog/register 500'd � cast the whole expression ::smallint; (2) insert_catalog_entry RETURNING id � catalog_id (no id column); (3) ToolSpec Option fields serialized absent values as JSON null, breaking PythonConfig's HashMap decode (invalid type: null, expected a map) � skip_serializing_if; (4) orchestrator end step with a tool: block now dispatches its action (was skipped as a sentinel) + flatten_task_sequence_data merges task_sequence label-keyed results so {{ step.data.x }} resolves + intra-pass dispatch dedup for parallel branches converging on end; (5) template UndefinedBehavior::Chainable so {{ ctx.x | default(step.y) }} resolves when ctx is undefined (default Lenient failed attribute access before default() ran); (6) drop the step != "end" orchestrator trigger gate so end's command.completed makes check_completion emit playbook.completed. Validated: four v10 fixtures (start_with_action, end_with_action, loop_test, control_flow_workbook) reach playbook.completed on the Rust-only kind stack; actions_test correct-fails on a missing TEST_SECRET env.
2026-06-05 noetl/worker v5.11.1 Preserve array tool_config for task_sequence (noetl/worker#44; closes worker#43). The command-dispatch path replaced any non-object tool_config with {}, silently dropping the pipeline array the server sends for every v10 tool: [...] step (routed as task_sequence) � so each such step failed at the registry with "config must be array or object with tasks field". Wrap arrays under a tool_config key so noetl-tools v2.18.1's parse_tasks receives its worker-envelope shape. Resolves to noetl-tools 2.18.1.
2026-06-05 noetl/tools v2.18.1 task_sequence parse_tasks accepts worker-envelope (noetl/tools#20; tracks noetl/tools#19). Priority-ordered three-shape parser: bare array, {tool_config: [...]} (the Rust worker's {args, tool_config, render_context} envelope), {tasks: [...]} (forward-compat). Without the tool_config key the worker's rendered pipeline config failed to decode.
2026-06-04 noetl/server v2.19.4 Orchestrator template context: expose step data at top level + capture call.done (noetl/server#67; tracks noetl/ai-meta#60, noetl/server#66). Two interlocking gaps in engine/state.rs that prevented v10 playbooks from advancing past steps whose next.arcs / step.when reference {{ step_name.field }}. (1) build_context only exposed step results under steps.<name>; v10 YAML omits the prefix, so the arcs' when expressions resolved to undefined. (2) apply_event overwrote step.result with command.completed's bare {status, command_id} envelope � the rich payload that call.done carries was silently dropped. Fix: new extract_user_data helper unwraps the standard outer.context.result.context.data envelope; build_context exposes each step's unwrapped data at the top level alongside back-compat steps.<name>; apply_event adds a call.done/action_done arm + guards command.completed against overwriting prior results. 4 new state tests; 14/14 engine::state tests pass. Validated end-to-end in local kind on the Rust-only stack: control_flow_workbook now runs fully end-to-end (start � eval_flag � hot_path � parallel hot_task_a + hot_task_b � playbook.completed) � exercising every prior Rust-only fix in a single execution: EE-5 lax decode (v2.19.1) + workload + input alias (v2.19.2) + pipeline shape / failure termination / workbook resolution (v2.19.3) + PythonTool result capture (noetl-tools 2.17.1) + TaskSequenceTool (noetl-tools 2.18.0). Same iteration: kind cluster legacy-Python-deployment cleanup (noetl-server, noetl-worker, noetl-outbox-publisher deployments + noetl-projector statefulset + noetl/noetl-ext/noetl-projector/noetl-worker-metrics services + associated configmaps/SAs); only Rust components remain.
2026-06-04 noetl/worker (no tag; chore(deps)) Bump noetl-tools 2.16 � 2.18 (noetl/worker#42). Picks up PythonTool result-global capture (v2.17.1) + TaskSequenceTool (v2.18.0) without worker-side code changes � both fixes dispatch through the existing dynamic-registry path. cargo update -p noetl-tools resolves to 2.18.0; 110/110 worker tests still pass.
2026-06-04 noetl/tools v2.18.0 TaskSequenceTool (noetl/tools#18; tracks noetl/tools#15). Runtime for the v10 list-shaped tool: [...] pipeline. noetl-server emits tool.kind = task_sequence for any pipeline-shape step (per noetl/server#61); without this tool the worker errored with "Tool not found: task_sequence". Accepts both wire-config shapes (bare array per server's current emit, plus future-proof {tasks: [...]} wrap). Dispatches sub-tasks through a fresh create_default_registry() instance. Threads results across tasks via _prev (last task's data) + _results (full labeled map) injected into the Jinja context; renders each task's config against the augmented context before dispatch. First-failure-stops semantics: failed_task: <idx> + labeled partial results when a sub-task fails. 8 new tests including end-to-end two-python-task pipeline with {{ _prev.value }} cross-task reference + failure-short-circuit semantics.
2026-06-04 noetl/tools v2.17.1 PythonTool result-global capture (noetl/tools#17; tracks noetl/tools#16). User code's result = {...} global is captured via a sentinel-marker line (@@__NOETL_RESULT__@@) on stdout and exposed as the tool's data field. Without this every playbook whose next.arcs / step.when guards read step results stalled � the orchestrator couldn't resolve {{ step_name.field }} references to a real value. User print(...) calls before the marker line stay visible in cleaned stdout for debugging. Legacy back-compat path (parse-stdout-as-JSON) preserved when no marker is present. 5 new tests (helper unit + subprocess-level).
2026-06-04 noetl/server v2.19.3 Rust-only e2e expansion � three orchestrator + parser fixes shipped together (noetl/server#61 + #63 + #65; tracks noetl/ai-meta#57 + #58 + #59). All three surfaced during the same e2e probe sequence on the Rust-only stack (Python deployments scaled to 0). (1) Pipeline flat shape (#61): ToolDefinition::Pipeline accepts both YAML shapes � flat ({name: label, kind: ...}, the v10 dominant form: 446 e2e occurrences vs 37 for the nested form) and nested ({label: {kind: ...}}). Untagged enum tries flat first; custom Serialize normalises to nested on the wire so the worker's task_sequence consumer is unchanged. Unblocked 3+ catalog fixtures from 400 parse rejection. (2) Failure termination (#63): four interlocking gaps fixed so a failing step emits playbook.failed instead of stalling forever � handler trigger gate extended to command.failed; trigger_orchestrator derives trigger_event_type from the actual trigger event instead of hardcoding completion; process_in_progress short-circuits on command.failed using durable StepState::Failed signal (not error-string presence); state.rs::apply_event extracts error from result.context.error fallback so step.error gets populated. Parallel-branch deferral preserved (wait for siblings before terminal emit). (3) Workbook resolution (#65): parser substitutes tool.kind: workbook references with the inline action from playbook.workbook[] before validate; merges step input over workbook input defaults (step wins). Error messages list available action labels for discoverability. Together these unblock the canonical v10 control-flow / parallel / iterator / workbook patterns end-to-end on the Rust-only stack. Two follow-up gaps surfaced + filed for noetl-tools work: task_sequence runtime (noetl/cli#53) + PythonTool result capture (noetl/cli#54) � both noetl-tools changes, not server.
2026-06-04 noetl/server v2.19.2 v10 playbook workload + input alias (noetl/server#59). Two interlocking YAML-decode gaps surfaced by the hello_world e2e probe on the Rust-only stack (noetl/ai-meta#56, noetl/server#58). (1) ToolSpec.args accepts input as a serde alias � the canonical v10 playbook YAML writes tool.input: { ... } (446 occurrences across e2e fixtures vs. 37 for args:), but Rust silently dropped the unknown field at decode and any script referencing the workload by name raised NameError. (2) emit_playbook_started_event merges playbook.workload (YAML defaults) with request.payload (overrides) before writing the event � without this, downstream steps' build_context() returned {} because ExecutionState hydrates state.workload from the playbook_started event, and only the start step's command context had the right merge. Validation: hello_world prints HELLO_WORLD: Hello World end-to-end (was HELLO_WORLD: None after fix 1 alone; was NameError before either). Three states map to three failure modes � handy diagnostic for the next playbook to land. Unblocks all v10-shape playbooks on Rust e2e.
2026-06-04 noetl/server v2.19.1 EE-5 follow-up � lax decode of integer execution_id / event_id (noetl/server#57). Closes the wire-type drift that blocked every Rust worker � Rust server event emission (noetl/ai-meta#55, noetl/server#56). Worker emits noetl-events::ExecutorEvent whose execution_id / event_id are i64, serializing as JSON integers; server's EventRequest / BatchEventRequest strict-decoded String and 422'd with invalid type: integer, expected a string. Python had hidden the drift for over a year via Pydantic v2's lax int�str coercion; serde does not. Fix: deserialize_string_or_i64 + deserialize_optional_string_or_i64 adapters applied to EventRequest.execution_id, EventRequest.event_id, BatchEventRequest.execution_id. Outbound encoding stays String (browser JSON-number precision intent preserved). 6 new unit tests pin the dual-shape contract; bogus shapes (arrays / objects) still reject. Validated end-to-end in local kind on the Rust-only stack (Python deployments scaled to 0): noetl exec tests/fixtures/playbooks/hello_world runs both steps to playbook.completed � the same scenario that returned Failed to emit event after 3 retries on v2.19.0. Wiki update lands in lockstep on noetl/server wiki event-envelope § EE-5. Unblocks Phase F R5 Rust-only e2e tier; pure input-side change so no client breaks.
2026-06-04 noetl/ops n/a (script) Phase F R4-5 � R4 complete (noetl/ops#160). Kind validation script for N=2 shard routing. Creates noetl_shard_0 + noetl_shard_1 + noetl_cluster databases on the existing postgres pod (idempotent WHERE NOT EXISTS; per-pod isolation is a Phase G concern), applies the noetl schema DDL to each via the already-mounted schema_ddl.sql.norun, patches noetl-server-rust deployment with NOETL_SHARDS + NOETL_CLUSTER_DSN env vars (DSN shape per ShardConnection::parse: host=...;port=...;user=...;password=...;database=...), probes R3b-1 endpoint to confirm shard_count=2, spawns N executions (default 10) via POST /api/execute, asserts each landed on the predicted shard per shard_for(execution_id, 2) by querying each per-shard DB directly, re-runs validate-shard-drift-guard.sh (R3b-3) against the sharded server, and trap-EXIT reverts the deployment patch on any exit path � idempotent and safe to re-run. R4 cutover sequence shipped end-to-end across noetl/server v2.13.0 � v2.19.0 (R4-1 pool layer + R4-2 AppState wiring + R4-3a/b/c handler cutover + R4-4 fan-out helpers + R4-4b ExecutionService refactor) + noetl/ops kind validation. Closes noetl/server#48 (R4 sub-issue). Next + only remaining Phase F round: R5 (production cutover � separate ops decision after the kind validation passes on the operator's live cluster). Phase G (deferred): keychain-backed noetl.shard_endpoint + result_ref.credential_alias, design + 5-round decomposition recorded in noetl/server wiki sharding-design § Phase G.
2026-06-04 noetl/server v2.19.0 Phase F R4-4b � ExecutionService refactor + list fan-out (noetl/server#55). Refactors ExecutionService to hold a DbPoolMap instead of a single DbPool; rewrites the cluster-wide list() endpoint as a per-shard fan-out with cluster-master catalog stitching. Service constructor: ExecutionService::new(pools: DbPoolMap, snowflake) (was (db: DbPool, ...)); new new_legacy(db, snowflake) shim for test / example callers without a pool map in scope. 9 per-execution sites (across get, get_status, cancel, is_cancelled, finalize) move to pool_for(execution_id); the catalog path lookup inside get() moves to pools.cluster(). list() rewritten: per-shard for_each_shard runs the execution_stats CTE filtered by catalog_id + status (no catalog JOIN) with (limit + offset) per-shard over-fetch since any single shard could contribute every row in the merged window after sorting by started_at DESC; merge rows, sort, single cluster query SELECT catalog_id, path FROM noetl.catalog WHERE catalog_id = ANY($1), stitch paths, apply path-LIKE filter post-merge (case-insensitive substring in Rust), skip(offset) + take(limit). Path filter quirk documented inline: when path is set the effective row count after the post-merge filter can be smaller than limit; future R4-5+ optimisation could push the filter into the cluster lookup as a pre-filter. main.rs wires state.pools.clone() into ExecutionService::new. cargo build clean; 250/251 full suite passes (one pre-existing parser failure carried over from R3b-1, unrelated). No new dedicated unit tests � R4-4's for_each_shard tests + the routing-math pins guard correctness at the helper layer; end-to-end behaviour validates against live multi-shard Postgres in R4-5. R4-3 + R4-4 + R4-4b together complete the in-server migration to DbPoolMap. R4-5 (kind validation N=2 shards in noetl/ops) is the next + final R4 round.
2026-06-04 noetl/server v2.18.0 Phase F R4-4 � cross-shard fan-out + event_id resolver (noetl/server#54). Builds on R4-3's complete handler cutover. Two new DbPoolMap helpers: for_each_shard<F, Fut, T, E>(f) -> Result<Vec<(u32, T)>, E> (sequential per-shard fan-out; collects outputs in shard-index order) and find_first<F, Fut, T, E>(f) -> Result<Option<(u32, T)>, E> (probes every shard, returns first Some). Sequential await (no futures crate dep, no tokio::spawn 'static bounds); parallelism is a Phase G concern. Closes the 2 event_id-keyed TODO(R4-4) sites from R4-3a: get_command uses find_first to locate the shard owning the event_id; claim_command resolves event_id � execution_id via find_first, then opens the tx on pool_for(execution_id) keeping shard-locality on the claim tx (inner SELECT + subsequent INSERTs all hit the same shard). state.db is now absent from the entire src/ tree � R4-3 + R4-4 together complete the handler-side migration to DbPoolMap. 4 new tokio pool tests: for_each_shard ordering + first-error propagation; find_first None/Some paths. 9/9 db::pool tests pass; 250/251 full suite passes (one pre-existing parser failure carried over from R3b-1, unrelated). Full ExecutionService refactor (DbPoolMap ownership + per-execution method migration + GET /api/executions list fan-out with the catalog join split into per-shard events aggregation + cluster-master catalog lookup) is the natural next slice (R4-4b) � deliberately deferred to keep this PR diff reviewable.
2026-06-04 noetl/server v2.17.0 Phase F R4-3c � health.rs cutover; R4-3 complete (noetl/server#53). Final slice of R4-3 (per-execution handler cutover) on top of R4-3a (events.rs, v2.15.0) + R4-3b (execute.rs, v2.16.0). 3 cluster-wide sites in health.rs migrate to state.pools.cluster(): db_health_check for GET /api/health, pool.size() + pool.num_idle() for GET /api/pool/status. Inline comments at both handlers call out the future per-shard health surface (/api/health/shards iterating state.pools.all_shards() + per-shard pool_size / pool_idle gauge labels on /metrics) � not blocking the basic readiness signal. R4-3 complete across events.rs + execute.rs + health.rs. Only 2 state.db sites remain in the entire codebase � both in events.rs (get_command + claim_command), both event_id-keyed, both explicitly TODO(R4-4)-tagged. R4-4 closes those out + adds the cluster-wide GET /api/executions list fan-out. cargo build clean; 20/20 R4 unit tests still pass; 246/247 full suite (one pre-existing parser failure carried over from R3b-1, unrelated).
2026-06-04 noetl/server v2.16.0 Phase F R4-3b � execute.rs per-execution handler cutover (noetl/server#52). Second slice of R4-3 on top of R4-3a (v2.15.0). Clean 3+3 split � 3 cluster-wide noetl.catalog reads (resolve_catalog by catalog_id, resolve_catalog by path, get_playbook_yaml content/payload fetch) move to state.pools.cluster(); 3 per-execution writes (playbook_started event INSERT, command.issued event INSERT, insert_command_row command INSERT) move to state.pools.pool_for(execution_id). state.db no longer appears in execute.rs. Same single-pool fallback verification as R4-3a: when NOETL_SHARDS is empty, all three accessors resolve to the same DbPool handle. cargo build clean; 20/20 R4 unit tests still pass; 246/247 full suite (one pre-existing parser failure carried over from R3b-1, unrelated). R4-3c next: health.rs (3 sites, all cluster-wide); then R4-4 (cluster-wide list fan-out + the event_id-keyed deferrals from R4-3a).
2026-06-04 noetl/server v2.15.0 Phase F R4-3a � events.rs per-execution handler cutover (noetl/server#51). First slice of R4-3 (per-execution handler cutover) on top of R4-2's AppState.pools wiring. 9 sites in events.rs migrate from state.db to state.pools.pool_for(execution_id): handle_event_inner event INSERT, handle_batch_events tx begin, check_already_claimed SELECT, get_catalog_id SELECT, trigger_orchestrator events fetch_all + 2 INSERTs. 1 cluster-wide site (trigger_orchestrator reading noetl.catalog) moves to state.pools.cluster(). 2 sites deferred to R4-4 (both event_id-keyed; execution_id not in scope at request time): get_command (GET /api/commands/{event_id}) and claim_command (POST /api/commands/{event_id}/claim). Both carry inline TODO(R4-4) markers � they work as-is in single-pool fallback mode; in sharded mode they need either a path-param redesign (/api/commands/{execution_id}/{event_id}/...) or a cross-shard probe helper R4-4 will pick one of. No new tests in this slice � the cutover is a mechanical pool-handle swap and R4's existing routing pins guard the contract. 246/247 full suite passes; one pre-existing parser failure carried over from R3b-1, unrelated. Mid-round design pivot: user raised the question whether shard endpoints should be DB-resident rows with credential references into the keychain (esp. for multi-cloud + per-shard credential rotation). Decision: deferred to Phase G (noetl/server wiki sharding-design § Phase G records the full design + 5-round decomposition). R4 continues with env-var DSN model through R4-5.
2026-06-04 noetl/server v2.14.0 Phase F R4-2 � DbPoolMap wired into AppState (noetl/server#50). Builds on R4-1's pool layer (v2.13.0). New AppState.pools: DbPoolMap field alongside the existing db: DbPool; both populated. No handler call-site changes � handlers continue reading state.db unchanged. R4-3 (next round) does the disciplined per-execution cutover (~12 files) so each PR's diff stays reviewable rather than landing a ~70-call-site swap in one go. New AppState::new_legacy(db, config, nats) shim wraps an already-created pool in a single-pool DbPoolMap for test / example code that doesn't have a ShardingConfig. New sync constructor DbPoolMap::from_single_pool(pool) lets the shim skip the async DbPoolMap::new path. main.rs loads ShardingConfig::from_env(), builds the map (sharded when NOETL_SHARDS set, fallback otherwise), logs shard count + single-pool mode at startup so operators see the routing topology in the first log lines. Doc example in src/lib.rs updated to new_legacy since the 3-arg new signature is gone. 2 new tokio pool tests using sqlx connect_lazy_with to fabricate a dummy pool (single-pool mode flag, all_shards cardinality, i64-extreme inputs -1/i64::MAX/0 don't panic). 20/20 R4 tests pass; one pre-existing parser failure carried over from R3b-1, unrelated.
2026-06-04 noetl/server v2.13.0 Phase F R4-1 � per-shard DB pool layer (noetl/server#49). First slice of R4 (DB sharding via per-shard physical Postgres � NOT Citus; explicit R4 decision after weighing Citus' Postgres-extension dep against per-shard physical isolation). Adds ShardConnection (host/port/user/password/database parsed from semicolon-separated DSNs � chose semicolons over & to keep DSNs obviously distinct from URL query strings since the outer NOETL_SHARDS separator is the comma) + ShardingConfig { shards: Vec<ShardConnection>, cluster: Option<ShardConnection> } loaded from NOETL_SHARDS + NOETL_CLUSTER_DSN. New DbPoolMap holding N per-shard pools + 1 cluster pool with three accessors: pool_for(execution_id) (picks via existing sharding::shard_for so R3b drift-guard contract holds), cluster() (always-master pool for catalog/credential/keychain/runtime/schedule/resource/manifest), all_shards() (iterator for R4-4 fan-out). Single-pool fallback when NOETL_SHARDS empty: degenerate one-shard map whose cluster handle IS the only shard pool � every accessor returns the legacy pool; behaviour bit-identical to today (this is how R4 ships dormant). 12 new config tests + 2 new pool tests pinning shard_for(1, 2/4/16/64/1024) � 1/1/5/21/405 against the R3b drift-guard pairs; 18/18 R4-1 tests pass. No handler changes � R4-2 wires the map into AppState next, R4-3 cuts over per-execution handlers, R4-4 adds the cluster-wide list fan-out, R4-5 lands the kind manifest split. Closes noetl/server#48 for R4-1 specifically (umbrella stays open through R4-5).
2026-06-04 noetl/ops n/a (script) Phase F R3b-3 � shard-drift-guard integration test (noetl/ops#158). Closes out the R3b drift-guard sequence: automation/development/validate-shard-drift-guard.sh posts to BOTH the noetl-server R3b-1 endpoint AND the noetl-gateway R3b-2 endpoint across a battery of 30 (execution_id, shard_count) pairs and asserts shard_index agreement. Battery: execution_id � {1, 42, 9999999999, 320816801799737344, 9223372036854775807, -1} � shard_count � {2, 4, 16, 64, 1024}. Per-pair extraction via python3 JSON parser (NOT sed � server's response carries a nested server_config.shard_index field a greedy regex would match instead of the top-level field). Exit code 0 = all pairs agree; exit code 1 + diagnostic table on disagreement. Idempotent: trap cleanup EXIT kills port-forwards on any exit path. Catches the four drift modes the per-side unit tests can't see � twox-hash crate version split, SHARD_HASH_SEED divergence, i64 � bytes endianness flip, hash crate internal algo change with non-major bump. Server side sanity-validated in local kind against noetl-server-rust v2.12.0 (30/30 pairs return expected varying shard_index distribution); full cross-side execution deferred to operator since gateway v3.2.0 not currently in local kind. Wiki update lands in lockstep per Rule 2a: noetl/ops wiki Home "Kind-cluster validation rigs" table gains a row. Phase F R3b sequence (R3b-1 + R3b-2 + R3b-3) is now complete; next is R4 (DB sharding) or R5 (cutover). Closes noetl/ops#157.
2026-06-04 noetl/gateway v3.2.0 Phase F R3b-2 � gateway shard-info twin endpoint (noetl/gateway#26). Adds GET /sharding/preview?execution_id=<i64>&shard_count=<u32> on the gateway, returning the gateway's own shard_for() result. Computes LOCALLY � NOT a proxy passthrough. The drift-guard's whole point is that both sides compute independently so the integration test (R3b-3) can verify agreement. Routed outside the /noetl/* proxy catchall (which would defeat the purpose by forwarding to the server) and outside the auth middleware (deterministic, no data leak). Response mirrors the server's R3b-1 shape with source: "noetl-gateway"; omits the server's server_config field since the gateway's ShardMap shape (Vec) differs from the server's ShardConfig (single index + count) � R3b-3's integration test compares only shard_index across sources. Validation: 400 for non-numeric execution_id, shard_count == 0, or shard_count > 1024 (mirrors server-side limits). Full suite 44/44 (no regressions on prior R3a + R3a-2 tests). Wiki update lands in lockstep per Rule 2a: configuration page gains "Diagnostic endpoint (Phase F R3b-2)" subsection. Pair fully assembled: R3b-3 (integration test in noetl/ops POSTs to both, asserts shard_index agreement across a battery of (eid, N) pairs) closes out the drift-guard. Closes noetl/gateway#25.
2026-06-04 noetl/server v2.12.0 Phase F R3b-1 � shard-info diagnostic endpoint (noetl/server#47). First slice of R3b (end-to-end drift-guard between gateway and server). Adds GET /api/runtime/shard-info?execution_id=<i64>&shard_count=<u32> returning the server's shard_for() result + diagnostic context. Public (no auth gate � pure math, deterministic, no DB access, no state mutation, no secret leak; different from /api/internal/* which is gated for the system pool). Response: { execution_id, shard_count, shard_index, source: "noetl-server", hash_function: "twox_hash::XxHash64", seed: 0, server_config: { shard_index, shard_count } }. Validation: 400 for non-numeric execution_id, shard_count == 0, or shard_count > 1024 (10-bit machine_id ceiling). 3 new handler tests + 229/230 full suite (one pre-existing parser failure unrelated). Wiki update lands in lockstep per Rule 2a: deployment-spec page gains "Diagnostic endpoint (Phase F R3b-1)" subsection. Pairs with R3b-2 (gateway twin endpoint) + R3b-3 (integration test in noetl/ops POSTs to both, asserts agreement across (eid, N) pairs) � catches drift that unit-test pinning can't see, e.g. one side's twox-hash crate version bumped without the other's. Closes noetl/server#46.
2026-06-04 noetl/gateway v3.1.0 Phase F R3a-2 � body-param JSON routing for events endpoints (noetl/gateway#24). Closes the routing gap left by R3a (path-param only): POST /noetl/events + /noetl/events/batch carry execution_id inside the JSON body, so R3a's path-extractor wouldn't reach them. Two new helpers in src/sharding.rs: path_carries_execution_id_in_body(path) (predicate gating the body-parse cost; matches exactly events + events/batch); extract_execution_id_from_body(body_bytes) -> Option<i64> (parses JSON, reads top-level execution_id, accepts both string + number encodings). proxy::proxy_request restructured via req.into_parts() so body reads BEFORE the target URL builds; ProxyState::resolve_upstream(path, body_bytes) gained an optional second parameter. Fall-back semantics unchanged: when the shard map is empty or neither path nor body yields a parseable id, gateway forwards to the default NOETL_BASE_URL. 12 new sharding tests (path predicate + body extraction edge cases: string/number/negative/batch envelope/missing/non-numeric/invalid JSON/empty/array root/nested id ignored). 30 sharding-module tests + 44 full-suite tests, no regressions. Wiki: configuration page split "Routes covered" into path-param + body-param subsections (Rule 2a in lockstep). Commit-body discipline: used "non-disruptive" instead of the literal BREAKING CHANGE token, so semantic-release correctly minor-bumped (no false-positive major like R3a's v3.0.0). Closes noetl/gateway#23.
2026-06-04 noetl/gateway v3.0.0 Phase F R3a � gateway-side shard routing infrastructure (noetl/gateway#21). First slice of R3 (gateway-side dispatch). Mirrors the sharding-design doc + the server-side shard_for() helper from noetl-server v2.11.0. Ships as dormant infrastructure � when the new optional shards config is absent (current single-replica deployments), behavior is unchanged. New src/sharding.rs (18 unit tests) with shard_for(execution_id, N) using twox_hash::XxHash64 + fixed seed 0 � IDENTICAL implementation to noetl-server's shard_for(). Both repos pin the same (eid, N) � shard expected values; drift on either side breaks the build. ShardEndpoint, ShardMap, extract_execution_id_from_path helpers; NoetlConfig.shards: Vec<ShardEndpoint> opt-in field; ProxyState.resolve_upstream(path) helper. Routes covered: path-param execution_id (/noetl/executions/{id}/..., /noetl/vars/{id}/...). Routes deferred: body-param (R3a-2), commands DB lookup (R3b), cluster-wide (any shard answers). Wiki update lands in lockstep per Rule 2a: configuration page gains "NoETL shard map (Phase F R3a)" subsection. Closes noetl/gateway#20. Note on the major bump: semantic-release misparsed the commit body � the phrase "no breaking change to wire shape" triggered its BREAKING CHANGES heuristic and bumped v2.14.1 � v3.0.0 instead of the intended minor. Cargo.toml on the merging PR said 2.10.0 (stale local view); a tiny follow-up patch will align it to 3.0.0. Lesson: avoid the literal "BREAKING CHANGE" / "breaking change" token in commit bodies, even with a negation.
2026-06-04 noetl/server v2.11.0 Phase F R2 � server-side shard_id() helper + ShardConfig (noetl/server#45). Implements the routing-key derivation called out in the sharding-design doc. Ships as infrastructure only � helper available to handlers but no live request path enforces shard membership yet. Default behavior (NOETL_SHARD_COUNT=1, NOETL_SHARD_INDEX=0 � both unset) is a no-op for current deployments. New src/sharding.rs module with ShardConfig, shard_for(execution_id, N), 12 unit tests pinning stability + distribution + partition semantics. Hash: twox_hash::XxHash64 with fixed seed 0 � stable across Rust releases, server restarts, replicas (the property R4 needs to never re-shuffle data); alternatives rejected (DefaultHasher unstable; ahash randomized; FNV-1a weak avalanche on sequential i64s). Two new env vars NOETL_SHARD_INDEX + NOETL_SHARD_COUNT on AppConfig; AppState.shard: Arc<ShardConfig> initialized at startup with shard_index < shard_count validation. Wiki update lands in lockstep per wiki-maintenance.md Rule 2a: the deployment-specification page gains both env vars + a new ## Sharding section. Closes noetl/server#44. Phase F R3 (gateway-side dispatch + LB config) is the next round.
2026-06-06 noetl/server v2.10.1 Phase F R1.5 follow-on (noetl/server#43). Rule 2a (wiki-maintenance) drift catch: the R1.5 PR named the config field machine_id, which under envy::prefixed("NOETL_") resolves to env var NOETL_MACHINE_ID � NOT NOETL_SERVER_MACHINE_ID as the deployment-spec page documented. Caught when starting the ops manifest update. Renamed field to server_machine_id so envy reads NOETL_SERVER_MACHINE_ID (matching docs + more specific name for shared deployment manifests). Patch bump. Pairs with noetl/ops#156 which adds the env var to the server-rust Deployment with value: "1" for the current single-replica setup + inline comments on per-replica derivation patterns for scaling beyond 1.
2026-06-06 noetl/server v2.10.0 Phase F R1.5 � application-side snowflake ID generation (noetl/server#42). Load-bearing prerequisite for R4 (DB sharding) per agents/rules/observability.md Principle 3. execution_id / event_id / command_id now minted by a new src/snowflake.rs module instead of the DB-side noetl.snowflake_id() Postgres function. New env var NOETL_SERVER_MACHINE_ID (10-bit, per-replica) drives the machine-id portion of the snowflake; falls back to HOSTNAME hashed via FNV-1a for local dev. 12 call sites flipped from generate_snowflake_id(state).await? � state.snowflake.generate()? across execute.rs / events.rs / runtime.rs / execution.rs. Local snowflake helpers dropped. RuntimeService::new + ExecutionService::new gain an Arc<SnowflakeGenerator> param. 10 new unit tests for the snowflake module (machine_id validation, monotonic ids within a thread, sequence rollover on 4096/ms, concurrent generators stay unique across 8 threads � 1000 ids, FNV-1a derivation stability). DB-side noetl.snowflake_id() function STAYS as the ad-hoc admin fallback per observability.md's per-component migration order. Wire format unchanged. Follow-up: noetl/ops manifest update to set NOETL_SERVER_MACHINE_ID per-replica in kind + GKE. Companion wiki update: new deployment-specification page captures the full env-var catalogue (including this new one), runtime contract, network surface, health probes, secrets handling, kind-validation procedure. Merged via #42, closes noetl/server#41.
2026-06-05 noetl/cli noetl-events 0.1.0 (workspace member, first release) + noetl-executor 0.4.0 + noetl 4.9.0 EE-4 PR 2 crates.io publish landed (cli release workflow run 26927431551). Manual gh workflow run release-cli on cli@d6e2432 after the PR 2 prep commits were on main. noetl-events 0.1.0 is the first release of the shared event-envelope workspace member � ExecutorEvent + EventSink + EventEmitter + NoopSink carved out of noetl-executor::events in noetl/cli#49 so noetl-server can link the canonical envelope without dragging in the rest of the executor's surface. noetl-executor 0.4.0 re-publishes with the new noetl-events = "0.1" dep + bumps the dep-graph version (no public-API change; the executor's events module is a verbatim re-export of the new crate). noetl 4.9.0 finally landed on crates.io � the previous v4.9.0 release run had failed at bin verification with cannot find function gcs_upload in module noetl_executor::tools_bridge (pre-existing drift between live 0.3.1 and current source); the 0.4.0 re-publish exposed the symbol and unblocked the bin verification.
2026-06-05 noetl/server v2.9.0 EE-4 PR 3 � noetl-server adopts the canonical noetl-events envelope (noetl/server#38). Direct dep on noetl-events = "0.1" (just published on the row above). Adds From<noetl_events::ExecutorEvent> for EventRequest + TryFrom<&EventRequest> for noetl_events::ExecutorEvent impls in src/handlers/events.rs. Server's EventRequest keeps its 5 server-only fields (result_kind, result_uri, event_ids, actionable, informative) and its String wire format for execution_id / event_id (browser JSON-number precision); the SHARED SUBSET is what's now anchored to the canonical envelope, replacing the hand-aligned doc-comment promise. Four wire-compat tests pin the shared-subset round-trip semantics so a future change to either type breaks the build here instead of in a kind-val cycle. Minor-semver bump (new dep, no public-API change, no runtime behavior change in the live POST /api/events handlers � the new conversions are infrastructure callers can use to thread the canonical envelope through downstream code in follow-up work). EE-4 sequence fully shipped: PR 1 (noetl/cli#49) extracted the workspace crate, PR 2 (noetl/cli#50) prepped the publish, PR 3 wired the server adoption. Merged via #38.
2026-06-05 noetl/server v2.8.3 Phase D R3 e2e fully closed � R3b iterators reach playbook.completed (noetl/server#37). Two fixes that together drain the iterator path end-to-end: (1) engine/commands.rs::build_iteration_command now injects iteration variables (item_var, _index, _total) into tool.config.args so the worker's Python tool sees them as globals via globals().update(args) � previously these went only into the render context (Jinja templating) and the Python code raised NameError: name 'item' is not defined; (2) engine/state.rs step.enter handler now reads iterations_expected from event.result.context.iterations_expected (the canonical storage shape after trigger_orchestrator wraps EventToEmit.context in the constraint-compliant {status, context} envelope per #29), in addition to the original event.context fallback � without this, iterations_expected was never populated in state, so command.completed for each iteration fell into the "plain step" branch and marked the step Completed on the FIRST iteration, causing playbook.completed to fire prematurely after only 1 of 3 iterations. New reproducer unit test simulates the exact kind-val event sequence; 32/32 engine tests pass. Kind validation: tests/fixtures/r3b_iterator (3 iterations over [1, 2, 3]) � all 3 iterations complete BEFORE the terminal playbook.completed. Phase D R3 end-to-end is now fully closed across all four fixtures: R2 (linear) + R3a (conditional, both branches) + R3b (iterator, 3 iterations) + R3c (parallel) all reach playbook.completed cleanly with both Rust + Python worker pools running. Merged via #37.
2026-06-05 noetl/server v2.8.2 Phase D R3 end-to-end gap closed � orchestrator skip-chain re-entry guard (noetl/server#36). Surfaced during the R3 end-to-end re-validation after noetl/ai-meta#53 unblocked the dual-trigger path (worker now routes lifecycle events to the publishing server, so orchestrator re-triggers fire on every command.completed). The R3a skip-chain had "already done / already running" guards on the direct next_step transition path but NOT on the chain target after the inline walk forward. Consequence: on every subsequent command.completed after the first, the orchestrator re-evaluated start's next.arcs � middle (skipped) � tail, and emitted a fresh step.enter(tail) + command.issued(tail) � ~7 extra step.skipped(middle) events and ~7 redundant tail dispatches before the playbook eventually completed. This PR adds the same state.is_step_done + running_steps() guards on the chain target after the skip-chain lands. 31/31 engine tests still pass. End-to-end kind validation: R2 (2-step linear), R3a (both skip_middle branches), and R3c (parallel) all hit playbook.completed cleanly with both Rust + Python worker pools running (no scale-to-zero workarounds). Phase D R3 e2e is now fully closed except for R3b iterators (gated on the separate worker-side iteration-variable injection � R3b-2 follow-up). Merged via #36.
2026-06-05 noetl/worker v5.11.0 Closes Gap 1 of noetl/ai-meta#53 � the Rust worker now honors the server_url field embedded in each NATS command notification rather than the global NOETL_SERVER_URL env var. Multi-server deployments (kind running both noetl-server Python and noetl-server-rust side by side) now correctly route per-command lifecycle events back to the server that PUBLISHED the command, so the publishing server's trigger_orchestrator fires on every command.completed. Implementation: ControlPlaneClient::with_server_url returns a fresh client (Arc-shared inner reqwest::Client so the clone is cheap); NatsCommandSource::next builds a per-notification client for claim_command; new CommandExecutor::execute_with_server_url(command, Option<&str>) threads a per-dispatch URL through every previously-self.client-using site (resolve_auth_alias, build_call_done_result, set_variable, every lifecycle event via the renamed emit_event_via). Backwards-compatible: passing None keeps the previous behaviour for callers outside the per-dispatch path. 2 new unit tests; 108/108 lib tests pass. Pairs with noetl/server#35 which made the Rust server accept the worker's component_type-style register payload. Together these unblock R3a/R3c/R2 end-to-end terminal playbook.completed events without dual-worker workarounds. Merged via #41.
2026-06-05 noetl/server v2.8.1 Closes Gap 2 of noetl/ai-meta#53 � the Rust noetl-worker sends component_type (matching the Python broker's wire shape from which the worker contract was derived), not kind. The Rust server's /api/worker/pool/register, /heartbeat, and /deregister handlers rejected the worker's payload with Failed to deserialize the JSON body into the target type: missing field 'kind', putting the worker into CrashLoopBackOff whenever NOETL_SERVER_URL was pointed at the Rust server. This PR makes kind on all three request types optional with a worker_pool default and accepts component_type as an alias; also captures the hostname field the worker sends (currently unused, reserved for future label promotion). 2 new unit tests; 7/7 runtime tests pass. Merged via #35.
2026-06-04 noetl/server v2.8.0 Phase D Round 3c of noetl/ai-meta#49 � Phase D R3 closes. Defers end-step completion so parallel branches can converge correctly. Multi-target transitions (inclusive-mode next.arcs) already dispatched correctly via the existing evaluator + orchestrator per-result loop, but if next_step_name == "end" short-circuited the whole workflow on the FIRST branch to hit end while siblings were still in flight. New reached_end flag + reached_end_quiescent clause at the end of process_in_progress: completion fires only when reached_end is set AND no new commands queued AND no other branches running. Same fix covers both the direct-arc case and the R3a skip-chain hit_end case. 3 new orchestrator unit tests (31/31 engine pass): dispatch-both-in-one-pass, defer-on-partial-completion, complete-on-full-convergence. Kind validation: tests/fixtures/r3c_parallel (4-step start � [branch_a, branch_b] � end inclusive) � orchestrator triggered exactly 3 times as designed (fan-out + defer + final). Final playbook.completed blocked by the dual-worker NATS subject race (Python worker re-claims completed commands), same gap as R3b � not an R3c issue. Merged via #34. Phase D R3 control-flow rounds complete: R3a conditionals (#32, v2.6.0) + R3b iterators (#33, v2.7.0) + R3c parallel (#34, v2.8.0).
2026-06-04 noetl/server v2.7.0 Phase D Round 3b of noetl/ai-meta#49 � step.loop iterator fan-out + state aggregation infrastructure. When the orchestrator's transition path picks a step with step.loop set, it now evaluates loop.in_expr via the existing Evaluator::evaluate_loop, emits ONE step.enter (carrying iterations_expected in context), and dispatches one CommandBuilder::build_iteration_command per item with IteratorMetadata filled in. Empty collection short-circuits with step.enter + a synthetic step.exit. StepInfo gains iterations_expected + iteration_command_ids (HashSet for dedup) + iteration_results; apply_event for command.completed deduplicates by command_id so dual-worker races count each iteration once, and only flips the step to Completed once all N distinct command_ids land. persist_engine_command threads iteration meta into cmd_meta and makes command_id per-iteration unique (<exec>:<step>:<event>:i<index>). 5 new unit tests; 28/28 engine tests pass. Kind validation: orchestrator path validated end-to-end (fan-out + NATS publish + worker claim of all 3 iterations); full e2e completion gated on two follow-ups � worker-side injection of iteration variables (item, _index, _total) into the Python tool's runtime scope (R3b-2), and an orchestrator re-trigger investigation for worker batch lifecycle events that didn't fire trigger_orchestrator after the initial start.command.completed (R3b-3). Merged via #33.
2026-06-04 noetl/server v2.6.0 Phase D Round 3a of noetl/ai-meta#49 � step.when enable guard wired into the orchestrator. When the transition path picks a next_step, it now evaluates next_step.when via the existing Evaluator::evaluate_step_when; if false, emits step.skipped and walks forward through the skipped step's own next.arcs in the same orchestrator pass until landing on a step whose guard passes or a terminal/end transition. Inline iteration rather than re-trigger because step.skipped has no command.completed to fire the next round on. Companion fix in engine/state.rs::build_context: workload now exposed BOTH at top level ({{ skip_middle }}) AND under the workload namespace ({{ workload.skip_middle }}) � the namespace form was missing from the orchestrator's eval context and raised an undefined-value template error on the first kind-val pass. Two new orchestrator unit tests (skip / pass branches); 22/22 engine tests pass. Kind validation: tests/fixtures/r3a_conditional (4-step playbook with conditional middle) runs both branches end-to-end � skip_middle=false dispatches middle, skip_middle=true emits step.skipped(middle) and walks directly to tail. Both terminate at playbook.completed. Merged via #32.
2026-06-04 noetl/server v2.5.0 Phase D Round 2 of noetl/ai-meta#49 � orchestrator engine wired into the event-ingest pipeline. trigger_orchestrator (events.rs) rewritten from stub to full impl: loads events, parses playbook, calls WorkflowOrchestrator::evaluate, persists events_to_emit + commands via the new persist_engine_command helper, emits terminal playbook.completed / playbook.failed. persist_engine_command extracted as pub(crate) from generate_initial_commands so /api/execute and the orchestrator share one wire-format helper. Both handle_event AND handle_batch_events invoke the orchestrator on command.completed when step != "end" (the batch arm matters because the Rust worker batches terminal events � without it, command.completed never advanced state). trigger_orchestrator SELECT sources attempt from NULLIF(meta->>'attempt','')::int � noetl.event has no attempt column today. claim_command result envelope flipped to {status, context} shape (chk_event_result_shape compliance � #29 follow-through). Kind validation: tests/fixtures/r2_two_step 2-step linear playbook runs end-to-end on noetl-server-rust; event log terminates at playbook.completed. Phase D materially complete � remaining work is the conditional / iterator / parallel coverage rounds (R3a now landed; R3b/c queued). Merged via #31.
2026-06-04 noetl/server v2.4.3 Closes noetl/server#29 � and with it closes Phase B of noetl/server#21 / noetl/ai-meta#49. Refactors build_result_object to emit constraint-compliant {status, reference} / {status, context} envelopes matching the DB's chk_event_result_shape rule. Previous {kind, data} / {kind, store_tier, �} / {kind, event_ids, �} shapes all violated the constraint; every POST /api/events that reached the INSERT 500'd, just unnoticed until the Round 4 load smoke exercised the Rust path at volume. After deploy: Round 3 harness produces clean command.completed lifecycle with {status, context} result rows; Round 4 load smoke flips from 0/60,000 status=ok to 60,000/60,000 status=ok at ~920 req/s sustained, p50 48ms, p95 92ms, p99 164ms on un-tuned single-replica kind. Merged via #30.
2026-06-04 noetl/ops (untagged @ cd2182f) Phase B Round 4 � load-smoke harness automation/development/load-smoke-events.sh. Drives ApacheBench at POST /api/events with synthetic but schema-valid event payloads, snapshots /metrics before + after, prints throughput / p50/p95/p99 / counter+histogram deltas. Baseline on kind: 1858 req/s sustained over 32s, p50 25ms, p99 77ms, p100 186ms. All 60k requests recorded under status=error because the Rust server's build_result_object emits result shapes that violate the DB's chk_event_result_shape constraint � filed as follow-up noetl/server#29. The metric instrumentation correctly captures real handler latency regardless of the constraint failure (the wrapper times Ok and Err paths the same), so the throughput numbers are the true write-path performance. Closes Phase B Round 4 of noetl/server#21. #155.
2026-06-03 noetl/server v2.4.2 Phase B Round 3 follow-up to v2.4.1 � fixes a call.error ("Invalid python config: invalid type: null, expected a map") that surfaced when the Round-3 harness ran the routing_test playbook end-to-end. generate_initial_commands was serialising Step.args (Option<HashMap<�>>) directly, which produced args: null when unset; the worker's executor/command.rs copies a missing tool-side args from the top-level key, so null ended up inside the python tool config and serde rejected it. Mapped None � {}, Some(m) � serde_json::to_value(m). After this fix the Round-3 harness produces a clean lifecycle (playbook_started � command.issued � command.claimed � command.started � call.done � command.completed). Merged via #28.
2026-06-03 noetl/server v2.4.1 Closes noetl/server#26 � Phase B Round 3 fix. POST /api/execute now publishes a {execution_id, event_id, command_id, step, server_url} notification to NATS on noetl.commands.<segment>.<execution_id> (segment = system for system/* paths, shared otherwise � matches Python's route_subject scheme); inserts the matching noetl.command row for replay parity (best-effort). Pre-existing NATS auth bug fixed: async_nats::connect(url) silently dropped the user:pass@ segment, so the Rust server had been failing to connect to NATS since it was first deployed (just unnoticed because prior rounds didn't need a Rust-side publish). Mirrored the worker's ConnectOptions::with_user_and_password shape. New config field AppConfig.public_server_url (envy NOETL_PUBLIC_SERVER_URL) � embedded in the notification so workers know where to call back; companion noetl/ops#154 wires it in the kind manifest. Kind-validated end-to-end: worker claims the published command, runs the step, POSTs the full command.claimed � started � call.* � completed/failed lifecycle back. Unblocks Phase B Rounds 3 + 4 of noetl/server#21. Merged via #27.
2026-06-03 noetl/ops (untagged @ ec492dd) Sets NOETL_PUBLIC_SERVER_URL=http://noetl-server-rust.noetl.svc.cluster.local:8082 on the kind server-rust deployment so the new Rust-side NATS publish carries a callable callback URL. Pairs with noetl/server#27. #154.
2026-06-03 noetl/ops (untagged @ e1832b3) Phase B Round 3 validation script � validate-rust-worker-to-rust-server.sh flips the existing Rust worker pool's NOETL_SERVER_URL to point at the Rust server, submits a single-step playbook (tests/fixtures/routing_test) via POST /api/execute, watches noetl.event + /metrics, restores the env on exit. First run surfaced noetl/server#26 � Rust server's /api/execute records playbook_started + command.issued events but doesn't insert into noetl.command or enqueue the noetl.outbox row, so the worker has nothing to claim. Round 3's acceptance ("Rust worker � Rust server completes full playbook with identical event log") is blocked on that fix; the harness in this script IS the validation rig that #26 will pass. #153.
2026-06-03 noetl/server v2.4.0 Phase B Round 2 � fans the v2.3.0 instrumentation pattern across the remaining 5 Phase B POST endpoints (/api/catalog/register, /credentials, /keychain/{id}/{name}, /worker/pool/register, /worker/pool/heartbeat) using one shared counter + histogram pair (noetl_write_requests_total{endpoint, status} + noetl_write_request_duration_seconds{endpoint}) rather than 5 separate metrics � each endpoint has a single mode so per-endpoint metric names would inflate the registry without adding signal. endpoint label values defined as &'static str constants under crate::metrics::endpoint::* so a typo at a call site is a compile error. Kind-validated: 5 distinct endpoint=* series populate after one POST each, mix of status=ok / status=error recorded. Merged via #25.
2026-06-03 noetl/server v2.3.0 Phase B Round 1 � Prometheus metrics surface. Adds the global Registry (lazy OnceLock) + GET /metrics exposition (gated by AppConfig.disable_metrics). First instrumented write boundary: POST /api/events records noetl_events_ingested_total{event_type, status} + noetl_event_ingest_duration_seconds{event_type} on every dispatch (handler split into wrapper + inner so both Ok and Err paths record). Per agents/rules/observability.md Principles 1+2+4 � _total / _seconds suffixes, low-cardinality labels only, execution_id stays on spans. Foundation for "verify under load" � Round 2 of noetl/server#21 fans the same instrumentation across the other 5 Phase B POST endpoints; Round 3 wires a Rust worker pool pointed at the Rust server; Round 4 is the load smoke. Merged via #24.
2026-06-03 noetl/server v2.2.1 Phase A drive-by � closes the catalog null-vs-omit drift. CatalogEntryResponse adds the missing layout field and drops skip_serializing_if = "Option::is_none" from content / payload / meta / layout so optional JSON fields serialise as explicit null, matching Python pydantic's no-exclude_none default. Surfaced in the parity harness for POST /api/catalog/list after v2.2.0 landed; same null-vs-omit pattern as the v2.2.0 UiSchemaField fix. Kind-validated PASS byte-identical for both POST /api/catalog/list and GET /api/catalog/{path}/ui_schema. Merged via #23.
2026-06-03 noetl/tools v2.17.0 Adds bounded js_consume operation to the nats tool kind � pulls up to batch (�1000) messages from a pre-existing durable pull consumer with a hard timeout_ms cap (�5000ms). Honors the execution-model "never hold a worker slot indefinitely" rule. Unblocks Phase 2.b of noetl/ai-meta#46 (system pool � projector replacement playbook will drain NOETL_EVENTS via js_consume). Closes noetl/ai-meta#52. Merged via #14.
2026-06-03 noetl/server v2.2.0 Phase A round 5 � closes Phase A. Ports /api/catalog/{path}/ui_schema (~720 LoC including 26 unit tests): YAML inference + inline # ui: directive scanner + field-kind branching, byte-identical to the Python pydantic wire shape after timestamp normalisation. Drive-by schema fixes: catalog.version aligned to Postgres smallint (i16, not i32) across model + queries + service; get_catalog_by_id corrected from WHERE id to WHERE catalog_id (same alias-vs-column drift as v2.1.5). Closes noetl/server#18. Merged via #20. Pairs with noetl/ops#152 (parity harness normalises generated_at).
2026-06-03 noetl/ops (untagged @ 3ef4013) Parity harness adds generated_at to NORMALIZE_JQ so the new ui_schema probe shows PASS instead of DIFF on the ephemeral per-request timestamp. #152.
2026-06-02 noetl/server v2.1.6 Phase A round 4 parity fix � removes /api/runtimes (Rust-side innovation with no Python equivalent; no callers across worker/cli/tools/gateway). Handler retained with #[allow(dead_code)] for the eventual Python backport. #19. Pairs with noetl/ops#151. Follow-up noetl/server#18 tracks the full /api/catalog/{path}/ui_schema port (~260 LoC YAML inference).
2026-06-02 noetl/ops (untagged @ 881e8db) Parity harness drops the /api/runtimes probe after noetl/server#19. #151.
2026-06-02 noetl/server v2.1.5 event_type literals + catalog column-name fixes from #49 Phase A round 3. Same root cause as v2.1.3 + v2.1.4: Rust written against imagined schema. #17.
2026-06-02 noetl/ops (untagged @ bef4784) Parity harness round 2 � adds 8 more read endpoints (executions, vars, catalog, runtimes) with kubectl-based auto-discovery of live fixtures. #150. Pairs with noetl/server#16.
2026-06-02 noetl/server v2.1.4 TIMESTAMP fix for /api/executions + /api/executions/{id} � noetl.event.created_at is naive TIMESTAMP but Rust expected TIMESTAMPTZ; added AT TIME ZONE 'UTC' in 3 query sites. Surfaced by #150 round-2 harness. #16.
2026-06-02 noetl/ops (untagged @ 7f7a624) Read-endpoint parity diff harness � validate-server-parity.sh hits 6 read endpoints on Python + Rust servers and surfaces shape drift. #149. Pairs with noetl/server#15.
2026-06-02 noetl/server v2.1.3 Credential SQL fix � all 8 queries referenced column data but real schema is data_encrypted. Surfaced by #149 parity harness. #15.
2026-06-02 noetl/ops (untagged @ 6849e3c) system/outbox_publisher.yaml auth-block fix � switches both tool: http steps to the standard auth: {type: bearer, credential: NOETL_INTERNAL_API_TOKEN} block. Closes #51; kind-validated 200 OK on /api/internal/outbox/claim from the system pool worker. #148.
2026-06-02 noetl/ops (untagged @ 54c1446) Rust server kind deployment manifest � #147. Pairs with noetl/server#14 (#49 Phase C kind-validation).
2026-06-02 noetl/server v2.1.2 axum 0.8 path-syntax fix � migrates 18 /:name route captures to /{name}. Without this fix v2.1.1 panics in Router::route() at startup. Surfaced + fixed during #49 Phase C kind-validation. #14.
2026-06-02 noetl/ops (untagged @ 6e82ff2) Kind validation rig for credential alias resolution � #146. Pairs with noetl/worker#40.
2026-06-02 noetl/worker v5.10.0 Resolve credential aliases in tool config dispatch � auth: "<alias>" strings now look up the keychain at dispatch time and substitute the noetl-tools AuthConfig shape (or merge postgres connection fields). Closes #48 � Rust worker no longer fails with invalid type: string ..., expected struct AuthConfig. #40.
2026-06-02 noetl/ops (untagged @ d103443) Kind validation rig for task_sequence � python routing � #145. Pairs with noetl/noetl#662.
2026-06-02 noetl/noetl v4.12.0 Route task_sequence to python pool � POOL_FILTER_MAP gains "task_sequence": "python". Closes #47 � Rust worker no longer fails with Tool not found: task_sequence. #662.
2026-06-02 noetl/ops (untagged @ 66282e7) Kind validation rig for system-pool path-based routing � #144. Pairs with noetl/noetl#661.
2026-06-02 noetl/noetl v4.11.0 Path-based pool routing for system/* playbooks (POOL_PATH_PREFIX_MAP + catalog_path_for LRU cache + 6-tuple publish path) � #661. #46 Phase 2.a.2 complete.
2026-06-02 noetl/ops (untagged @ b6dd205) System pool deployment env wiring � NOETL_KEYCHAIN_ENV_VARS=NOETL_INTERNAL_API_TOKEN � #143. #46 Phase 2.a.3 complete.
2026-06-02 noetl/ops (untagged @ b77244e) playbooks/system/outbox_publisher.yaml � first system playbook � #142. #46 Phase 2.a.1 complete.
2026-06-02 noetl/ops (untagged @ a529e55) System worker pool deployment manifests + /api/internal/* validation rig � #141, #139. #46 Phase 1.b complete.
2026-06-02 noetl/server v2.1.1 events/project schema-match fix (partitioned table, NOT NULL cols, ON CONFLICT DO NOTHING) � #13
2026-06-02 noetl/noetl v4.10.1 Internal API kind-validation fixes (router prefix, dict-row, noetl.event schema) � #660
2026-06-02 noetl/server v2.1.0 Rust mirror of /api/internal/* endpoints � #12. Closes #49 Phase C on the Rust side.
2026-06-02 noetl/noetl v4.10.0 Internal API endpoints (/api/internal/outbox/* + /events/project) for system worker pool � #659
2026-06-02 noetl/worker v5.9.0 republish Native multi-arch (amd64 + arm64); replaces failed QEMU build
2026-06-01 noetl/noetl v4.8.0 KEDA multi-trigger ScaledObject generator; pool_routing.py
2026-05-31 noetl/tools v2.16.0 NATS MCP dispatch tool kind
2026-05-31 noetl/server v2.0.1 Event envelope EE-3 (snowflake event_id, meta.attempts)
2026-05-30 noetl/cli v4.8.0 Port-conflict probe; global --context flag
2026-05-29 noetl/gateway v2.14.1 Runtime contract Auth0 exposure
2026-05-29 noetl/server v1.0.3 Skeleton at ~1,967 LoC engine
2026-05-28 noetl/tools v2.12.0 result_fetch tool kind (Flight client � server)
2026-05-28 noetl/tools v2.11.0 Tabular encoder helper
2026-05-27 noetl/cli v4.7.0 noetl context init --from-gateway
2026-05-27 noetl/worker v5.1.0 Initial noetl-tools registry adoption (PR-2c sweep)

noetl/worker

Version Date Notes GitHub Release
v5.43.0 2026-06-23 #104 Phase D — the minting flip (#129, be6863a). NOETL_RESULT_MINT_AUTHORITATIVE (default off) makes the result materializer the authoritative tier writer (implies the Phase B flag) + resolve-by-URN the primary consume read path (implies the Phase C flag), with the dual-written result_store as the fail-safe fallback; new noetl_worker_result_mint_authoritative_total{path} (tier | legacy_fallback). No Cargo.toml change (resolves from the registry); 247 lib tests + clippy; default-off true no-op → inert in prod. release
v5.40.4 2026-06-22 #127 — adopt noetl-tools 3.14.1 (task_sequence per-sub-task context optimization) + ship to prod (#125; closes noetl/ai-meta#127). Deps-only Cargo.lock pin 3.143.14.1, no worker source change, clippy clean. Built the prod image via Cloud Build (noetl-worker-rust:0afbf5c) and rolled it onto prod noetl-worker-rust + noetl-worker-system-pool (server stays v3.39.5/.6, CPU req 250m/limit 2 kept) — rolling restart clean, off-server CQRS cutover stayed healthy. release
v5.40.2 2026-06-20 #119 — off-server WAL state-builder drain rebuilds the in-memory index from the retained WAL on every boot (#123; closes noetl/ai-meta#119). The authoritative drain defaults to an ephemeral DeliverPolicy::All consumer (was a durable consumer whose persisted cursor outran the empty post-restart WalEventIndex → permanent Incomplete → off-server execs stalled). Rebuilds the full index from the retained noetl_events WAL on every boot; instant revert NOETL_STATE_BUILDER_DURABLE=1; new noetl_worker_state_builder_indexed_executions gauge + index rehydrated… log. Gate-ON kind: pod restart → indexed_executions=17 wal_events=200; single-replica 6/6 stress + multi-replica clean; never re-introduces a noetl.event scan. 224 lib tests + clippy green; default-safe (PROD runs the in-server drive). release
v5.36.0 2026-06-19 #115 Phase 1 — selective render-time ref resolution (refs-in-state consume side) (#117; closes noetl/ai-meta#113 + #114 with server#243). resolve_context_references resolves a noetl:// ref only when this command's tool input binds the step's bulk; predicate/scalar/_ref access reads off the bounded extracted summary, unconsumed refs stay references — foreign bulk never inflates the render. Closed all 9 #113 stalls gate-ON. release
v5.22.0 2026-06-14 #99 � transfer endpoint credential-alias resolution (#87; closes noetl/ai-meta#99). Pre-resolves source.auth/target.auth keychain aliases before dispatch; bumps noetl-tools to 3.10.0. Bidirectional data_transfer/snowflake_postgres fixture green on kind. release
v5.18.0 2026-06-12 #90 Phase 5 � Cloud Run parity (#77; closes noetl/worker#76). spool.backend: gcs wired into the run-loop (ADC/Workload Identity; in-memory circuit out-of-cluster); optional NOETL_INTERNAL_API_TOKEN bearer auth; $PORT-aware health bind. Live out-of-cluster proof green. release
v5.17.0 2026-06-12 #90 Phase 4 � spool wired into the subscription run-loop (#75; closes noetl/worker#74). SpoolRuntime: probe�circuit�spool-or-dispatch�ack, NATS-KV circuit persistence, drain-on-recovery, 6 spool/circuit events + spool-bytes gauge. buffer_and_ack/hybrid loss-safe. Live outage proof green on kind. release
v5.10.0 2026-06-02 Resolve credential aliases in tool config dispatch. New src/executor/auth_alias.rs (~430 LoC, 11 unit tests + 3 HTTP-integration tests) runs before serde_json::from_value and handles bare auth: "<alias>" strings. 4 credential types supported (postgres / bearer / api_key / basic); playbook overrides win over keychain defaults; missing alias � clear Credential alias '<name>' not found in keychain error. Closes #48. Merged via #40. release
v5.9.0 2026-06-02 Multi-arch image republish � native amd64 + arm64 runners (fixed QEMU timeout from initial v5.9.0 publish). KEDA multi-trigger scaler. Per-pool routing scheme integration. release
v5.8.0 2026-05-31 Snowflake event_id + meta.attempts; NATS consumer-lag gauge; observability harness release
v5.6.0 2026-05-29 Worker tabular encoder (Arrow IPC fallback) + credential scrubbing release
v5.1.0 2026-05-27 Initial noetl-tools registry adoption (PR-2c sweep) release

noetl/cli

Version Date Notes GitHub Release
v4.8.0 2026-05-30 Port-conflict probe; global --context flag; Auth0 dashboard URL fix (regional segment) release
v4.7.0 2026-05-27 noetl context init --from-gateway for runtime contract bootstrap release

noetl/tools

Version Date Notes GitHub Release
v3.14.1 2026-06-22 #127 — behavior-preserving task_sequence per-sub-task context optimization (#74; closes noetl/ai-meta#127). TemplateEngine::render_value builds the proxied minijinja context ONCE and threads it through the recursion (render_value_with/render_with; Value is Arc-backed); new build_context_with_overlay(&variables, overlay) skips the redundant to_template_context() HashMap deep-clone + per-block ExecutionContext clones in the set/policy paths. Isolated micro-bench: per-sub-task context cost 2988.9µs→1147.1µs (−61.6%, 2.6×). 407 lib tests + 2 new equivalence pins + clippy clean. Worker adopts via worker#125 (v5.40.4). release
v3.10.0 2026-06-14 #99 � Snowflake�Postgres transfer arms + flatten credential config (#65; closes noetl/ai-meta#99). (Snowflake,Postgres) and (Postgres,Snowflake) transfer arms implemented. SF�PG: SnowflakeTool::query_rows + information_schema type lookup + $n::text::<udt> coercion + RFC3339 timestamp reformat. PG�SF: generated INSERT statements. SourceConfig/TargetConfig add #[serde(flatten)] extra for worker-injected credential fields. Kind-validated bidirectionally against the live sf_test account. release
v3.5.0 2026-06-12 #90 Phase 5 � gcs spool backend (#56; closes noetl/tools#55). GcsBackend (GCS impl of the Phase-4 SpoolBackend trait, ADC + reqwest, no new dep); prefix-shared bucket, live+dlq split, recv_seq-ordered, idempotent put/delete; gcs feature (default-on). Live GCS round-trip proven. Filed #57 (real Pub/Sub needs timeout_ms >= 10s). release
v3.4.0 2026-06-12 #90 Phase 4 � store-and-forward spool engine + per-downstream circuit breaker (#54; closes noetl/tools#53). noetl_tools::spool: circuit breaker, SpoolItem (SHA-256 + noetl://spool ref), nats_object/local_disk backends, ordered-replay engine (idempotency / dead-letter / retention+GC). 44 unit tests + real-NATS integration test. Live outage proof green on kind. release
v3.1.1 2026-06-10 Multi-tool sibling references (#48; closes noetl/ai-meta#87). TaskSequenceTool injects each sub-tool's result under its label so a later sub-tool resolves {{ <label>.<field> }} (was rendering empty � syntax error at or near "," in unquoted numeric SQL). Synthetic .data self-ref mirrors build_context. 2 new unit tests; 300/0 lib. Worker adopts via worker#69. Kind-validated (save_edge_cases/save_delegation_test). release
v2.17.0 2026-06-03 Bounded js_consume operation on the nats tool kind � pulls up to batch (�1000) messages from a pre-existing durable pull consumer with a hard timeout_ms cap (�5000ms). Honors the execution-model "never hold a worker slot indefinitely" rule: JS_CONSUME_BATCH_MAX = 1000, JS_CONSUME_TIMEOUT_MAX_MS = 5_000 enforced at the function call site so a misconfigured playbook can't keep a slot for minutes. Returns immediately with an empty messages array when the stream is idle. Unblocks Phase 2.b of noetl/ai-meta#46 � the projector replacement system playbook will drain NOETL_EVENTS via js_consume. Closes noetl/ai-meta#52. Merged via #14. release
v2.16.0 2026-05-31 NATS MCP dispatch tool kind release
v2.12.0 2026-05-28 result_fetch tool kind (Flight client � server) release
v2.11.0 2026-05-28 Tabular encoder helper release

noetl/server

Version Date Notes GitHub Release
v3.43.0 2026-06-23 #104 Phase D — mint-authoritative flag + result_store dual-write counter (#263, 6f6b9ef). NOETL_RESULT_MINT_AUTHORITATIVE (default off); the result_store PUT records each write on the new noetl_result_store_dual_write_total counter as the reversible dual-write fallback leg of the minting flip. No Cargo.toml change (resolves noetl-locator 0.1.1 from the registry); 613 lib tests; heavy graph stays absent from the control plane; default-off byte-identical → inert in prod. release
v3.39.1 2026-06-20 #118 — exactly-one-terminal-per-execution FinalizedGuard (#253; closes noetl/ai-meta#118). A bounded process-local FinalizedGuard suppresses a duplicate terminal event at emit_events before the chain linker, so a straggler/duplicate finalize (off-server + PUBLISH_ONLY materializer-lag, single replica) can't orphan the chain with a NULL-prev_event_id second root. First terminal wins; gate-off byte-identical; metric noetl_terminal_dedup_total{suppressed}. Gate-ON kind (unblocked by worker #119 v5.40.2): single-replica 6/6 stress / ~126 execs all roots=1(incl. terminal)/terminals=1/zero-scan; multi-replica 21 execs clean. 597 lib tests + clippy green; no prod default changed. release
v3.31.0 2026-06-19 #115 Phase 2 — one-level prev_event_id event chain (#244; companion DDL noetl/noetl#667; refs noetl/ai-meta#115). Each noetl.event/noetl.command carries the chain link, stamped at emit_events from a per-execution chain-head watermark (ChainHeads), covering both gate paths + the materializer. Additive — nothing reads the link yet (Phase 3, in progress). Kind-proven walkable / 1-root / no-gap / no-scan across 6 gate-ON topologies; 573 lib tests + clippy green. No prod default changed. release
v3.30.0 2026-06-19 #115 Phase 1 — surface _ref/_store on kept refs + refs_in_state default true (#243; closes noetl/ai-meta#113 + #114 with worker#117). hydrate_result_references keep_refs branch merges ref/store/uri onto the bounded summary (so {{ step._ref }} lazy-load + {{ step._ref is defined }}/{{ step._store }} predicates resolve without bulk); refs_in_state now defaults true now that the worker consume side holds — references stay out of state + commands. Kind gate-ON: all 9 #113 stalls COMPLETE, max command ctx 412KB, 0 __orchestrate__ event rows. release
v3.4.2 2026-06-12 #90 Phase 5 � gcs/s3 spool credential optional (#187; closes noetl/server#186). credential optional for gcs/s3: absent � ADC/Workload Identity (Cloud Run platform bucket), present � tenant-bucket keychain alias; bucket still required. release
v3.4.1 2026-06-12 #90 Phase 4 � spool validation + subscription lifecycle-status fix (#184+#185; closes noetl/server#183). Validate spec.spool at registration; lifecycle queries match only the six lifecycle event types so spool/circuit events can't 500 subscription_get/activate. release
v2.4.3 2026-06-04 Closes noetl/server#29 � and with it closes Phase B of noetl/server#21 / noetl/ai-meta#49. handlers/events.rs::build_result_object refactored to emit constraint-compliant {status, reference: {�}} / {status, context: <payload>} envelopes matching the DB's chk_event_result_shape rule (which only allows status/reference/context at the top level). Previous {kind, data} / {kind, store_tier, logical_uri} / {kind, event_ids, total_parts} shapes all violated the constraint; every POST /api/events that reached the INSERT 500'd silently in production (only surfaced when Phase B Round 4's ApacheBench-driven load smoke at noetl/ops#155 exercised the path at volume and found 60,000/60,000 status=error). Function signature now takes status: &str; derived_status computation moves earlier in handle_event_inner so the same value feeds both the result envelope and the column binding. Body shape: result_kind="ref"+result_uri � {status, reference: {store_tier, logical_uri}}; result_kind="refs"+event_ids � {status, reference: {event_ids, total_parts}}; default (object payload) � {status, context: <payload>}; default (null/primitive payload) � {status} (context key omitted since the constraint requires it be an object when present). 5 new unit tests including a top_level_keys_only walker that asserts no shape ever emits anything outside the allowed set. Kind-validated: Round 3 harness produces clean command.completed lifecycle with {status, context} worker-POSTed result rows (e.g. {"status":"COMPLETED","context":{"data":{},"exit_code":0,�}}); Round 4 load smoke flips from pre-fix 0/60,000 status=ok to 60,000/60,000 status=ok at ~920 req/s sustained, p50 48ms, p95 92ms, p99 164ms on un-tuned single-replica kind cluster (pre-fix p99 of 77ms was constraint-rejection latency, not write-path latency). Merged via #30. release
v2.4.2 2026-06-03 Phase B Round 3 follow-up to v2.4.1. The end-to-end kind validation after #27 ran the tests/fixtures/routing_test (single-step python) playbook and hit a call.error with Configuration error: Invalid python config: invalid type: null, expected a map. Root cause: generate_initial_commands was emitting start_step.args (Option<HashMap<�>>) via serde_json::json! direct interpolation, producing args: null when unset. The worker's executor/command.rs re-hydrates the tool config and copies a missing tool-side args from the top-level key (if !map.contains_key("args") { if let Some(args) = command.input.get("args") { � } }); the null therefore ended up inside the python tool config, where PythonConfig.args: HashMap<String, serde_json::Value> rejected it with "expected a map". Fix: match-arm None � serde_json::json!({}), Some(m) � serde_json::to_value(m). Mirrors Python's input: {} empty-map default. After this fix the Round-3 harness produces the clean terminal command.completed lifecycle rather than call.error � command.failed. Merged via #28. release
v2.4.1 2026-06-03 Closes noetl/server#26 � fixes the gap Phase B Round 3 surfaced (noetl/ai-meta#49). POST /api/execute's generate_initial_commands now: (1) builds the same CommandNotification shape Python's NATSEventPublisher.publish_command() uses and publishes it to NATS on noetl.commands.<segment>.<execution_id> (segment routes via path: system for system/* playbook paths, shared for everything else � mirrors pool_routing.route_subject); (2) inserts the matching row into the partitioned noetl.command table for replay/dashboard parity with the Python server (best-effort; failure logged at WARN since the event-log row is the source of truth). Drive-by: pre-existing NATS-auth bug fixed in main.rs::connect_nats � async_nats::connect(url) silently drops the URL's user:pass@ segment, so the Rust server had been failing to connect to NATS since it was first deployed (just unnoticed because Phases A and B Rounds 1+2 didn't need a Rust-side publish). Mirrored the worker's subscriber.rs shape: new strip_nats_userinfo helper splits userinfo off the URL, builds ConnectOptions::with_user_and_password, connects against the cleaned URL. New config field AppConfig.public_server_url: Option<String> (envy maps NOETL_PUBLIC_SERVER_URL) � embedded in the NATS notification so workers know where to call back for /api/commands/{event_id} and result POSTs; the companion ops PR (#154) wires it on the kind deployment. Kind-validated end-to-end via the Round-3 harness (noetl/ops#153): worker claims the published command, runs the step, POSTs the full lifecycle (`command.claimed � command.started � call.* � command.completed failed) back through POST /api/events`. Unblocks Phase B Rounds 3 (acceptance) and 4 (load smoke) of noetl/server#21. Merged via #27.
v2.4.0 2026-06-03 Phase B Round 2 of noetl/server#21 (noetl/ai-meta#49) � fans the v2.3.0 instrumentation pattern across the remaining 5 Phase B POST endpoints. Adds one shared counter (noetl_write_requests_total{endpoint, status}) + one shared histogram (noetl_write_request_duration_seconds{endpoint}, same 1ms�10s buckets) rather than 5 separate metric definitions � each endpoint has a single mode of operation (catalog register = upsert, credentials = upsert, keychain = set, runtime register, runtime heartbeat) so per-endpoint metrics would inflate the registry without adding signal. endpoint label values declared as &'static str constants under crate::metrics::endpoint::{CATALOG_REGISTER, CREDENTIALS_UPSERT, KEYCHAIN_SET, RUNTIME_REGISTER, RUNTIME_HEARTBEAT} so a typo at a call site is a compile error rather than a runtime cardinality leak. Each handler split into a thin wrapper + _inner body (identical to the v2.3.0 handle_event pattern); wrapper records on both Ok and Err paths. Kind-validated against live cluster: 5 distinct endpoint=* series populate /metrics after one POST per endpoint; mix of status=ok (runtime endpoints with valid payloads) and status=error (catalog/credentials/keychain returning 4xx/500 from intentionally-thin synthetic payloads) recorded � the wrapper records both, surfacing failure visibility. Round 3 will spawn a second Rust worker pool pointed at the Rust server for the end-to-end playbook test; Round 4 is the load smoke. Merged via #25. release
v2.3.0 2026-06-03 Phase B Round 1 of noetl/server#21 (noetl/ai-meta#49) � Prometheus metrics surface, the foundation for "verify under load." New module src/metrics.rs (~180 LoC + 5 unit tests) provides a global Registry via OnceLock, lazy-init counters and histograms via OnceLock<IntCounterVec> / OnceLock<HistogramVec>, and a gather_text() helper that renders the standard Prometheus text-exposition format. GET /metrics route added to health_routes in main.rs, gated by the existing AppConfig.disable_metrics knob (default on); handler in handlers/health.rs::metrics returns text/plain; version=0.0.4; charset=utf-8. First instrumented write boundary: POST /api/events is split into a thin wrapper (handle_event) + body (handle_event_inner) so the wrapper can record noetl_events_ingested_total{event_type, status} (counter � one increment per call, Ok or Err) and noetl_event_ingest_duration_seconds{event_type} (histogram, buckets 1ms�10s) without coupling the body to the metrics module. Per agents/rules/observability.md Principles 1+2+4: _total / _seconds suffix conventions, event_type + status are low-cardinality enum labels, execution_id is NOT a Prometheus label (cardinality blow-up � it stays on tracing spans only per Principle 4). Kind-validated against live cluster: counter increments and histogram buckets populate correctly after sample POST /api/events traffic. Merged via #24. release
v2.2.1 2026-06-03 Phase A drive-by � closes the catalog null-vs-omit drift surfaced by the parity harness after v2.2.0 landed. CatalogEntryResponse (in src/db/models/catalog.rs) gets the missing pub layout: Option<serde_json::Value> field, and the skip_serializing_if = "Option::is_none" attrs are dropped from content / layout / payload / meta so optional JSON fields serialise as explicit null, matching Python pydantic's default (no exclude_none config on the source model at repos/noetl/noetl/server/api/catalog/schema.py:133). Same null-vs-omit pattern as the v2.2.0 UiSchemaField fix � the catalog/list response had the bug for the same reason, just not exercised by Phase A round 5. Kind-validated PASS byte-identical for POST /api/catalog/list (was DIFF on layout: null missing); ui_schema probe also still PASS � no regression. Merged via #23. release
v2.2.0 2026-06-03 Phase A round 5 � closes Phase A of noetl/ai-meta#49. Ports /api/catalog/{path}/ui_schema (~720 LoC including 26 unit tests in src/services/ui_schema.rs): YAML parse + inline # ui: directive scanner (secret / enum / credential / description) + immediate-child-of-workload indent detection + field-kind branching (boolean/integer/number/null/object/array/string/enum) � byte-identical to the Python pydantic wire shape (optional fields serialised as explicit null, not omitted) after harness normalisation of the per-request generated_at timestamp via noetl/ops#152. Route mounted as /api/catalog/{*tail} (axum's equivalent of FastAPI's {path:path} for slash-bearing catalog paths) with handler suffix-matching /ui_schema. Drive-by schema fixes flushed during kind validation: catalog.version aligned to Postgres smallint (i16, not i32) across CatalogEntry / CatalogEntryResponse / CatalogRegisterResponse + the three query function signatures + the service-side parse � without this, the new route returned mismatched types; Rust type i32 (as SQL type INT4) is not compatible with SQL type INT2; get_catalog_by_id corrected from the never-matching WHERE id = $1 to WHERE catalog_id = $1 (same alias-vs-column drift as the v2.1.5 catalog list fix). Closes noetl/server#18. Merged via #20. release
v2.1.6 2026-06-02 Phase A round 4 � removes /api/runtimes (Rust-side innovation; no Python equivalent and no callers across worker/cli/tools/gateway) for byte-identical contract per noetl/ai-meta#49 constraint #2. Handler runtime::list_all retained with #[allow(dead_code)] for the eventual Python backport (re-wiring is a one-line main.rs edit). Pairs with noetl/ops#151 (harness probe drop). The companion /api/catalog/{path}/ui_schema gap is tracked separately as noetl/server#18 (full ~260 LoC YAML inference helper port � deserves its own PR + kind validation, not a one-shot fix). Merged via #19. release
v2.1.5 2026-06-02 event_type literals + catalog column-name fixes � Rust queries used underscore-style event_types (playbook_started/completed/failed) but real values are dot-style (playbook.initialized/completed/failed); catalog SELECTs referenced id but real PK is catalog_id; both bugs same root cause as the v2.1.3 credential and v2.1.4 timestamp fixes (Rust written against an imagined schema rather than the live one). 6 sites in ExecutionService + 5 sites in db/queries/catalog.rs updated. Surfaced by noetl/ai-meta#49 Phase A round 3. Merged via #17. release
v2.1.4 2026-06-02 noetl.event.created_at is naive TIMESTAMP (no tz) but Rust services decoded into DateTime<Utc> (TIMESTAMPTZ); both /api/executions and /api/executions/{id} returned 500. Added AT TIME ZONE 'UTC' to 3 query sites in ExecutionService::list + get. No struct change. Surfaced by noetl/ops#150 round-2 parity harness. Merged via #16. release
v2.1.3 2026-06-02 Credential SQL fix � all 8 queries in src/db/queries/credential.rs referenced column data (which doesn't exist); real schema has data_encrypted. Fixed by data_encrypted AS data in SELECTs + using the real column name in INSERT/UPDATE. Surfaced by the noetl/ops#149 parity diff harness; GET /api/credentials no longer 500s on column lookup (deeper TEXT/BYTEA type mismatch surfaces underneath � tracked as follow-up). Merged via #15. release
v2.1.2 2026-06-02 axum 0.8 path-syntax fix � migrates all 18 path captures from the v0.7 :name style to the v0.8 {name} style. Without this fix the v2.1.1 binary panics in Router::route() at startup before binding the HTTP listener, so the server had never actually run in any deployment until now. Surfaced during #49 Phase C kind-validation. Merged via #14. release
v2.1.1 2026-06-02 events/project schema-match fix (partitioned noetl.event table, NOT NULL cols, ON CONFLICT DO NOTHING). Merged via #13. release
v2.1.0 2026-06-02 Rust mirror of /api/internal/* endpoints (5 outbox + projector routes; bearer-token auth gate; tracing spans). Merged via #12. release
v2.0.1 2026-05-31 Event envelope EE-3 (snowflake event_id, meta.attempts) release
v1.0.3 2026-05-29 Skeleton at ~1,967 LoC engine release

noetl/gateway

Version Date Notes GitHub Release
v2.14.1 2026-05-29 Runtime contract Auth0 exposure release
v2.12.0 2026-05-25 Subscription routing rework release

noetl/noetl

Version Date Notes GitHub Release
v4.12.0 2026-06-02 Route task_sequence to python pool � POOL_FILTER_MAP gains "task_sequence": "python". task_sequence is an engine-level pipeline construct (~1.2k LoC TaskSequenceExecutor + ~11 special-case engine sites), not a tool kind in the noetl-tools sense; routing closes the regression gap #47 without multi-week Rust port work. Merged via #662. release
v4.11.0 2026-06-02 Path-based pool routing for system/* playbooks. Extends pool_routing.py with POOL_PATH_PREFIX_MAP (privileged catalog path � pool segment); precedence rule: path wins over tool kind. Adds catalog_path_for(catalog_id) LRU cache. Threads playbook_path through the publish pipeline (4 emit call sites + _publish_commands_with_recovery + reaper). Closes #46 Phase 2.a.2. Merged via #661. release
v4.10.1 2026-06-02 Kind-validation fixes for v4.10.0: router prefix /api/internal � /internal; dict-row tuple subscript in pending-count; noetl.event schema match (partitioned table, NOT NULL cols, ON CONFLICT DO NOTHING). Merged via #660. release
v4.10.0 2026-06-02 Internal API endpoints (/api/internal/outbox/{claim,mark-published,mark-failed,pending-count} + /api/internal/events/project) for the system worker pool; bearer-token gated. Unblocks #46 Phase 2 system playbooks. release
v4.8.0 2026-06-01 KEDA multi-trigger ScaledObject generator; pool routing module (pool_routing.py) release
v4.7.0 2026-05-30 terminal-event fast path + handle_event perf tuning rounds 1-5 release

Untagged / continuous

These repos don't tag formally; track via submodule pointer SHAs.

  • noetl/docs â�� Docusaurus site, deploys on push to main
  • noetl/ops â�� Helm chart + manifests, deploys via the automation/development/noetl.yaml playbook
  • noetl/travel â�� Cloudflare Pages deploys on push to main
  • noetl/e2e â�� test fixtures, no release artefact

NoETL Dashboard

Active Umbrellas

Closed Umbrellas

Conventions

Per-repo wikis

Clone this wiki locally