-
Notifications
You must be signed in to change notification settings - Fork 0
Umbrella Rust Server Port
ai-task: noetl/ai-meta#49
· Opened: 2026-06-02
· Last update: 2026-06-02
· Priority: PRIMARY (interlocked with Umbrella: System Pool Design — #46)
· Target crate: noetl/server (currently v2.0.1, early skeleton)
· Source: noetl/noetl/server/ Python FastAPI
Port the Python noetl-server (FastAPI / uvicorn, ~15-20k LoC, 87 route decorators) to the existing Rust noetl/server crate. Full HTTP API parity so the gateway + workers + CLI don't notice a swap. Cutover via strangler-fig at the ingress layer.
┌──────────────────────┐ ┌──────────────────────────┐
│ noetl/ai-meta #49 │ │ noetl/ai-meta #46 │
│ Rust server port │ │ System pool playbooks │
│ (THIS UMBRELLA) │ │ │
│ │ │ │
│ Phase A: reads │ │ │
│ Phase B: writes │ │ │
│ Phase C: internal ├─────────►│ Phase 2: │
│ endpoints │ unblocks │ system/outbox_publisher│
│ │ │ system/projector │
│ Phase D: engine │ │ │
│ Phase E: SSE etc │ │ Phase 1.b: deployment │
│ Phase F: shards │ │ │
└──────────┬───────────┘ └──────────┬───────────────┘
│ │
│ produces │ consumes
▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ HTTP API surface (server is the data gatekeeper) │
│ │
│ POST /api/internal/outbox/claim │
│ POST /api/internal/outbox/mark-published │
│ POST /api/internal/outbox/mark-failed │
│ GET /api/internal/outbox/pending-count │
│ POST /api/internal/events/project │
└──────────────────────────────────────────────────────────────┘
Phase C lands on both Python (noetl/noetl) and Rust (noetl/server) in parallel — system playbooks call HTTP, not DB, so they don't care which server is responding. This is the single PR that unblocks both umbrellas' next phases.
Three architectural decisions in the same 2026-06-02 session converge to make this a top priority:
-
System worker pool requires
/api/internal/*endpoints (per data-access-boundary rule) — workers don't touchnoetl.*direct; they call the server. Those endpoints don't exist yet in Python OR Rust. - Sharding readiness — the platform's path to multi-region / multi-tenant scale runs through a sharded server. Re-engineering the Python FastAPI server for sharding is comparable cost to a Rust port that does sharding correctly from day one.
- Python footprint reduction — after the publisher + projector retire via #46 system playbooks, the FastAPI server is the largest remaining Python service. Porting it closes the loop on the runtime hot path.
| Rule | Implication for this port |
|---|---|
| Data access boundary | Rust server is the only thing that talks to noetl.* directly; new /api/internal/* endpoints land here for the system pool |
| Execution model | Server stays the gatekeeper for data + the orchestrator of state machines; doesn't move to playbooks |
| Observability | Every endpoint ships with span + metric + execution_id correlation in the same change set |
| Deployment validation | Kind-first per endpoint port; production cutover via ingress flip on prod-shaped env |
| API contract preserved | Rust request/response shapes are byte-identical to Python's during migration; no drift; no "new and improved" during port |
| Sharding-first | Every endpoint that touches per-execution state derives execution_id; routing layer built in from day one |
| Strangler-fig cutover | Endpoint-by-endpoint flip via ingress, never big-bang |
Routes that just read DB state. Lowest risk; biggest test of the read path. Many already scaffolded in repos/server/src/handlers/.
-
GET /api/health,/api/pool/status(already wired) -
GET /api/catalog/{path}/ui_schema(already wired) -
POST /api/catalog/list(already wired) -
GET /api/catalog/resource -
GET /api/executions/{id} -
GET /api/executions/{id}/events -
GET /api/events/{id}/result -
GET /api/runtime/contract -
GET /api/variables/... -
GET /api/credentials/... -
GET /api/keychain/...
Acceptance: every read endpoint returns byte-identical JSON to the Python version against the same DB state. Diff harness in kind validation.
Endpoints the Rust worker uses to emit results. Must be solid.
-
POST /api/events(worker'sput_result) — already wired; verify under load -
POST /api/catalog/register— already wired -
POST /api/credentials(encrypted-at-rest write) -
POST /api/keychain -
POST /api/runtime/heartbeat -
POST /api/runtime/register
Acceptance: Rust worker pointed at Rust server completes a full playbook execution against kind with event log identical to Python-server-pointed run.
NEW endpoints — Python doesn't have them today. Lands on BOTH Python AND Rust so the system pool can deploy against either during migration.
Python side ✅ landed + kind-validated 2026-06-02 via #659 (v4.10.0) + #660 (v4.10.1 with kind-validation fixes).
Rust side ✅ landed + kind-validated 2026-06-02 via #12 (v2.1.0) + #13 (v2.1.1 with schema fix) + #14 (axum 0.8 route-syntax fix — without this v2.1.1 panics in Router::route() at startup before binding the HTTP listener) + noetl/ops#147 (kind deployment manifest).
All 11 assertions of automation/development/validate-internal-api.sh pass against the Rust server (same harness that validated Python; identical pass rate).
| Endpoint | Python | Rust |
|---|---|---|
POST /api/internal/outbox/claim?limit=N |
✅ kind-validated | ✅ kind-validated |
POST /api/internal/outbox/mark-published |
✅ kind-validated | ✅ kind-validated |
POST /api/internal/outbox/mark-failed |
✅ kind-validated (backoff confirmed) | ✅ kind-validated (1s backoff for attempts=1) |
GET /api/internal/outbox/pending-count (KEDA scaler source) |
✅ kind-validated | ✅ kind-validated |
POST /api/internal/events/project |
✅ kind-validated (fresh + idempotent) | ✅ kind-validated (fresh + idempotent) |
| ServiceAccount bearer-token auth gate | ✅ kind-validated (403 + 503 paths) | ✅ kind-validated (403 on no-auth / wrong-token / Basic-scheme) |
Span (tracing::instrument) + execution_id per endpoint |
✅ tracing spans (Prometheus metrics deferred) |
Three real-world bugs found + fixed during kind validation (see Sessions Log 2026-06-02 (late evening)):
- Python router prefix double-prefix —
/api/internal→/internal(Python-only). - Python dict-row tuple subscript in
pending-count(Python-only). -
noetl.eventschema mismatch —timestampcolumn missing, NOT NULL columns absent, partitioned table doesn't supportON CONFLICT (event_id)(both Python + Rust).
Validation harness: noetl/ops automation/development/validate-internal-api.sh (PR awaiting merge).
Acceptance: system worker pool on kind runs system/outbox_publisher end-to-end against either the Python or Rust server's internal endpoints. Python side validated; full pipeline lands when #46 Phase 1.b deploys the system pool.
Python's catalog → command-generation → state-machine logic. ~5-8k LoC Python in repos/noetl/noetl/server/. Rust skeleton at repos/server/src/engine/ (~1,967 LoC).
-
POST /api/execute— full port: persists initial command, publishes NATS notification, snowflake-generated event_id (noetl/server#27, #28, #29) - State-machine orchestrator wired into event ingest —
trigger_orchestratorloads events, callsWorkflowOrchestrator::evaluate, persists generated events + commands, emits terminalplaybook.completed/playbook.failed(noetl/server#31, Phase D R2) -
persist_engine_commandextracted as shared helper for/api/executeand orchestrator paths -
noetl-executorcrate already feedsWorkflowOrchestrator(Phase D R1 survey closed the gap)
Acceptance: full execution lifecycle handled by Rust server, replayable against the same event log as Python. Kind validation tests/fixtures/r2_two_step 2-step linear playbook runs end-to-end, terminates at playbook.completed (Phase D R2, ai-meta@TBD). Phase D materially complete — remaining work is conditional/iterator/parallel control-flow coverage (future rounds), not the core engine wiring.
-
GET /api/executions/{id}/events/stream— SSE for the gateway (axum has SSE support built-in) - Remaining ~20-30 Python
@routerroutes triaged; port the ones with callers; drop the ones without
-
shard_id = hash(execution_id) % N - StatefulSet deployment (replaces today's Deployment)
- Inter-shard coordination (catalog + credentials shared; executions sharded)
- Gateway/load-balancer extension to route by
execution_idheader - Migration path: single-replica StatefulSet → scale to N → cutover
- Helm chart values:
server.replicas→server.shards - Production cutover — flip ingress; Python server retires
Client Gateway Server-Shard-0 Server-Shard-1 Postgres
│ │ │ │ │
│ POST /api/execute │ │ │
│ (no execution_id yet) │ │ │
├───────────────►│ │ │ │
│ │ pick any shard │ │ │
│ │ (load-balance) │ │ │
│ ├──────────────────►│ │ │
│ │ │ generate eid │ │
│ │ │ (snowflake) │ │
│ │ │ INSERT execution│ │
│ │ ├─────────────────┼───────────────►│
│ │ ◄─────────────────┤ │ │
│ ◄──────────────┤ {execution_id: 12345} │ │
│ │ │ │
│ GET /api/executions/12345 │ │
│ X-Execution-ID: 12345 │ │
├───────────────►│ │ │
│ │ shard = 12345 % 2 = 1 │ │
│ ├─────────────────────────────────────►│ │
│ │ │ SELECT state │
│ │ ├────────────────►│
│ │ ◄───────────────────────────────────┤ │
│ ◄──────────────┤ {status: RUNNING, ...} │ │
│ │
│ Phase C internal endpoints (system pool calls) │
│ │
│ POST /api/internal/outbox/claim │
│ X-Execution-ID: not required (claim is shard-aware via worker pod) │
├───────────────►│ │
│ │ outbox claim fans out across all shards via │
│ │ per-shard `outbox-publisher-<shard>` subscription │
│ │ (each system pool worker pod owns a shard slice │
│ │ like the projector does today) │
▼ ▼ ▼
Sharding rules:
| Resource | Strategy |
|---|---|
noetl.execution / noetl.event / noetl.command / noetl.outbox
|
Per-execution_id sharding (write to owning shard) |
noetl.catalog / noetl.credential / noetl.keychain
|
Shared (read from any shard, write to designated leader shard) |
noetl.runtime (worker heartbeats) |
Per-pool sharding (worker_id hash) |
Migration to sharded mode is N=1 first (no functional change), then scale to N=3 with executionID % N routing.
- System worker pool runtime + system playbooks → Umbrella: System Pool Design (#46). This umbrella provides the endpoints #46 needs; #46 builds the deployment + playbooks.
- Rust worker tool-kind gaps → Umbrella: Rust Worker Parity Gaps (#47 + #48). Orthogonal.
- Container tool kind callback → Umbrella: Container Tool Callback (#43). Orthogonal.
- DSL parser /
noetl-tools/noetl-executor— separate Rust crates; not part of the server port.
Per the issue-tracking convention, file these against noetl/server when work begins:
- Phase A read-endpoint parity audit + diff harness — surfaces drift between Rust and Python responses for already-wired endpoints.
-
Phase C internal endpoints (LANDS FIRST — even before Phase A finishes) —
/api/internal/outbox/*+/api/internal/events/projecton BOTH Python and Rust. Unblocks #46 Phase 2. -
Event envelope crate (EE-4 from TaskList #51) —
noetl-eventsshared crate that worker + executor + server depend on.
| Date | Event |
|---|---|
| 2026-06-02 | Umbrella filed during the architecture-pivot session. Priority PRIMARY alongside #46. |
| 2026-06-02 | Cross-linked from #46 — the two umbrellas interlock (#49 provides API endpoints #46's playbooks consume). |
| 2026-06-02 | No code work started. Phase 1 plan + first three sub-issues defined; ready to pick up. |
| 2026-06-02 | Phase A read-endpoint parity harness landed (noetl/server#18 + #20). Phase C internal endpoints wired (#17, #19, #25). v2.2.0 shipped. |
| 2026-06-03 (morning) | Phase B write-boundary parity complete — Prometheus surface + all 6 write endpoints instrumented (noetl/server#21, #23). Rust worker → Rust server e2e validated; 60k/60k load smoke at 920 req/s, p99 164ms. v2.4.0 shipped. |
| 2026-06-03 (afternoon) |
/api/execute full port (noetl/server#27) + args:null fix (#28) + result-envelope shape compliance (#29 → noetl/server#30). Phase D R1 survey closed; template resolution already wired via existing TemplateRenderer.render_value. |
| 2026-06-03 (evening) | Phase E SSE port (GET /api/executions/{id}/events/stream) shipped. Phase D R2 orchestrator wired into event ingest (noetl/server#31, v2.5.0). Kind validation: 2-step linear playbook (tests/fixtures/r2_two_step) runs end-to-end on Rust server, event log terminates at playbook.completed. |
| 2026-06-04 (evening) | Phase D R3a — step.when enable guard wired through orchestrator (noetl/server#32, v2.6.0). Skip-chain walks inline through skipped steps' next.arcs until landing on a guard-passing step or end. Companion fix in state.build_context exposes workload under workload.* namespace. Two new unit tests + kind validation on tests/fixtures/r3a_conditional (both skip_middle=true and =false terminate at playbook.completed). |
| 2026-06-04 (late evening) | Phase D R3b — step.loop iterator fan-out + state aggregation (noetl/server#33, v2.7.0). Orchestrator detects step.loop, evaluates loop.in_expr, emits ONE step.enter (with iterations_expected in context) + N iteration commands via build_iteration_command with per-iteration command_id shape <exec>:<step>:<event>:i<index>. StepInfo tracks iterations_expected + dedup iteration_command_ids HashSet + iteration_results; step flips to Completed only after all N distinct command_ids land. 5 new unit tests (28/28 engine). Kind validation: orchestrator path validated (fan-out + NATS publish + worker claim); full e2e completion gated on R3b-2 (worker-side Python iter-variable injection) + dual-worker NATS subject race. |
| 2026-06-05 (early morning) | Phase D R3c — parallel branches via next.arcs mode:inclusive (noetl/server#34, v2.8.0). Phase D R3 closes. Multi-target transitions already dispatched correctly via existing evaluator path; this PR fixes the orchestrator's if next_step_name == "end" early-return that falsely completed workflows when ONE parallel branch hit end while siblings were still in flight. New reached_end flag + reached_end_quiescent clause at end of process_in_progress: completion fires only when reached_end AND no new commands queued AND no other branches running. 3 new orchestrator unit tests (31/31 engine). Kind validation: orchestrator triggered exactly 3 times as designed; final playbook.completed blocked by dual-worker NATS subject race (same gap as R3b — not an R3c issue). |
| 2026-06-05 (midday) |
noetl/ai-meta#53 closes — Phase D R3 e2e drained for R3a/R3c/R2. Three PRs landed: noetl/server#35 (v2.8.1) makes /api/worker/pool/register accept component_type alias the Rust worker actually sends; noetl/worker#41 (v5.11.0) makes the worker honor server_url from each NATS command notification (new ControlPlaneClient::with_server_url + CommandExecutor::execute_with_server_url); noetl/server#36 (v2.8.2) adds the R3a skip-chain re-entry guard surfaced once the multi-trigger path worked. End-to-end kind validation with both Rust + Python worker pools running: R2 (2-step linear), R3a (both skip_middle branches), R3c (parallel) all reach playbook.completed cleanly without dual-worker workarounds. R3b iterators still gated on R3b-2 (worker-side Python iter-variable injection). |
| 2026-06-05 (late afternoon) |
Phase D R3 e2e fully closed (noetl/server#37, v2.8.3) — R3b iterators reach playbook.completed after all N iterations. Two fixes: (1) build_iteration_command injects iteration vars into tool.config.args (worker Python sees them as globals via globals().update(args) — fixes the long-standing NameError: name 'item' is not defined); (2) state.rs step.enter handler now reads iterations_expected from event.result.context (the canonical storage shape after trigger_orchestrator wraps via the {status, context} envelope), so the iteration aggregation actually fires and command.completed no longer falls into the "plain step" branch that prematurely marked the step Completed. New reproducer unit test; 32/32 engine tests pass. All four fixtures end-to-end on kind: R2 (linear), R3a (both branches), R3b (3 iterations), R3c (parallel) → playbook.completed. |
| 2026-06-04 |
Phase F R3b sequence complete (drift-guard end-to-end). Three rounds shipped in sequence: R3b-1 (noetl/server#47, v2.12.0) adds GET /api/runtime/shard-info returning the server's shard_for() result; R3b-2 (noetl/gateway#26, v3.2.0) adds the gateway twin GET /sharding/preview (local compute, NOT proxy); R3b-3 (noetl/ops#158) ships automation/development/validate-shard-drift-guard.sh posting to both endpoints across 30 (execution_id, shard_count) pairs and asserting shard_index agreement. Catches drift modes the per-side unit tests can't see — twox-hash crate version split, SHARD_HASH_SEED divergence, i64→bytes endianness flip, hash crate algo change with non-major bump. Server side sanity-validated against noetl-server-rust v2.12.0 in kind (30/30 pairs return expected varying shard_index distribution); full cross-side execution deferred to operator since gateway v3.2.0 not yet in local kind. Phase F status: R1 + R1.5 + R2 + R3a + R3a-2 + R3b all ✅; next is R4 (DB sharding) or R5 (cutover). |
| 2026-06-04 |
Phase F R4 kicked off — decision: per-shard physical Postgres, NOT Citus. Tracked under sub-issue noetl/server#48 with 5-round decomposition (R4-1 pool layer, R4-2 AppState wiring, R4-3 per-execution handler cutover, R4-4 cluster-wide list fan-out, R4-5 kind validation with N=2 shards). R4-1 shipped (noetl/server#49, v2.13.0): ShardConnection DSN type + ShardingConfig env loader + DbPoolMap N+1 pool container with pool_for(execution_id) / cluster() / all_shards() accessors. Single-pool fallback when NOETL_SHARDS empty — degenerate one-shard map whose cluster handle IS the only shard pool; behaviour bit-identical to today. 12 + 2 new tests pin shard_for routing against R3b drift-guard pairs so any future change to pool_for keeps the wire contract honest. No handler changes; R4 ships dormant until operator sets NOETL_SHARDS. |
| 2026-06-04 |
Phase F R4-2 shipped (noetl/server#50, v2.14.0). DbPoolMap wired into AppState as a new pools field alongside the existing db: DbPool. Handler call-sites untouched — R4-3 does the disciplined per-execution cutover (~12 files) so each PR diff stays reviewable rather than landing a ~70-call-site swap in one go. New AppState::new_legacy shim for test / example callers without a ShardingConfig; new sync DbPoolMap::from_single_pool constructor lets the shim skip the async path. main.rs loads ShardingConfig::from_env(), builds the map (sharded when NOETL_SHARDS set, fallback otherwise), logs shard topology at startup. Doc example updated to new_legacy. 2 new tokio pool tests pin the single-pool path against the i64-extreme inputs the R3b drift-guard exercises. |
| 2026-06-04 |
Phase F R4-3a shipped (noetl/server#51, v2.15.0). First slice of R4-3 (per-execution handler cutover). 9 sites in events.rs migrated from state.db to state.pools.pool_for(execution_id); 1 cluster-wide read (catalog content in trigger_orchestrator) moves to state.pools.cluster(). 2 event_id-keyed sites (get_command, claim_command) deferred to R4-4 with inline TODO(R4-4) markers — execution_id isn't in scope at request time so neither shard selector nor tx begin can pick the per-execution pool up front; R4-4 picks between a path-param redesign and a cross-shard probe helper. Mid-round design pivot: user raised whether shard endpoints should be DB-resident keychain-backed records (multi-cloud + per-shard rotation). Decision: deferred to Phase G; full design + 5-round Phase G decomposition recorded in server wiki sharding-design § Phase G. R4 continues with env-var DSN through R4-5. |
| 2026-06-04 |
Phase F R4-3b shipped (noetl/server#52, v2.16.0). Second slice of R4-3 (per-execution handler cutover). execute.rs cutover with a clean 3+3 split: 3 cluster-wide noetl.catalog reads (resolve_catalog × 2 + get_playbook_yaml) move to state.pools.cluster(); 3 per-execution writes (playbook_started event INSERT, command.issued event INSERT, insert_command_row command INSERT) move to state.pools.pool_for(execution_id). state.db no longer appears in execute.rs. No new tests — mechanical pool-handle swap. 246/247 full suite passes (one pre-existing parser failure carried over from R3b-1, unrelated). R4-3c next (health.rs, 3 sites all cluster-wide); then R4-4 (cluster-wide list fan-out + event_id-keyed deferrals) → R4-5 (kind validation N=2). |
| 2026-06-04 |
Phase F R4-3c shipped — R4-3 complete (noetl/server#53, v2.17.0). Final slice of R4-3. health.rs cutover: 3 cluster-wide sites (db_health_check for GET /api/health, pool.size() + pool.num_idle() for GET /api/pool/status) move to state.pools.cluster(). Inline comments at both handlers point at the future per-shard health surface (/api/health/shards + per-shard /metrics gauge labels) without blocking the basic readiness signal. R4-3 cutover complete across events.rs + execute.rs + health.rs. Only 2 state.db sites remain in the entire codebase — both in events.rs (get_command + claim_command), both event_id-keyed, both explicitly TODO(R4-4)-tagged. R4-4 closes those out + adds the cluster-wide GET /api/executions list fan-out. |
| 2026-06-04 |
Phase F R4-4 shipped (noetl/server#54, v2.18.0). Cross-shard fan-out + event_id resolver. Two new DbPoolMap helpers: for_each_shard (sequential per-shard fan-out; outputs in shard-index order) and find_first (probes every shard, returns first Some). Sequential await — no futures crate dep, no tokio::spawn 'static bounds; parallelism is a Phase G concern. Closes the 2 event_id-keyed TODO(R4-4) sites from R4-3a (get_command + claim_command); claim_command resolves event_id → execution_id via find_first then opens the tx on pool_for(execution_id) to keep shard-locality on the claim tx. state.db is now absent from the entire src/ tree — R4-3 + R4-4 complete the handler-side migration to DbPoolMap. 4 new tokio pool tests; 250/251 full suite passes (pre-existing parser failure unrelated). Full ExecutionService refactor (DbPoolMap ownership + per-execution method migration + GET /api/executions list fan-out with the catalog join split into per-shard events aggregation + cluster-master catalog lookup) is the natural next slice (R4-4b) — deliberately deferred to keep this PR diff reviewable. |
| 2026-06-04 |
Phase F R4-4b shipped — in-server migration complete (noetl/server#55, v2.19.0). ExecutionService::new(pools: DbPoolMap, snowflake) (was (db: DbPool, ...)). 9 per-execution sites across get/get_status/cancel/is_cancelled/finalize → pool_for(execution_id); catalog path lookup → pools.cluster(). list() rewritten as fan-out: per-shard execution_stats CTE (no catalog JOIN) with (limit + offset) over-fetch, merge sorted by started_at DESC, cluster-master SELECT path FROM noetl.catalog WHERE catalog_id = ANY($1), stitch paths, post-merge path-LIKE filter (case-insensitive), skip(offset).take(limit). New ExecutionService::new_legacy(db, snowflake) shim wraps a single pool for test/example callers without a pool map. main.rs wires state.pools.clone() into the constructor. Path filter quirk documented inline (post-merge filter can reduce effective row count below limit). cargo build clean; 250/251 full suite passes. R4-3 + R4-4 + R4-4b together complete the in-server migration to DbPoolMap. R4-5 (kind validation N=2 shards in noetl/ops) is the final R4 round. |
| 2026-06-04 |
Rust-only e2e complete — orchestrator template context + noetl-tools chain + legacy cleanup (noetl/server#67 (v2.19.4) + noetl/tools#17 (v2.17.1) + noetl/tools#18 (v2.18.0) + noetl/worker#42). control_flow_workbook runs FULLY end-to-end on the Rust-only stack 🎉 — first NoETL playbook exercising the complete control-flow surface (workbook → python with result capture → conditional next.arcs → parallel branches → terminal) without Python. Five fixes shipped across three repos: PythonTool result capture; TaskSequenceTool runtime; worker noetl-tools dep bump; orchestrator template context (expose step data at top level + capture call.done so command.completed's bare envelope doesn't clobber the rich payload). Kind cluster legacy cleanup: deleted noetl-server, noetl-worker, noetl-outbox-publisher deployments + noetl-projector statefulset + noetl/noetl-ext/noetl-projector/noetl-worker-metrics services + associated configmaps + noetl-worker SA. Patched noetl-worker-system-pool to point at noetl-server-rust before the legacy noetl service deletion. Cluster is now Rust-only: noetl-server-rust + noetl-worker-rust + noetl-worker-system-pool. Closes noetl/ai-meta#60, noetl/tools#15, noetl/tools#16, noetl/server#66. |
| 2026-06-04 |
Rust-only e2e expansion — pipeline + failure + workbook shipped together (noetl/server#61, #63, #65; v2.19.3). Three orchestrator + parser gaps surfaced by the same e2e probe sequence on the Rust-only stack right after the workload + input alias fix. (Gap C — pipeline flat shape) ToolDefinition::Pipeline accepts both YAML shapes via an untagged PipelineItem enum (flat name-as-field vs nested label-as-key); custom Serialize normalises on the wire so the worker's task_sequence consumer is unchanged. Flat form is the v10 dominant (446 vs 37 occurrences in e2e fixtures). Unblocked iterator_save_test, start_with_action, end_with_action from 400 parse rejection. (Gap B — failure termination) Four interlocking gaps fixed: handler trigger gate extended to command.failed; trigger_orchestrator derives trigger_event_type from the actual event; process_in_progress short-circuits on command.failed using durable StepState::Failed signal; state.rs::apply_event extracts failure error from result.context.error fallback so step.error gets populated. Parallel-branch deferral preserved. Validated: control_flow_workbook now emits `playbook.failed |
| 2026-06-04 |
Rust e2e canonical v10 playbook compat — workload + input alias (noetl/server#59, v2.19.2). Two interlocking YAML-decode gaps surfaced by hello_world e2e on the Rust-only stack right after the EE-5 fix unblocked event emission. (1) ToolSpec.args accepts input as a serde alias — the canonical v10 playbook YAML writes tool.input: { ... } (446 occurrences across e2e fixtures vs. 37 for args:), but Rust silently dropped the unknown field at decode and any script referencing the workload by name raised NameError. (2) emit_playbook_started_event merges playbook.workload defaults with request.payload overrides — without this, downstream steps' build_context() returned {} because ExecutionState::handle_event hydrates state.workload from an event that carried only the request payload (the generate_initial_commands path had its own merge for the start step's command context — divergent sites, silent drift). Three-state diagnostic on hello_world: no fix → NameError; input alias only → HELLO_WORLD: None; both fixes → HELLO_WORLD: Hello World ✅. Validated end-to-end in local kind with Python deployments scaled to 0. 3 new unit tests pin both YAML shapes. Closes noetl/ai-meta#56 + noetl/server#58. Unblocks all v10-shape playbooks on Rust e2e. |
| 2026-06-04 |
Phase F R5 follow-up — EE-5 lax decode of integer execution_id / event_id shipped (noetl/server#57, v2.19.1). Rust-only e2e validation surfaced a year-old contract drift hidden by Pydantic v2's lax int→str coercion on the Python stack: worker emits noetl-events::ExecutorEvent.execution_id: i64 over .json(&event) (JSON integer), server's EventRequest.execution_id: String strict-decoded and 422'd with invalid type: integer, expected a string. Three worker-→-server inbound id fields gained custom serde adapters (deserialize_string_or_i64 + Option<String> variant) so both the worker's i64 shape and the documented browser-facing String shape decode cleanly: EventRequest.execution_id, EventRequest.event_id, BatchEventRequest.execution_id. Outbound encoding stays String. 6 new unit tests pin the dual-shape contract; bogus shapes (arrays / objects) still 422. Validated end-to-end in local kind on the Rust-only stack (Python deployments scaled to 0): noetl exec tests/fixtures/playbooks/hello_world runs both steps to playbook.completed — the same scenario that previously stalled after 3 retry attempts. Closes noetl/ai-meta#55 + noetl/server#56; unblocks Phase F R5 Rust-only e2e tier. Wiki update lands in lockstep on noetl/server wiki event-envelope § EE-5. |
| 2026-06-04 |
Phase F R4-5 shipped — R4 complete (noetl/ops#160). Kind validation script automation/development/validate-shard-routing-n2.sh (~358 lines). Creates 3 databases on existing postgres pod (idempotent), patches noetl-server-rust with NOETL_SHARDS + NOETL_CLUSTER_DSN, probes R3b-1 → asserts shard_count=2, spawns N executions (default 10), asserts each landed on the predicted shard per shard_for(execution_id, 2) by querying each per-shard DB directly, re-runs R3b-3 drift-guard against the sharded server, trap-EXIT reverts the deployment patch. Three databases on one pod (not three pods): exercises the in-server routing code path without process-level isolation overhead; per-pod isolation is the Phase G concern. R4 cutover end-to-end shipped across noetl/server v2.13.0 → v2.19.0 + noetl/ops kind validation. Closes noetl/server#48. Next + only remaining Phase F round: R5 (production cutover, separate ops decision). |
| 2026-06-05 |
Phase F R5 Tier 4 — v10 control-flow runs end-to-end on Rust-only (noetl/server#69 v2.19.5, 6 commits, closes server#68; noetl/worker#44 v5.11.1, closes worker#43; noetl/tools v2.18.1). R5's e2e-playbook tier re-probe found + fixed 7 distinct bugs across the Rust stack: catalog get_next_version INT4-vs-i16 SQL decode; insert_catalog_entry RETURNING id→catalog_id; ToolSpec Option fields serialized null breaking PythonConfig decode (skip_serializing_if); orchestrator end-step-with-action dispatch + task_sequence label-keyed result flatten + intra-pass dispatch dedup; minijinja UndefinedBehavior::Chainable so {{ ctx.x | default(step.y) }} resolves when ctx is undefined; drop the step != "end" orchestrator trigger gate so end's command.completed emits playbook.completed; worker preserves array tool_config for task_sequence (every v10 tool: [...] step was dropping its config to {}). Four v10 fixtures reach playbook.completed on the Rust-only kind stack (Python at 0): start_with_action (conditional arcs + set-blocks, from_start data threaded), end_with_action (terminal step's cleanup action ran, ctx.temp_table threaded), loop_test (iterator fan-out, 41 events), control_flow_workbook (workbook→eval→parallel→terminal, 37 events). actions_test correct-fails on a missing TEST_SECRET env (failure-termination fired cleanly). Pointers bumped: server 52043a8, worker 912678a, tools ff5b6e6. R5 stays open — this is the v10 control-flow slice of Tier 4; remaining: sharded-N=2 mode + 60-fixture regression rig + production cutover. |
| 2026-06-05 |
Phase F R5 Tier 4 — keychain-credential path validated (noetl/server#71 v2.19.6, closes server#70; noetl/worker#46 v5.11.2, closes worker#45). Registered the pg_k8s postgres credential and probed the DB-backed fixtures — surfaced a 3-bug chain in the keychain subsystem. (1) Credential store bound the AES-GCM ciphertext as Vec<u8> (BYTEA) to the TEXT data_encrypted column → POST /api/credentials 500'd, blocking every keychain credential. Fix: base64-armor on write + decode on read (server#71). (2) resolve_auth_alias read only the auth: key, but v10 playbooks carry the alias under credential: → no connection fields injected → tool fell back to a default unreachable connection. Fix: accept auth OR credential, strip both (worker#46). Proven end-to-end: iterator_save_test's create_table (top-level credential: pg_k8s) registers → decrypts → field-maps (db_host→host, …) → connects → runs a real CREATE TABLE against demo_noetl. (3, filed) Nested task_sequence sub-task credentials don't resolve — process_items's save_item is a postgres task inside a pipeline; task_sequence dispatches sub-tasks through noetl-tools (no ControlPlaneClient), bypassing worker-side resolution (worker#47; recommended fix = worker pre-resolves nested aliases). Pointers bumped: server 7a845da, worker 4144ea0. postgres_test's query_catalog has no credential: (fixture-design gap, not a code bug). |
| 2026-06-05 |
Phase F R5 Tier 4 — nested-pipeline credentials resolved + template-timing finding (noetl/worker#48 v5.11.3, closes worker#47). Bug #3 from the credential chain above: split resolve_auth_alias into a dispatcher + single-tool helper; for a task_sequence envelope the worker now walks the tool_config pipeline array and pre-resolves each sub-task's credential:/auth: alias before dispatch (task_sequence dispatches sub-tasks through noetl-tools, which has no ControlPlaneClient). Validated: iterator_save_test's process_items.save_item (postgres-in-a-pipeline-in-an-iterator) now connects to demo_noetl. Credential-path chain complete + validated: store TEXT/base64 (server#71) → credential:-key alias (worker#46) → nested resolution (worker#48). Last iterator_save_test blocker — filed noetl/server#72: the server pre-renders the entire task_sequence pipeline config at command-build time, so {{ _prev.* }} (task_sequence runtime values) render against an undefined _prev and — since the v2.19.5 UndefinedBehavior::Chainable change — silently become empty strings → malformed SQL (VALUES ('exec:', 'exec', '', )). Evidence read from noetl.event.context. Recommended fix: defer task_sequence sub-task rendering to the task_sequence tool. Also flagged: postgres tool swallows the real SQL error behind a generic "db error" (noetl-tools observability gap). Pointer bumped: worker fbefd57. |
| 2026-06-05 |
Phase F R5 Tier 4 — iterator_save_test GREEN; full v10 + credential + iterator-pipeline surface validated (noetl/server#73 v2.19.7, closes server#72). Added render_value_deferring(value, ctx, deferred_roots): renders every {{ ... }} block EXCEPT those referencing a deferred root (masked with a null-delimited placeholder, restored verbatim). The Pipeline branch defers _prev/_results, so {{ pg_auth }} / {{ execution_id }} / {{ item }} resolve at command-build (worker keychain-alias resolution still sees a resolved alias) while {{ _prev.* }} survives to task_sequence, which renders it per-iteration with the real previous-task output. 3 new template unit tests (18/18 template::jinja pass). Validated — the data is the proof: iterator_save_test reaches playbook.completed and writes 3 rows to the real demo_noetl DB (item1/100, item2/200, item3/300); 35-event chain start → create_table → process_items (×3: transform → save_item) → end → playbook.completed. The complete credential + iterator + pipeline chain is now merged + validated end-to-end on Rust-only: store TEXT/base64 (server#71) → credential:-key alias (worker#46) → nested resolution (worker#48) → _prev/_results defer-render (server#73). This exercises the deepest v10 surface NoETL has — an iterator whose body is a multi-tool pipeline that chains data between tools, resolves a keychain credential on a nested sub-task, and writes to an external DB. Pointer bumped: server 24a1966. Remaining R5: sharded-N=2 mode, 60-fixture regression rig, production cutover. Still filed (noetl-tools): postgres tool "db error" observability gap. |
| 2026-06-05 |
Phase F R5 Tier 4 — postgres-tool observability (real SQLSTATE errors) (noetl/tools#22 v2.18.2, closes tools#21; noetl/worker#49 dep bump). Closed the last follow-up from the credential/iterator saga. tokio_postgres::Error's Display renders just db error for server-side failures — the real SQLSTATE + message lives in the attached DbError. format_pg_error(context, &e) surfaces severity: message (SQLSTATE code) + DETAIL/HINT, with a std::error::Error::source chain fallback for connection/type errors; wired into the Query + Execute sites. Worker bumped to noetl-tools 2.18.2 (Cargo.lock-only; ^2.18 already covered it; chore(deps) non-releasing so worker stays semver v5.11.3 on 64db698). Validated on the Rust-only kind stack: a bad query (SELECT * FROM <nonexistent>) reports Database error: Query failed: ERROR: relation "..." does not exist (SQLSTATE 42P01) in the call.error event (was: Query failed: db error). Diagnosing failing-SQL playbook steps no longer requires reading noetl.event.context. Pointers bumped: tools e636845, worker 64db698. |
| 2026-06-05 |
Phase F R5 Tier 4 — regression rig: canonical v10 SQL + http config shapes (noetl/tools#24 v2.18.3 closes tools#23; noetl/tools#25 test; noetl/tools#26 v2.18.4; noetl/worker#50 + #51). Started the fixture regression sweep (~30 self-contained fixtures) on the Rust-only kind stack; fixed three config-shape classes the canonical v10 fixtures exercise. (1) postgres command: alias + multi-statement SQL (v2.18.3): canonical v10 postgres steps use command: (incl. v10_canonical_example.yaml) — config required query; both postgres (extended protocol) and duckdb (prepare()) rejected CREATE …; INSERT …; SELECT …. Added the alias + a quote-aware splitter (leading statements batched, final keeps the typed path; param-free only). (2) task_sequence→duckdb regression test (tools#25) after a rollout claim-race briefly masqueraded as a fix bug. (3) duckdb command: alias + http coercion (v2.18.4): duckdb gains the alias; http params/headers/form were HashMap<String,String> but templated values render to integers/null (invalid type: integer/null, expected a string) → lenient coercion (numbers/bools→string, null dropped). Newly GREEN: duckdb_test, json_serialization_save, duckdb_retry_query, pagination/{offset,cursor,max_iterations,pipeline}, retry_simple_config. Cluster recovery first (server latched into NATS not configured after a podman restart — fixed by a server rollout). Server-side follow-up: loop_with_pagination renders {{ execution_id }} empty inside a multi-statement postgres command → = ; syntax error (render-scoping at command-build). Pointers bumped: tools 3e1df11 (v2.18.4), worker 4759bd1. |
| 2026-06-05 |
Phase F R5 Tier 4 — e2e sweep + orchestrator-strand fix (noetl/server#95 v2.27.2, closes server#94; fixture noetl/e2e#28). Re-ran the sweep against a fresh v2.27.1 kind build of the Tier-1 self-contained set: 16 PASS, 3 correct-FAIL (should_error_tool_is_required negative test, v10_canonical_example fake URL, spike/* experimental). Found a real orchestrator-strand bug: vars_test/test_vars_template_access hung in RUNNING forever after set_variables. Root cause — trigger_orchestrator caught a deterministic evaluate() error (an invalid {{ ctx.* }} Jinja expression in a downstream step's code body, rendered by engine::commands::build_tool_command) in a WARN-only arm and emitted no terminal event. Fix: emit a terminal playbook.failed (FAILED, error surfaced, parented on the trigger) on an evaluate Err; transient/infra errors before evaluate stay retryable. Same stall class as the #58 command.failed fix. Unit test locks the precondition (invalid template body → evaluate returns Err); lib suite 307/0. Kind-validated: a bad-template playbook → FAILED (was RUNNING-forever); template_access (fixture fixed), hello_world, test_transient, loop_test → COMPLETED. Server wiki: event-envelope terminal-events section. 3 deferred findings filed: #62 (/api/executions list status query O(all events), 7–8 s + list-vs-detail status drift), #63 (python script.source.code shape, ~18 fixtures), #64 (artifact tool kind). Pointers bumped: server 5808486 (v2.27.2), e2e f3a49b2. Remaining R5: clear #62/#63/#64, external-infra fixtures, sharded-N=2 mode, production cutover. |
| Round | Code | Kind E2E |
|---|---|---|
| R1 (template resolution) | ✅ shipped in Phase B | ✅ |
| R2 (orchestrator wired into event ingest, #31) | ✅ v2.5.0 | ✅ |
R3a (step.when enable guard, #32 + #36) |
✅ v2.6.0 + v2.8.2 | ✅ |
R3b (step.loop iterator, #33 + #37) |
✅ v2.7.0 + v2.8.3 | ✅ |
| R3c (parallel branches, #34) | ✅ v2.8.0 | ✅ |
EE-4 fully shipped ✅ — all three rounds merged + crates.io publish landed:
| Round | PR | Outcome |
|---|---|---|
| 1 | noetl/cli#49 |
noetl-events workspace crate extracted from noetl-executor::events; executor 1-line re-exports. |
| 2 | noetl/cli#50 | Crates.io publish prep + actual publish: noetl-events 0.1.0, noetl-executor 0.4.0, noetl 4.9.0 all on crates.io after manual workflow dispatch on cli@d6e2432. |
| 3 | noetl/server#38 | Server takes noetl-events = "0.1" direct dep, adds From<ExecutorEvent> + TryFrom<&EventRequest> impls, 4 wire-compat tests pinning the shared-subset round-trip. Server v2.9.0. |
Design call: server's EventRequest legitimately retains its 5 server-only fields (result_kind, result_uri, event_ids, actionable, informative) and String wire format for execution_id / event_id (JSON-number precision for browser clients) — the two types stay distinct but the SHARED SUBSET is now anchored to the canonical noetl_events::ExecutorEvent.
Forward:
-
Phase F — sharding — R4 complete (shipped 2026-06-04 across noetl/server v2.13.0 → v2.19.0 + noetl/ops kind validation; noetl/server#48 closed). Design doc + endpoint inventory live on noetl/server wiki —
sharding-design. In-serverDbPoolMaprouting is end-to-end kind-validated; ships dormant unless the operator setsNOETL_SHARDS. Only remaining Phase F round is R5 (production cutover — separate ops decision after the operator runsvalidate-shard-routing-n2.shagainst their live kind cluster and confirms acceptance criteria; tracked separately whenever that rollout window opens). -
Phase G (deferred) — keychain-backed shard endpoints +
result_ref.credential_alias. Design + 5-round decomposition recorded in noetl/server wiki sharding-design § Phase G. Triggers when multi-cloud or per-shard credential rotation becomes a real production requirement. - Remaining Python-only endpoints — scope a sweep through the Phase A parity-harness output to identify the long-tail of routes the Rust server doesn't yet cover (none currently block production traffic; the system pool + workers already operate on the Rust surface for all observed call patterns).
-
R5 fixture regression rig (in progress, 2026-06-05). ~30 self-contained e2e fixtures swept against the Rust-only kind stack; three canonical-v10 config-shape classes fixed in noetl-tools v2.18.4 (see Recent activity). Remaining sweep buckets: save_storage / regression_test / vars-API / pft / batch_execution fixtures (many need
pg_localor external resources — triage PASS vs correct-FAIL vs bug as they run). Open server-side finding:loop_with_paginationrenders{{ execution_id }}empty inside a multi-statement postgrescommand:at command-build time →WHERE execution_id = ;syntax error — needs the same render-scoping diagnosis as the server#72/#73 defer-render work, scoped to single-tool postgrescommandsteps (not just task_sequence pipelines). Porting gap noted: theartifacttool kind (test_output_select,test_storage_tiers) is not in noetl-tools — confirm whether it's canonical v10 before porting.
Read-only investigation of the Rust server (repos/server/src/) to scope the sharding boundary. The architecture is ready: every per-execution operation is already keyed by execution_id, NATS subjects are already execution-aware (noetl.commands.{system|shared}.<execution_id> derived in publish_command_notification at execute.rs:535), and the orchestrator is stateless re: execution (loads events fresh from the DB on every trigger). Sharding is an orchestration problem, not a redesign.
The execution. All state belonging to a single execution_id (events, commands, variables, outbox rows) lives on one shard. Cluster-wide state (catalog, credentials, keychain, runtime registration) is read-only from the execution's perspective and replicates across shards or lives behind a shared read-only path.
Per-execution (shardable by execution_id):
| Endpoint | execution_id source |
|---|---|
POST /api/execute |
body (server generates snowflake) |
POST /api/events |
body (string-wire i64) |
POST /api/events/batch |
body |
GET /api/commands/{event_id} |
path → DB lookup |
POST /api/commands/{event_id}/claim |
path → DB lookup |
GET /api/executions/{execution_id} |
path |
GET /api/executions/{execution_id}/status |
path |
POST /api/executions/{execution_id}/cancel |
path |
GET /api/executions/{execution_id}/cancellation-check |
path |
POST /api/executions/{execution_id}/finalize |
path |
GET /api/executions/{execution_id}/events/stream (SSE) |
path |
GET / POST / DELETE /api/vars/{execution_id} |
path |
POST /api/internal/outbox/claim |
per-execution batch |
POST /api/internal/outbox/mark-published / mark-failed
|
body |
POST /api/internal/events/project |
body |
Cluster-wide (NOT shardable):
| Endpoint | Reason |
|---|---|
GET /api/executions (list) |
cluster-wide query |
GET/POST /api/catalog/* |
global playbook catalog |
POST/GET /api/credentials/* |
global credential store |
POST/GET /api/keychain/{catalog_id}/* |
keyed by catalog_id, not execution_id
|
POST /api/worker/pool/{register,deregister,heartbeat} |
worker pools are global |
GET /api/worker/pools |
cluster-wide read |
GET /api/internal/outbox/pending-count |
cluster-wide count (KEDA scaler trigger) |
Shardable by execution_id:
-
noetl.event— append-only event log -
noetl.command— command queue -
noetl.execution— execution records -
noetl.outbox— transactional outbox -
noetl.variables— ephemeral step-scoped cache
Cluster-wide:
-
noetl.catalog— playbook catalog -
noetl.credential— credential store -
noetl.keychain— keychain entries -
noetl.runtime— worker pool registration/heartbeat
-
Gateway-aware vs server-aware shard routing. Either the ingress (gateway / load balancer) inspects
execution_idand routes to the owning shard, or any server replica accepts any request and proxies internally. Trade-off: gateway-aware is one network hop and centralizes the routing logic; server-aware is more resilient but adds inter-shard mTLS + an N-hop worst case. Lean recommendation: gateway-aware for Phase F (simpler); server-aware as a Phase G optimization if gateway becomes a bottleneck. -
Modulo on full snowflake vs timestamp / machine_id. The snowflake i64 layout is
[timestamp: 41][machine_id: 10][sequence: 12].hash(execution_id) % N(full 64-bit) distributes evenly;timestamp % Ncreates time-based hotspots (all playbooks started in the same second hit one shard);machine_id % Nclusters by generating machine. Decision: full i64 modulo. Cheap and even. - Catalog / credential / keychain sync strategy. These must be consistent across shards. Options: (a) shared single-master Postgres (simple; write bottleneck on heavy catalog churn); (b) multi-master with conflict resolution (risky); (c) read-only replicas per shard from a single master (highest ops cost, cleanest separation). Lean recommendation: (a) single-master for Phase F; revisit (c) as Phase G if replication lag becomes the limiting factor.
| Round | Scope | Output |
|---|---|---|
| R1 | Sharding design doc + endpoint inventory codification | Sub-issue with the design call (gateway-aware, full-i64 modulo, single-master cluster-wide tables) and the 40+ endpoint inventory tagged shardable / metashard. ADR added to noetl.dev/docs. |
| R2 | Server-side shard_id() helper + (optional) route_to_shard() proxy |
New repos/server/src/state.rs helpers; HTTP proxy fallback for mis-routed requests (gives a safety net while gateway routing rolls out). |
| R3 | Gateway-side dispatch + load balancer config | Wire ingress (Kong / nginx / Cloud LB) to extract execution_id and route by consistent hashing. Replicate the shard assignment in LB config or a small plugin. |
| R4 | DB sharding + per-shard schema migration | Partition event / command / execution / outbox / variables by execution_id via Citus or per-shard separate schemas. Replicate catalog / credential / keychain / runtime (or keep on shared read-only path). Schema migration dry-run on kind first. |
| R5 | Cutover + validation | Spin up N=2 shard replicas in kind, run the Phase D e2e harness across shards, confirm execution affinity (all events for one execution hit one shard), then prod canary + traffic split. |
Per agents/rules/observability.md Principle 3, execution_id should be generated in the application using a Rust snowflake library, not via the DB-side noetl.snowflake_id() function. Reasons listed in the rule: spans need the id at span-creation time; retries are idempotent only with a stable id across attempts; cross-component publish doesn't have to wait for the INSERT round trip; tests need deterministic ids; multi-cluster sharding can't agree on a single DB-side sequence.
Currently the Rust server still calls the DB function on every event / command create. This wants to flip before R4 (DB sharding) — once tables are partitioned, the DB-side sequence either becomes per-partition (clusters within a shard) or has to be globally synchronized across shards (defeats the point). Likely an R1.5 / R2 prerequisite — track it explicitly in the R1 design doc.
- Multi-region — that's Phase G or later.
- Per-tenant sharding — orthogonal concern; Phase F is plain
execution_idmodulo. - Dynamic re-sharding (adding shards on the fly) — Phase F lands a fixed N; resizing is later.
- Umbrella: System Pool Design (#46) — interlocks with this umbrella.
- Umbrella: Container Tool Callback (#43) — orthogonal.
- Umbrella: Rust Worker Parity Gaps (#47, #48) — orthogonal.
- ADR: System Worker Pool and WASM Plug-in Surface — covers the data access boundary that shapes Phase C.
- noetl-server wiki: Runtime shape — implementation-level companion.
- Data Access Boundary — the rule that shapes Phase C.
- Home — overview
- Repo Map
- Releases
- Sessions Log
- Secrets Wallet (#61) — SECURITY (design)
- Rust Server Port (#49) — PRIMARY
- Decoupled Context + Event Chain (#115) — RFC (design), reframes #101
- Orchestrator Scaling (#101) — reframed by #115; consume side = #115 Phase 1
- Event WAL + Derivable Storage (#104) — Round 01 (locator) PR open
- WASM Plug-in Compilation (#105) — system-pool plug-in hot-reload (ADR Phase 4)
- System Pool Design (#46) — PRIMARY
- Regression Baseline Migration (#98) — e2e
- Subscription / Listener Tool (#90) — RFC
- Container Tool Callback (#43)
- Rust Worker Parity Gaps (#47 · #48)
- Event Envelope Reconciliation (#51 in TaskList)
- Cursor Loop Mode (#100) — server v3.8.0 + tools v3.10.1, 2026-06-15
- Transfer Tool Credentials (#99) — tools v3.10.0 + worker v5.22.0, 2026-06-14
- Explicit Input Binding (#77) — v3.0.0 shipped 2026-06-09
- Rust Worker Migration (#30)
- Python Services → Rust (#45)
- Issue Tracking
- Wiki Convention
- Handoffs
- Deployment Validation
- Execution Model
- Data Access Boundary
- Observability
- noetl/noetl wiki — app + DSL
- noetl/server wiki — Rust control plane
- noetl/worker wiki — Rust pull worker
- noetl/tools wiki — tool registry crate
- noetl/cli wiki — CLI + local mode
- noetl/gateway wiki — gatekeeper
- noetl/ops wiki — Helm + manifests
- noetl/travel wiki — domain SPA reference
- Docs site — engineer-facing architecture