-
Notifications
You must be signed in to change notification settings - Fork 0
Umbrella Orchestrator Scaling
Issue: noetl/ai-meta#101
Status: In progress — block a + b landed in noetl/server v3.9.0; block-b
stage 2 (references-in-state + frame pruning) + the GKE-tier benchmark still ahead.
Primary repo: noetl/server (+ noetl/worker)
See also: Design note — CQRS event log + batched projection (the write-path scaling direction).
⤴ Reframed 2026-06-19 by #115 — RFC: Decoupled Context + Event Chain. #115 names the root cause this umbrella's incremental-state + results-by-reference work bounded but did not remove: orchestrator context grows unbounded because every step passes the entire accumulated context to the next (17.4MB drive-state / 1.32MB command observed). It inverts #101's "resolve over-budget refs back into a rebuilt
WorkflowState" to "never put data in state, never scannoetl.eventto build state": the schema carries references only, state is reconstructed by walking a one-level event chain (no event-table scan), and the walk-and-cache moves off-server onto a system-poolsystem/state_builder. #101's remaining consume side (worker selective render-time ref-resolution) becomes #115 Phase 1 — the immediate unblock for the 3 stuck #114 fixtures + the off-server-drive prod cutover. The incrementalOrchStateCachehere is superseded by #115's immutable-chain cache.
noetl/server#197 → v3.9.0 (1760c19):
projection-snapshot bounded rebuild (memory flat — 167KB snapshot at 200k
events, was OOM at ~19k), throttled consistency COUNT (O(events) per-trigger
COUNT off the hot path), a background reconcile poller that force-advances
every active execution every 8s so a missed non-triggering straggler can't
permanently deadlock the cursor, results-by-reference resolution, and the
GET /api/executions/{id} memory-bomb fix (was loading all events).
Validated: kind 10×1000 (flat memory, 0 OOMs across 200k events); GKE
db-g1-small + PgBouncer 10×200 (cleared the prior fetch_vital_signs deadlock,
poller observed advancing a stuck execution, 0 fails / 0 restarts, Cloud SQL
~15 backends). Connection topology: server keeps its own small bounded noetl
pool; workers multiplex demo_noetl through a separate PgBouncer pool.
Make the Rust orchestrator hold up under high cursor concurrency and large tool
results. Two coupled problems surfaced validating test_pft_flow_v2 at scale on
kind, fixed together.
trigger_orchestrator reloaded and replayed the whole noetl.event log on
every completion — O(n) per trigger, O(n²) over a run. Under high cursor
concurrency the simultaneous full loads spiked memory and OOM-killed the server
(1Gi limit).
A per-execution OrchStateCache (src/state.rs) now holds the reconstructed
WorkflowState and advances it by applying only events newer than
last_event_id, behind a per-execution lock — one execution's completion
triggers serialise; different executions never contend. A count mismatch (a late
straggler event) falls back to a full rebuild for correctness; terminal
executions are evicted to free memory. Cursor frame tracking moved into
StepInfo (CursorFrame) so it survives incremental application instead of
being reconstructed by a full scan each trigger.
The worker stages tool results over its inline budget in the durable result
store and emits a {data:{_ref}} placeholder + a sibling reference block,
nested under the call.done envelope at result.context.result.reference. The
orchestrator never resolved those when reading events to build state — so a
cursor claim that returned its rows by reference looked like it returned zero
rows → 0-row frames → the loop re-claimed the same pending rows forever
(observed: 20,844 events, work-queue stuck at pending=5). Any result over the
default 100 KB budget would have hit this in production.
hydrate_result_references (src/handlers/events.rs) resolves the reference
(both nested and top-level envelope shapes) from noetl.result_store and
splices the data in place before from_events / evaluate_state see the
events, so extract_user_data / the cursor drive / build_context read it like
an inline result.
The worker side (worker#89) makes the
inline budget env-configurable (NOETL_EVENT_RESULT_CONTEXT_MAX_BYTES, default
100 KB) so ops can tune it and tests can force the reference path.
Note: the table actually written is
noetl.result_store(columnsresult_id, name, scope, source_step, data, …). The richernoetl.result_refschema (extracted,preview, store tiers) exists in DDL but is currently unused/empty — relevant to the follow-up below.
| Date | What | Pointers |
|---|---|---|
| 2026-06-19 |
🐞 #114 — oversized command.issued offload shipped (server v3.29.5); refs_in_state consume side (#101) is the remaining off-server-drive cutover blocker. Under the publish-only gate the off-server drive (refs_in_state=false) embeds the full resolved upstream context into the next step's command, so its command.issued event reached ~1.32MB > NATS max_payload (1MB) → publish ack-timeout → step.enter persisted, command never issued, wedge. Fix (server#242 → v3.29.5): a context over NOETL_COMMAND_CONTEXT_MAX_BYTES (512KB) is offloaded to noetl.result_store with a {__context_ref__} marker; get_command/claim_command resolve it before the worker sees the command (metrics context_offloaded/context_ref_resolved). apply_event reads only command.issued meta → rebuild/sole-writer/idempotency hold; within-budget commands unchanged. Rig phase 8 (e2e#64). Kind gate-ON: off-server rig PASS — new test_oversize_command_context COMPLETED, max command.issued ctx 585B, offload+resolve fired, 0 __orchestrate__ event rows, lag 0; all command.issued <1MB; 6 of #113's 9 fixtures COMPLETE. Chose ref-on-oversize over refs_in_state=true (candidate #1): a kind experiment proved the latter fixes the state-bloat (lease_expiry COMPLETES, drive-state 29KB) but breaks the bulk-consuming fixtures (storage_tiers/output_select fail at the bulk step — worker render-time ref-resolution not impl), so the default stays false. Remaining 3 fixtures + the off-server-drive cutover (#107/#111) now hinge on the refs_in_state consume side (#101) — __orchestrate__ drive-state bloat (17.4MB for storage_tiers) + the _ref/bulk-resolve gap. No prod default flipped. |
server#242 · e2e#64 · #114 · #101 |
| 2026-06-19 |
🚀 GKE pre-flip PREP — prod images pushed, GMP monitoring LIVE, roll-forward manifests staged. NO traffic flip, NO PUBLISH_ONLY. Read-only verification reconciled the prep brief against the live prod cluster: prod already runs the full Rust stack (the #49 cutover is done; noetl Service already selects app=noetl-server-rust; live images are pre-#103 batch-dispatch-v1/cursor-100), both flip secrets already exist (NOETL_ENCRYPTION_KEY + noetl-internal-api-token), and prod monitoring is Google Managed Prometheus, not VictoriaMetrics. Built + pushed the post-#103 images to the prod AR (server v3.29.3 @sha256:6d2de32… + worker v5.35.0, linux/amd64). Applied + verified GMP monitoring (ci/manifests/noetl/gmp/): PodMonitoring for the worker pools + server /metrics (the noetl namespace had none — GMP ignores prometheus.io/scrape annotations, so app metrics weren't scraped at all) + materializer-lag Rules (identical PromQL/thresholds to the kind VMRule); proven live via the Managed Prometheus query API (up{namespace="noetl"}=4 series, worker+server metrics flowing). Staged the roll-forward manifests (server→v3.29.3 PUBLISH_ONLY=false; system-pool→v5.35.0 MATERIALIZER_ENABLED=false) in a PR — not applied (they roll live workloads). Runbook gained a "Production (GKE)" section (GMP not VM, secrets present, image-roll prerequisite, exact operator sequence) + a GMP managedAlertmanager pager stub. Operator-gated remainder (surfaced, not done): roll the live images, enable the materializer shadow, wire the pager, then flip — one revert away. No prod default changed. |
ops PR · #103 · #49 |
| 2026-06-19 |
🛡️ Materializer-lag GUARDRAIL shipped — the pre-flip observability gate for PUBLISH_ONLY. The server was FLIP-READY; the one remaining operator gate was a materializer-lag metric + alert. Worker #116 → v5.35.0 extends the JetStream lag poller to track the noetl_events/noetl_materializer consumer on an independent task → noetl_worker_nats_consumer_pending{consumer="noetl_materializer"} (+_ack_pending) climbs even when the materializer loop is stalled/dead (it can't report its own lag). consumer_lag_for(stream,consumer) reuses the same JetStream connection + the existing gauge (no new metric). Ops #195+#196: VMRule (backlog warning>200/critical>2000/growing + stall-under-gate [guarded on backlog>0] + project-errors + absent-under-gate), worker /metrics VMServiceScrape (was unscraped), VMAlert enabled, Grafana dashboard, flip runbook noetl-cqrs-publish-only-flip.md (pre-flip green-baseline check + one-command revert). Kind-proven full cycle on the VM stack: green baseline (backlog 0, published==projected==acked) → induced lag (materializer fault-injected, events publishing under the gate) → gauge climbs 0→684 via the independent poller, alerts fire (backlog warning+critical + stall) → recover → drains→0 idempotently (0 dup/loss), alerts clear. Flip-readiness now includes the monitoring gate. Default-off. |
worker#116 · ops#195 · ops#196 · #103 |
| 2026-06-19 |
🎯 CQRS server cutover COMPLETE — FLIP-READY. The 2 ExecutionService cancel/finalize sites now route through the emit_event chokepoint (server#240 → v3.29.3, the third and final flip blocker). cancel (playbook_cancelled) + finalize (playbook_completed/playbook_failed) were the last synchronous server noetl.event writers under the gate. ExecutionService now carries AppState; resolve_catalog_id falls back noetl.event→noetl.command (mirrors #236, since noetl.event is empty under the gate for a fresh exec); require_state guards the pool-less test shim. Kind-proven both modes: gate-off cancel/finalize INSERT synchronously, byte-identical columns (error preserved), published delta +0, natural completion still COMPLETED; gate-on noetl_event_ingest_published_total{playbook_cancelled}=1/{playbook_failed}=1 (PUBLISHED, not inserted), materializer sole writer, both execs reach correct terminal state, rows==distinct, 0 catalog_id=0, no loss/dup. Dual-mode e2e rig kind_validate_cancel_finalize_gate.sh (e2e#62). No remaining synchronous server noetl.event writers under the gate. All three flip blockers closed → flipping PUBLISH_ONLY on is a staged operator decision. Default-off; no prod default changed. |
server#240 · e2e#62 · #103 |
| 2026-06-19 |
CQRS materializer ack-after-materialize durability RESOLVED + fault-tested (the last correctness item before the PUBLISH_ONLY flip). Deferred (ack-after-processing) ack in the subscription SourceClient/NATS source — AckMode::Defer surfaces the $JS.ACK reply subject as a durable handle, SourceClient::ack(ids, Ack/Nack/Term) disposes it later (tools#71). An in-process worker materializer consume-loop (worker#115, NOETL_MATERIALIZER_ENABLED, default off) drains noetl_events with deferred ack, POSTs events/project, and acks only on 2xx — on failure the batch redelivers (no loss). Chosen over playbook deferred-ack (step model can't hold an ack handle across pods/steps). System-pool wiring ops#194. Kind fault-injection (gate-on, sole writer): happy drained==projected==acked zero-dup; fault before ack → redeliver → materialize, loss=0, idempotent. Default-off; pointer bumps staged on the tools→worker crate-publish cascade. |
tools#71 · worker#115 · ops#194 · #103 |
| 2026-06-18 |
CQRS 2d-3 sole-writer cutover implemented, default-off. emit_event chokepoint + NOETL_EVENT_INGEST_PUBLISH_ONLY gate (server#235): 13 producer sites PUBLISH to noetl_events instead of INSERT under the gate → materializer is the sole writer; trigger relocates to events_project (read-your-writes); system/* drainers exempt (write synchronously, else deadlock). Kind: a gate-on exec writes 0 noetl.event rows (server no longer writes the log); gate-off byte-identical (25/25 regression + off-server e2e PASS). End-to-end single-exec completion needs a clean-cluster soak (shared kind saturated by accumulated test execs). 557 tests + clippy green. Operator-gated; no prod default flipped. |
server#235 · #103 |
| 2026-06-18 |
CQRS 2d shadow validated on kind; materializer proven sole-writer-capable; 2d-3 cutover designed + staged. The system/event_materializer + system/projector system playbooks land (ops#192, built on the in-flight feat/cqrs-2d1-materializer-playbook branch). Live on kind (tailer on): materializer reproduces the log byte-identically + idempotently — events/project of 25 real off-server-orchestrate rows → {projected:0,duplicates:25} (zero double-writes), a full playbook cycle → {projected:0,duplicates:20}; off-server orchestrate e2e green with the tailer on. Fixed a stall (batch: 500→25; drains over the 100 KiB inline budget stage to a _ref, breaking {{ drain_events.count }}). 2d-3 cutover (sole writer) mapped to a server-wide ~18-site event-write chokepoint refactor + default-off NOETL_EVENT_INGEST_PUBLISH_ONLY gate + orchestrator-trigger relocation (read-your-writes) — designed, operator-gated, not flipped. |
ops#192 · #103 |
| 2026-06-16 |
CQRS 2d-2 (server) DONE + live-validated. Shared normalize_event_to_row (extracted from handle_event_inner) + POST /api/internal/events/materialize (normalizes native producer events → idempotent batch-insert noetl.event, byte-identical to the synchronous path) — the foundation for the 2d-3 cutover. Plus a tailer fix: pre-skip events over NATS max_payload instead of wedging the cursor (caught live — orchestrator command.issued events are 5.4MB, the cursor fan-out's rendered context; flagged for 2d-3 + as a write-path cost). PFT 217 events green; 669 tests. |
server#206 · #103 |
| 2026-06-16 |
CQRS step 2 — phase 2b-2 (playbook) PR open: the system/projector. Bounded drain of noetl_events (tool: subscription, ack on_success) → aggregate distinct execution_ids → POST /api/internal/projection/advance; never touches noetl.* directly, routed to the system worker pool, driven by a CronJob that loops bounded drains ~55s/run (KEDA-on-pending-count noted as the production keep-up path). Server side extended to ensure the durable noetl_projector pull consumer on the stream. The full CQRS read path now exists end-to-end (gated off): worker/server → noetl.event → tailer → noetl_events → projector playbook → /projection/advance → projection_snapshot → orchestrator reads it. |
ops#189 · server#204 · #103 |
| 2026-06-16 |
CQRS step 2 — phase 2b-1 (server) PR open: projector owns projection_snapshot. Chosen path (vs a shadow noetl.projection): the system/projector owns the snapshot the orchestrator reads. POST /api/internal/projection/advance recomputes + saves each execution's snapshot via the block-b bounded-rebuild machinery (rebuild_state + orch_snapshot::save, no dispatch, idempotent); NOETL_PROJECTOR_OWNS_SNAPSHOT (default off) makes the orchestrator stop self-writing the snapshot and only read it. Default off preserves block-b/the OOM fix exactly — ownership transfer is a deliberate reversible flag, not a code-coupled atomic change. Stacked on the 2a tailer #202. 669 tests + clippy green. |
server#204 · server#203 · #103 |
| 2026-06-16 |
CQRS step 2 — phase 2a (producer) PR open (reworked from a DB trigger after review). A background tailer reads committed noetl.event rows by a persisted cursor (noetl.stream_cursor) and batch-publishes them onto a noetl_events JetStream stream (dedup by event_id) for the system/projector playbook (2b) to fold. Not a trigger (a trigger welds the producer to Postgres internals + doesn't survive a storage-type change) and not an in-process channel fed at the 17 insert sites (couples every emit path, loses in-flight on crash); the tailer reads committed rows so a restart re-scans an overlap the stream dedup collapses. Env-gated NOETL_EVENT_STREAM_ENABLED, default off, no-op without NATS. At the 2d cutover the worker publishes to the same stream and the tailer is deleted. 669 tests + clippy green; metrics noetl_event_stream_published_total + _cursor. |
server#202 · server#200 · #103 |
| 2026-06-16 |
(superseded) First 2a attempt used a noetl.event → noetl.outbox DB trigger (server#201); closed after review — trigger couples the producer to Postgres internals + per-event granularity is wrong. Replaced by the tailer above. |
server#201 (closed) |
| 2026-06-15 | Both PRs opened + kind-validated green. PFT test_pft_flow_v2 (1 facility × 5 patients, worker budget=256 forcing every result by reference) COMPLETED — 34 references / 9 steps stored+resolved, 292 events (vs 20,844 runaway pre-fix), all 25 work-queue rows done. server 666 tests pass, worker 19 pass, clippy clean. |
server#197 · worker#89 |
-
✅ CQRS
PUBLISH_ONLYflip-readiness — DONE, incl. the monitoring gate. The server is FLIP-READY. All three flip blockers are closed: (a) ack-after-materialize durability (deferred ack + worker materializer loop, done + fault-tested), (b) off-server-drive × gate reconciliation (#104, server v3.29.2cold_rebuild, proven), (c) the 2 ExecutionService cancel/finalize sites (server#240 → v3.29.3, kind-proven both modes). No remaining synchronous servernoetl.eventwriters under the gate. ✅ Materializer-lag guardrail SHIPPED (worker #116 v5.35.0 lag gauge on an independent poller + ops #195/#196 VMRule + worker scrape + VMAlert + dashboard + flip runbooknoetl-cqrs-publish-only-flip.md); kind-proven induce→fire→recover→clear. ✅ GKE pre-flip PREP landed (2026-06-19, staged): prod verified already-Rust (the #49 cutover is done; live images pre-#103), both flip secrets present, prod monitoring = Google Managed Prometheus (not VM). Pushed the post-#103 images to the prod AR (server v3.29.3@sha256:6d2de32…- worker v5.35.0); applied + verified GMP monitoring (PodMonitoring +
materializer-lag Rules, translated from the kind VMRule); staged the
roll-forward manifests (not applied). The remaining steps are all
operator-gated, in order: (1) roll the live system pool → v5.35.0 then the
live server → v3.29.3; (2) enable the materializer shadow
(
NOETL_MATERIALIZER_ENABLED=true, gate still off) — the green-baseline check; (3) wire the GMP managedAlertmanager pager (templated stub in the runbook); (4) flipNOETL_EVENT_INGEST_PUBLISH_ONLY=true, one revert away. No prod default changed; the gate stays default-off until that staged operator flip.
- worker v5.35.0); applied + verified GMP monitoring (PodMonitoring +
materializer-lag Rules, translated from the kind VMRule); staged the
roll-forward manifests (not applied). The remaining steps are all
operator-gated, in order: (1) roll the live system pool → v5.35.0 then the
live server → v3.29.3; (2) enable the materializer shadow
(
-
🐞 Off-server-drive cutover (#107/#111) — oversized-event class CLEARED (#114, server v3.29.5); now blocked on the refs_in_state consume side (#101).
The #114 oversized-
command.issued-event offload shipped (server#242 → v3.29.5): every command.issued event is < 1MB and 6 of #113's 9 large-context fixtures COMPLETE under the gate. The remaining 3 (test_output_select,test_storage_tiers,kind_playbook_lease_expiry) progress past the oversized-event wedge but hit DEEPERrefs_in_state=falseissues — the__orchestrate__drive-state bloats (the drive command'sWorkflowStatereached 17.4MB forstorage_tiers; ~1MB + a non-convergent loop forlease_expiry) and the_ref/bulk-resolve lazy-load gap (output_select). A kind experiment confirmedrefs_in_state=truefixes the bloat (lease_expirycompletes) but breaks the bulk-consuming fixtures because the worker render-time ref-resolution (the consume side) is not implemented. So the concrete next work is exactly #101's consume side: resolvenoetl://refs inrender_contextat the worker's tool-dispatch render time (+ cursor-claim handling) so over-budget results stay as refs in state. That unblocks the last 3 fixtures and the off-server-drive prod cutover. The #114 offload is an orthogonal safety cap that stays regardless. - Merge server#197 + worker#89; bump ai-meta pointers; update server wiki (results-by-reference resolution behavior) at pointer-bump time.
-
Follow-up (separate work, user-steered): extend the contract so
event.resultandcommand.contextcarry references only — never inline data — with anextractedpredicate-fields block on the reference for round-trip-freewhen:/set:evaluation (theresult_ref.extractedcolumn was designed for this). Open design questions: predicate eval via extracted fields vs. full resolve; field selection via explicitoutput_select:vs. auto-extract. Part of the user's projection + reference + step-container / heterogeneous-runtime model.
- Umbrella: Cursor / Claim Loop Mode — the cursor loop these fixes scale.
-
agents/rules/execution-model.md— "workers hydrate inputs from the shared cache" is the boundary results-by-reference resolution implements. -
noetl/server deployment-specification
· noetl/worker deployment-specification
(
NOETL_EVENT_RESULT_CONTEXT_MAX_BYTES).
Captured 2026-06-15 after the GKE db-g1-small benchmark + the session-vs-transaction pool-mode test. The write path — not compute — is the scaling wall.
At the small tier the PFT runs ~1.5 items/s, and the cost is round-trip count,
not CPU. Per patient: several throttled API calls + a data store + ~5 noetl.event
writes (lifecycle), all synchronous round-trips through PgBouncer to a 1-vCPU
Cloud SQL. The noetl.event log is the write-hot bottleneck, and beyond a
point batching can't fix it — event volume (per-patient, per-API-call) is
intrinsic to the workload when per-call granularity is needed for replay/audit/
error-handling (the whole point of test_pft_flow_v2). A small synchronous
Postgres is the wrong tool for high-volume append.
-
Playbook data (collected patient info →
demo_noetl). Batchable: hold a frame's results in the shared cache / GCS, write as one multi-rowINSERT. -
noetl.eventlog (platform event sourcing). The bottleneck — needs a log-optimized store, not bigger batches.
Split the write model (append-only event log) from the read model (state/projections), each in the store it's good at:
- Write side → NATS JetStream (already in the stack; no Kafka unless we outgrow it). Workers publish events here — durable, replicated, built for high-throughput sequential append. JetStream-ack is the commit point.
-
Projector → consumes JetStream in batches, folds into
noetl.projection/noetl.projection_snapshot(indexed, queryable) and archives cold events (Postgres archive table or GCS) for replay/audit at a relaxed cadence. -
Read side → the orchestrator reads the projection, never the raw event
stream. v3.9.0's snapshot-cached
WorkflowState(read bounded state, not full replay) is already the read-model half of this.
State + projections are not lost — they're derived by the projector.
The truth never lives in the worker. A worker-local buffer is fine as staging, but the durable commit is:
- Worker batch-publishes its events to JetStream; acks the command only after JetStream confirms. Crash after publish → events survive in the stream.
- Crash before publish → the command was never acked → the command queue
re-delivers it → the body re-runs, safely, because apply is
idempotent (cursor
cursor_issued/cursor_completedid-sets dedup).
So a step-instance runtime error costs a re-execution, never data.
- Eventual consistency — the projection lags the log by the projector's batch interval; the orchestrator must tolerate read-your-writes lag (or the projector commits the slice it just folded before evaluate).
-
Operational — a projector to run + JetStream retention / cold-event archive
policy. More moving parts, but the established CQRS shape, and NoETL already has
the substrate (
noetl.outbox,noetl.projection,system/projector).
-
Step 1 — cheap wins (no re-architecture). Frame-batch the cursor body (one
bulk data write + one control-plane event-set per frame, not per patient);
batch the server's
noetl.eventINSERTs (group-commit /POST /api/events/batch); a bigger Cloud SQL tier. Targets ~5–10× on the small tier. Scoped as the next concrete work. -
Step 2 — CQRS event-log split (#103, in progress).
Events → JetStream (write log) → batch projector → projection tables (read
model) → orchestrator reads projection. The natural extension of block-b
stage 2; where "the event log is the bottleneck" stops being true.
Refined shape (per
data-access-boundary.md): the projector is asystem/*catalog playbook on the system worker pool, not bespoke Rust — it consumes thenoetl_eventsJetStream stream in batches and folds the read model via the server internal API (/events/project, already shipped). Phased: 2a producer — a background event-log → JetStream tailer (not a DB trigger; storage-agnostic queue, batched), PR server#202, default off · 2bsystem/projectorplaybook (JetStream batch consumer → projection transaction) · 2c orchestrator reads projection · 2d cutover (worker publishes to the same stream, drop the synchronousnoetl.eventINSERT + the tailer).
- Home — overview
- Repo Map
- Releases
- Sessions Log
- Secrets Wallet (#61) — SECURITY (design)
- Rust Server Port (#49) — PRIMARY
- Decoupled Context + Event Chain (#115) — RFC (design), reframes #101
- Orchestrator Scaling (#101) — reframed by #115; consume side = #115 Phase 1
- Event WAL + Derivable Storage (#104) — Round 01 (locator) PR open
- WASM Plug-in Compilation (#105) — system-pool plug-in hot-reload (ADR Phase 4)
- System Pool Design (#46) — PRIMARY
- Regression Baseline Migration (#98) — e2e
- Subscription / Listener Tool (#90) — RFC
- Container Tool Callback (#43)
- Rust Worker Parity Gaps (#47 · #48)
- Event Envelope Reconciliation (#51 in TaskList)
- Cursor Loop Mode (#100) — server v3.8.0 + tools v3.10.1, 2026-06-15
- Transfer Tool Credentials (#99) — tools v3.10.0 + worker v5.22.0, 2026-06-14
- Explicit Input Binding (#77) — v3.0.0 shipped 2026-06-09
- Rust Worker Migration (#30)
- Python Services → Rust (#45)
- Issue Tracking
- Wiki Convention
- Handoffs
- Deployment Validation
- Execution Model
- Data Access Boundary
- Observability
- noetl/noetl wiki — app + DSL
- noetl/server wiki — Rust control plane
- noetl/worker wiki — Rust pull worker
- noetl/tools wiki — tool registry crate
- noetl/cli wiki — CLI + local mode
- noetl/gateway wiki — gatekeeper
- noetl/ops wiki — Helm + manifests
- noetl/travel wiki — domain SPA reference
- Docs site — engineer-facing architecture