-
Notifications
You must be signed in to change notification settings - Fork 0
Umbrella Container Tool Callback
ai-task: noetl/ai-meta#43 · Opened: 2026-06-02 · Last update: 2026-06-02 · Status: In flight (design conversation; no implementation) · Parent umbrella: Rust Worker Migration (specifically R-3 Phase C-2)
Design the callback pattern that lets the Rust worker dispatch a container tool kind as a Kubernetes Job and resume the playbook when the Job's container completes — without holding a worker slot for the duration of the container run.
This is the canonical instance of the Callback / hook rule from execution-model.md:
A block must not hold a worker slot waiting for an external operation that takes more than a few seconds.
A container tool kind is fundamentally different from python,
http, postgres:
- Container startup is 5-30 seconds (image pull + scheduling).
- Container runtime can be seconds to hours (training jobs, long-running tasks).
- Holding a worker slot for that duration breaks the atomic-block model and starves real-time playbooks.
The fix: dispatch the K8s Job, release the worker slot
immediately, capture execution_id + step in the Job's annotations,
and arrange a callback that fires when the Job's container
exits.
-
Who watches the Job? Options:
- A separate operator pod with
watchaccess on Jobs in the cluster. - A sidecar in the worker that subscribes to a K8s informer.
- The Job itself emits the completion event back to the
server (e.g. via a
wait-and-postinit container or a postStop hook).
- A separate operator pod with
-
What's the resume signal? The server's
/api/eventsPOST with the Job's outcome → orchestrator picks up and continues the playbook. -
How to handle Job failure modes? OOMKilled, image pull
error, node lost, timeout. Each maps to a structured
call.donestatus. - Image source. Catalog stores image reference; how is it pinned / signed / cached?
- Resource requests + limits. Per-step config or per-tool defaults?
| Date | Event |
|---|---|
| 2026-06-02 | Issue filed as part of R-3 phase C-2 (after #42 closed with the agent-tool-kind routing decision). |
| 2026-06-02 | No implementation yet; design conversation ongoing. Connects to Umbrella: Python Services to Rust — the container watcher is a candidate for the four-binary noetl/server crate layout. |
- Pick the "who watches" model from the four options above.
Recommended: a separate
noetl-container-watcherDeployment that's part ofnoetl/servercrate (same image,--mode=container-watcherflag, similar to the four-binary shape proposed in Umbrella: Python Services to Rust). - Sketch the catalog entry shape for a container tool kind:
tool: kind: container image: gcr.io/my-project/long-running:v1.2.3 command: ["./run.sh"] env: [...] resources: requests: { cpu: 500m, memory: 1Gi } limits: { cpu: 2, memory: 4Gi } timeout_seconds: 3600
- Define the event-shape contract for callback resume.
- Validate on kind cluster with a trivial
sleep + echocontainer playbook before tackling real workloads.
- Umbrella: Rust Worker Migration — parent.
- Execution model rule — the design constraint.
- noetl-worker wiki: https://github.com/noetl/worker/wiki.
- noetl-tools wiki: https://github.com/noetl/tools/wiki.
- Home — overview
- Repo Map
- Releases
- Sessions Log
- Domain-Specific SLM Platform (#139) — RFC (design); travel#63 is the reference impl
- Secrets Wallet (#61) — SECURITY (design)
- Rust Server Port (#49) — PRIMARY
- Decoupled Context + Event Chain (#115) — RFC (design), reframes #101
- Orchestrator Scaling (#101) — reframed by #115; consume side = #115 Phase 1
- Event WAL + Derivable Storage (#104) — Round 01 (locator) PR open
- WASM Plug-in Compilation (#105) — system-pool plug-in hot-reload (ADR Phase 4)
- System Pool Design (#46) — PRIMARY
- Regression Baseline Migration (#98) — e2e
- Subscription / Listener Tool (#90) — RFC
- Container Tool Callback (#43)
- Rust Worker Parity Gaps (#47 · #48)
- Event Envelope Reconciliation (#51 in TaskList)
- Cursor Loop Mode (#100) — server v3.8.0 + tools v3.10.1, 2026-06-15
- Transfer Tool Credentials (#99) — tools v3.10.0 + worker v5.22.0, 2026-06-14
- Explicit Input Binding (#77) — v3.0.0 shipped 2026-06-09
- Rust Worker Migration (#30)
- Python Services → Rust (#45)
- Issue Tracking
- Wiki Convention
- Handoffs
- Deployment Validation
- Execution Model
- Data Access Boundary
- Observability
- noetl/noetl wiki — app + DSL
- noetl/server wiki — Rust control plane
- noetl/worker wiki — Rust pull worker
- noetl/tools wiki — tool registry crate
- noetl/cli wiki — CLI + local mode
- noetl/gateway wiki — gatekeeper
- noetl/ops wiki — Helm + manifests
- noetl/travel wiki — domain SPA reference
- Docs site — engineer-facing architecture