Anvil v0.4.0 — Reliable runtime & resilient resume
Every agent call now rides one reliability engine — retry with backoff, model fallback, and a per-provider circuit breaker — and durable runs resume cleanly across process restarts, even across providers.
Reliability — one router for every agent call
LlmRouter.runAgentis now the single reliability engine for every agent spawn (CLI + dashboard), via a shared process-wide router: per-error-class retry with backoff + jitter, a per-provider circuit breaker, unified error classification, and a spend ledger. Replaces the ad-hocrunWithChainFallback.- Chain fallback that backs off. A fall-back-eligible failure (
rate_limit/server_5xx/timeout/model_unavailable) burns that model, backs off per error class, trips the breaker if it keeps failing, and walks to the next model in the chain. Terminal failures (auth/content_policy/invalid_request) surface immediately. - New
model_unavailableclass — a phantom model id or a credit-balance error now burns one chain rung instead of failing the whole stage. - Tunable via
~/.anvil/models.yaml→walker.retry/walker.circuit_breaker. - Failure observability — model burns surface in the dashboard activity feed, e.g.
minimax-m2.5: timeout (status 0) — backing off 0.5s before next model.
Durable cross-vendor resume
- Turn-level durable recording — each agent turn is recorded as sub-effects, so resuming a run replays prior turns (no fresh upstream calls) and continues from where it stopped — even on a different provider (prefill continuation).
- Resume actually works now. Fixed a
DeterminismViolationErrorthat aborted every resume at turn 0: the turn prompt embeds live-queried Knowledge-Base/memory content that isn't byte-stable across a process restart, so it's no longer hashed into a fatal determinism guard — the recorded transcript is authoritative on replay. - Bounded durable log — tool-result payloads are capped (64 KiB default;
ANVIL_DURABLE_MAX_TOOL_RESULT_BYTESto override) so large file/bash dumps don't block the event loop while persisting.
Dashboard
- Clarify Q&A feels instant — your answer renders immediately on submit (optimistic echo) instead of appearing bundled with the next question.
- Clarify no longer hangs after the first answer — the WebSocket send-channel is restored on reconnect.
- Faster pipeline start — provider liveness eagerly probes the selected model first and sweeps the rest in the background, removing the startup stall.
Internal / breaking
runWithChainFallback/isRetryableUpstreamErrorwere removed from@esankhan3/anvil-core-pipeline; the agentic chain walk now lives in@esankhan3/anvil-agent-coreasLlmRouter.runAgent. All eight model adapters are presented to the router uniformly via alegacyAdapterToLanguageModelshim. (Relevant only if you imported those symbols directly.)- New ADR:
docs/TURN-LEVEL-DURABLE-RESUME-ADR.md. Comment-hygiene convention added to every package'sCLAUDE.md.
Published packages
@esankhan3/anvil-agent-core@0.4.0@esankhan3/anvil-core-pipeline@0.4.0@anvil-dev/dashboard@0.4.0@esankhan3/anvil-cli@0.4.0
knowledge-core, memory-core, convention-core, and code-search-mcp stay at 0.3.0 (no functional changes this release).
Upgrade
npm install -g @esankhan3/anvil-cli@0.4.0Full Changelog: v0.3.0...v0.4.0