Release v0.4.0 · esanmohammad/Anvil

Anvil v0.4.0 — Reliable runtime & resilient resume

Every agent call now rides one reliability engine — retry with backoff, model fallback, and a per-provider circuit breaker — and durable runs resume cleanly across process restarts, even across providers.

Reliability — one router for every agent call

LlmRouter.runAgent is now the single reliability engine for every agent spawn (CLI + dashboard), via a shared process-wide router: per-error-class retry with backoff + jitter, a per-provider circuit breaker, unified error classification, and a spend ledger. Replaces the ad-hoc runWithChainFallback.
Chain fallback that backs off. A fall-back-eligible failure (rate_limit / server_5xx / timeout / model_unavailable) burns that model, backs off per error class, trips the breaker if it keeps failing, and walks to the next model in the chain. Terminal failures (auth / content_policy / invalid_request) surface immediately.
New model_unavailable class — a phantom model id or a credit-balance error now burns one chain rung instead of failing the whole stage.
Tunable via ~/.anvil/models.yaml → walker.retry / walker.circuit_breaker.
Failure observability — model burns surface in the dashboard activity feed, e.g. minimax-m2.5: timeout (status 0) — backing off 0.5s before next model.

Durable cross-vendor resume

Turn-level durable recording — each agent turn is recorded as sub-effects, so resuming a run replays prior turns (no fresh upstream calls) and continues from where it stopped — even on a different provider (prefill continuation).
Resume actually works now. Fixed a DeterminismViolationError that aborted every resume at turn 0: the turn prompt embeds live-queried Knowledge-Base/memory content that isn't byte-stable across a process restart, so it's no longer hashed into a fatal determinism guard — the recorded transcript is authoritative on replay.
Bounded durable log — tool-result payloads are capped (64 KiB default; ANVIL_DURABLE_MAX_TOOL_RESULT_BYTES to override) so large file/bash dumps don't block the event loop while persisting.

Dashboard

Clarify Q&A feels instant — your answer renders immediately on submit (optimistic echo) instead of appearing bundled with the next question.
Clarify no longer hangs after the first answer — the WebSocket send-channel is restored on reconnect.
Faster pipeline start — provider liveness eagerly probes the selected model first and sweeps the rest in the background, removing the startup stall.

Internal / breaking

runWithChainFallback / isRetryableUpstreamError were removed from @esankhan3/anvil-core-pipeline; the agentic chain walk now lives in @esankhan3/anvil-agent-core as LlmRouter.runAgent. All eight model adapters are presented to the router uniformly via a legacyAdapterToLanguageModel shim. (Relevant only if you imported those symbols directly.)
New ADR: docs/TURN-LEVEL-DURABLE-RESUME-ADR.md. Comment-hygiene convention added to every package's CLAUDE.md.

Published packages

@esankhan3/anvil-agent-core@0.4.0
@esankhan3/anvil-core-pipeline@0.4.0
@anvil-dev/dashboard@0.4.0
@esankhan3/anvil-cli@0.4.0

knowledge-core, memory-core, convention-core, and code-search-mcp stay at 0.3.0 (no functional changes this release).

Upgrade

npm install -g @esankhan3/anvil-cli@0.4.0

Full Changelog: v0.3.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Choose a tag to compare

Sorry, something went wrong.