v0.9.5

github-actions released this 06 Mar 22:06

· 717 commits to main since this release

ea076fb

Changed

Behavior soak hardening: scripts/run-agent-behavior-soak.py now includes regression checks for filesystem capability truthfulness, subagent capability response quality, and affirmative continuation quality, with rubric updates to score substantive outcomes over brittle phrase matching.
Roadmap/release traceability: docs/releases/v0.9.5.md and docs/ROADMAP.md updated with current v0.9.5 prep status for speculative execution, browser runtime support, CLI skill roadmap slice, and behavior continuity validation.
Architecture documentation: Added explicit v0.9.5-prep control/dataflow coverage for deterministic execution shortcuts and guarded response sanitization in docs/architecture/ironclad-dataflow.md and docs/architecture/ironclad-sequences.md.
Browser runtime continuity: Browser action execution now attempts a single stop/start session recovery when CDP disconnect/closed-socket errors are detected, limited to idempotent actions to avoid duplicate side effects on replay.
Autonomy turn-budget controls: Added configurable agent-level ReAct budget controls (autonomy_max_react_turns, autonomy_max_turn_duration_seconds) and wired enforcement into the runtime loop.
CLI adapter response contract: run_script now emits stable typed metadata (adapter, schema_version, status, error_class) and normalized script error classes for downstream handling.
Speculative policy invariants: Added explicit test coverage enforcing Safe-only speculative eligibility (Caution/Dangerous/Forbidden remain excluded from speculative execution).

Fixed

Internal protocol fallback leakage: response sanitization no longer surfaces protocol-placeholder fallback text; empty/degraded sanitized content now resolves through deterministic user-facing quality fallback.
Markdown count execution reliability: execution shortcut path now handles recursive markdown-file count prompts deterministically, including strict numeric-only responses when requested (count only / only the number style prompts).
Delegation shortcut boundary: markdown-count shortcut no longer hijacks explicitly delegated prompts, preserving delegation intent handling.
Speculative branch cleanup safety: introduced RAII speculation slot guards and abort-path tests to guarantee no slot leakage when speculative tasks are canceled.
CLI skill sandbox isolation coverage: added explicit tests that secret env vars are stripped while only allowlisted runtime vars are propagated under skills.sandbox_env=true.

Assets 9