Skip to content

Agentao 0.4.7

Choose a tag to compare

@jin-bo jin-bo released this 17 May 02:19
· 64 commits to main since this release
6611f70

Agentao 0.4.7

A diagnostics + hook decisions + observability release on top of 0.4.6.
No public Python API or wire-format change — all telemetry additions are
optional fields on existing transport/replay events, and the new CLI surface
is two non-interactive subcommands that never instantiate an agent. The
PreToolUse plugin hook gains decision capability but reuses the existing
PermissionDecisionEvent and confirmation path; no host-schema bump.
Everything upgrades in place via pip install -U agentao.

The headline:

  • agentao doctor + agentao config validate — two new
    non-interactive subcommands that aggregate or validate the harness's
    existing signals. --json is the contract surface for CI/hosts; both are
    read-only (probing an absent memory.db reports "absent" instead of
    bootstrapping it) and reject unknown flags with exit 2. Output:
    {"ok": bool, "sections": {...}, "findings": [...]}. Errors exit 1,
    warnings keep exit 0.
  • PreToolUse plugin hooks become decision-capable. A hook returning
    hookSpecificOutput.permissionDecision: "deny" cancels the tool call;
    "ask" flips the plan to the existing confirmation path; "allow" is a
    no-op. The dispatch moved to Phase 1.5 of ToolRunner so the
    PermissionDecisionEvent precedes any tool started event — the host
    ordering contract holds without after-the-fact cancelled events.
  • Telemetry P1 increment. LLM_CALL_COMPLETED carries
    model_latency_ms (stable alias of duration_ms) and first_token_ms
    (TTFT). TURN_END carries tool_count. All optional, all backwards-
    compatible — no schema bump.
  • /compact manual-compaction REPL command. Triggers
    compress_messages(is_auto=False) on demand instead of waiting for the
    auto-compaction threshold.
  • AGENTAO_WEB_FETCH_FALLBACK opt-in for JS-rendering fallback in
    web_fetch. Previous behavior auto-routed through crawl4ai whenever it
    was installed; that broke the audit contract (the confirmation prompt
    showed a different URL than the actual outbound request). Now: explicit
    none (default) / jina / crawl4ai.

Why this release

Two threads landed in the 2026-05-09 → 2026-05-16 window:

  1. The Codex reverse-review pull
    (docs/design/codex-reverse-review.md). A reverse review of the
    2026-05-12 / 2026-05-17 Codex changes against Agentao's current
    architecture. The conclusion was that most of Codex's apparent
    architectural work is driven by Codex-specific scale (app-server, remote
    environments, multiple workspaces, Windows enterprise sandboxing, broader
    extension surface) and Agentao should not copy that weight. The narrow,
    actionable scope:

    • P0 (resolved 2026-05-12): make PreToolUse hooks minimally
      decision-capable — close the existing fire-and-forget hole without
      introducing a new hook policy subsystem.
    • P1 (resolved 2026-05-12): a small telemetry increment
      (model_latency_ms, first_token_ms, tool_count, stable
      compaction-duration placement) — fields hosts need to debug
      integrations without a new SQLite telemetry subsystem.
    • P0 follow-up (resolved 2026-05-16): agentao doctor and
      agentao config validate — two small wiring PRs over existing
      signals, not a new app-server diagnostics subsystem.
  2. Audit-surface tightening for web_fetch. A [full] install used to
    silently rewire web_fetch to route through crawl4ai. That violated
    the principle that the confirmation prompt and host-visible URL must
    match the actual outbound request (it's how an embedded host audits the
    agent's network activity). 0.4.7 makes the fallback explicit via
    AGENTAO_WEB_FETCH_FALLBACK and surfaces the actual destination on
    every fallback result.

The five rounds of /codex:review validation on the diagnostics CLI also
caught real validation gaps that ship in this release as part of
config validate: MCP per-server shape checking, MCP nested
env/headers/args string validation, user/project MCP name-collision
warnings, replay shape validation, and rejection of max_instances ≤ 0
that from_mapping silently coerces away.

agentao doctor — health snapshot

agentao doctor
agentao doctor --json

Aggregates every health signal the harness already produces into one
operator-facing report — without instantiating an agent.

Sections in the JSON output:

  • settings.agentao/settings.json shape and parse status (path,
    keys, status)
  • providerLLM_PROVIDER, API-key presence (boolean, never the
    value), model, base URL; parse status of LLM_TEMPERATURE /
    LLM_MAX_TOKENS when set
  • permissions — user-scope rule count, loaded sources; warns when a
    stray project-scope permissions.json exists (ignored at runtime)
  • mcp — server counts per scope, plus shape validation (non-object
    entries, non-string env/headers/args) and user/project name-
    collision warnings
  • replay — effective ReplayConfig after coercion; also flags raw
    values that from_mapping would silently swallow
  • acp_schemahost events + ACP schema export succeeds (regression
    signal for the contract-snapshot path)
  • memory — project + user SQLite memory stores opened read-only
    (file:<path>?mode=ro&uri=true); reports "absent" instead of
    bootstrapping a missing DB
  • plugins — full plugin diagnostics via the new
    collect_full_plugin_diagnostics() helper (runs the same post-load
    registration simulation that agentao plugin list uses, so the two
    cannot drift on which plugins they consider failed)
  • optional_depsfind_spec probe of rich, prompt_toolkit,
    readchar, mcp, openai, httpx

Output contract:

  • {"ok": bool, "sections": {...}, "findings": [...]}
  • ok=false iff ≥1 finding has "level": "error"; that case exits 1
  • Warnings keep ok=true and exit 0 (so a missing API key on a fresh
    clone doesn't trip CI)
  • Each finding carries level ("info" / "warning" / "error"),
    area, message, and source (path or env-var label)
  • Unknown flags fail with exit 2 (mirrors agentao run), so a CI typo
    surfaces immediately

Redaction: the API-key value never appears in any output — tests assert
this against OPENAI_API_KEY=sk-test-....

agentao config validate — configuration-only check

agentao config validate
agentao config validate --json

The narrower companion to doctor. Validates only user-editable
configuration (no plugin / ACP / optional-dep section). Surfaces problems
the runtime would otherwise swallow silently:

  • malformed JSON or non-object top-level shapes in settings.json,
    permissions.json, mcp.json
  • LLM_TEMPERATURE / LLM_MAX_TOKENS that fail to parse
  • per-server MCP entries that aren't objects ({"bad": "oops"} would
    crash _expand_config_env at runtime)
  • non-string MCP env / headers / args values (expand_env_vars
    raises TypeError on them)
  • user/project mcp.json name collisions (the runtime drops project
    entries silently with a log warning — validate surfaces the same)
  • replay block values that ReplayConfig.from_mapping silently coerces
    away: non-object replay, max_instances < 1, non-bool capture flags,
    unknown capture-flag keys

Runtime behavior is unchanged — the factory stays best-effort so an
embedded host with a bad optional config isn't blocked from running.
config validate is the explicit surface for users who want their config
problems to be loud.

Decision-capable PreToolUse hooks

Before 0.4.7, PreToolUse hooks were observational only — the dispatcher
returned attachment records that the executor discarded, and the dispatcher
never parsed permissionDecision out of hook stdout (only Stop did).
Hooks could not deny a tool call, ask for confirmation, or have host /
replay consumers see a permission decision caused by the hook.

0.4.7 closes that hole. Supported output shape (Claude Code-compatible):

{
  "hookSpecificOutput": {
    "permissionDecision": "allow",
    "reason": "optional human-readable reason",
    "additionalContext": "optional structured context"
  }
}

MVP behavior:

  • "deny" cancels the current tool call; produces a
    PermissionDecisionEvent with reason = "pre-tool-hook[: <hook reason>]"
    and matched_rule = None
  • "ask" flips an ALLOW plan to ASK and goes through the existing
    Phase 2 confirmation path (subject to allow_all_tools like any other
    prompt — a hook ask does not bypass full-access)
  • "allow" is a no-op; never downgrades an engine deny/ask
  • Multi-hook merge: first deny wins, then first ask, otherwise
    continue
  • additionalContext is parsed and recorded as hook output but is not
    injected into the model prompt (out of scope for the MVP)

Event ordering — the meaningful change beyond decision parsing:

  • Hook dispatch moved to Phase 1.5 of ToolRunner, before the
    PermissionDecisionEvent emit and before any tool started event
  • The host ordering contract (PermissionDecisionEvent precedes
    ToolLifecycleEvent(started) for the same tool_call_id) holds without
    an after-the-fact cancelled event
  • A PLUGIN_HOOK_FIRED replay event with hook_name: "PreToolUse" is
    emitted for parity with the other hook sites

Out of scope for 0.4.7:

  • Exit-code-2 "block" parity (Claude Code honors stderr → model on
    exit 2; MVP supports only the JSON permissionDecision shape)
  • additionalContext model-prompt injection
  • updatedInput / argument rewriting

Telemetry P1 increment

Three small additions, all as optional fields on existing transport /
replay event payloads — no new event types, no public host-schema
bump
:

  • LLM_CALL_COMPLETED.model_latency_ms — stable intent-named alias of
    the existing duration_ms already on the event. Same number; the
    intent-named field is what hosts should subscribe to going forward.
  • LLM_CALL_COMPLETED.first_token_ms — TTFT. The monotonic timestamp
    of the first streamed text chunk reaching on_text_chunk, minus call
    start. None for tool-only responses or failures before the first
    delta. Both the ok and error emit paths populate it. The streaming
    callback was promoted from a lambda to a named closure so it can record
    the first-chunk timestamp.
  • TURN_END.tool_count — per-turn count of tool invocations.
    run_turn resets a _turn_tool_count counter; the chat loop bumps it
    by len(clean_tool_calls) after each tool batch; run_turn reads the
    value on completion and includes it on the TURN_END payload. The
    replay adapter mirrors it onto the TURN_COMPLETED replay record.
  • CONTEXT_COMPRESSED.duration_ms was already covered (populated by
    every microcompaction / full-compaction call site, mirrored by the
    replay adapter). 0.4.7 documents it as the stable compaction-duration
    field for public/replay consumers.

Hosts subscribe via the transport LLM_CALL_COMPLETED /
TURN_END events — the same path cost/usage tracking already uses. The
fields are not re-exposed on agentao.host Pydantic models yet: the
host contract does not currently project LLM-call events at all, and
adding that surface is a larger decision than this telemetry increment.
Add a cross-layer trace id only if host integrations need LLM-to-MCP
stitching — not for 0.4.7.

/compact manual-compaction command

A new REPL command that triggers full history compaction
(compress_messages(is_auto=False)) on demand, without waiting for the
auto-compaction threshold. Useful when:

  • the conversation has accumulated many tool results and the user wants to
    reclaim context budget before the next turn
  • the user wants to test compaction shape during plugin / skill
    development
  • the user is preparing the session for --resume and wants a smaller
    serialized state

Handler in agentao/cli/commands/compact.py; documented in CLAUDE.md
under "CLI Commands" and in
developer-guide/{en,zh}/cli/7-context-status.md.

AGENTAO_WEB_FETCH_FALLBACKweb_fetch audit-surface fix

Behavior change. Before 0.4.7, installing the [full] extra silently
rewired web_fetch to route JS-rendered or failing fetches through a
local crawl4ai headless browser. Two problems:

  • A [full] install changed runtime behavior just by being present.
    Embedded hosts that approved a fetch based on the URL shown in the
    confirmation prompt could be making a different actual outbound request.
  • The principle the audit surface depends on — the confirmation prompt
    and host-visible URL must match the actual outbound destination — was
    violated.

0.4.7 makes the fallback explicit:

Value Behavior
none (default) Direct httpx fetch only. On a JS-detected page the static shell is returned with a Note: line telling the LLM the content is likely incomplete.
jina Falls back to Jina Reader. The URL is sent to a third-party service; this is disclosed in the tool description and in a Fallback: jina reader (https://r.jina.ai/<url>) line on every result. JINA_API_KEY (optional) adds Authorization: Bearer <key> for higher rate limits.
crawl4ai Falls back to the local headless browser (requires pip install 'agentao[crawl4ai]' and playwright install chromium).

Read once at WebFetchTool construction; invalid values warn and degrade
to none. Implementation note: the crawl4ai import moved from module
top to inside _fetch_with_crawl4ai() so [full] users no longer pay
the Playwright/Chromium import cost on web.py import.

Empty-content guard

#42 — the chat loop used to append an empty assistant message when the
model produced only a tool call and no text on the final iteration,
leaving a malformed history entry that some providers reject on the next
turn. Fixed by guarding the final assistant content push against empty
text in agentao/runtime/chat_loop/_runner.py.

What did not change

  • Public Python API (Agentao(...), events(), active_permissions(),
    add_event_observer / remove_event_observer, the harness contract from
    0.4.4): all unchanged.
  • Host event schema (docs/schema/host.events.v1.json,
    docs/schema/host.acp.v1.json): telemetry additions live on transport
    events that the host contract does not project; no schema bump.
  • CLI exit-code table for agentao run and agentao -p: unchanged
    from 0.4.6 (0/1/2/3/4/130). The new diagnostics commands use the same
    meanings (0 ok, 1 errors, 2 invalid usage).
  • Permission engine semantics: hooks are layered over the engine via
    the existing plan, not a parallel decision pipeline. A hook "allow"
    never overrides an engine deny/ask.
  • MCP runtime loader: load_mcp_config() still silently ignores
    collisions / malformed entries. The new validator surfaces them in
    config validate; the runtime stays best-effort so an embedded host
    with a bad optional config isn't blocked from running.

Migration notes

For most users this release upgrades in place. Two cases need attention:

  1. If you relied on the implicit crawl4ai fallback in web_fetch
    the [full] install no longer triggers it. Set
    AGENTAO_WEB_FETCH_FALLBACK=crawl4ai in your .env (or shell env) to
    restore the previous behavior. Embedded hosts that audit fetches based
    on the confirmation-prompt URL should not enable this — the audit
    surface now reflects the actual destination on every result, and the
    default none mode keeps the prompt-vs-actual match strict.

  2. If you wrote a PreToolUse plugin hook expecting its stdout to be
    discarded
    — that's still the default. The hook continues to be
    observational unless it emits the
    hookSpecificOutput.permissionDecision JSON shape, in which case the
    new decision pipeline kicks in. Existing hooks that print arbitrary
    text are unaffected.

CI / host integrations that want to verify these surfaces before shipping
should add agentao doctor --json to their pipeline. It runs in <1s, has
no side effects on disk, and surfaces config drift before a runtime
failure.

Tests

  • tests/test_diagnostics_cli.py — 39 cases covering shape validation,
    redaction (no API-key leakage), exit-code gates, the read-only memory
    probe (regression test against bootstrapping an absent DB),
    unknown-flag rejection, and the user/project MCP collision warning.
  • tests/test_hooks_pre_tool_use_decision.py — dispatcher parsing/merge
    • runner Phase 1.5 wiring, including the deny/ask/allow precedence
      and the no-rules fast path.
  • tests/test_telemetry_increment.py — TTFT/latency on the ok, error,
    and tool-only paths; tool_count on TURN_END and the replay mirror.
  • Full suite: 2609 passed, 2 skipped locally; CI matrix on
    Python 3.10 / 3.11 / 3.12.
  • mypy --strict --package agentao.host clean.
  • Replay + host schema-drift checks pass.

Upgrade

pip install -U agentao

No public Python API change, no wire-format break.

Out of scope (deferred)

These appear in docs/design/codex-reverse-review.md as deliberate
non-goals for 0.4.7. They will be reconsidered only when a real consumer
or bug report creates pressure:

  • Permission profile rewrite. Agentao already has the lightweight
    ActivePermissions snapshot it needs today.
  • Structured ToolOutput overhaul. The current tool count and
    consumers do not justify artifact/visibility metadata across all tools.
  • Large trace ID refactor. Existing session_id / turn_id /
    tool_call_id / call_id covers most debugging.
  • Plugin marketplace / share checkout. The plugin system should
    stabilize locally before distribution is expanded.
  • Exit-code-2 PreToolUse block parity. JSON permissionDecision
    shape only in the MVP.
  • PreToolUse additionalContext injection into the model prompt.
  • OS-level Windows deny-read implementation. Local-first and
    macOS-first remains the sandbox focus.
  • Read-deny first-class modeling in PermissionEngine. P2 work, not
    blocking 0.4.7.