Agentao 0.4.7

A diagnostics + hook decisions + observability release on top of 0.4.6.
No public Python API or wire-format change — all telemetry additions are
optional fields on existing transport/replay events, and the new CLI surface
is two non-interactive subcommands that never instantiate an agent. The
PreToolUse plugin hook gains decision capability but reuses the existing
PermissionDecisionEvent and confirmation path; no host-schema bump.
Everything upgrades in place via pip install -U agentao.

The headline:

agentao doctor + agentao config validate — two new
non-interactive subcommands that aggregate or validate the harness's
existing signals. --json is the contract surface for CI/hosts; both are
read-only (probing an absent memory.db reports "absent" instead of
bootstrapping it) and reject unknown flags with exit 2. Output:
{"ok": bool, "sections": {...}, "findings": [...]}. Errors exit 1,
warnings keep exit 0.
PreToolUse plugin hooks become decision-capable. A hook returning
hookSpecificOutput.permissionDecision: "deny" cancels the tool call;
"ask" flips the plan to the existing confirmation path; "allow" is a
no-op. The dispatch moved to Phase 1.5 of ToolRunner so the
PermissionDecisionEvent precedes any tool started event — the host
ordering contract holds without after-the-fact cancelled events.
Telemetry P1 increment. LLM_CALL_COMPLETED carries
model_latency_ms (stable alias of duration_ms) and first_token_ms
(TTFT). TURN_END carries tool_count. All optional, all backwards-
compatible — no schema bump.
/compact manual-compaction REPL command. Triggers
compress_messages(is_auto=False) on demand instead of waiting for the
auto-compaction threshold.
AGENTAO_WEB_FETCH_FALLBACK opt-in for JS-rendering fallback in
web_fetch. Previous behavior auto-routed through crawl4ai whenever it
was installed; that broke the audit contract (the confirmation prompt
showed a different URL than the actual outbound request). Now: explicit
none (default) / jina / crawl4ai.

Why this release

Two threads landed in the 2026-05-09 → 2026-05-16 window:

The Codex reverse-review pull
(docs/design/codex-reverse-review.md). A reverse review of the
2026-05-12 / 2026-05-17 Codex changes against Agentao's current
architecture. The conclusion was that most of Codex's apparent
architectural work is driven by Codex-specific scale (app-server, remote
environments, multiple workspaces, Windows enterprise sandboxing, broader
extension surface) and Agentao should not copy that weight. The narrow,
actionable scope:
- P0 (resolved 2026-05-12): make PreToolUse hooks minimally
  decision-capable — close the existing fire-and-forget hole without
  introducing a new hook policy subsystem.
- P1 (resolved 2026-05-12): a small telemetry increment
  (model_latency_ms, first_token_ms, tool_count, stable
  compaction-duration placement) — fields hosts need to debug
  integrations without a new SQLite telemetry subsystem.
- P0 follow-up (resolved 2026-05-16): agentao doctor and
  agentao config validate — two small wiring PRs over existing
  signals, not a new app-server diagnostics subsystem.
Audit-surface tightening for web_fetch. A [full] install used to
silently rewire web_fetch to route through crawl4ai. That violated
the principle that the confirmation prompt and host-visible URL must
match the actual outbound request (it's how an embedded host audits the
agent's network activity). 0.4.7 makes the fallback explicit via
AGENTAO_WEB_FETCH_FALLBACK and surfaces the actual destination on
every fallback result.

The five rounds of /codex:review validation on the diagnostics CLI also
caught real validation gaps that ship in this release as part of
config validate: MCP per-server shape checking, MCP nested
env/headers/args string validation, user/project MCP name-collision
warnings, replay shape validation, and rejection of max_instances ≤ 0
that from_mapping silently coerces away.

`agentao doctor` — health snapshot

agentao doctor
agentao doctor --json

Aggregates every health signal the harness already produces into one
operator-facing report — without instantiating an agent.

Sections in the JSON output:

settings — .agentao/settings.json shape and parse status (path,
keys, status)
provider — LLM_PROVIDER, API-key presence (boolean, never the
value), model, base URL; parse status of LLM_TEMPERATURE /
LLM_MAX_TOKENS when set
permissions — user-scope rule count, loaded sources; warns when a
stray project-scope permissions.json exists (ignored at runtime)
mcp — server counts per scope, plus shape validation (non-object
entries, non-string env/headers/args) and user/project name-
collision warnings
replay — effective ReplayConfig after coercion; also flags raw
values that from_mapping would silently swallow
acp_schema — host events + ACP schema export succeeds (regression
signal for the contract-snapshot path)
memory — project + user SQLite memory stores opened read-only
(file:<path>?mode=ro&uri=true); reports "absent" instead of
bootstrapping a missing DB
plugins — full plugin diagnostics via the new
collect_full_plugin_diagnostics() helper (runs the same post-load
registration simulation that agentao plugin list uses, so the two
cannot drift on which plugins they consider failed)
optional_deps — find_spec probe of rich, prompt_toolkit,
readchar, mcp, openai, httpx

Output contract:

{"ok": bool, "sections": {...}, "findings": [...]}
ok=false iff ≥1 finding has "level": "error"; that case exits 1
Warnings keep ok=true and exit 0 (so a missing API key on a fresh
clone doesn't trip CI)
Each finding carries level ("info" / "warning" / "error"),
area, message, and source (path or env-var label)
Unknown flags fail with exit 2 (mirrors agentao run), so a CI typo
surfaces immediately

Redaction: the API-key value never appears in any output — tests assert
this against OPENAI_API_KEY=sk-test-....

`agentao config validate` — configuration-only check

agentao config validate
agentao config validate --json

The narrower companion to doctor. Validates only user-editable
configuration (no plugin / ACP / optional-dep section). Surfaces problems
the runtime would otherwise swallow silently:

malformed JSON or non-object top-level shapes in settings.json,
permissions.json, mcp.json
LLM_TEMPERATURE / LLM_MAX_TOKENS that fail to parse
per-server MCP entries that aren't objects ({"bad": "oops"} would
crash _expand_config_env at runtime)
non-string MCP env / headers / args values (expand_env_vars
raises TypeError on them)
user/project mcp.json name collisions (the runtime drops project
entries silently with a log warning — validate surfaces the same)
replay block values that ReplayConfig.from_mapping silently coerces
away: non-object replay, max_instances < 1, non-bool capture flags,
unknown capture-flag keys

Runtime behavior is unchanged — the factory stays best-effort so an
embedded host with a bad optional config isn't blocked from running.
config validate is the explicit surface for users who want their config
problems to be loud.

Decision-capable `PreToolUse` hooks

Before 0.4.7, PreToolUse hooks were observational only — the dispatcher
returned attachment records that the executor discarded, and the dispatcher
never parsed permissionDecision out of hook stdout (only Stop did).
Hooks could not deny a tool call, ask for confirmation, or have host /
replay consumers see a permission decision caused by the hook.

0.4.7 closes that hole. Supported output shape (Claude Code-compatible):

{
  "hookSpecificOutput": {
    "permissionDecision": "allow",
    "reason": "optional human-readable reason",
    "additionalContext": "optional structured context"
  }
}

MVP behavior:

"deny" cancels the current tool call; produces a
PermissionDecisionEvent with reason = "pre-tool-hook[: <hook reason>]"
and matched_rule = None
"ask" flips an ALLOW plan to ASK and goes through the existing
Phase 2 confirmation path (subject to allow_all_tools like any other
prompt — a hook ask does not bypass full-access)
"allow" is a no-op; never downgrades an engine deny/ask
Multi-hook merge: first deny wins, then first ask, otherwise
continue
additionalContext is parsed and recorded as hook output but is not
injected into the model prompt (out of scope for the MVP)

Event ordering — the meaningful change beyond decision parsing:

Hook dispatch moved to Phase 1.5 of ToolRunner, before the
PermissionDecisionEvent emit and before any tool started event
The host ordering contract (PermissionDecisionEvent precedes
ToolLifecycleEvent(started) for the same tool_call_id) holds without
an after-the-fact cancelled event
A PLUGIN_HOOK_FIRED replay event with hook_name: "PreToolUse" is
emitted for parity with the other hook sites

Out of scope for 0.4.7:

Exit-code-2 "block" parity (Claude Code honors stderr → model on
exit 2; MVP supports only the JSON permissionDecision shape)
additionalContext model-prompt injection
updatedInput / argument rewriting

Telemetry P1 increment

Three small additions, all as optional fields on existing transport /
replay event payloads — no new event types, no public host-schema
bump:

LLM_CALL_COMPLETED.model_latency_ms — stable intent-named alias of
the existing duration_ms already on the event. Same number; the
intent-named field is what hosts should subscribe to going forward.
LLM_CALL_COMPLETED.first_token_ms — TTFT. The monotonic timestamp
of the first streamed text chunk reaching on_text_chunk, minus call
start. None for tool-only responses or failures before the first
delta. Both the ok and error emit paths populate it. The streaming
callback was promoted from a lambda to a named closure so it can record
the first-chunk timestamp.
TURN_END.tool_count — per-turn count of tool invocations.
run_turn resets a _turn_tool_count counter; the chat loop bumps it
by len(clean_tool_calls) after each tool batch; run_turn reads the
value on completion and includes it on the TURN_END payload. The
replay adapter mirrors it onto the TURN_COMPLETED replay record.
CONTEXT_COMPRESSED.duration_ms was already covered (populated by
every microcompaction / full-compaction call site, mirrored by the
replay adapter). 0.4.7 documents it as the stable compaction-duration
field for public/replay consumers.

Hosts subscribe via the transport LLM_CALL_COMPLETED /
TURN_END events — the same path cost/usage tracking already uses. The
fields are not re-exposed on agentao.host Pydantic models yet: the
host contract does not currently project LLM-call events at all, and
adding that surface is a larger decision than this telemetry increment.
Add a cross-layer trace id only if host integrations need LLM-to-MCP
stitching — not for 0.4.7.

`/compact` manual-compaction command

A new REPL command that triggers full history compaction
(compress_messages(is_auto=False)) on demand, without waiting for the
auto-compaction threshold. Useful when:

the conversation has accumulated many tool results and the user wants to
reclaim context budget before the next turn
the user wants to test compaction shape during plugin / skill
development
the user is preparing the session for --resume and wants a smaller
serialized state

Handler in agentao/cli/commands/compact.py; documented in CLAUDE.md
under "CLI Commands" and in
developer-guide/{en,zh}/cli/7-context-status.md.

`AGENTAO_WEB_FETCH_FALLBACK` — `web_fetch` audit-surface fix

Behavior change. Before 0.4.7, installing the [full] extra silently
rewired web_fetch to route JS-rendered or failing fetches through a
local crawl4ai headless browser. Two problems:

A [full] install changed runtime behavior just by being present.
Embedded hosts that approved a fetch based on the URL shown in the
confirmation prompt could be making a different actual outbound request.
The principle the audit surface depends on — the confirmation prompt
and host-visible URL must match the actual outbound destination — was
violated.

0.4.7 makes the fallback explicit:

Value	Behavior
`none` (default)	Direct `httpx` fetch only. On a JS-detected page the static shell is returned with a `Note:` line telling the LLM the content is likely incomplete.
`jina`	Falls back to Jina Reader. The URL is sent to a third-party service; this is disclosed in the tool description and in a `Fallback: jina reader (https://r.jina.ai/<url>)` line on every result. `JINA_API_KEY` (optional) adds `Authorization: Bearer <key>` for higher rate limits.
`crawl4ai`	Falls back to the local headless browser (requires `pip install 'agentao[crawl4ai]'` and `playwright install chromium`).

Read once at WebFetchTool construction; invalid values warn and degrade
to none. Implementation note: the crawl4ai import moved from module
top to inside _fetch_with_crawl4ai() so [full] users no longer pay
the Playwright/Chromium import cost on web.py import.

Empty-content guard

#42 — the chat loop used to append an empty assistant message when the
model produced only a tool call and no text on the final iteration,
leaving a malformed history entry that some providers reject on the next
turn. Fixed by guarding the final assistant content push against empty
text in agentao/runtime/chat_loop/_runner.py.

What did not change

Public Python API (Agentao(...), events(), active_permissions(),
add_event_observer / remove_event_observer, the harness contract from
0.4.4): all unchanged.
Host event schema (docs/schema/host.events.v1.json,
docs/schema/host.acp.v1.json): telemetry additions live on transport
events that the host contract does not project; no schema bump.
CLI exit-code table for agentao run and agentao -p: unchanged
from 0.4.6 (0/1/2/3/4/130). The new diagnostics commands use the same
meanings (0 ok, 1 errors, 2 invalid usage).
Permission engine semantics: hooks are layered over the engine via
the existing plan, not a parallel decision pipeline. A hook "allow"
never overrides an engine deny/ask.
MCP runtime loader: load_mcp_config() still silently ignores
collisions / malformed entries. The new validator surfaces them in
config validate; the runtime stays best-effort so an embedded host
with a bad optional config isn't blocked from running.

Migration notes

For most users this release upgrades in place. Two cases need attention:

If you relied on the implicit crawl4ai fallback in web_fetch —
the [full] install no longer triggers it. Set
AGENTAO_WEB_FETCH_FALLBACK=crawl4ai in your .env (or shell env) to
restore the previous behavior. Embedded hosts that audit fetches based
on the confirmation-prompt URL should not enable this — the audit
surface now reflects the actual destination on every result, and the
default none mode keeps the prompt-vs-actual match strict.
If you wrote a PreToolUse plugin hook expecting its stdout to be
discarded — that's still the default. The hook continues to be
observational unless it emits the
hookSpecificOutput.permissionDecision JSON shape, in which case the
new decision pipeline kicks in. Existing hooks that print arbitrary
text are unaffected.

CI / host integrations that want to verify these surfaces before shipping
should add agentao doctor --json to their pipeline. It runs in <1s, has
no side effects on disk, and surfaces config drift before a runtime
failure.

Tests

tests/test_diagnostics_cli.py — 39 cases covering shape validation,
redaction (no API-key leakage), exit-code gates, the read-only memory
probe (regression test against bootstrapping an absent DB),
unknown-flag rejection, and the user/project MCP collision warning.
tests/test_hooks_pre_tool_use_decision.py — dispatcher parsing/merge
- runner Phase 1.5 wiring, including the deny/ask/allow precedence
  and the no-rules fast path.
tests/test_telemetry_increment.py — TTFT/latency on the ok, error,
and tool-only paths; tool_count on TURN_END and the replay mirror.
Full suite: 2609 passed, 2 skipped locally; CI matrix on
Python 3.10 / 3.11 / 3.12.
mypy --strict --package agentao.host clean.
Replay + host schema-drift checks pass.

Upgrade

pip install -U agentao

No public Python API change, no wire-format break.

Out of scope (deferred)

These appear in docs/design/codex-reverse-review.md as deliberate
non-goals for 0.4.7. They will be reconsidered only when a real consumer
or bug report creates pressure:

Permission profile rewrite. Agentao already has the lightweight
ActivePermissions snapshot it needs today.
Structured ToolOutput overhaul. The current tool count and
consumers do not justify artifact/visibility metadata across all tools.
Large trace ID refactor. Existing session_id / turn_id /
tool_call_id / call_id covers most debugging.
Plugin marketplace / share checkout. The plugin system should
stabilize locally before distribution is expanded.
Exit-code-2 PreToolUse block parity. JSON permissionDecision
shape only in the MVP.
PreToolUse additionalContext injection into the model prompt.
OS-level Windows deny-read implementation. Local-first and
macOS-first remains the sandbox focus.
Read-deny first-class modeling in PermissionEngine. P2 work, not
blocking 0.4.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentao 0.4.7

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Agentao 0.4.7

Why this release

`agentao doctor` — health snapshot

`agentao config validate` — configuration-only check

Decision-capable `PreToolUse` hooks

Telemetry P1 increment

`/compact` manual-compaction command

`AGENTAO_WEB_FETCH_FALLBACK` — `web_fetch` audit-surface fix

Empty-content guard

What did not change

Migration notes

Tests

Upgrade

Out of scope (deferred)

Uh oh!

Agentao 0.4.7

Agentao 0.4.7

Why this release

agentao doctor — health snapshot

agentao config validate — configuration-only check

Decision-capable PreToolUse hooks

Telemetry P1 increment

/compact manual-compaction command

AGENTAO_WEB_FETCH_FALLBACK — web_fetch audit-surface fix

Empty-content guard

What did not change

Migration notes

Tests

Upgrade

Out of scope (deferred)

Uh oh!

`agentao doctor` — health snapshot

`agentao config validate` — configuration-only check

Decision-capable `PreToolUse` hooks

`/compact` manual-compaction command

`AGENTAO_WEB_FETCH_FALLBACK` — `web_fetch` audit-surface fix