Agentao 0.4.7
Agentao 0.4.7
A diagnostics + hook decisions + observability release on top of 0.4.6.
No public Python API or wire-format change — all telemetry additions are
optional fields on existing transport/replay events, and the new CLI surface
is two non-interactive subcommands that never instantiate an agent. The
PreToolUse plugin hook gains decision capability but reuses the existing
PermissionDecisionEvent and confirmation path; no host-schema bump.
Everything upgrades in place via pip install -U agentao.
The headline:
agentao doctor+agentao config validate— two new
non-interactive subcommands that aggregate or validate the harness's
existing signals.--jsonis the contract surface for CI/hosts; both are
read-only (probing an absentmemory.dbreports"absent"instead of
bootstrapping it) and reject unknown flags with exit2. Output:
{"ok": bool, "sections": {...}, "findings": [...]}. Errors exit1,
warnings keep exit0.PreToolUseplugin hooks become decision-capable. A hook returning
hookSpecificOutput.permissionDecision: "deny"cancels the tool call;
"ask"flips the plan to the existing confirmation path;"allow"is a
no-op. The dispatch moved to Phase 1.5 ofToolRunnerso the
PermissionDecisionEventprecedes any toolstartedevent — the host
ordering contract holds without after-the-factcancelledevents.- Telemetry P1 increment.
LLM_CALL_COMPLETEDcarries
model_latency_ms(stable alias ofduration_ms) andfirst_token_ms
(TTFT).TURN_ENDcarriestool_count. All optional, all backwards-
compatible — no schema bump. /compactmanual-compaction REPL command. Triggers
compress_messages(is_auto=False)on demand instead of waiting for the
auto-compaction threshold.AGENTAO_WEB_FETCH_FALLBACKopt-in for JS-rendering fallback in
web_fetch. Previous behavior auto-routed throughcrawl4aiwhenever it
was installed; that broke the audit contract (the confirmation prompt
showed a different URL than the actual outbound request). Now: explicit
none(default) /jina/crawl4ai.
Why this release
Two threads landed in the 2026-05-09 → 2026-05-16 window:
-
The Codex reverse-review pull
(docs/design/codex-reverse-review.md). A reverse review of the
2026-05-12 / 2026-05-17 Codex changes against Agentao's current
architecture. The conclusion was that most of Codex's apparent
architectural work is driven by Codex-specific scale (app-server, remote
environments, multiple workspaces, Windows enterprise sandboxing, broader
extension surface) and Agentao should not copy that weight. The narrow,
actionable scope:- P0 (resolved 2026-05-12): make
PreToolUsehooks minimally
decision-capable — close the existing fire-and-forget hole without
introducing a new hook policy subsystem. - P1 (resolved 2026-05-12): a small telemetry increment
(model_latency_ms,first_token_ms,tool_count, stable
compaction-duration placement) — fields hosts need to debug
integrations without a new SQLite telemetry subsystem. - P0 follow-up (resolved 2026-05-16):
agentao doctorand
agentao config validate— two small wiring PRs over existing
signals, not a new app-server diagnostics subsystem.
- P0 (resolved 2026-05-12): make
-
Audit-surface tightening for
web_fetch. A[full]install used to
silently rewireweb_fetchto route throughcrawl4ai. That violated
the principle that the confirmation prompt and host-visible URL must
match the actual outbound request (it's how an embedded host audits the
agent's network activity). 0.4.7 makes the fallback explicit via
AGENTAO_WEB_FETCH_FALLBACKand surfaces the actual destination on
every fallback result.
The five rounds of /codex:review validation on the diagnostics CLI also
caught real validation gaps that ship in this release as part of
config validate: MCP per-server shape checking, MCP nested
env/headers/args string validation, user/project MCP name-collision
warnings, replay shape validation, and rejection of max_instances ≤ 0
that from_mapping silently coerces away.
agentao doctor — health snapshot
agentao doctor
agentao doctor --jsonAggregates every health signal the harness already produces into one
operator-facing report — without instantiating an agent.
Sections in the JSON output:
settings—.agentao/settings.jsonshape and parse status (path,
keys, status)provider—LLM_PROVIDER, API-key presence (boolean, never the
value), model, base URL; parse status ofLLM_TEMPERATURE/
LLM_MAX_TOKENSwhen setpermissions— user-scope rule count, loaded sources; warns when a
stray project-scopepermissions.jsonexists (ignored at runtime)mcp— server counts per scope, plus shape validation (non-object
entries, non-stringenv/headers/args) and user/project name-
collision warningsreplay— effectiveReplayConfigafter coercion; also flags raw
values thatfrom_mappingwould silently swallowacp_schema—hostevents + ACP schema export succeeds (regression
signal for the contract-snapshot path)memory— project + user SQLite memory stores opened read-only
(file:<path>?mode=ro&uri=true); reports"absent"instead of
bootstrapping a missing DBplugins— full plugin diagnostics via the new
collect_full_plugin_diagnostics()helper (runs the same post-load
registration simulation thatagentao plugin listuses, so the two
cannot drift on which plugins they consider failed)optional_deps—find_specprobe ofrich,prompt_toolkit,
readchar,mcp,openai,httpx
Output contract:
{"ok": bool, "sections": {...}, "findings": [...]}ok=falseiff ≥1 finding has"level": "error"; that case exits1- Warnings keep
ok=trueand exit0(so a missing API key on a fresh
clone doesn't trip CI) - Each finding carries
level("info"/"warning"/"error"),
area,message, andsource(path or env-var label) - Unknown flags fail with exit
2(mirrorsagentao run), so a CI typo
surfaces immediately
Redaction: the API-key value never appears in any output — tests assert
this against OPENAI_API_KEY=sk-test-....
agentao config validate — configuration-only check
agentao config validate
agentao config validate --jsonThe narrower companion to doctor. Validates only user-editable
configuration (no plugin / ACP / optional-dep section). Surfaces problems
the runtime would otherwise swallow silently:
- malformed JSON or non-object top-level shapes in
settings.json,
permissions.json,mcp.json LLM_TEMPERATURE/LLM_MAX_TOKENSthat fail to parse- per-server MCP entries that aren't objects (
{"bad": "oops"}would
crash_expand_config_envat runtime) - non-string MCP
env/headers/argsvalues (expand_env_vars
raisesTypeErroron them) - user/project
mcp.jsonname collisions (the runtime drops project
entries silently with a log warning — validate surfaces the same) - replay block values that
ReplayConfig.from_mappingsilently coerces
away: non-objectreplay,max_instances < 1, non-bool capture flags,
unknown capture-flag keys
Runtime behavior is unchanged — the factory stays best-effort so an
embedded host with a bad optional config isn't blocked from running.
config validate is the explicit surface for users who want their config
problems to be loud.
Decision-capable PreToolUse hooks
Before 0.4.7, PreToolUse hooks were observational only — the dispatcher
returned attachment records that the executor discarded, and the dispatcher
never parsed permissionDecision out of hook stdout (only Stop did).
Hooks could not deny a tool call, ask for confirmation, or have host /
replay consumers see a permission decision caused by the hook.
0.4.7 closes that hole. Supported output shape (Claude Code-compatible):
{
"hookSpecificOutput": {
"permissionDecision": "allow",
"reason": "optional human-readable reason",
"additionalContext": "optional structured context"
}
}MVP behavior:
"deny"cancels the current tool call; produces a
PermissionDecisionEventwithreason = "pre-tool-hook[: <hook reason>]"
andmatched_rule = None"ask"flips anALLOWplan toASKand goes through the existing
Phase 2 confirmation path (subject toallow_all_toolslike any other
prompt — a hookaskdoes not bypass full-access)"allow"is a no-op; never downgrades an enginedeny/ask- Multi-hook merge: first
denywins, then firstask, otherwise
continue additionalContextis parsed and recorded as hook output but is not
injected into the model prompt (out of scope for the MVP)
Event ordering — the meaningful change beyond decision parsing:
- Hook dispatch moved to Phase 1.5 of
ToolRunner, before the
PermissionDecisionEventemit and before any toolstartedevent - The host ordering contract (
PermissionDecisionEventprecedes
ToolLifecycleEvent(started)for the sametool_call_id) holds without
an after-the-factcancelledevent - A
PLUGIN_HOOK_FIREDreplay event withhook_name: "PreToolUse"is
emitted for parity with the other hook sites
Out of scope for 0.4.7:
- Exit-code-2 "block" parity (Claude Code honors stderr → model on
exit2; MVP supports only the JSONpermissionDecisionshape) additionalContextmodel-prompt injectionupdatedInput/ argument rewriting
Telemetry P1 increment
Three small additions, all as optional fields on existing transport /
replay event payloads — no new event types, no public host-schema
bump:
LLM_CALL_COMPLETED.model_latency_ms— stable intent-named alias of
the existingduration_msalready on the event. Same number; the
intent-named field is what hosts should subscribe to going forward.LLM_CALL_COMPLETED.first_token_ms— TTFT. The monotonic timestamp
of the first streamed text chunk reachingon_text_chunk, minus call
start.Nonefor tool-only responses or failures before the first
delta. Both the ok and error emit paths populate it. The streaming
callback was promoted from a lambda to a named closure so it can record
the first-chunk timestamp.TURN_END.tool_count— per-turn count of tool invocations.
run_turnresets a_turn_tool_countcounter; the chat loop bumps it
bylen(clean_tool_calls)after each tool batch;run_turnreads the
value on completion and includes it on theTURN_ENDpayload. The
replay adapter mirrors it onto theTURN_COMPLETEDreplay record.CONTEXT_COMPRESSED.duration_mswas already covered (populated by
every microcompaction / full-compaction call site, mirrored by the
replay adapter). 0.4.7 documents it as the stable compaction-duration
field for public/replay consumers.
Hosts subscribe via the transport LLM_CALL_COMPLETED /
TURN_END events — the same path cost/usage tracking already uses. The
fields are not re-exposed on agentao.host Pydantic models yet: the
host contract does not currently project LLM-call events at all, and
adding that surface is a larger decision than this telemetry increment.
Add a cross-layer trace id only if host integrations need LLM-to-MCP
stitching — not for 0.4.7.
/compact manual-compaction command
A new REPL command that triggers full history compaction
(compress_messages(is_auto=False)) on demand, without waiting for the
auto-compaction threshold. Useful when:
- the conversation has accumulated many tool results and the user wants to
reclaim context budget before the next turn - the user wants to test compaction shape during plugin / skill
development - the user is preparing the session for
--resumeand wants a smaller
serialized state
Handler in agentao/cli/commands/compact.py; documented in CLAUDE.md
under "CLI Commands" and in
developer-guide/{en,zh}/cli/7-context-status.md.
AGENTAO_WEB_FETCH_FALLBACK — web_fetch audit-surface fix
Behavior change. Before 0.4.7, installing the [full] extra silently
rewired web_fetch to route JS-rendered or failing fetches through a
local crawl4ai headless browser. Two problems:
- A
[full]install changed runtime behavior just by being present.
Embedded hosts that approved a fetch based on the URL shown in the
confirmation prompt could be making a different actual outbound request. - The principle the audit surface depends on — the confirmation prompt
and host-visible URL must match the actual outbound destination — was
violated.
0.4.7 makes the fallback explicit:
| Value | Behavior |
|---|---|
none (default) |
Direct httpx fetch only. On a JS-detected page the static shell is returned with a Note: line telling the LLM the content is likely incomplete. |
jina |
Falls back to Jina Reader. The URL is sent to a third-party service; this is disclosed in the tool description and in a Fallback: jina reader (https://r.jina.ai/<url>) line on every result. JINA_API_KEY (optional) adds Authorization: Bearer <key> for higher rate limits. |
crawl4ai |
Falls back to the local headless browser (requires pip install 'agentao[crawl4ai]' and playwright install chromium). |
Read once at WebFetchTool construction; invalid values warn and degrade
to none. Implementation note: the crawl4ai import moved from module
top to inside _fetch_with_crawl4ai() so [full] users no longer pay
the Playwright/Chromium import cost on web.py import.
Empty-content guard
#42 — the chat loop used to append an empty assistant message when the
model produced only a tool call and no text on the final iteration,
leaving a malformed history entry that some providers reject on the next
turn. Fixed by guarding the final assistant content push against empty
text in agentao/runtime/chat_loop/_runner.py.
What did not change
- Public Python API (
Agentao(...),events(),active_permissions(),
add_event_observer/remove_event_observer, the harness contract from
0.4.4): all unchanged. - Host event schema (
docs/schema/host.events.v1.json,
docs/schema/host.acp.v1.json): telemetry additions live on transport
events that the host contract does not project; no schema bump. - CLI exit-code table for
agentao runandagentao -p: unchanged
from 0.4.6 (0/1/2/3/4/130). The new diagnostics commands use the same
meanings (0ok,1errors,2invalid usage). - Permission engine semantics: hooks are layered over the engine via
the existing plan, not a parallel decision pipeline. A hook"allow"
never overrides an enginedeny/ask. - MCP runtime loader:
load_mcp_config()still silently ignores
collisions / malformed entries. The new validator surfaces them in
config validate; the runtime stays best-effort so an embedded host
with a bad optional config isn't blocked from running.
Migration notes
For most users this release upgrades in place. Two cases need attention:
-
If you relied on the implicit
crawl4aifallback inweb_fetch—
the[full]install no longer triggers it. Set
AGENTAO_WEB_FETCH_FALLBACK=crawl4aiin your.env(or shell env) to
restore the previous behavior. Embedded hosts that audit fetches based
on the confirmation-prompt URL should not enable this — the audit
surface now reflects the actual destination on every result, and the
defaultnonemode keeps the prompt-vs-actual match strict. -
If you wrote a
PreToolUseplugin hook expecting its stdout to be
discarded — that's still the default. The hook continues to be
observational unless it emits the
hookSpecificOutput.permissionDecisionJSON shape, in which case the
new decision pipeline kicks in. Existing hooks that print arbitrary
text are unaffected.
CI / host integrations that want to verify these surfaces before shipping
should add agentao doctor --json to their pipeline. It runs in <1s, has
no side effects on disk, and surfaces config drift before a runtime
failure.
Tests
tests/test_diagnostics_cli.py— 39 cases covering shape validation,
redaction (no API-key leakage), exit-code gates, the read-only memory
probe (regression test against bootstrapping an absent DB),
unknown-flag rejection, and the user/project MCP collision warning.tests/test_hooks_pre_tool_use_decision.py— dispatcher parsing/merge- runner Phase 1.5 wiring, including the deny/ask/allow precedence
and the no-rules fast path.
- runner Phase 1.5 wiring, including the deny/ask/allow precedence
tests/test_telemetry_increment.py— TTFT/latency on the ok, error,
and tool-only paths;tool_countonTURN_ENDand the replay mirror.- Full suite: 2609 passed, 2 skipped locally; CI matrix on
Python 3.10 / 3.11 / 3.12. - mypy
--strict --package agentao.hostclean. - Replay + host schema-drift checks pass.
Upgrade
pip install -U agentaoNo public Python API change, no wire-format break.
Out of scope (deferred)
These appear in docs/design/codex-reverse-review.md as deliberate
non-goals for 0.4.7. They will be reconsidered only when a real consumer
or bug report creates pressure:
- Permission profile rewrite. Agentao already has the lightweight
ActivePermissionssnapshot it needs today. - Structured
ToolOutputoverhaul. The current tool count and
consumers do not justify artifact/visibility metadata across all tools. - Large trace ID refactor. Existing
session_id/turn_id/
tool_call_id/call_idcovers most debugging. - Plugin marketplace / share checkout. The plugin system should
stabilize locally before distribution is expanded. - Exit-code-2
PreToolUseblock parity. JSONpermissionDecision
shape only in the MVP. PreToolUseadditionalContextinjection into the model prompt.- OS-level Windows deny-read implementation. Local-first and
macOS-first remains the sandbox focus. - Read-deny first-class modeling in
PermissionEngine. P2 work, not
blocking 0.4.7.