Skip to content

Agentao 0.4.9

Choose a tag to compare

@jin-bo jin-bo released this 10 Jun 09:56
· 13 commits to main since this release
cc0eb1c

A host tool injection + lifecycle robustness release on top of 0.4.8. The
headline is the completed construction-and-runtime tool-injection contract for
embedding hosts (extra_tools / disable_tools / enabled_tools +
add_tool / remove_tool), backed by a wave of lifecycle fixes that came out
of real embedding deployments: subprocess timeouts that actually kill the
whole process tree, ACP sessions that survive server shutdown and resume on
startup, sub-agents that construct correctly again, and skills whose relative
paths self-heal from any working directory. Everything upgrades in place via
pip install -U agentao; no public API, schema, or config break.

The headline:

  • Host tool injection, complete. Construction-time
    Agentao(extra_tools=..., disable_tools=...) and the additive dual
    enabled_tools= (allowlist), plus runtime add_tool() / remove_tool()
    for long-lived sessions — all routed through the same validation +
    capability binding, with reserved namespaces (mcp_, plan-mode tools)
    closed off.
  • Subprocess hardening. A shared run_captured() runner gives search,
    plugin hooks, and the shell executor own-process-group execution, explicit
    stdin handling, and kill-the-whole-tree-on-timeout semantics — no more
    wedged ACP turns from a grandchild holding the captured pipe.
  • ACP session lifecycle. Sessions persist on server shutdown (client
    closes stdin first so the save path runs), and agentao --acp --resume [SESSION_ID] reattaches the first session/new to a persisted session.
  • Skills work from any cwd. Activation reports the skill directory and
    the relative-path resolution rule, and enumerates scripts/ — fixing the
    "no scripts/ocr.py in my cwd" failure for every skill, including
    third-party ones (#83).

Why this release

0.4.8 finished the DeepChat/ACP integration arc; 0.4.9 consolidates what
embedding hosts hit next: controlling the tool surface and surviving the
process lifecycle
. The tool-injection trio came from hosts needing to ship
their own domain tools (or remove ours) without monkey-patching the registry.
The robustness fixes all trace to one theme — agentao embedded in another
process (ACP subprocess, chahua rooms, CI) cannot assume the CLI's cozy
single-cwd, interactive-restart world.

Host tool injection (extra_tools / disable_tools / enabled_tools / add_tool / remove_tool)

The embedded-host contract gains first-class tool-surface control
(#64, #65/#67, #68):

  • extra_tools=[...] registers pre-built Tool / AsyncToolBase instances
    as the true final pass — a same-named entry overrides a built-in or agent
    tool; injected tools inherit the same working-directory / filesystem /
    shell capability binding as built-ins. mcp_-prefixed names raise.
  • disable_tools={...} skips built-in registration with construction-time
    typo validation against BUILTIN_TOOL_NAMES.
  • enabled_tools={...} is the additive dual — declare the minimal set to
    keep; a built-in added in a future release can't silently leak in. Empty
    set is honored (is not None semantics); mutually exclusive with
    disable_tools. Scope is agentao-owned tools only (extra/MCP/plan always
    kept).
  • add_tool(tool, replace=False) / remove_tool(name) are the
    post-construction duals for hosts that mutate the surface between turns
    (long ACP sessions). The schema snapshot is per-call: changes apply on the
    next chat() / arun(), never mid-turn.
  • Tool, AsyncToolBase, RegistrableTool re-export from agentao.host;
    WebSearchTool(backend=..., api_key=...) constructor args beat env vars so
    two in-process instances can use different backends.

User-facing doc home: developer-guide §5.8 "Host Tool Injection" (EN+ZH).

Subprocess hardening (#73#74#75)

subprocess.run(timeout=) kills only the direct child. A grandchild that
inherits the captured pipe (Windows git credential helpers, a user hook
backgrounding a process) kept communicate() blocked past the timeout;
five parallel hangs saturated the tool pool and wedged the ACP-stdio turn
until the client dropped the connection.

capabilities/process.run_captured() is now the shared hardened runner:

  • child runs in its own process group / session;
  • stdin is explicit — input= over a pipe (hook payloads) or DEVNULL
    (search) so a child can never read the host's JSON-RPC channel;
  • timeout kills the whole tree via kill_process_tree() (taskkill /T on
    Windows, killpg(pid) elsewhere — never getpgid, which races a zombie
    child), then re-raises TimeoutExpired;
  • output decodes with errors="replace".

search_file_content and the plugin hook dispatcher route through it;
LocalShellExecutor.run keeps its streaming + inactivity-timeout loop but
shares the same kill_process_tree teardown.

ACP session lifecycle

  • Persist on shutdown (#78). AcpSessionState.close() saves the
    conversation (keyed by ACP sessionId, so session/load resumes it)
    before cancelling the turn token. The bundled acp_client closes the
    server's stdin first — EOF lets the read loop reach the save path — and
    only then escalates SIGTERM → SIGKILL. The CLI and ACP teardown now share
    one persist_agent_session() helper.
  • Resume on startup (#76). agentao --acp --resume [SESSION_ID]: a
    one-shot ResumeDirective makes the first session/new hydrate and
    replay the saved history and return the persisted sessionId. Permissive
    fallback — unknown/corrupt/already-active ids degrade to a fresh session
    with a WARNING, never a failed handshake.
  • Resume keeps the process model (#81). Sessions store only the model
    name, never its provider; rebinding the name onto the current provider
    could produce a (provider, model) pair that fails on the next LLM call.
    Both resume paths now keep the process-default model and show the saved
    name for reference.

Skills: relative paths self-heal (#83 / #85)

Any skill whose SKILL.md says uv run scripts/foo.py broke as soon as
cwd ≠ skill directory — which is the normal case (~/.agentao/skills/,
<project>/.agentao/skills/). Fixed once at the activation layer:

  • activate_skill and the per-turn skills context report
    Skill directory: <abs path> plus the rule that relative paths resolve
    against it, NOT the cwd;
  • scripts/ is enumerated alongside references/ / assets/ — absolute
    paths, hidden files and __pycache__ skipped, long listings truncated
    with an explicit marker;
  • plugin-command entries are exempted (a shared commands/ folder is not a
    skill directory); skill paths are resolved to absolute at load;
  • the bundled ocr example carries PEP 723 inline metadata, so uv run
    resolves its deps from any cwd (its requirements.txt is gone).

Skill authors don't need absolute paths or boilerplate — plain relative
references now work everywhere, including third-party skills.

Other fixes and improvements

  • Sub-agent construction fixed (#80). AgentToolWrapper._run_sync built
    the sub-Agentao(...) without the (required since 0.3.0)
    working_directory, so every sub-agent invocation raised TypeError. The
    parent's project root is threaded through, with a regression test.
  • MCP connect preflight (#71). A url in mcp.json pointing at a plain
    web page now fails in ≤5 s with an actionable NonMcpEndpointError
    instead of stalling connect_all() for the full 60 s SSE timeout.
    Allow-list based and best-effort — the real handshake stays authoritative.
  • Vision degradation format (#84). When a model rejects image input, the
    retry rewrites the turn with one <attachment uri= mimetype=/> tag per
    image instead of bracketed prose. Canonical format: dev-guide appendix A.1.
  • Six oversized modules split (#63). Agentao.__init__ phase helpers,
    cli/diagnostics/ package, ACP transport mixins, LLM logging mixin,
    acp_client errors module, hook-dispatcher output parsing — all
    behavior-preserving with every historical import re-exported.
  • docs/ reorganized (#80). Seven audience-oriented dirs (start/,
    guides/, reference/, design/, releases/, migration/,
    history/), kebab-case filenames; schema/ untouched (tests hardcode it).

What did not change

  • The agentao.host Python contract and the ACP schema snapshot
    (docs/schema/host.acp.v1.json) are unchanged.
  • Deprecations stay on schedule: agentao.harness (→ agentao.host) and
    the 8 legacy constructor callbacks still warn and are removed in 0.5.0.
  • Permission semantics, memory, replay, plugins: untouched.

Tests

Full suite green across the Python 3.10/3.11/3.12 CI matrix (2943 passed at
cut time), plus the clean-install smoke matrix, mypy --strict typing gate
on agentao.host, schema-drift check, and the 7 example-host smoke jobs.

Upgrade

pip install -U agentao        # library
pip install -U 'agentao[cli]' # interactive CLI

No migration needed from 0.4.8. If you embed agentao and were poking tools
into agent.tools post-construction, switch to extra_tools= /
add_tool() — the registry pokes keep working but bypass capability
binding and validation.

Out of scope (deferred)

  • MCP resources/prompts surfacing (demand-gated; codex-style host surface is
    the reference if demand lands).
  • ACP availableModes / currentModeId + current_mode_update (own design).
  • The 0.5.0 deprecation removals.