Skip to content

Agentao 0.4.8

Choose a tag to compare

@jin-bo jin-bo released this 31 May 01:53
· 40 commits to main since this release

Agentao 0.4.8

A DeepChat ACP-integration + multimodal release on top of 0.4.7. The
headline is the upstreamable core of the DeepChat / TensorChat (Electron)
ACP-backend patch — runtime provider/model switching that keeps credentials
off the ACP wire, a standards-track session/set_mode, and image input
that flows end-to-end from an ACP client (or the CLI) through chat() into
the model. All new ACP surface is additive; the legacy session/set_model /
session/list_models endpoints are kept unchanged for one release. The host
ACP schema snapshot (docs/schema/host.acp.v1.json) is bumped; the
agentao.host Python contract is unchanged. Everything upgrades in place
via pip install -U agentao.

The headline:

  • Secret-free ACP model/provider switching. A new ACP-standard
    session/set_config_option (configId="model") switches model/provider by
    identifier ("openai/gpt-4o"); credentials resolve server-side through
    a host-injectable provider_resolver and never travel on the wire or into
    agentao.log. Two independent layers reject any credential-bearing field
    (apiKey / baseUrl / _meta): a handler allowlist and extra="forbid"
    on the request schema.
  • Standards-track session/set_mode. The handler now reads the
    ACP-standard modeId (not the old non-standard mode), applies a
    PermissionEngine preset only on an exact match
    (read-only / workspace-write / full-access / plan), and
    persists + echoes any other value — so a client mode like DeepChat's
    code / ask round-trips instead of being rejected with -32602.
  • Multimodal image input through the turn. chat() / arun() accept
    images=[{"data": <base64>, "mimeType": <type>}, …]; ACP session/prompt
    accepts image content blocks; the CLI gains /image. Payloads are
    validated at the boundary (image/* only, valid base64, 20 MB/image, ≤16
    images/prompt) and the request logger summarizes image parts instead of
    dumping base64.
  • Structured ask_user. The tool gains optional
    header / options / multiple / allow_custom hints, so the model can
    offer a choice list while still allowing a typed custom answer. Fully
    backward-compatible with plain ask_user(question) and legacy 1-arg
    callbacks.
  • initialize extensions move under _meta. The initialize response
    now advertises _agentao.cn/ask_user under
    _meta["_agentao.cn/extensions"] instead of a non-standard top-level
    extensions array.
  • Robustness fixes. $HOME-unset environments no longer crash at import
    (agentao.paths.user_home()), and jieba's SyntaxWarning noise on the
    first Chinese-text recall is silenced at the import chokepoint.

Why this release

The work originates in the DeepChat ACP integration patch triage
(docs/design/deepchat-acp-patch-revision.md, drafted 2026-05-29). A ~405 KB
local-changes patch wired Agentao up as an ACP subprocess backend for a
DeepChat / TensorChat desktop chat UI. That patch bundled four unrelated
concerns — genuine harness capabilities, one wrongly-exposed capability,
DeepChat-only glue, and repo deletions that regressed main. The design
record separates them; only the A-series (upstream), B-series (rework

  • upstream), and D1 (restore the deleted acp_client tests) are main
    actions. 0.4.8 ships exactly those.

The single load-bearing decision was credential handling for runtime
provider/model switching
: the patch sent apiKey / baseUrl over the ACP
wire (and into agentao.log). 0.4.8 replaces that with the ACP-standard
session/set_config_option mechanism plus a server-side resolver, so the
secret never leaves the host process.

PR sequencing that landed:

  • PR-1/PR-2/PR-3 (A-series + paths) — multimodal image input, structured
    ask_user, and robust home-directory resolution.
  • PR-4/PR-5/PR-6 (B1/B2/B3) — the ACP core rework: secret-free
    set_config_option, standards-track set_mode (modeId), and
    _meta-scoped initialize extensions.
  • D1 — restored the tests/test_acp_client_* suite the patch had deleted
    (acp_client is live in main).

Secret-free model/provider switching

// client → agent
{"method": "session/set_config_option",
 "params": {"sessionId": "", "configId": "model", "value": "openai/gpt-4o"}}
  • The value is split on the first / (huggingface/meta-llama/Llama-3
    → provider huggingface, model meta-llama/Llama-3); a bare value with no
    / is a model-only switch that keeps the current provider.
  • Credentials are resolved server-side by a host-injectable
    provider_resolver; they never appear on the wire or in agentao.log.
  • The default resolver accepts only the configured LLM_PROVIDER
    ({PROVIDER}_API_KEY / _BASE_URL); any other provider id →
    INVALID_REQUEST. It never scans the environment for a provider list —
    multi-provider switching requires a host-injected resolver paired with a
    host-injected catalog
    (AcpServer(provider_resolver=…, model_catalog=…)).
  • session/new / session/load advertise the model configOptions
    (default catalog is the single current provider/model); a successful
    switch returns the refreshed configOptions in its response only — no
    config_option_update notification.
  • A vendor _agentao.cn/set_model ({sessionId, model}, free-form,
    secret-free, model-only) covers "type any model" UX a select can't
    express; it shares the core agent.set_model() path.
  • session/set_model and session/list_models are kept unchanged as
    one-release compatibility endpoints (retirement is PR-7, deferred).

New schema types: AcpSessionSetConfigOptionRequest/Response,
AcpConfigOption, AcpConfigOptionChoice,
AcpAgentaoSetModelRequest/Response, and configOptions on the
session/new / session/load responses.

session/set_mode — standard modeId, open values

A modeId is a UI/behavioural selector that need not be an Agentao
permission preset:

  • Exact match on read-only / workspace-write / full-access / plan
    applies the PermissionEngine preset (unchanged posture semantics).
  • Any other value → persisted and echoed without changing permission
    posture, so a client mode (code / ask) round-trips instead of a
    -32602 rejection.

AcpSessionSetModeRequest.modeId / AcpSessionSetModeResponse.modeId are now
open strings. The permission-axis split and
availableModes / currentModeId + current_mode_update remain deferred
to their own design.

Multimodal image input

End-to-end, decoupled from any specific ACP client:

  • Engine. chat() / arun() accept
    images=[{"data": <base64>, "mimeType": <type>}, …]. The user turn is
    emitted as an OpenAI-style multimodal content list (a text part plus one
    image_url part per image with an inline data: URL); text-only turns are
    unchanged. The request logger summarizes parts
    (image_url (N chars, inline base64)) rather than dumping the blob.
  • ACP. session/prompt accepts image content blocks; initialize
    advertises promptCapabilities.image: true. The untrusted payload is
    validated at the boundary: mimeType must be image/*, data must be
    valid base64 within a 20 MB per-image cap, and a prompt may carry at
    most 16 images — each violation is a clean -32602 INVALID_PARAMS.
    Only inline {data, mimeType} is accepted; any other key (uri, path,
    apiKey, _meta, …) is rejected both at the schema layer (extra="forbid")
    and in the runtime _parse_prompt allowlist, so the handler can never be
    coaxed into dereferencing a host path or smuggling a secret.
  • CLI. /image enforces the same size/count caps and re-validates the
    bytes it actually read
    (closing a TOCTOU gap against the earlier
    stat()); /clear and /new drop staged images so they cannot leak into
    the next session.
  • Model fallback. When a model rejects image input, the turn falls back
    to a text reference to the image instead of failing the call (#59).
  • Saving a session whose first user message is image+text derives its title
    from the text part instead of persisting an empty title.

New schema type: AcpImageContentBlock.

Structured ask_user

The ask_user tool accepts optional header / options / multiple /
allow_custom hints alongside the free-form question. The hints flow
through the Transport.ask_user contract to every transport:

  • CLI renders a numbered menu and accepts a number, comma-separated
    numbers (when multiple), or custom text — re-prompting when
    allow_custom is false and the entry isn't one of the options.
  • ACP forwards them on _agentao.cn/ask_user (host-agnostic plain-string
    options, not option-cards); AcpAskUserParams gains the matching fields.
  • The replay recorder captures them.

The reply stays a single string. Backward-compatible: a plain
ask_user(question) keeps its original wire/recording shape, and legacy 1-arg
Callable[[str], str] callbacks keep working — structured kwargs are
forwarded only to callbacks whose signature accepts them.

initialize extensions under _meta

The ACP-standard initialize response carries only protocolVersion /
agentCapabilities / agentInfo / authMethods; extension data belongs
under _meta. Agentao now returns its _agentao.cn/ask_user advertisement
under _meta["_agentao.cn/extensions"] (vendor-namespaced) instead of a
non-standard top-level extensions array. AcpInitializeResponse drops the
extensions field for an open _meta object (AcpInitializeMeta); a
schema-following host that still sends a top-level extensions is now rejected
(extra="forbid"). Agentao's own acp_client never read extensions, so no
client code changes.

Robustness fixes

  • Robust home-directory resolution when $HOME is unset.
    agentao.paths.user_home() is Path.home() with a fallback to
    $HOME / USERPROFILE and finally a private, per-user temp subdirectory
    (created 0700, ownership/permission-validated — a pre-existing
    world/group-accessible path is abandoned for a fresh mkdtemp). This fixes
    the import-time RuntimeError risk on stripped service accounts, some
    container/CI sandboxes, and headless ACP launches, where module-level
    ~/.agentao constants resolved Path.home() at import. Subsystem
    constructors still take explicit roots (unchanged from the Issue 5
    no-implicit-fallback contract).
  • Silenced jieba SyntaxWarning on first CJK recall. jieba 0.42.1 uses
    non-raw regex literals that Python 3.12 flags as
    SyntaxWarning: invalid escape sequence on first compile. Routed every
    jieba import in memory/retriever.py through a single _import_jieba()
    helper that mutes SyntaxWarning at the import chokepoint; no behavior
    change, other warnings untouched.

What did not change

  • Public Python API (Agentao(...), events(),
    active_permissions(), the harness contract): unchanged.
  • session/set_model / session/list_models: kept as one-release
    compatibility endpoints.
  • Permission posture semantics: a non-preset modeId needs no permission
    engine; a recognized preset still applies one.
  • CLI exit-code table for agentao run: unchanged (0/1/2/3/4/130).

Tests

  • Default suite: 2840 passed, 2 skipped locally; CI matrix on
    Python 3.10 / 3.11 / 3.12.
  • ACP coverage: test_acp_set_config_option.py,
    test_acp_session_set_model.py, test_acp_session_new/load,
    test_acp_initialize.py, plus the restored test_acp_client_* suite (D1).
  • The slow [full]-extras closure baseline
    (tests/data/full_extras_baseline.txt) was refreshed for transitive
    dependency version bumps — same package set, 122 packages.
  • Host + replay schema-drift checks pass; docs/schema/host.acp.v1.json
    re-generates with no diff.

Upgrade

pip install -U agentao

The agentao.host Python contract is unchanged. ACP clients gain new
optional methods; existing session/set_model / session/list_models
clients keep working.

Out of scope (deferred)

Per docs/design/deepchat-acp-patch-revision.md, these are deliberate
non-goals for 0.4.8:

  • PR-7 — retire the legacy session/set_model / session/list_models
    endpoints.
    Deferred to a later release; 0.4.8 keeps them for one-release
    compatibility.
  • Permission-axis split + availableModes / currentModeId +
    current_mode_update.
    A separate set_mode design, not required by
    DeepChat.
  • DeepChat-only glue and packaging (C-series) and the patch's repo
    deletions (D3).
    Fork-keep / drop verdicts — they do not come into main.