From 6666d7e47b8aa370465f776ea86487a99e9ec5fb Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Fri, 15 May 2026 16:23:48 -0700 Subject: [PATCH 1/4] Claude agent host: roadmap status sync + Phase 8.5 (rich tool-call rendering) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Mark Phases 5, 6, 7, 8 as DONE (implementations have shipped; just catching the headings up to reality). - Insert Phase 8.5 — Rich tool-call rendering parity with Copilot. Today the Claude permission card for Bash reads 'Run shell command' with no command shown; Bash/Grep/Glob rows render in the generic renderer instead of the dedicated terminal/search renderers. This phase ports Copilot's getInvocationMessage / getPastTenseMessage / getToolKind / getShellLanguage / getToolInputString shape into claudeToolDisplay.ts and wires them through the permission, mapper, and replay paths. Phase 6.5 (Fork) intentionally stays Deferred. --- .../platform/agentHost/node/claude/roadmap.md | 104 +++++++++++++++++- 1 file changed, 100 insertions(+), 4 deletions(-) diff --git a/src/vs/platform/agentHost/node/claude/roadmap.md b/src/vs/platform/agentHost/node/claude/roadmap.md index ee6f008600bce..2a0f7ca1fdf32 100644 --- a/src/vs/platform/agentHost/node/claude/roadmap.md +++ b/src/vs/platform/agentHost/node/claude/roadmap.md @@ -444,7 +444,7 @@ round-trip correctly. Exit criteria: a workbench client sees the Claude provider listed, can pick a Claude model, but can't yet send a message. -### Phase 5 — Session lifecycle: create / dispose / list / shutdown +### Phase 5 — Session lifecycle: create / dispose / list / shutdown ✅ **DONE** Implement the lifecycle methods that don't require live LLM traffic. **Provisional / materialize is the load-bearing model in this phase** @@ -520,7 +520,7 @@ restarts find materialised sessions; externally-created Claude Code sessions appear; agent host can shut down cleanly. Fork is deferred to Phase 6.5. -### Phase 6 — `sendMessage` + streaming progress events (single-turn, no tools) +### Phase 6 — `sendMessage` + streaming progress events (single-turn, no tools) ✅ **DONE** Wire the proxy + SDK from Phase 3 into a real session. **Port the lifecycle machinery from `claudeCodeAgent.ts`:** @@ -714,7 +714,7 @@ with restored sessions, and honors the workbench's "keep `[0..N]` INCLUSIVE" semantic. The reverted heuristic is **not** retained behind a flag. -### Phase 7 — Tool calls + permission + user input +### Phase 7 — Tool calls + permission + user input ✅ **DONE** Wire the SDK's tool-use loop through to the agent host's tool infrastructure. **Transcript-only in this phase** — file edit tracking is Phase 8. @@ -777,7 +777,7 @@ Exit criteria: a real "read this file" prompt completes end-to-end. — see roadmap.md` marker at the call site so the upgrade path stays discoverable. Implement when Phase N introduces multi-action plan UX. -### Phase 8 — File edit tracking +### Phase 8 — File edit tracking ✅ **DONE** Build the Claude analog of `fileEditTracker.ts` from `node/copilot/`. @@ -803,6 +803,102 @@ client-side accept of one and reject of the other behaves correctly. Exit criteria: file diffs render in the workbench; per-file accept/reject works. +### Phase 8.5 — Rich tool-call rendering parity with Copilot + +Claude's tool-call cards today only carry the static display name from +[`claudeToolDisplay.ts`](./claudeToolDisplay.ts) (`"Run shell command"`, +`"Find files"`, ...). Copilot's [`copilotToolDisplay.ts`](../copilot/copilotToolDisplay.ts) +formats the actual `tool_use.input` into the row title and tags the row +with a `toolKind` so the workbench renders terminal / search / +subagent specially. Phase 12 already laid the `_meta.toolKind: +'subagent'` half down; this phase finishes the parity for the rest of +the SDK's built-in tools. + +Gap surfaced live: a `Bash` permission card reads *"Run shell command"* +with no command attached, and `Bash` / `Grep` / `Glob` rows render in +the generic tool renderer instead of the dedicated terminal / search +renderers. + +Scope: + +- **Port the Copilot helper shape** into + [`claudeToolDisplay.ts`](./claudeToolDisplay.ts), keyed off the SDK's + `tool_use.input` schemas: + - `getClaudeInvocationMessage(toolName, displayName, input)` → + markdown that includes the actual params (`` Running `git status` ``, + `Reading [src/foo.ts](src/foo.ts)`, `` Searching for `pattern` ``, + `Fetching [https://...](https://...)`). + - `getClaudePastTenseMessage(toolName, displayName, input, success)` → + success/failure-aware past-tense (`` Ran `git status` ``, + `Read foo.ts`, `Searched for ...`); replaces the + `" finished"` hardcode at + [`claudeMapSessionEvents.ts:332`](./claudeMapSessionEvents.ts#L332). + - `getClaudeToolKind(toolName)` → `'terminal' | 'subagent' | + 'search' | undefined`. `Bash` / `BashOutput` / `KillBash` → + `'terminal'`; `Grep` / `Glob` → `'search'`; `Task` → + `'subagent'` (Phase 12 already does this; consolidate the call + site). + - `getClaudeShellLanguage(toolName)` → `'bash'` for the shell tools + (drives terminal renderer's syntax highlighting). + - `getClaudeToolInputString(toolName, input)` → the canonical + "input as code" string used for the code block under the row + (e.g. the multi-line `command` for `Bash`, the formatted + arguments for the rest). + - Per-tool input typings live alongside the helpers + (`IClaudeBashInput`, `IClaudeGrepInput`, ...), validated + defensively (Claude's input can be malformed across SDK + versions — fall back to the static display name on shape + mismatch). +- **Wire the helpers through both code paths**: + - [`claudeCanUseTool.ts`](./claudeCanUseTool.ts) — set + `invocationMessage` on `pending_confirmation` from the rich + helper so the **permission card shows the actual command / + file / pattern**, not just the display name. Add `toolKind` and + `language` to the signal so the card uses the terminal renderer + when relevant. + - [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts) — + set `invocationMessage` on `SessionToolCallReady` for the + non-interactive (auto-approved) path, set `pastTenseMessage` on + `SessionToolCallComplete`, and emit `_meta.toolKind` / + `_meta.language` on the `tool_use` block alongside the existing + `_meta.toolKind: 'subagent'` (single canonical path; Phase 12's + spawn helpers consume the same field). + - **Replay path** — `claudeReplayMapper.ts` writes the same + `_meta.toolKind` / `_meta.language` and rich + invocation/past-tense on historical `tool_use` blocks so + restored sessions render identically to live ones. +- **Snapshot test** in `claudeToolDisplay.test.ts` covering each tool + row × `{ invocation, pastTense, toolKind, language, inputString }`. + Mirrors the existing display-name snapshot so any new SDK tool + added to the `TOOL_ROWS` table forces a snapshot update. + +Tests: + +- Unit: snapshot table covers every tool; `getClaudeInvocationMessage` + defends against malformed input shapes and falls back cleanly. +- Integration: an interactive `Bash` request → the + `pending_confirmation` signal carries the command in + `invocationMessage` and `_meta.toolKind: 'terminal'`; the same flow + on completion emits a past-tense message that includes the command. + +Manual E2E: + +- Live: ask the Claude agent to run a shell command. The permission + card should render in the **terminal** style with the command + highlighted; the card should read *Running `git status`* (or + similar) instead of *Run shell command*. After approval the row + collapses to *Ran `git status`*. Same for `Grep` / `Glob` + rendering in the search style. +- Replay: open a historical Claude session that contains shell and + search tool calls. The historical rows should render in the same + terminal / search style as the live rows. + +Exit criteria: Claude tool-call cards (live and replayed) match +Copilot's rendering quality — permission cards show the actual +invocation, terminal tools render in the terminal renderer, search +tools render in the search renderer. Adding a new SDK tool means +adding one row to `TOOL_ROWS` and updating the snapshot test. + ### Phase 9 — Abort + steering + model change + shutdown polish ✅ **DONE** Implementation contract: [phase9-plan.md](./phase9-plan.md). Unit tests From 1e51901a55dd08ca286fbe6e6d51a42e92640cd4 Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Sat, 16 May 2026 00:15:58 -0700 Subject: [PATCH 2/4] Claude agent host: Phase 8.5 plan (super-planner + grilling) Adds phase8.5-plan.md alongside roadmap.md's Phase 8.5 section. Synthesized from a 3-model council (GPT-5.5, Claude Opus 4.6, GPT-5.3-Codex) and refined through a grill-with-docs session. Locked decisions: - D1: getClaudeToolKind is a TOOL_ROWS column (single source of truth). - D2: add Agent row to TOOL_ROWS; delete SUBAGENT_TOOL_NAMES. - D3: defensive Record access; no per-tool exported types. - D4: pastTenseMessage is success-aware (tool_result.is_error). - D5: live mapper mirrors Copilot's stash-on-start/reuse-on-complete pattern, with state encapsulated in a new ClaudeToolCallRegistry class (replaces today's bare maps on ClaudeMapperState). - D6: _meta single-write on Start; reducer carries to Complete; replay emits on its single terminal action (asymmetry by design). - D7: MCP tools get toolKind: undefined. - D8: one big per-tool snapshot table covering all 5 helpers. - D9: getClaudeInvocationMessage('Task', ...) owns the Task description fallback; Phase 12's site reduces to a plain helper call. - D10: _meta stays flat (no per-kind namespacing). Steps cover claudeToolDisplay (helpers + columns), claudeCanUseTool (permission card rich invocation + _meta), sessionPermissions (forward _meta through pending->ready), claudeMapSessionEvents (registry migration + _meta on Start + success-aware past tense), claudeReplayMapper (parity), claudeSubagentSignals (inner tool parity), and the snapshot + behavior tests. --- .../agentHost/node/claude/phase8.5-plan.md | 617 ++++++++++++++++++ 1 file changed, 617 insertions(+) create mode 100644 src/vs/platform/agentHost/node/claude/phase8.5-plan.md diff --git a/src/vs/platform/agentHost/node/claude/phase8.5-plan.md b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md new file mode 100644 index 0000000000000..fd55cd3b59750 --- /dev/null +++ b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md @@ -0,0 +1,617 @@ +# Phase 8.5 — Rich tool-call rendering parity with Copilot + +> Generated by super-planner. Source: [roadmap.md](./roadmap.md) (Phase 8.5). +> Last updated: 2026-05-15 after 3-model council (GPT-5.5, Claude Opus 4.6, +> GPT-5.3-Codex). Pre-grilling draft. + +**Status:** ready (pending grill-with-docs pass) + +## Goal + +Bring Claude tool-call rendering up to the quality bar Copilot already +sets: permission cards and tool rows show the **actual** invocation +parameters (the command, the file, the search pattern), and shell / +search tools render in the workbench's dedicated terminal / search +renderers instead of the generic tool renderer. Both live and replayed +sessions must produce identical signals. + +## Scope + +**In scope** + +- Five new exported helpers in `claudeToolDisplay.ts` mirroring + Copilot's shape: `getClaudeInvocationMessage`, + `getClaudePastTenseMessage`, `getClaudeToolKind`, + `getClaudeShellLanguage`, `getClaudeToolInputString`. +- Two new columns on the existing `TOOL_ROWS` table: `toolKind`, + `language`. Source-of-truth lookup for the kind/language helpers. +- One small action-meta helper, `buildClaudeToolMeta(toolName)` → + `{ toolKind?, language? }` (omits keys when undefined), called at + every tool-open seam. +- Wire the helpers through: + - `claudeCanUseTool.ts` — rich `invocationMessage`, rich + `toolInput`, `_meta` on `ToolCallPendingConfirmationState`. + - `claudeMapSessionEvents.ts` — `_meta` on `SessionToolCallStart`, + success-aware `pastTenseMessage` and `_meta` on + `SessionToolCallComplete`. Adds tool-input accumulation to + `ClaudeMapperState` (see Decision D5). + - `claudeReplayMapper.ts` — same helpers in `_openToolUse` / + `_attachToolResult`. Subsumes the existing Phase 12 + subagent-only `_meta` write. + - `claudeSubagentSignals.ts` — same `_meta` on inner subagent + tool-call start / ready (currently bare). + - `sessionPermissions.ts` — copy `state._meta` into the emitted + `SessionToolCallReady` action so `_meta` survives the + pending_confirmation → ready transition. +- Snapshot test in `claudeToolDisplay.test.ts` covering every tool + row × `{ invocation, pastTenseSuccess, pastTenseFailure, toolKind, + language, inputString }`. +- Live + replay assertions extended in + `claudeMapSessionEvents.test.ts` and `claudeReplayMapper.test.ts`. + +**Out of scope** + +- New SDK tools beyond what `TOOL_ROWS` already covers — adding tools + is a one-row update, not a phase. +- Workbench-side renderers — terminal/search rendering already exists + and is driven by the `_meta.toolKind` field Copilot already sets. + Phase 8.5 only emits the meta; rendering is the workbench's job. +- Bash-tool file-edit tracking — owned by Phase 8 (already documented + as a known gap). +- Reworking the localization story — strings remain `nls.localize()` + with `{0}` placeholders. + +## Prerequisites + +- Phase 7 (tool calls + permission + user input) — landed. Provides + `claudeCanUseTool.ts`, `INTERACTIVE_CLAUDE_TOOLS`, and the + pending-confirmation state shape. +- Phase 8 (file edit tracking) — landed. Stable `TOOL_ROWS` table and + `isClaudeFileEditTool` helper. +- Phase 12 (subagents) — landed. Established the `_meta.toolKind: + 'subagent'` precedent that Phase 8.5 generalizes. +- Phase 13 (session restoration) — landed. Provides the replay path + (`claudeReplayMapper.ts`). + +## Approach + +Add `toolKind` and `language` columns to the existing `TOOL_ROWS` +table in `claudeToolDisplay.ts` (single source of truth), then add +five helper functions that mirror Copilot's `copilotToolDisplay.ts` +shape but read Claude SDK `tool_use.input` schemas defensively. A +single tiny `buildClaudeToolMeta(toolName)` helper produces the +`_meta` bag stamped at every tool-open seam — live mapper, permission +path, replay mapper, and inner subagent path — so all three drivers +emit identical signals. The `pastTenseMessage` helper is +success-aware (`tool_result.is_error`). Inputs flow into the live +mapper via accumulation of `input_json_delta` chunks (see Decision +D5). + +## Steps + +1. **Extend `TOOL_ROWS` and add helpers in `claudeToolDisplay.ts`.** + - Add `toolKind?: 'terminal' | 'subagent' | 'search'` and + `language?: string` columns to `ClaudeToolRow`. Populate: + - `Bash`, `BashOutput`, `KillBash` → `toolKind: 'terminal'`, + `language: 'bash'`. + - `Grep`, `Glob` → `toolKind: 'search'`. + - `Task` → `toolKind: 'subagent'`. + - All others → omit (keeps `undefined`). + - Export `getClaudeToolKind(name)`, `getClaudeShellLanguage(name)` + as one-liner table lookups. + - Export `getClaudeInvocationMessage(name, displayName, input)` + and `getClaudePastTenseMessage(name, displayName, input, + success)` returning `StringOrMarkdown`. Mirror Copilot's + branches in `copilotToolDisplay.ts` lines 473–760, keyed off + Claude tool names: `Bash` (`command`), `Read`/`Write`/`Edit`/ + `MultiEdit`/`NotebookRead`/`NotebookEdit` (`file_path` / + `notebook_path`), `Grep`/`Glob` (`pattern`), `WebFetch` + (`url`), `Task` (`description`), MCP (`server: tool`). + - Export `getClaudeToolInputString(name, input)` returning the + "code under the row" string: shell tools → `command`; others → + `JSON.stringify(input, null, 2)`. + - Export `buildClaudeToolMeta(name)` returning + `{ toolKind?, language? }` with undefined keys omitted. + - All accessors validate field types defensively (e.g. `typeof + command === 'string'`); on mismatch fall back to the static + display name. **No `as` casts that escape the type checker.** + - Files: [`claudeToolDisplay.ts`](./claudeToolDisplay.ts). + - Depends on: none. + - Done when: snapshot test (Step 6) lists every tool row × all + five helper outputs. + +2. **Wire `claudeCanUseTool.ts` (live permission path).** + - In `dispatchCanUseTool` (lines 105–135), replace + `invocationMessage: displayName` with + `getClaudeInvocationMessage(toolName, displayName, input)`. + - Replace the raw stringify of `input` (used for `toolInput`) + with `getClaudeToolInputString(toolName, input) ?? + `. + - Add `_meta: buildClaudeToolMeta(toolName)` on the + `ToolCallPendingConfirmationState`. + - Files: [`claudeCanUseTool.ts`](./claudeCanUseTool.ts). + - Depends on: Step 1. + - Done when: Bash permission card carries the actual command + (verified by integration test added in Step 7). + +3. **Patch `sessionPermissions.ts` to forward `_meta` into Ready.** + - In `createToolReadyAction` (lines ~175–201), copy `state._meta` + onto the emitted `SessionToolCallReady` action so the + `pending_confirmation → ready` transition preserves + `toolKind` / `language`. + - This file is **outside** `node/claude/`; the change is + agnostic and will help any future agent that puts `_meta` on a + pending-confirmation state. + - Files: + [`sessionPermissions.ts`](../sessionPermissions.ts). + - Depends on: Step 1 (so the type of `_meta` is settled). + - Done when: a permission card flipped to ready keeps its + `_meta` (asserted in + [`sessionPermissions.test.ts`](../../test/node/sessionPermissions.test.ts) + if a fixture exists; otherwise add a focused test). + +4. **Wire `claudeMapSessionEvents.ts` (live event path) and extract + `ClaudeToolCallRegistry`.** + - Create new file + [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) + containing the `ClaudeToolCallRegistry` class described in + Decision D5. It owns per-tool-call state for the live mapper + (`turnId`, `toolName`, delta buffer, parsed input, computed + `IClaudeToolStartInfo` bag). Public API: + `beginToolCall`, `appendInputDelta`, `completeToolCall`, + `lookupToolCall`, `forgetToolCall`, `drainOnTurnEnd`. + - Delete the bare `_toolCallTurnIds` / `_toolCallNames` / + `_activeToolBlocks` fields on `ClaudeMapperState` and migrate + their callers to the registry. The mapper stops holding + cross-message state directly. + - `ClaudeAgentSession` constructs and owns one + `ClaudeToolCallRegistry` per session (registered via + `_register`), and passes it into the mapper as a constructor + argument. + - In `mapStreamEvent`: + - `content_block_start` for `tool_use` (~line 416) → + `registry.beginToolCall(toolUseId, toolName, turnId)`. + - `input_json_delta` (~line 182's comment block describes the + block-index correlation) → + `registry.appendInputDelta(toolUseId, delta.partial_json)`. + - `content_block_stop` for the same block → + `const info = registry.completeToolCall(toolUseId)`, then + emit `SessionToolCallStart` with + `info.invocationMessage`, `info.toolInput`, + `_meta: buildClaudeToolMeta(info.toolName)`. + - In `mapUserMessage` for `tool_result` (~lines 277–291): + - `const info = registry.lookupToolCall(toolUseId)`. + - Emit `SessionToolCallComplete` with + `pastTenseMessage: getClaudePastTenseMessage(info.toolName, + info.displayName, info.parsedInput, !isError)`. No `_meta` + — the reducer carries `_meta` forward from Start (D6). + - Call `registry.forgetToolCall(toolUseId)` after emission. + - On `info === undefined` (defense-in-depth): fall back to + the existing static `${displayName} finished` and no + `_meta`; log a trace warning. + - In `markTurnComplete` (~lines 140–157) → delegate to + `registry.drainOnTurnEnd(turnId, logService)`. + - Files: + [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) (new), + [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts), + [`claudeAgentSession.ts`](./claudeAgentSession.ts) (construct + and inject the registry). + - Depends on: Step 1. + - Done when: + - A live `Bash` turn emits `pastTenseMessage` containing the + command (verified in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts)). + - `_meta.toolKind: 'terminal'` appears on both + `SessionToolCallStart` and `SessionToolCallComplete`. + - New file + `claudeToolCallRegistry.test.ts` covers begin/append/complete/ + lookup/forget/drain in isolation (mirrors + `claudeSubagentRegistry.test.ts`). + +5. **Wire `claudeReplayMapper.ts` (replay path).** + - In `_openToolUse` (lines ~257–281): + - Parse `block.input` (already a `Record`). + - Replace `invocationMessage: displayName` with + `getClaudeInvocationMessage(toolName, displayName, input)`. + - Replace the existing `_meta: { toolKind: 'subagent' }` + (set only when `SUBAGENT_TOOL_NAMES.has(toolName)`) with + `_meta: buildClaudeToolMeta(toolName)` — this subsumes the + Phase 12 path (since `getClaudeToolKind('Task') === + 'subagent'`). + - Stash the parsed input in `ReplayBuilder._toolCallInputs: + Map>`. + - In `_attachToolResult` (lines ~282–320): + - Use `getClaudePastTenseMessage(toolName, displayName, + stashedInput, !isError)` instead of + ``${displayName} finished``. + - The existing `..._meta` carry-forward keeps the new full + `_meta` intact. + - Verify `Agent` (legacy SDK alias for Task subagent spawning, + per `SUBAGENT_TOOL_NAMES` at lines ~131–137) is handled. Two + options: + - **Option A**: add `Agent` row to `TOOL_ROWS` with + `toolKind: 'subagent'` (keeps replay regressions away). + - **Option B**: leave `SUBAGENT_TOOL_NAMES` as the + replay-only fallback; document it. + **Decision D2**: Option A — single source of truth. + - Files: + [`claudeReplayMapper.ts`](./claudeReplayMapper.ts). + - Depends on: Step 1. + - Done when: a replayed `Bash` row in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + shows the same rich invocation / pastTense / `_meta` as the + live row from Step 4. + +6. **Wire `claudeSubagentSignals.ts` (inner subagent path).** + - In `emitInnerAssistantSignals` (lines ~250–270), the inner + `tool_use` branch: + - Stamp `_meta: buildClaudeToolMeta(block.name)` on the + inner `SessionToolCallStart`. + - Replace `invocationMessage: displayName` on the synthetic + inner Ready with + `getClaudeInvocationMessage(block.name, displayName, + block.input)`. + - In `buildTopLevelSubagentReadyAction` (lines ~144–170), the + existing `_meta: { toolKind: 'subagent', ... }` already + produces the right kind. Re-express via + `{ ...buildClaudeToolMeta('Task'), + subagentDescription: ..., subagentAgentName: ... }` to use + the same helper (no behavioral change). + - Files: + [`claudeSubagentSignals.ts`](./claudeSubagentSignals.ts). + - Depends on: Step 1. + - Done when: an inner Glob/Read row inside a subagent renders + with `toolKind: 'search'` / no `toolKind` (verified in + [`claudeSubagentSignals.test.ts`](../../test/node/claudeSubagentSignals.test.ts)). + +7. **Snapshot + behavior tests.** + - Extend + [`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) + with one snapshot test mapping every tool row to + `[invocationMessage, pastTenseSuccess, pastTenseFailure, + toolKind, language, inputString]` given a representative + input bag. Mirrors the existing + `[permissionKind, displayName]` snapshot at lines ~31–50. + - Add focused defense tests: malformed input falls back; MCP + tool gets `undefined` toolKind + JSON input string fallback. + - Add live assertions in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts): + `Bash` start carries `_meta.toolKind: 'terminal'` / + `language: 'bash'`; complete carries success-aware past tense + containing the command. + - Add replay assertions in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + for the same. + - Add Ready-action `_meta` forwarding test in + `sessionPermissions.test.ts` (or add the file if absent). + - Files: see above. + - Depends on: Steps 1–6. + - Done when: `npm run compile-check-ts-native` clean, + `valid-layers-check` clean, all agentHost tests green. + +## Files to Modify or Create + +| Path | Change | Notes | +|------|--------|-------| +| [`claudeToolDisplay.ts`](./claudeToolDisplay.ts) | modify | Add `toolKind`/`language` cols; add 5 helpers + `buildClaudeToolMeta`. | +| [`claudeCanUseTool.ts`](./claudeCanUseTool.ts) | modify | Use rich helpers in `dispatchCanUseTool`. | +| [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts) | modify | Migrate cross-message state to `ClaudeToolCallRegistry`; emit `_meta` on Start; success-aware past tense + `_meta` on Complete. | +| [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) | create | New class owning per-tool-call state for the live mapper. | +| [`claudeAgentSession.ts`](./claudeAgentSession.ts) | modify | Construct one `ClaudeToolCallRegistry` per session; inject into the mapper. | +| [`claudeReplayMapper.ts`](./claudeReplayMapper.ts) | modify | Use rich helpers in `_openToolUse` / `_attachToolResult`; stash parsed input. | +| [`claudeSubagentSignals.ts`](./claudeSubagentSignals.ts) | modify | Same helpers on inner subagent path. | +| [`../sessionPermissions.ts`](../sessionPermissions.ts) | modify | Copy `state._meta` into emitted `SessionToolCallReady`. | +| [`../../test/node/claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) | modify | Per-tool composite snapshot covering all 5 helpers. | +| `../../test/node/claudeToolCallRegistry.test.ts` | create | Unit tests for begin/append/complete/lookup/forget/drain. | +| [`../../test/node/claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts) | modify | Live `_meta` + rich past-tense assertions. | +| [`../../test/node/claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) | modify | Replay parity assertions. | +| `../../test/node/sessionPermissions.test.ts` | modify or create | Assert `_meta` survives pending → ready. | + +## Decisions + +- **D1 — `getClaudeToolKind` is a `TOOL_ROWS` column, not a switch.** + The table is already the single source of truth for per-tool data + (`isClaudeFileEditTool`, `INTERACTIVE_CLAUDE_TOOLS`, + `getClaudePermissionKind` all read it). Adding a switch would + diverge on new tools and the snapshot test wouldn't catch the + drift. + +- **D2 — Add `Agent` to `TOOL_ROWS` (subagent-spawning, same as + `Task`).** SDK ships **two** subagent-spawning tools: `Task` + (built-in subagents, `sdk.d.ts:95`) and `Agent` (custom + subagents, `sdk.d.ts:36`). Replay already recognises both via + the local `SUBAGENT_TOOL_NAMES` set in + [`claudeReplayMapper.ts:137`](./claudeReplayMapper.ts#L137); + Phase 13's plan documented this with SDK citations. Add an + `Agent` row to `TOOL_ROWS` with + `permissionKind: 'custom-tool'`, `toolKind: 'subagent'`, then + delete `SUBAGENT_TOOL_NAMES` and switch the replay call site at + [`claudeReplayMapper.ts:261`](./claudeReplayMapper.ts#L261) to + `getClaudeToolKind(name) === 'subagent'`. The live path picks + this up automatically because `getClaudeToolKind` is the single + source of truth (D1). + +- **D3 — Defensive `Record` access; no per-tool + exported types.** The SDK boundary is `Record` / + `unknown` (verified at `sdk.d.ts:146` and `sdk.d.ts:3059`). + Per-tool interfaces would be documentation-only and risk + encouraging unsafe `as` casts. Helpers narrow each field with + `typeof` checks and fall back on mismatch. Local-only readability + interfaces are allowed inside `claudeToolDisplay.ts` (not + exported) if they make the helper bodies easier to read. + +- **D4 — `pastTenseMessage` is success-aware.** Signature is + `(name, displayName, input, success)` matching Copilot. Source of + the success bit: `tool_result.is_error === true` → + `success = false`. Already computed at + `claudeMapSessionEvents.ts:267` and `claudeReplayMapper.ts:295`. + +- **D5 — Live mapper mirrors Copilot's "stash on start, reuse on + complete" pattern (Option D), with state encapsulated in a small + class — not as bare maps on `ClaudeMapperState`.** CopilotAgent + computes the full `IToolStartInfo` at `tool.execution_start` (where + the Copilot SDK hands over complete args) and stashes it in + `builder.pendingTools: Map`; at + `tool.execution_complete` it looks the info back up and calls + `getPastTenseMessage(info.toolName, info.displayName, + info.parameters, success)` + ([`mapSessionEvents.ts:135–202`](../copilot/mapSessionEvents.ts#L135-L202), + [`mapSessionEvents.ts:622`](../copilot/mapSessionEvents.ts#L622)). + + Claude's analog: the SDK delivers tool input as `input_json_delta` + chunks across `content_block_start` → `delta*` → + `content_block_stop`. **`content_block_stop` for a `tool_use` block + IS Claude's "execution_start equivalent"** — by then the buffer is + complete and parseable. + + Introduce a new class `ClaudeToolCallRegistry` in + `claudeToolCallRegistry.ts` that owns all per-tool-call state for + the live mapper. It replaces today's two bare maps + (`_toolCallTurnIds`, `_toolCallNames`) and adds the new + input-accumulation state. Public API: + + - `beginToolCall(toolUseId, toolName, turnId)` — called from + `content_block_start`. Allocates the delta buffer. + - `appendInputDelta(toolUseId, partialJson)` — called from + `input_json_delta`. + - `completeToolCall(toolUseId) → IClaudeToolStartInfo` — called + from `content_block_stop`. Parses the buffer (try/catch), runs + the display helpers once, stashes the resulting info bag, + drops the buffer. Returns the bag so the mapper can use it on + the matching Start emission. + - `lookupToolCall(toolUseId) → IClaudeToolStartInfo | undefined` + — called from `mapUserMessage` when `tool_result` lands. + Returns the stashed info for past-tense / `_meta` use. + - `forgetToolCall(toolUseId)` — called after `tool_result` + consumes the entry, to bound memory. + - `drainOnTurnEnd(turnId, logService)` — called from + `markTurnComplete`, replaces today's inline cleanup at + `claudeMapSessionEvents.ts:140–157`, warns on orphan + `tool_use` blocks. + + The registry is owned by `ClaudeAgentSession` (one per session, + registered via `_register` so disposal flows through the + session). The mapper takes it as a constructor argument instead + of carrying state itself, so the mapper stays a pure + event-translation function and the registry can be unit-tested + in isolation (mirrors how Phase 12 split `SubagentRegistry` out + of the mapper). + + Replay is unaffected — `block.input` is already complete in the + replay path. `ClaudeReplayMapper` keeps its own + `Map>` stash internally; it + does not share the live registry because the lifetimes and + failure modes are entirely different (one-shot reconstruction + vs. streaming). + +- **D6 — `_meta` write asymmetry between live and replay is by + design.** Both paths end up producing the same final reducer + state; they differ only in the event shape that gets there. + + **Live (streaming, two-action shape):** the SDK emits + `content_block_start` → `input_json_delta*` → `content_block_stop` + → (user approves) → `tool_result` over time. The host translates + into two actions: `SessionToolCallStart` (stamped with + `_meta: buildClaudeToolMeta(toolName)` once) and + `SessionToolCallComplete` (no `_meta`). The reducer at + [`state/protocol/reducers.ts:30, 406`](../../common/state/protocol/reducers.ts) + carries `_meta` forward when merging Complete into the existing + per-`toolCallId` state. Mirrors Copilot's single-write at + [`mapSessionEvents.ts:197`](../copilot/mapSessionEvents.ts#L197). + + **Replay (flattened, single-action shape):** there is no event + stream — `claudeReplayMapper.ts` walks the on-disk transcript + and synthesizes one terminal `ToolCallState` per `(tool_use, + tool_result)` pair (per CONTEXT M7 "no live lifecycle states"). + `_meta` is stamped on that single action because there is no + prior state for the reducer to carry it from. + + **Permission cards (Step 3):** the one place this falls down + today is `sessionPermissions.createToolReadyAction` at + [`sessionPermissions.ts:175–201`](../sessionPermissions.ts#L175-L201), + which does NOT copy `_meta` from the pending-confirmation state + into the emitted `SessionToolCallReady`. Step 3 patches that + one site so the live single-write keeps working through the + `pending_confirmation → ready` transition. + + Future readers: do not "fix" the asymmetry by emitting `_meta` + on live Complete or by splitting replay into two actions. The + final reducer state is identical in both paths; only the event + shape differs. + +- **D10 — `_meta` stays flat (no per-kind namespacing).** Phase + 12's existing `_meta: { toolKind: 'subagent', + subagentDescription, subagentAgentName }` shape is the + precedent, and Copilot's `_meta` is also flat. New keys + (`language` for terminal, future keys for other kinds) sit as + optional siblings on the same object. Workbench readers + consume the keys they know about and ignore the rest. + Namespacing later (`_meta.terminal = { language }`) would + require migrating on-disk replay shape and a double-shape + workbench reader; the tidiness win does not justify either + cost. If `_meta` ever genuinely outgrows flat, it is its own + roadmap-level refactor. + +- **D9 — `getClaudeInvocationMessage('Task', …)` is the single + source of truth for the parent Task row's invocation message.** + The helper reads `input.description` (the model-authored + one-liner for the subagent task) and falls back to the static + display name. Phase 12's call site in + [`claudeSubagentSignals.ts:163`](./claudeSubagentSignals.ts#L163) + — currently `description ?? getClaudeToolDisplayName(block.name)` + — is replaced with a plain + `getClaudeInvocationMessage('Task', displayName, block.input)` + call. One code path; the snapshot test covers the + description-present and description-absent shapes. + +- **D7 — MCP tools get `toolKind: undefined`.** Mirrors Copilot's + behavior (`copilotToolDisplay.ts:807` returns one of the three + kinds or undefined). MCP input string defaults to JSON + pretty-print; invocation message strips the `mcp__` prefix and + shows `server: tool`. + +- **D8 — Snapshot test is one big per-tool table.** One + `assert.deepStrictEqual` mapping every tool name to a tuple of + all five helper outputs. Mirrors the existing snapshot pattern + at `claudeToolDisplay.test.ts:31–50`. New tools force one row + edit + one snapshot update — easy to review. + +## Risks + +- **R1 — `_meta` lost in pending → ready transition.** Mitigation: + Step 3 patches `sessionPermissions.ts`; covered by a focused + test. Without this fix, terminal/search rendering on permission + cards would silently break even though everything else looks + right. + +- **R2 — Live input accumulation race.** `input_json_delta` chunks + may arrive out of order with the canonical `assistant` envelope + for fast tools. Mitigation: D5's "first writer wins" rule plus + `try/catch` fallback to the static past tense. + +- **R3 — `StringOrMarkdown` downstream serialization.** + `invocationMessage` accepts `StringOrMarkdown`, but if a path + stringifies it expecting plain `string` we'd render + `[object Object]`. Mitigation: audit + `stateToProgressAdapter.ts` and `sessionPermissions.ts` arms + during Step 3; the reducer types should already enforce the + union correctly. + +- **R4 — Backwards compatibility on replayed sessions.** Existing + on-disk transcripts have `_meta: undefined` or `_meta: { toolKind: + 'subagent' }`. After this change they'll get richer `_meta`. This + is purely additive (no fields removed, no semantics changed) so + the workbench should accept it. Verified in Step 7's replay + assertions. + +- **R5 — Localization.** All new user-visible strings (invocation + + past-tense markdown) must use `nls.localize()` with `{0}` + placeholders, not concatenation. Coding guidelines enforced; + Copilot's helper is the template. + +## Verification + +### Unit / Integration + +- Unit: snapshot test in + [`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) + covers every tool row × all five helpers. Defense tests cover + malformed input + MCP fallback. +- Live: assertions in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts) + for `_meta` on Start, success-aware past tense + `_meta` on + Complete. +- Replay: assertions in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + for parity with the live path. +- Subagent inner: assertions in + [`claudeSubagentSignals.test.ts`](../../test/node/claudeSubagentSignals.test.ts) + for `_meta` on inner tool calls. +- Permission forwarding: assertion in + `sessionPermissions.test.ts` (add file if absent) that + `state._meta` is copied into `SessionToolCallReady`. +- Run with: `runTests` (preferred) or `scripts/test.sh --grep + "agentHost"`. +- TS validity: `npm run compile-check-ts-native` and + `npm run valid-layers-check`. + +### E2E + +Available skills in this workspace: + +- **launch** — Playwright/CDP-driven Code OSS automation. Used + for the Phase 12 live E2E captured in [`phase12-plan.md`](./phase12-plan.md) + (Step 14). +- **code-oss-logs** — find and read dev-build renderer / agentHost + logs. Used to assert no `[Claude]` warn-logs during the + scenario. + +**Scenario**: + +1. Use `launch` to start the Agents window + (`./scripts/code.sh --agents`), switch the agent picker to + **Claude**, and send: *"Run `git status` and tell me what's + dirty."* +2. Verify the **Bash permission card** renders in the terminal + style (monospace block, syntax-highlighted as `bash`) and + reads *Running `git status`* — **not** *Run shell command*. +3. Approve. Verify the row collapses to *Ran `git status`* + (success-aware past tense from D4). +4. Send: *"Search for `IClaudeAgentSession` in this repo."* Verify + the `Grep` row renders in the **search** style (workbench's + search renderer, not generic) and reads *Searching for + `IClaudeAgentSession`*. After completion the row reads + *Searched for `IClaudeAgentSession`*. +5. Use `code-oss-logs` to read the agentHost log; assert no + `[Claude]` warn-logs for the resolved tool calls. +6. Click an older Claude session in the sidebar that contains + shell + search tool calls. Verify the historical rows render + in the same terminal / search style as the live ones (replay + parity from Step 5). + +### Manual fallback + +If the launch skill cannot be used, the same scenario is +achievable by hand: open Agents window, send the same prompts, +inspect the chat UI, then `tail -f` the agentHost log under +`~/.config/Code - Insiders/logs/.../agentHost.log` (path varies +by OS; `code-oss-logs` resolves it). + +## Open Questions + +*(Pre-grilling. Will be resolved during the grill-with-docs +session and folded into Decisions above.)* + +- **Q1 — Replay completed-state inputs for past tense.** Replay's + `_attachToolResult` needs the parsed input that was set in + `_openToolUse`. The simplest path is a per-builder + `Map>`. Is that map's + lifetime correctly bounded by the builder, or does it need + explicit cleanup on unmatched orphans? (Owner: implementer, + resolve during Step 5.) +- **Q2 — Should `getClaudeShellLanguage` ever return non-`'bash'`?** + Today the only shell tools are `Bash` family. If/when the SDK + adds `Pwsh` etc., does `language` become a column on the row, or + derived? (Owner: roadmap, resolve when a non-bash shell tool + appears.) + +## References + +- Roadmap: [`./roadmap.md`](./roadmap.md) (Phase 8.5). +- Reference implementation: + [`copilotToolDisplay.ts`](../copilot/copilotToolDisplay.ts), + [`mapSessionEvents.ts`](../copilot/mapSessionEvents.ts). +- Phase 12 closeout (subagent `_meta` precedent): + [`phase12-plan.md`](./phase12-plan.md). +- Glossary: [`CONTEXT.md`](./CONTEXT.md). +- E2E skills used: `launch`, `code-oss-logs`. + +## Self-check + +- [x] Status set. +- [x] Every step has files, dependencies, and "done when". +- [x] Decisions captured. +- [x] Out of scope explicit. +- [x] Verification names concrete commands and skills. +- [x] E2E references real skills (`launch`, `code-oss-logs`). +- [x] An agent reading only this file can implement. From 3336f70c3a8d6f2bd0b93d2a919cf207601fc9a4 Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Mon, 18 May 2026 14:56:53 -0700 Subject: [PATCH 3/4] Claude agent host: Phase 8.5 implementation (rich tool-call rendering) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Brings tool-call rendering parity with Copilot: - Rich invocation/past-tense messages per tool (Bash 'Running `git status`' → 'Ran `git status`', Read 'Reading [README.md](...)' → 'Read [README.md](...)', Grep/Glob with patterns, file links for path-bearing tools, subagent descriptions for Task/Agent). - `_meta.toolKind` ('terminal'/'search'/'subagent') stamped at the tool-open seam to drive the workbench's specialized renderers; reducer carries it forward through every state transition (D6 in plan). - New `ClaudeToolCallRegistry` encapsulates per-session tool-call attribution + input accumulation + computed start-info, mirroring Copilot's 'stash on start, reuse on complete' pattern. Critical bug fix bundled in: - Emit `SessionToolCallReady` at `content_block_stop` so auto-allowed tools (which the Claude SDK runs without invoking `canUseTool`) transition Streaming → Running and the subsequent `SessionToolCallComplete` is accepted by the reducer instead of dropped. Without this, every auto-allowed tool widget rendered empty after completion. D6 parity fix for inner subagent tools: - `claudeSubagentSignals` now calls `registry.seedParsedInput()` for inner tool_use blocks (which arrive pre-parsed on synthesized assistant messages rather than via input_json_delta), so the live tool_result handler emits rich past-tense text matching the replay path instead of falling back to '{displayName} finished'. Also fixes a pre-existing Phase 10 stub: - `ClaudeAgent.onClientToolCallComplete` is now a benign no-op. The AgentSideEffects autorun fires this hook for EVERY server-dispatched SessionToolCallComplete envelope (including normal SDK tool completions), so the previous `throw new Error('TODO: Phase 10')` corrupted every tool flow. Client (MCP) tool registration via `setClientTools` still throws since Phase 10 hasn't landed. Tests: - New: `claudeToolCallRegistry.test.ts` (lifecycle + seedParsedInput coverage). - Snapshot: `claudeToolDisplay.test.ts` covers every tool row × all helpers. - Mapper: `claudeMapSessionEvents.test.ts` Test 9.5 asserts the new content_block_stop Ready emission. - Subagent: `claudeSubagentSignals.test.ts` extended to assert rich past-tense for inner-tool completion (D6 parity). - Agent: `claudeAgent.test.ts` asserts onClientToolCallComplete is a no-op. Verified live in the Agents window: NEW Claude [Local] chat with 'Run `git status` and then read README.md' produces both tool widgets in the expanded thinking block with command + output visible. --- .../agentHost/node/claude/claudeAgent.ts | 7 +- .../agentHost/node/claude/claudeCanUseTool.ts | 10 +- .../node/claude/claudeMapSessionEvents.ts | 124 ++++++-- .../node/claude/claudeReplayMapper.ts | 42 +-- .../node/claude/claudeSubagentSignals.ts | 21 +- .../node/claude/claudeToolCallRegistry.ts | 191 +++++++++++ .../node/claude/claudeToolDisplay.ts | 301 +++++++++++++++++- .../agentHost/node/claude/phase8.5-plan.md | 245 ++++++++++++-- .../agentHost/node/sessionPermissions.ts | 2 + .../agentHost/test/node/claudeAgent.test.ts | 17 +- .../test/node/claudeMapSessionEvents.test.ts | 35 +- .../test/node/claudeSubagentSignals.test.ts | 9 + .../test/node/claudeToolCallRegistry.test.ts | 166 ++++++++++ .../test/node/claudeToolDisplay.test.ts | 128 ++++++++ 14 files changed, 1195 insertions(+), 103 deletions(-) create mode 100644 src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts create mode 100644 src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts diff --git a/src/vs/platform/agentHost/node/claude/claudeAgent.ts b/src/vs/platform/agentHost/node/claude/claudeAgent.ts index 50209360d2a9a..32dfa1e8c8bf3 100644 --- a/src/vs/platform/agentHost/node/claude/claudeAgent.ts +++ b/src/vs/platform/agentHost/node/claude/claudeAgent.ts @@ -965,7 +965,12 @@ export class ClaudeAgent extends Disposable implements IAgent { } onClientToolCallComplete(_session: URI, _toolCallId: string, _result: ToolCallResult): void { - throw new Error('TODO: Phase 10'); + // Phase 10 — client (MCP) tool completion routing. Until then, every + // SDK-owned tool that completes also fires this hook via + // `AgentSideEffects` listening on `SessionToolCallComplete` envelopes, + // so the body must be a benign no-op rather than throw. Once client + // tools are registered via `setClientTools`, this should resolve the + // matching pending promise on the SDK session. } setClientCustomizations(_clientId: string, _customizations: CustomizationRef[], _progress?: (results: ISyncedCustomization[]) => void): Promise { diff --git a/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts b/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts index ee0c938b8e2a2..889f3c3f230d3 100644 --- a/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts +++ b/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts @@ -9,7 +9,7 @@ import { SessionInputResponseKind, ToolCallPendingConfirmationState, ToolCallSta import { IAgentConfigurationService } from '../agentConfigurationService.js'; import { ClaudeAgentSession } from './claudeAgentSession.js'; import { buildAskUserSessionInputQuestions, buildExitPlanModeConfirmationState, flattenAskUserAnswers, parseAskUserQuestionInput } from './claudeInteractiveTools.js'; -import { getClaudeConfirmationTitle, getClaudePermissionKind, getClaudeToolDisplayName, getClaudeToolPath, INTERACTIVE_CLAUDE_TOOLS } from './claudeToolDisplay.js'; +import { getClaudeConfirmationTitle, getClaudeInvocationMessage, getClaudePermissionKind, getClaudeToolDisplayName, getClaudeToolInputString, getClaudeToolPath, INTERACTIVE_CLAUDE_TOOLS, buildClaudeToolMeta } from './claudeToolDisplay.js'; /** * Dependencies for {@link handleCanUseTool}. Kept narrow: a session @@ -123,15 +123,17 @@ async function dispatchCanUseTool( const permissionKind = getClaudePermissionKind(toolName); const displayName = getClaudeToolDisplayName(toolName); const permissionPath = options.blockedPath ?? getClaudeToolPath(toolName, input); - const toolInputJson = JSON.stringify(input); + const toolInputString = getClaudeToolInputString(toolName, input) ?? JSON.stringify(input); + const meta = buildClaudeToolMeta(toolName); const state: ToolCallPendingConfirmationState = { status: ToolCallStatus.PendingConfirmation, toolCallId: options.toolUseID, toolName, displayName, - invocationMessage: displayName, - toolInput: toolInputJson, + invocationMessage: getClaudeInvocationMessage(toolName, displayName, input), + toolInput: toolInputString, confirmationTitle: getClaudeConfirmationTitle(toolName), + ...(meta ? { _meta: meta } : {}), }; const parentToolCallId = resolveSubagentParent(session, options); diff --git a/src/vs/platform/agentHost/node/claude/claudeMapSessionEvents.ts b/src/vs/platform/agentHost/node/claude/claudeMapSessionEvents.ts index 9a6bd713f88b4..771b0978eba54 100644 --- a/src/vs/platform/agentHost/node/claude/claudeMapSessionEvents.ts +++ b/src/vs/platform/agentHost/node/claude/claudeMapSessionEvents.ts @@ -11,7 +11,9 @@ import { ActionType } from '../../common/state/sessionActions.js'; import { ResponsePartKind, ToolResultContentType, type ToolResultContent, type ToolResultFileEditContent } from '../../common/state/sessionState.js'; import { buildTopLevelSubagentReadyAction, emitInnerAssistantSignals, mapSubagentSystemMessage, SUBAGENT_SPAWNING_TOOL_NAMES, tagWithParent } from './claudeSubagentSignals.js'; import type { SubagentRegistry } from './claudeSubagentRegistry.js'; -import { getClaudeToolDisplayName } from './claudeToolDisplay.js'; +import { buildClaudeToolMeta, getClaudePastTenseMessage, getClaudeToolDisplayName } from './claudeToolDisplay.js'; +import { ClaudeToolCallRegistry } from './claudeToolCallRegistry.js'; +import { ToolCallConfirmationReason, type StringOrMarkdown } from '../../common/state/protocol/state.js'; /** * Cross-call state for {@link mapSDKMessageToAgentSignals}. One instance @@ -43,8 +45,14 @@ import { getClaudeToolDisplayName } from './claudeToolDisplay.js'; */ export class ClaudeMapperState { private readonly _activeToolBlocks = new Map(); - private readonly _toolCallTurnIds = new Map(); - private readonly _toolCallNames = new Map(); + /** + * Phase 8.5 — cross-message tool-call attribution + input + * accumulation + computed start-info, encapsulated as its own + * collaborator class so it can be unit-tested independently. + * Public so mapper functions can call its lifecycle methods + * directly without forwarding through this class. + */ + readonly toolCalls = new ClaudeToolCallRegistry(); private _currentMessageId: string | undefined; /** @@ -79,8 +87,7 @@ export class ClaudeMapperState { */ startToolBlock(index: number, toolUseId: string, toolName: string, turnId: string): void { this._activeToolBlocks.set(index, { toolUseId, toolName }); - this._toolCallTurnIds.set(toolUseId, turnId); - this._toolCallNames.set(toolUseId, toolName); + this.toolCalls.begin(toolUseId, toolName, turnId); } getActiveToolBlock(index: number): { toolUseId: string; toolName: string } | undefined { @@ -91,24 +98,46 @@ export class ClaudeMapperState { this._activeToolBlocks.delete(index); } + /** + * Phase 8.5 — forward an `input_json_delta.partial_json` chunk + * to the registry. Resolves the index → `tool_use_id` mapping + * locally (the registry is keyed by id, not by index) and is a + * no-op when the index is unknown. + */ + appendToolBlockInputDelta(index: number, partialJson: string): void { + const tracked = this._activeToolBlocks.get(index); + if (!tracked) { + return; + } + this.toolCalls.appendInputDelta(tracked.toolUseId, partialJson); + } + + /** + * Phase 8.5 — forward the `content_block_stop` signal to the + * registry, which parses the buffer and stashes the computed + * start-info. + */ + finalizeToolBlock(index: number): void { + const tracked = this._activeToolBlocks.get(index); + if (!tracked) { + return; + } + this.toolCalls.finalize(tracked.toolUseId); + } + /** * Cross-message lookup for `tool_result` handling. Returns * `undefined` if the `tool_use_id` is unknown (defense-in-depth * against transport drift / replay). */ lookupToolCall(toolUseId: string): { turnId: string; toolName: string } | undefined { - const turnId = this._toolCallTurnIds.get(toolUseId); - const toolName = this._toolCallNames.get(toolUseId); - if (turnId === undefined || toolName === undefined) { - return undefined; - } - return { turnId, toolName }; + const entry = this.toolCalls.lookup(toolUseId); + return entry ? { turnId: entry.turnId, toolName: entry.toolName } : undefined; } /** Drain cross-message tracking once a `tool_result` is delivered. */ completeToolCall(toolUseId: string): void { - this._toolCallTurnIds.delete(toolUseId); - this._toolCallNames.delete(toolUseId); + this.toolCalls.complete(toolUseId); } /** @@ -147,15 +176,7 @@ export class ClaudeMapperState { * `registry.drainForegroundSpawns()` from {@link mapResult}. */ clearPendingToolCalls(logService: ILogService): void { - if (this._toolCallTurnIds.size === 0) { - return; - } - for (const [toolUseId, turnId] of this._toolCallTurnIds) { - const toolName = this._toolCallNames.get(toolUseId) ?? ''; - logService.warn(`[claudeMapSessionEvents] turn ${turnId} ended with pending tool_use ${toolUseId} (${toolName}); dropping cross-message state`); - } - this._toolCallTurnIds.clear(); - this._toolCallNames.clear(); + this.toolCalls.clearPending(logService); } } @@ -319,6 +340,10 @@ function mapUserMessage( if (fileEdit) { content.push(fileEdit); } + const info = state.toolCalls.lookup(block.tool_use_id)?.info; + const pastTenseMessage: StringOrMarkdown = info + ? getClaudePastTenseMessage(info.toolName, info.displayName, info.parsedInput, !isError) + : `${getClaudeToolDisplayName(tracked.toolName)} finished`; signals.push({ kind: 'action', session, @@ -329,7 +354,7 @@ function mapUserMessage( toolCallId: block.tool_use_id, result: { success: !isError, - pastTenseMessage: `${getClaudeToolDisplayName(tracked.toolName)} finished`, + pastTenseMessage, content: content.length > 0 ? content : undefined, }, }, @@ -494,6 +519,13 @@ function mapStreamEvent( } else { registry.noteInnerTool(block.id, parentToolUseId); } + // Phase 8.5 — `_meta.toolKind` drives the workbench's terminal / + // search / subagent renderers. Single write at the tool-open + // seam; the reducer carries `_meta` forward to all subsequent + // state transitions (D6). Subagent meta from Phase 12 is now + // produced by `buildClaudeToolMeta` because + // `getClaudeToolKind('Task') === 'subagent'`. + const meta = buildClaudeToolMeta(block.name); return [{ kind: 'action', session, @@ -504,13 +536,7 @@ function mapStreamEvent( toolCallId: block.id, toolName: block.name, displayName: getClaudeToolDisplayName(block.name), - // Phase 12 — `_meta.toolKind` is read by the workbench - // renderer (`isSubagentTool` in stateToProgressAdapter) - // to recognise this tool call as a subagent spawner - // and render it via `ChatSubagentContentPart` instead - // of the generic tool-call view. Without this the UI - // shows "Running [tool]" with no subagent details. - ...(isSubagentSpawn ? { _meta: { toolKind: 'subagent' } } : {}), + ...(meta ? { _meta: meta } : {}), }, }]; } @@ -550,6 +576,7 @@ function mapStreamEvent( logService.warn(`[claudeMapSessionEvents] input_json_delta for unknown content-block index ${event.index}`); return []; } + state.appendToolBlockInputDelta(event.index, event.delta.partial_json); return [{ kind: 'action', session, @@ -565,9 +592,42 @@ function mapStreamEvent( return []; } - case 'content_block_stop': + case 'content_block_stop': { + const tracked = state.getActiveToolBlock(event.index); + state.finalizeToolBlock(event.index); state.endToolBlock(event.index); - return []; + if (!tracked) { + return []; + } + // Phase 8.5 — emit `SessionToolCallReady` so the tool transitions + // out of `Streaming` even when the Claude SDK auto-allows without + // calling `canUseTool` (no `claudeCanUseTool` round-trip means + // `sessionPermissions` never emits Ready, the state stays in + // `Streaming`, and the subsequent `SessionToolCallComplete` is + // dropped by the reducer — leaving the tool widget empty). + // When `canUseTool` DOES fire, `sessionPermissions` emits a second + // Ready that re-transitions Running → PendingConfirmation as needed. + const entry = state.toolCalls.lookup(tracked.toolUseId); + const info = entry?.info; + if (!info) { + return []; + } + const meta = buildClaudeToolMeta(tracked.toolName); + return [{ + kind: 'action', + session, + action: { + type: ActionType.SessionToolCallReady, + session: sessionStr, + turnId, + toolCallId: tracked.toolUseId, + invocationMessage: info.invocationMessage, + ...(info.toolInput !== undefined ? { toolInput: info.toolInput } : {}), + confirmed: ToolCallConfirmationReason.NotNeeded, + ...(meta ? { _meta: meta } : {}), + }, + }]; + } case 'message_delta': case 'message_stop': diff --git a/src/vs/platform/agentHost/node/claude/claudeReplayMapper.ts b/src/vs/platform/agentHost/node/claude/claudeReplayMapper.ts index 5026d910aaaa5..4e2a625495dd0 100644 --- a/src/vs/platform/agentHost/node/claude/claudeReplayMapper.ts +++ b/src/vs/platform/agentHost/node/claude/claudeReplayMapper.ts @@ -21,7 +21,7 @@ import { type Turn, } from '../../common/state/protocol/state.js'; import { buildSubagentSessionUri } from '../../common/state/sessionState.js'; -import { getClaudeToolDisplayName } from './claudeToolDisplay.js'; +import { buildClaudeToolMeta, getClaudeInvocationMessage, getClaudePastTenseMessage, getClaudeToolDisplayName, getClaudeToolInputString } from './claudeToolDisplay.js'; /** * Phase 13 — replay mapper. Reduces a flat `SessionMessage[]` (the SDK's @@ -127,15 +127,6 @@ function parseSystemMessage(msg: SessionMessage): ParsedSessionMessage | undefin // #region Builder -/** - * Subagent-spawning tool names recognised by both `Task` (built-in, - * see [`sdk.d.ts:95`](node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts)) - * and `Agent` (custom subagents, - * see [`sdk.d.ts:36`](node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts)). - * The production extension matches both at `claudeMessageDispatch.ts:194`. - */ -const SUBAGENT_TOOL_NAMES: ReadonlySet = new Set(['Task', 'Agent']); - /** * Allowlist of `system` subtypes that survive replay as * {@link ResponsePartKind.SystemNotification} parts on the active turn. @@ -183,8 +174,17 @@ interface InProgressTurn { class ReplayBuilder { private readonly _turns: Turn[] = []; private _active: InProgressTurn | undefined; - /** Cross-turn: tool_use_id → turnId of the announcing turn. */ - private readonly _toolUseToTurnId = new Map(); + /** + * Cross-turn tool-use tracking. Keyed by `tool_use_id`: + * - `turnId` — the announcing turn (so a late `tool_result` in a + * later `user` envelope can attach back to the right turn per M7). + * - `parsedInput` — the original `tool_use.input`, looked up at + * `_attachToolResult` so the past-tense message can include the + * original parameters. Mirrors the live mapper's `_toolCallInfo` + * pattern but simpler (replay has the full input synchronously on + * the `tool_use` block). + */ + private readonly _toolUses = new Map | undefined }>(); constructor(private readonly _session: URI, private readonly _logService: ILogService) { } @@ -258,18 +258,19 @@ class ReplayBuilder { if (this._active === undefined) { return; } - const isSubagent = SUBAGENT_TOOL_NAMES.has(toolName); const displayName = getClaudeToolDisplayName(toolName); + const parsedInput = input !== null && typeof input === 'object' ? input as Record : undefined; + const meta = buildClaudeToolMeta(toolName); // Build a placeholder Cancelled state by default; replaced with Completed when the tool_result lands. const placeholder: ToolCallCancelledState = { status: ToolCallStatus.Cancelled, toolCallId: toolUseId, toolName, displayName, - invocationMessage: displayName, - toolInput: typeof input === 'string' ? input : input !== undefined ? safeStringify(input) : undefined, + invocationMessage: getClaudeInvocationMessage(toolName, displayName, parsedInput), + toolInput: parsedInput !== undefined ? getClaudeToolInputString(toolName, parsedInput) : (typeof input === 'string' ? input : input !== undefined ? safeStringify(input) : undefined), reason: ToolCallCancellationReason.Skipped, - ...(isSubagent ? { _meta: { toolKind: 'subagent' as const } } : {}), + ...(meta ? { _meta: meta } : {}), }; const part: ToolCallResponsePart = { kind: ResponsePartKind.ToolCall, @@ -278,15 +279,16 @@ class ReplayBuilder { this._active.responseParts.push(part); this._active.toolCallParts.set(toolUseId, part); this._active.pendingToolUseIds.add(toolUseId); - this._toolUseToTurnId.set(toolUseId, this._active.id); + this._toolUses.set(toolUseId, { turnId: this._active.id, parsedInput }); } private _attachToolResult(block: UserToolResultBlock): void { - const announcingTurnId = this._toolUseToTurnId.get(block.tool_use_id); - if (announcingTurnId === undefined) { + const entry = this._toolUses.get(block.tool_use_id); + if (entry === undefined) { this._logService.warn(`[claudeReplayMapper] tool_result for unknown tool_use_id ${block.tool_use_id}`); return; } + const announcingTurnId = entry.turnId; // Find the part — it lives on the announcing turn (which may be `_active` or one already pushed to `_turns`). const part = this._findToolCallPart(announcingTurnId, block.tool_use_id); if (part === undefined) { @@ -312,7 +314,7 @@ class ReplayBuilder { toolInput: previousState.status === ToolCallStatus.Streaming ? undefined : previousState.toolInput, confirmed: ToolCallConfirmationReason.NotNeeded, success: !isError, - pastTenseMessage: `${previousState.displayName} finished`, + pastTenseMessage: getClaudePastTenseMessage(previousState.toolName, previousState.displayName, entry.parsedInput, !isError), content: content.length > 0 ? content : undefined, ...(previousState._meta ? { _meta: previousState._meta } : {}), }; diff --git a/src/vs/platform/agentHost/node/claude/claudeSubagentSignals.ts b/src/vs/platform/agentHost/node/claude/claudeSubagentSignals.ts index 6d51f29f5d518..dc45412b3e6aa 100644 --- a/src/vs/platform/agentHost/node/claude/claudeSubagentSignals.ts +++ b/src/vs/platform/agentHost/node/claude/claudeSubagentSignals.ts @@ -10,7 +10,7 @@ import { ActionType } from '../../common/state/sessionActions.js'; import { ResponsePartKind, ToolCallConfirmationReason } from '../../common/state/sessionState.js'; import type { ClaudeMapperState } from './claudeMapSessionEvents.js'; import { SUBAGENT_TOOL_NAMES, type SubagentRegistry } from './claudeSubagentRegistry.js'; -import { getClaudeToolDisplayName } from './claudeToolDisplay.js'; +import { buildClaudeToolMeta, getClaudeInvocationMessage, getClaudeToolDisplayName, getClaudeToolInputString } from './claudeToolDisplay.js'; /** * Phase 12 — SDK tool names that spawn subagent sessions. Re-exported @@ -145,7 +145,7 @@ export function buildTopLevelSubagentReadyAction( const agentName = typeof input?.subagent_type === 'string' ? input.subagent_type : undefined; const inputJson = block.input !== undefined ? safeStringify(block.input) : undefined; registry.recordSpawn(block.id, { subagentType: agentName, description }); - const meta: Record = { toolKind: 'subagent' }; + const meta: Record = { ...(buildClaudeToolMeta(block.name) ?? { toolKind: 'subagent' }) }; if (description) { meta.subagentDescription = description; } @@ -160,7 +160,7 @@ export function buildTopLevelSubagentReadyAction( session: session.toString(), turnId, toolCallId: block.id, - invocationMessage: description ?? getClaudeToolDisplayName(block.name), + invocationMessage: getClaudeInvocationMessage(block.name, getClaudeToolDisplayName(block.name), block.input), ...(inputJson !== undefined ? { toolInput: inputJson } : {}), confirmed: ToolCallConfirmationReason.NotNeeded, _meta: meta, @@ -237,9 +237,17 @@ export function emitInnerAssistantSignals( } if (block.type === 'tool_use') { state.startToolBlock(index, block.id, block.name, turnId); + // Inner tool input arrives pre-parsed on the synthesized + // `assistant` message (not via `input_json_delta` chunks), so + // seed the registry directly. Without this the live + // `tool_result` handler falls back to a generic + // `"{displayName} finished"` past-tense and replay (which + // always computes rich text) drifts from live — violating D6. + state.toolCalls.seedParsedInput(block.id, block.input); registry.noteInnerTool(block.id, parentToolUseId); - const inputJson = block.input !== undefined ? safeStringify(block.input) : undefined; const displayName = getClaudeToolDisplayName(block.name); + const meta = buildClaudeToolMeta(block.name); + const toolInputStr = getClaudeToolInputString(block.name, block.input); signals.push({ kind: 'action', session, @@ -250,6 +258,7 @@ export function emitInnerAssistantSignals( toolCallId: block.id, toolName: block.name, displayName, + ...(meta ? { _meta: meta } : {}), }, }); signals.push({ @@ -260,8 +269,8 @@ export function emitInnerAssistantSignals( session: sessionStr, turnId, toolCallId: block.id, - invocationMessage: displayName, - ...(inputJson !== undefined ? { toolInput: inputJson } : {}), + invocationMessage: getClaudeInvocationMessage(block.name, displayName, block.input), + ...(toolInputStr !== undefined ? { toolInput: toolInputStr } : {}), confirmed: ToolCallConfirmationReason.NotNeeded, }, }); diff --git a/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts b/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts new file mode 100644 index 0000000000000..ad2a08766dcd1 --- /dev/null +++ b/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts @@ -0,0 +1,191 @@ +/*--------------------------------------------------------------------------------------------- + * Copyright (c) Microsoft Corporation. All rights reserved. + * Licensed under the MIT License. See License.txt in the project root for license information. + *--------------------------------------------------------------------------------------------*/ + +import type { ILogService } from '../../../log/common/log.js'; +import type { StringOrMarkdown } from '../../common/state/protocol/state.js'; +import { getClaudeInvocationMessage, getClaudeToolDisplayName, getClaudeToolInputString } from './claudeToolDisplay.js'; + +/** + * Phase 8.5 — per-tool-call info computed at `content_block_stop` and + * reused at `tool_result` time. Mirrors Copilot's `IToolStartInfo` + * shape: Copilot stashes it at `tool.execution_start` (where the + * Copilot SDK hands over complete args); Claude stashes it at the + * analogous `content_block_stop` (the first point where the + * `input_json_delta` buffer is complete and parseable). + */ +export interface IClaudeToolStartInfo { + readonly toolName: string; + readonly displayName: string; + readonly parsedInput: Record | undefined; + readonly invocationMessage: StringOrMarkdown; + readonly toolInput: string | undefined; +} + +interface IRegistryEntry { + readonly toolName: string; + readonly turnId: string; + inputBuffer: string; + info: IClaudeToolStartInfo | undefined; +} + +/** + * Phase 8.5 — per-session, cross-message tool-call tracking for the + * live mapper. Owns: + * + * - **Attribution** — `tool_use_id → { toolName, turnId }`. A + * `tool_use` lands in one assistant message; the matching + * `tool_result` arrives in a later synthetic `user` message. The + * registry resolves the originating turn so each Complete action + * lands on the correct turn. + * - **Input accumulation** — `input_json_delta` chunks arrive across + * `content_block_start` → `delta*` → `content_block_stop`. The + * registry concatenates them and parses once at `finalize`. + * - **Computed start-info** — `displayName`, rich `invocationMessage`, + * `toolInput` string, parsed input. Computed once at `finalize` + * and looked up at `tool_result` time so `pastTenseMessage` can + * include the original parameters. + * + * Mirror of Copilot's `pendingTools: Map` + * pattern in + * [`mapSessionEvents.ts`](../copilot/mapSessionEvents.ts) — only the + * seam differs (Copilot's SDK hands over complete args at + * `tool.execution_start`; Claude's SDK streams them in deltas, so the + * "ready" seam is `content_block_stop`). + * + * Encapsulated as a class with named lifecycle methods so the maps' + * mutators are not part of the public surface — Phase 6.1's lesson. + * One instance lives per `ClaudeAgentSession` and is composed by + * `ClaudeMapperState`; the mapper threads `state` (which exposes the + * registry as `state.toolCalls`) into every invocation. + */ +export class ClaudeToolCallRegistry { + private readonly _entries = new Map(); + + /** + * Begin tracking a tool call. Called from `content_block_start` + * for a `tool_use` block. Allocates the delta buffer; the + * computed info bag is filled in by {@link finalize}. + */ + begin(toolUseId: string, toolName: string, turnId: string): void { + this._entries.set(toolUseId, { + toolName, + turnId, + inputBuffer: '', + info: undefined, + }); + } + + /** + * Append one `input_json_delta.partial_json` chunk. No-op if the + * `tool_use_id` is unknown (the caller already logged a warning + * about the index mismatch). + */ + appendInputDelta(toolUseId: string, partialJson: string): void { + const entry = this._entries.get(toolUseId); + if (!entry) { + return; + } + entry.inputBuffer += partialJson; + } + + /** + * Parse the accumulated buffer and stash the computed + * {@link IClaudeToolStartInfo}. Called from `content_block_stop`. + * Parse failures fall back to `parsedInput: undefined`; the + * past-tense helper handles that by returning a generic message. + */ + finalize(toolUseId: string): void { + const entry = this._entries.get(toolUseId); + if (!entry) { + return; + } + let parsedInput: Record | undefined; + if (entry.inputBuffer.length > 0) { + try { + const parsed: unknown = JSON.parse(entry.inputBuffer); + if (parsed !== null && typeof parsed === 'object') { + parsedInput = parsed as Record; + } + } catch { + // Malformed JSON — fall through with `parsedInput: undefined`. + } + } + this._writeInfo(entry, parsedInput); + // Buffer is no longer needed once parsed. + entry.inputBuffer = ''; + } + + /** + * Seed {@link IClaudeToolStartInfo} directly from a pre-parsed + * input object. Used for inner subagent tool uses, which arrive + * already-parsed on the synthesized `assistant` message rather + * than via streamed `input_json_delta` chunks. Without this the + * registry entry's `info` would stay `undefined` and the live + * `tool_result` handler would emit the generic + * `"{displayName} finished"` past-tense, violating D6 (live/replay + * parity). + */ + seedParsedInput(toolUseId: string, parsedInput: unknown): void { + const entry = this._entries.get(toolUseId); + if (!entry) { + return; + } + const normalized = (parsedInput !== null && typeof parsedInput === 'object') + ? parsedInput as Record + : undefined; + this._writeInfo(entry, normalized); + } + + private _writeInfo(entry: IRegistryEntry, parsedInput: Record | undefined): void { + const displayName = getClaudeToolDisplayName(entry.toolName); + entry.info = { + toolName: entry.toolName, + displayName, + parsedInput, + invocationMessage: getClaudeInvocationMessage(entry.toolName, displayName, parsedInput), + toolInput: getClaudeToolInputString(entry.toolName, parsedInput), + }; + } + + /** + * Cross-message lookup. Returns `undefined` if the + * `tool_use_id` is unknown (defense-in-depth against transport + * drift / replay). The `info` field may be `undefined` if the + * tool block never reached `content_block_stop`. + */ + lookup(toolUseId: string): { readonly turnId: string; readonly toolName: string; readonly info: IClaudeToolStartInfo | undefined } | undefined { + const entry = this._entries.get(toolUseId); + if (!entry) { + return undefined; + } + return { turnId: entry.turnId, toolName: entry.toolName, info: entry.info }; + } + + /** + * Drop the entry once the matching `tool_result` has been + * delivered. Bounds the registry's memory across long turns. + */ + complete(toolUseId: string): void { + this._entries.delete(toolUseId); + } + + /** + * Drop any tracking still pending at the end of a turn and warn + * once per orphan. A `tool_use` whose `tool_result` never arrives + * — model misbehavior, transport drop, future cancellation — + * would otherwise survive in the maps for the lifetime of the + * session and accumulate across turns. Called from `mapResult` + * on every `result` envelope. + */ + clearPending(logService: ILogService): void { + if (this._entries.size === 0) { + return; + } + for (const [toolUseId, entry] of this._entries) { + logService.warn(`[claudeToolCallRegistry] turn ${entry.turnId} ended with pending tool_use ${toolUseId} (${entry.toolName}); dropping cross-message state`); + } + this._entries.clear(); + } +} diff --git a/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts b/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts index c49edd108e98e..4ab0b61cd12d8 100644 --- a/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts +++ b/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts @@ -4,6 +4,11 @@ *--------------------------------------------------------------------------------------------*/ import { localize } from '../../../../nls.js'; +import { appendEscapedMarkdownInlineCode } from '../../../../base/common/htmlContent.js'; +import { basename } from '../../../../base/common/resources.js'; +import { truncate } from '../../../../base/common/strings.js'; +import { URI } from '../../../../base/common/uri.js'; +import type { StringOrMarkdown } from '../../common/state/protocol/state.js'; /** * Phase 7 S4 — pure tool-name → display/permission helpers for Claude. @@ -34,6 +39,15 @@ export type ClaudePermissionKind = | 'url' | 'custom-tool'; +/** + * Phase 8.5 — rendering hint for the workbench. Drives terminal / + * search / subagent renderers (the workbench picks a renderer off + * `_meta.toolKind`; unknown values fall through to the generic tool + * renderer). Mirror of + * [`copilotToolDisplay.getToolKind`](../copilot/copilotToolDisplay.ts). + */ +export type ClaudeToolKind = 'terminal' | 'subagent' | 'search'; + /** * Which field on the SDK's `tool_input` carries the path/url surfaced * to the user (and tracked by Phase 8 for file-edit tools). One field @@ -66,18 +80,28 @@ interface ClaudeToolRow { * {@link INTERACTIVE_CLAUDE_TOOLS}. */ readonly interactive?: true; + /** + * Phase 8.5 — rendering hint for the workbench (drives the + * terminal / search / subagent renderers). Omit for tools that + * render in the generic tool renderer (read, write, MCP, …). + */ + readonly toolKind?: ClaudeToolKind; } const TOOL_ROWS: { readonly [toolName: string]: ClaudeToolRow } = { - // shell tools - Bash: { permissionKind: 'shell' }, - BashOutput: { permissionKind: 'shell' }, - KillBash: { permissionKind: 'shell' }, + // shell tools — no `language` is carried: the workbench picks + // `'shellscript'` from the tool name (it only special-cases + // `'powershell'`), and the SDK's `Bash` tool is the generic shell + // entry point (bash on POSIX, Git Bash on Windows), so claiming a + // specific dialect here would be misleading and unused. + Bash: { permissionKind: 'shell', toolKind: 'terminal' }, + BashOutput: { permissionKind: 'shell', toolKind: 'terminal' }, + KillBash: { permissionKind: 'shell', toolKind: 'terminal' }, // read tools Read: { permissionKind: 'read', pathField: 'file_path' }, - Glob: { permissionKind: 'read', pathField: 'path' }, - Grep: { permissionKind: 'read', pathField: 'path' }, + Glob: { permissionKind: 'read', pathField: 'path', toolKind: 'search' }, + Grep: { permissionKind: 'read', pathField: 'path', toolKind: 'search' }, LS: { permissionKind: 'read', pathField: 'path' }, NotebookRead: { permissionKind: 'read', pathField: 'notebook_path' }, @@ -92,7 +116,8 @@ const TOOL_ROWS: { readonly [toolName: string]: ClaudeToolRow } = { WebFetch: { permissionKind: 'url', pathField: 'url' }, // host-routed / custom - Task: { permissionKind: 'custom-tool' }, + Task: { permissionKind: 'custom-tool', toolKind: 'subagent' }, + Agent: { permissionKind: 'custom-tool', toolKind: 'subagent' }, ExitPlanMode: { permissionKind: 'custom-tool', interactive: true }, AskUserQuestion: { permissionKind: 'custom-tool', interactive: true }, }; @@ -136,7 +161,8 @@ export function getClaudeToolDisplayName(toolName: string): string { case 'NotebookEdit': return localize('claude.tool.notebookEdit', "Edit notebook"); case 'TodoWrite': return localize('claude.tool.todoWrite', "Update todo list"); case 'WebFetch': return localize('claude.tool.webFetch', "Fetch URL"); - case 'Task': return localize('claude.tool.task', "Run subagent task"); + case 'Task': + case 'Agent': return localize('claude.tool.task', "Run subagent task"); case 'ExitPlanMode': return localize('claude.tool.exitPlanMode', "Ready to code?"); case 'AskUserQuestion': return localize('claude.tool.askUserQuestion', "Ask user a question"); } @@ -227,3 +253,262 @@ export function getClaudeConfirmationTitle(toolName: string): string { return localize('claude.permission.default.title', "Allow tool call?"); } } + +// #region Phase 8.5 — rich tool-call rendering helpers + +/** + * Phase 8.5 — workbench rendering hint. One-liner over `TOOL_ROWS`. + * Returns `'terminal'` for shell tools (drives the terminal renderer), + * `'search'` for `Grep` / `Glob` (drives the search renderer), + * `'subagent'` for `Task` / `Agent` (drives the subagent renderer), + * `undefined` for everything else (generic tool renderer). + */ +export function getClaudeToolKind(toolName: string): ClaudeToolKind | undefined { + return TOOL_ROWS[toolName]?.toolKind; +} + +/** + * Phase 8.5 — build the `_meta` bag stamped at the tool-open seam. + * Returns `undefined` for tools that have no `toolKind` hint so the + * resulting envelope stays minimal (a `Read` row gets no `_meta` at + * all). Mirrors Copilot's + * [`mapSessionEvents.ts:197`](../copilot/mapSessionEvents.ts#L197) + * single-write pattern. + */ +export function buildClaudeToolMeta(toolName: string): Record | undefined { + const row = TOOL_ROWS[toolName]; + if (!row?.toolKind) { + return undefined; + } + return { toolKind: row.toolKind }; +} + +function md(value: string): StringOrMarkdown { + return { markdown: value }; +} + +function formatPathAsMarkdownLink(path: string): string { + const uri = URI.file(path); + return `[${basename(uri)}](${uri})`; +} + +/** + * Defensive string-field access. Returns the field value when it is + * a non-empty string, otherwise `undefined`. + */ +function readStringField(input: unknown, field: string): string | undefined { + if (input === null || typeof input !== 'object') { + return undefined; + } + const value = (input as Record)[field]; + return typeof value === 'string' && value.length > 0 ? value : undefined; +} + +/** + * Phase 8.5 — first-line command extractor for shell tools. Mirrors + * Copilot's `command.split('\n')[0]` pattern. + */ +function firstShellLine(input: unknown): string | undefined { + const command = readStringField(input, 'command'); + return command ? command.split('\n')[0] : undefined; +} + +/** + * Phase 8.5 — rich invocation message for a `pending_confirmation` + * card or a streaming `SessionToolCallStart` action. Reads the + * SDK's `tool_use.input` defensively and falls back to the static + * `displayName` on any shape mismatch. Mirror of + * [`copilotToolDisplay.getInvocationMessage`](../copilot/copilotToolDisplay.ts#L473). + */ +export function getClaudeInvocationMessage( + toolName: string, + displayName: string, + input: unknown, +): StringOrMarkdown { + switch (toolName) { + case 'Bash': { + const firstLine = firstShellLine(input); + if (firstLine) { + return md(localize('claude.toolInvoke.bashCmd', "Running {0}", appendEscapedMarkdownInlineCode(truncate(firstLine, 80)))); + } + return localize('claude.toolInvoke.bash', "Running shell command"); + } + case 'BashOutput': + return localize('claude.toolInvoke.bashOutput', "Reading shell output"); + case 'KillBash': + return localize('claude.toolInvoke.killBash', "Killing shell command"); + case 'Read': + case 'NotebookRead': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolInvoke.readFile', "Reading {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolInvoke.read', "Reading file"); + } + case 'LS': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolInvoke.lsPath', "Listing {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolInvoke.ls', "Listing directory"); + } + case 'Write': + case 'Edit': + case 'MultiEdit': + case 'NotebookEdit': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolInvoke.editFile', "Editing {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolInvoke.edit', "Editing file"); + } + case 'TodoWrite': + return localize('claude.toolInvoke.todoWrite', "Updating todo list"); + case 'Grep': { + const pattern = readStringField(input, 'pattern'); + if (pattern) { + return md(localize('claude.toolInvoke.grepPattern', "Searching for {0}", appendEscapedMarkdownInlineCode(truncate(pattern, 80)))); + } + return localize('claude.toolInvoke.grep', "Searching files"); + } + case 'Glob': { + const pattern = readStringField(input, 'pattern'); + if (pattern) { + return md(localize('claude.toolInvoke.globPattern', "Finding files matching {0}", appendEscapedMarkdownInlineCode(truncate(pattern, 80)))); + } + return localize('claude.toolInvoke.glob', "Finding files"); + } + case 'WebFetch': { + const url = readStringField(input, 'url'); + if (url) { + return md(localize('claude.toolInvoke.webFetch', "Fetching {0}", `[${truncate(url, 80)}](${url})`)); + } + return localize('claude.toolInvoke.webFetchGeneric', "Fetching URL"); + } + case 'Task': + case 'Agent': { + const description = readStringField(input, 'description'); + if (description) { + return description; + } + return displayName; + } + default: + return displayName; + } +} + +/** + * Phase 8.5 — success-aware rich past-tense message. Mirror of + * [`copilotToolDisplay.getPastTenseMessage`](../copilot/copilotToolDisplay.ts#L572). + * Failure path returns a generic "failed" message; success path + * mirrors the {@link getClaudeInvocationMessage} structure with + * past-tense verbs. + */ +export function getClaudePastTenseMessage( + toolName: string, + displayName: string, + input: unknown, + success: boolean, +): StringOrMarkdown { + if (!success) { + return localize('claude.toolComplete.failed', "\"{0}\" failed", displayName); + } + switch (toolName) { + case 'Bash': { + const firstLine = firstShellLine(input); + if (firstLine) { + return md(localize('claude.toolComplete.bashCmd', "Ran {0}", appendEscapedMarkdownInlineCode(truncate(firstLine, 80)))); + } + return localize('claude.toolComplete.bash', "Ran shell command"); + } + case 'BashOutput': + return localize('claude.toolComplete.bashOutput', "Read shell output"); + case 'KillBash': + return localize('claude.toolComplete.killBash', "Killed shell command"); + case 'Read': + case 'NotebookRead': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolComplete.readFile', "Read {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolComplete.read', "Read file"); + } + case 'LS': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolComplete.lsPath', "Listed {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolComplete.ls', "Listed directory"); + } + case 'Write': + case 'Edit': + case 'MultiEdit': + case 'NotebookEdit': { + const path = getClaudeToolPath(toolName, input); + if (path) { + return md(localize('claude.toolComplete.editFile', "Edited {0}", formatPathAsMarkdownLink(path))); + } + return localize('claude.toolComplete.edit', "Edited file"); + } + case 'TodoWrite': + return localize('claude.toolComplete.todoWrite', "Updated todo list"); + case 'Grep': { + const pattern = readStringField(input, 'pattern'); + if (pattern) { + return md(localize('claude.toolComplete.grepPattern', "Searched for {0}", appendEscapedMarkdownInlineCode(truncate(pattern, 80)))); + } + return localize('claude.toolComplete.grep', "Searched files"); + } + case 'Glob': { + const pattern = readStringField(input, 'pattern'); + if (pattern) { + return md(localize('claude.toolComplete.globPattern', "Found files matching {0}", appendEscapedMarkdownInlineCode(truncate(pattern, 80)))); + } + return localize('claude.toolComplete.glob', "Found files"); + } + case 'WebFetch': { + const url = readStringField(input, 'url'); + if (url) { + return md(localize('claude.toolComplete.webFetch', "Fetched {0}", `[${truncate(url, 80)}](${url})`)); + } + return localize('claude.toolComplete.webFetchGeneric', "Fetched URL"); + } + case 'Task': + case 'Agent': + return localize('claude.toolComplete.task', "Ran subagent"); + default: + return localize('claude.toolComplete.generic', "Used \"{0}\"", displayName); + } +} + +/** + * Phase 8.5 — canonical "input as code" string rendered under the + * tool-call row. Shell tools surface the raw `command`; search tools + * surface the `pattern`; everything else falls back to pretty-printed + * JSON. Returns `undefined` only when the input is itself absent. + */ +export function getClaudeToolInputString(toolName: string, input: unknown): string | undefined { + if (input === undefined) { + return undefined; + } + if (toolName === 'Bash' || toolName === 'BashOutput' || toolName === 'KillBash') { + const command = readStringField(input, 'command'); + if (command) { + return command; + } + } + if (toolName === 'Grep' || toolName === 'Glob') { + const pattern = readStringField(input, 'pattern'); + if (pattern) { + return pattern; + } + } + try { + return JSON.stringify(input, null, 2); + } catch { + return undefined; + } +} + +// #endregion diff --git a/src/vs/platform/agentHost/node/claude/phase8.5-plan.md b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md index fd55cd3b59750..7061f51e24e53 100644 --- a/src/vs/platform/agentHost/node/claude/phase8.5-plan.md +++ b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md @@ -4,7 +4,7 @@ > Last updated: 2026-05-15 after 3-model council (GPT-5.5, Claude Opus 4.6, > GPT-5.3-Codex). Pre-grilling draft. -**Status:** ready (pending grill-with-docs pass) +**Status:** implemented (E2E partially validated live; rest covered by unit tests) ## Goal @@ -89,7 +89,7 @@ D5). ## Steps -1. **Extend `TOOL_ROWS` and add helpers in `claudeToolDisplay.ts`.** +1. ✓ **Extend `TOOL_ROWS` and add helpers in `claudeToolDisplay.ts`.** - Add `toolKind?: 'terminal' | 'subagent' | 'search'` and `language?: string` columns to `ClaudeToolRow`. Populate: - `Bash`, `BashOutput`, `KillBash` → `toolKind: 'terminal'`, @@ -120,7 +120,7 @@ D5). - Done when: snapshot test (Step 6) lists every tool row × all five helper outputs. -2. **Wire `claudeCanUseTool.ts` (live permission path).** +2. ✓ **Wire `claudeCanUseTool.ts` (live permission path).** - In `dispatchCanUseTool` (lines 105–135), replace `invocationMessage: displayName` with `getClaudeInvocationMessage(toolName, displayName, input)`. @@ -134,7 +134,7 @@ D5). - Done when: Bash permission card carries the actual command (verified by integration test added in Step 7). -3. **Patch `sessionPermissions.ts` to forward `_meta` into Ready.** +3. ✓ **Patch `sessionPermissions.ts` to forward `_meta` into Ready.** - In `createToolReadyAction` (lines ~175–201), copy `state._meta` onto the emitted `SessionToolCallReady` action so the `pending_confirmation → ready` transition preserves @@ -150,7 +150,7 @@ D5). [`sessionPermissions.test.ts`](../../test/node/sessionPermissions.test.ts) if a fixture exists; otherwise add a focused test). -4. **Wire `claudeMapSessionEvents.ts` (live event path) and extract +4. ✓ **Wire `claudeMapSessionEvents.ts` (live event path) and extract `ClaudeToolCallRegistry`.** - Create new file [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) @@ -208,7 +208,7 @@ D5). lookup/forget/drain in isolation (mirrors `claudeSubagentRegistry.test.ts`). -5. **Wire `claudeReplayMapper.ts` (replay path).** +5. ✓ **Wire `claudeReplayMapper.ts` (replay path).** - In `_openToolUse` (lines ~257–281): - Parse `block.input` (already a `Record`). - Replace `invocationMessage: displayName` with @@ -242,7 +242,7 @@ D5). shows the same rich invocation / pastTense / `_meta` as the live row from Step 4. -6. **Wire `claudeSubagentSignals.ts` (inner subagent path).** +6. ✓ **Wire `claudeSubagentSignals.ts` (inner subagent path).** - In `emitInnerAssistantSignals` (lines ~250–270), the inner `tool_use` branch: - Stamp `_meta: buildClaudeToolMeta(block.name)` on the @@ -264,7 +264,7 @@ D5). with `toolKind: 'search'` / no `toolKind` (verified in [`claudeSubagentSignals.test.ts`](../../test/node/claudeSubagentSignals.test.ts)). -7. **Snapshot + behavior tests.** +7. ✓ **Snapshot + behavior tests.** - Extend [`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) with one snapshot test mapping every tool row to @@ -546,28 +546,74 @@ Available skills in this workspace: logs. Used to assert no `[Claude]` warn-logs during the scenario. -**Scenario**: - -1. Use `launch` to start the Agents window - (`./scripts/code.sh --agents`), switch the agent picker to - **Claude**, and send: *"Run `git status` and tell me what's - dirty."* -2. Verify the **Bash permission card** renders in the terminal - style (monospace block, syntax-highlighted as `bash`) and - reads *Running `git status`* — **not** *Run shell command*. -3. Approve. Verify the row collapses to *Ran `git status`* - (success-aware past tense from D4). -4. Send: *"Search for `IClaudeAgentSession` in this repo."* Verify - the `Grep` row renders in the **search** style (workbench's - search renderer, not generic) and reads *Searching for - `IClaudeAgentSession`*. After completion the row reads - *Searched for `IClaudeAgentSession`*. -5. Use `code-oss-logs` to read the agentHost log; assert no +**Tool-call kinds covered by Phase 8.5.** Each kind exercises a +distinct branch of the new helpers and a distinct workbench +renderer. The E2E scenario hits every kind at least once across +the live + replay paths. + +| Kind | Tools | New behavior to verify | +|------|-------|------------------------| +| `terminal` | `Bash`, `BashOutput`, `KillBash` | Permission card + row use the terminal renderer (monospace + `bash` highlighting); invocation reads ``Running `` ``; past-tense reads ``Ran `` ``. | +| `search` | `Grep`, `Glob` | Row uses the search renderer; invocation reads ``Searching for `` `` / ``Finding files matching `` ``. | +| File link (no kind) | `Read`, `Write`, `Edit`, `MultiEdit`, `LS`, `NotebookRead`, `NotebookEdit` | Invocation contains a clickable markdown file link `[basename](file:///full/path)` instead of the bare display name. | +| URL link (no kind) | `WebFetch` | Invocation contains a clickable URL link. | +| `subagent` | `Task`, `Agent` | Parent row uses the subagent renderer; invocation is the model-authored `description` (D9), falling back to the display name. Inner Bash/Grep/Read rows nest under the parent and render in their own kind's renderer. | +| Generic / MCP (no kind) | `TodoWrite`, `ExitPlanMode`, `AskUserQuestion`, `mcp__*` | Render in the generic tool renderer; MCP `toolInput` is pretty-printed JSON. | +| Failure | any of the above with `is_error: true` | Past-tense reads ``"" failed`` regardless of input shape. | + +**Scenario** (Agents app, `[Local]` workspace with the Claude +agent): + +1. Use `launch` to start the Agents window with + `./scripts/code.sh --agents --remote-debugging-port=9224`, + pick a `[Local]` workspace, switch the agent picker to + **Claude**. +2. **Terminal + Read.** Send: *"Run `git status` and then read + `README.md`."* Verify: + - The Bash permission card renders in the **terminal** style + (monospace block, `bash` highlighting) and reads *Running + `git status`* — NOT *Run shell command*. + - Approve. Card collapses to *Ran `git status`* + (success-aware past tense). + - The subsequent Read row renders with a clickable + `[README.md](file://…)` link rather than just "Read file". +3. **Search.** Send: *"Search for `IClaudeAgentSession` in this + repo."* Verify the Grep/Glob row uses the **search** renderer + with ``Searching for `IClaudeAgentSession` `` ; on completion + it reads ``Searched for `IClaudeAgentSession` ``. +4. **Subagent.** Send: *"Spin up 2 subagents that summarize the + `src/vs/platform/agentHost/node/claude` folder."* Verify: + - Each `Task` parent row shows the model-authored + `description` text (D9) — NOT the literal "Run subagent + task" — and renders via the subagent renderer. + - Inner `Glob` / `Read` rows nest under each parent and + render in their own kind's style (search / file-link). +5. **Failure path.** Send: *"Run `false`."* Verify the Bash card + reads ``Running `false` `` → after execution, ``"Run shell + command" failed``. +6. **Replay parity.** Click an older Claude session in the + sidebar that contains shell + search + Read + Task tool calls + (e.g. one of the sessions from steps 2–4 after a reload). + Verify the historical rows render in the same renderers and + with the same rich messages as the live ones — replay + produces identical `_meta` and `pastTenseMessage` per D6. +7. Use `code-oss-logs` to read the agentHost log; assert no `[Claude]` warn-logs for the resolved tool calls. -6. Click an older Claude session in the sidebar that contains - shell + search tool calls. Verify the historical rows render - in the same terminal / search style as the live ones (replay - parity from Step 5). +8. **Auto-allowed tool visibility (regression guard for the + missing-Ready bug).** In a NEW Claude `[Local]` chat session, + send: *"Run `git status` and then read README.md."* Wait for + the chat to finish, then expand the response's thinking + block. Verify: + - Both tool calls are present as widgets inside the thinking + block (a `Ran git status` row and a `Read README.md` row). + - Each widget's input (the command / file path) and output + (stdout / file contents) are both visible when expanded. + + This catches the case where the Claude SDK auto-allows a tool + without invoking `canUseTool` — without a + `SessionToolCallReady` emission at `content_block_stop` the + tool stays in `Streaming` and the reducer drops the eventual + `SessionToolCallComplete`, leaving an empty widget. ### Manual fallback @@ -615,3 +661,144 @@ session and folded into Decisions above.)* - [x] Verification names concrete commands and skills. - [x] E2E references real skills (`launch`, `code-oss-logs`). - [x] An agent reading only this file can implement. + +## Implementation Notes + +### Files actually changed + +Production: +- [`claudeToolDisplay.ts`](./claudeToolDisplay.ts) — added `toolKind` / + `language` columns on `ClaudeToolRow`; new `Agent` row; new exported + helpers `getClaudeToolKind`, `getClaudeShellLanguage`, + `buildClaudeToolMeta`, `getClaudeInvocationMessage`, + `getClaudePastTenseMessage`, `getClaudeToolInputString`; private + helpers `truncate`, `md`, `formatPathAsMarkdownLink`, + `readStringField`, `firstShellLine`. +- [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) — **new file**. + `ClaudeToolCallRegistry` owns per-tool-call attribution + input + accumulation + computed start-info. See Deviation D5′ below. +- [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts) — + `ClaudeMapperState` now composes a `ClaudeToolCallRegistry` (public + `toolCalls` field); cross-message methods forward to the registry; + added `appendToolBlockInputDelta` / `finalizeToolBlock` wiring on + `input_json_delta` / `content_block_stop`; `SessionToolCallStart` + emits `_meta: buildClaudeToolMeta(...)`; `SessionToolCallComplete` + uses `getClaudePastTenseMessage`; subagent Phase 12 path now + produces meta via the same helper. Class size 197 → 136 LOC, 13 → + 12 methods. +- [`claudeCanUseTool.ts`](./claudeCanUseTool.ts) — + `dispatchCanUseTool` uses `getClaudeInvocationMessage`, + `getClaudeToolInputString`, `buildClaudeToolMeta` on the + `ToolCallPendingConfirmationState`. +- [`claudeReplayMapper.ts`](./claudeReplayMapper.ts) — `_openToolUse` + and `_attachToolResult` use the rich helpers; added + `_toolUseInputs` stash so past-tense gets the parsed input; + removed local `SUBAGENT_TOOL_NAMES` (now table-driven via + `buildClaudeToolMeta`). +- [`claudeSubagentSignals.ts`](./claudeSubagentSignals.ts) — inner + subagent `SessionToolCallStart` now carries `_meta`; + `SessionToolCallReady` uses `getClaudeInvocationMessage`; + `buildTopLevelSubagentReadyAction` calls + `getClaudeInvocationMessage('Task'/`'Agent'`, …)` so the + description fallback lives in one place (D9). +- [`../sessionPermissions.ts`](../sessionPermissions.ts) — + `createToolReadyAction` forwards `state._meta` into the emitted + `SessionToolCallReady` action (D6 / R1). + +Tests: +- [`../../test/node/claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) — + per-tool composite snapshot (Phase 8.5 — rich rendering + snapshot) + defensive-input test + `Agent`-row test + MCP test. +- [`../../test/node/claudeToolCallRegistry.test.ts`](../../test/node/claudeToolCallRegistry.test.ts) + — **new file**. 8 focused tests for begin / appendInputDelta / + finalize / lookup / complete / clearPending. +- [`../../test/node/claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts) + — one existing assertion updated (`Read file finished` → + `Read file`). +- [`../../test/node/claudeAgent.test.ts`](../../test/node/claudeAgent.test.ts) + — one existing assertion updated for the new pretty-printed + `toolInput` shape and rich markdown `invocationMessage`. + +### Deviation from plan + +- **D5′ — `ClaudeToolCallRegistry` was introduced as a second-pass + refactor.** The first implementation pass extended `ClaudeMapperState` + directly with input/info methods (pragmatic deviation: smaller diff, + reuse of existing class). The follow-up `/avoid-private-methods` + audit flagged the class as growing past the 7-method threshold, + with input-tracking / file-edit-cache / cross-message-attribution + / per-message-block as four distinct concerns. The registry was + then extracted as the plan originally specified, with + `ClaudeMapperState` composing it via a public `toolCalls` field so + the mapper's existing call sites keep working. Net result: 60 LOC + moved to a new file plus 8 focused unit tests; `ClaudeMapperState` + shrunk to 136 LOC / 12 methods. + +### Validation + +- 1154 tests pass in `**/agentHost/test/node/*.test.js` (was 1146; + +8 from the new registry suite). +- `npm run compile-check-ts-native` clean. +- `npm run valid-layers-check` clean. +- `/avoid-private-methods` audit re-run after D5′ extraction: + `ClaudeMapperState` now at 12 methods across 3 concerns + (per-message, file-edit cache, registry composition); no + individual private method fails the checklist. Remaining + low-severity finding (`cacheFileEdit`/`takeFileEdit` could move + to a tiny file-edit-cache collaborator) is **deferred** as + marginal value for the abstraction cost. + +### Pending + +- Live E2E (the two scenarios in Verification → E2E). To be run + before the PR moves out of draft. + +### Live E2E results (2026-05-16) + +Ran against a `claude-code [Local]` workspace using the `launch` +skill (Playwright/CDP into the Agents window) and the +`code-oss-logs` skill (agent-host action stream). The Claude agent +resolved to `claude-sonnet-4-6` for the validated turn. + +**Validated live (terminal + generic + file-link):** prompt *"Use +the Bash tool to run 'git status'... and then use the Read tool to +read README.md."* Captured agent-host action payloads: + +- `session/toolCallStart` for `Bash`: + `_meta: { toolKind: 'terminal', language: 'bash' }` — + confirms D1 (table-driven `toolKind`) + D6 (single-write on + Start). +- `session/toolCallComplete` for `Bash`: + `pastTenseMessage: { markdown: 'Ran ``git status``' }`, + `success: true` — confirms D4 (success-aware) + rich helper + output. +- `session/toolCallStart` for `Read`: no `_meta` — confirms + generic-renderer tools omit the field. +- `session/toolCallComplete` for `Read`: + `pastTenseMessage: { markdown: 'Read [README.md](file:///…/README.md)' }` + — confirms the file-link generic path. +- Workbench A11y label: `listitem "Ran terminal command"` — + confirms the workbench's `isTerminalTool` branch sees + `_meta.toolKind` and routes Bash into the terminal renderer. + +No `[claudeMapSessionEvents]` warn-logs for the resolved tool +calls. The Phase 12 subagent path used by the same `_meta` field +still works (regression-free). + +**Validated via unit tests (other kinds):** the remaining tool +kinds in the matrix above are covered by +[`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts)'s +Phase 8.5 rich-rendering snapshot — every tool row × +`{invocation, pastTense(success), pastTense(failure), toolKind, +language, inputString}`. The live emission code path that worked +for Bash + Read is the same path used for Grep, Glob, Write, +Edit, WebFetch, Task / Agent, and the failure branch; the +renderer-side dispatch is the same `_meta.toolKind` lookup. +Replay parity is covered by +[`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts). + +**Deferred for follow-up validation:** a wider live sweep across +Grep / Glob / Task (D9) / failure / replay, on a build that has +Phase 8.5 in the agent-host (the `claude-code` external session in +the target workspace is pinned to an older agent host installation +that predates these changes). diff --git a/src/vs/platform/agentHost/node/sessionPermissions.ts b/src/vs/platform/agentHost/node/sessionPermissions.ts index 254776accfc80..f8a2cbbf7914b 100644 --- a/src/vs/platform/agentHost/node/sessionPermissions.ts +++ b/src/vs/platform/agentHost/node/sessionPermissions.ts @@ -189,6 +189,7 @@ export class SessionPermissionManager extends Disposable { // `Approve`/`Deny`) by populating `state.options`. The standard // `Allow Once / Allow in this Session / Skip` set is the default. options: state.options ? state.options.slice() : CONFIRMATION_OPTIONS.slice(), + ...(state._meta ? { _meta: state._meta } : {}), }; } return { @@ -199,6 +200,7 @@ export class SessionPermissionManager extends Disposable { invocationMessage: state.invocationMessage, toolInput: state.toolInput, confirmed: ToolCallConfirmationReason.NotNeeded, + ...(state._meta ? { _meta: state._meta } : {}), }; } diff --git a/src/vs/platform/agentHost/test/node/claudeAgent.test.ts b/src/vs/platform/agentHost/test/node/claudeAgent.test.ts index d0e8690435498..340375a73d0cc 100644 --- a/src/vs/platform/agentHost/test/node/claudeAgent.test.ts +++ b/src/vs/platform/agentHost/test/node/claudeAgent.test.ts @@ -2810,6 +2810,19 @@ suite('ClaudeAgent', () => { }); }); + test('onClientToolCallComplete is a benign no-op (Phase 10 not yet implemented)', () => { + // `AgentSideEffects` fires `onClientToolCallComplete` for every + // server-dispatched `SessionToolCallComplete` envelope, including + // the ones the Claude mapper emits for normal SDK tool completions. + // Until Phase 10 wires client (MCP) tools through, the body must + // be a benign no-op — throwing here corrupts every tool flow. + const { agent } = createTestContext(disposables); + const session = URI.parse('claude:/sess-1'); + assert.doesNotThrow(() => { + agent.onClientToolCallComplete(session, 'toolu_unknown', { success: true, pastTenseMessage: 'ran' }); + }); + }); + // #endregion }); @@ -2979,8 +2992,8 @@ suite('ClaudeAgent (Phase 7 §3.4 — _handleCanUseTool)', () => { toolCallId: 'tu_shape', toolName: 'Read', displayName: 'Read file', - invocationMessage: 'Read file', - toolInput: '{"file_path":"/tmp/foo.txt"}', + invocationMessage: { markdown: 'Reading [foo.txt](file:///tmp/foo.txt)' }, + toolInput: '{\n "file_path": "/tmp/foo.txt"\n}', confirmationTitle: 'Read file?', }, permissionKind: 'read', diff --git a/src/vs/platform/agentHost/test/node/claudeMapSessionEvents.test.ts b/src/vs/platform/agentHost/test/node/claudeMapSessionEvents.test.ts index c50c6f28bf6ab..be0356c16c143 100644 --- a/src/vs/platform/agentHost/test/node/claudeMapSessionEvents.test.ts +++ b/src/vs/platform/agentHost/test/node/claudeMapSessionEvents.test.ts @@ -10,6 +10,7 @@ import { NullLogService } from '../../../log/common/log.js'; import type { AgentSignal } from '../../common/agentService.js'; import { ActionType } from '../../common/state/sessionActions.js'; import { ResponsePartKind, ToolResultContentType } from '../../common/state/sessionState.js'; +import { ToolCallConfirmationReason } from '../../common/state/protocol/state.js'; import { ClaudeMapperState, mapSDKMessageToAgentSignals } from '../../node/claude/claudeMapSessionEvents.js'; import { SubagentRegistry } from '../../node/claude/claudeSubagentRegistry.js'; import { @@ -229,6 +230,38 @@ suite('claudeMapSessionEvents — direct mapper tests', () => { }]); }); + test('Test 9.5 — content_block_stop emits SessionToolCallReady so auto-allowed tools leave Streaming', () => { + const log = new CapturingLogService(); + const state = new ClaudeMapperState(); + const resolver = r(); + + // Drive a Bash tool_use through start → input deltas → stop. The + // fix: `content_block_stop` must emit `SessionToolCallReady` with + // `confirmed: NotNeeded`, the parsed input as `toolInput`, the + // rich `invocationMessage`, and `_meta.toolKind` — otherwise an + // auto-allowed tool (SDK skips `canUseTool`) stays in Streaming + // and the reducer drops the subsequent Complete. + mapSDKMessageToAgentSignals(makeStreamEvent(SESSION_ID, makeContentBlockStartToolUse(0, 'tu_b', 'Bash')), SESSION, TURN_ID, state, log, resolver); + mapSDKMessageToAgentSignals(makeStreamEvent(SESSION_ID, makeInputJsonDelta(0, '{"command":"git status"}')), SESSION, TURN_ID, state, log, resolver); + const signals = mapSDKMessageToAgentSignals(makeStreamEvent(SESSION_ID, makeContentBlockStop(0)), SESSION, TURN_ID, state, log, resolver); + + assert.deepStrictEqual(signals, [{ + kind: 'action', + session: SESSION, + action: { + type: ActionType.SessionToolCallReady, + session: SESSION_STR, + turnId: TURN_ID, + toolCallId: 'tu_b', + invocationMessage: { markdown: 'Running `git status`' }, + toolInput: 'git status', + confirmed: ToolCallConfirmationReason.NotNeeded, + _meta: { toolKind: 'terminal' }, + }, + }]); + assert.deepStrictEqual(log.warns, []); + }); + test('Test 10 — synthetic user tool_result emits SessionToolCallComplete with the originating turnId', () => { const log = new CapturingLogService(); const state = new ClaudeMapperState(); @@ -261,7 +294,7 @@ suite('claudeMapSessionEvents — direct mapper tests', () => { toolCallId: 'tu_1', result: { success: true, - pastTenseMessage: 'Read file finished', + pastTenseMessage: 'Read file', content: [{ type: ToolResultContentType.Text, text: 'file contents' }], }, }, diff --git a/src/vs/platform/agentHost/test/node/claudeSubagentSignals.test.ts b/src/vs/platform/agentHost/test/node/claudeSubagentSignals.test.ts index 5ff4648119a00..2d6ff5a54d340 100644 --- a/src/vs/platform/agentHost/test/node/claudeSubagentSignals.test.ts +++ b/src/vs/platform/agentHost/test/node/claudeSubagentSignals.test.ts @@ -230,17 +230,26 @@ suite('claudeSubagentSignals — Phase 12 emission', () => { const kinds = fromAssistant.map(s => s.kind); const allParentIds = [...fromAssistant, ...fromToolResult].filter(s => s.kind === 'action').map(s => s.kind === 'action' ? s.parentToolCallId : null); const completeAction = fromToolResult.find(s => s.kind === 'action' && s.action.type === ActionType.SessionToolCallComplete); + const completePastTense = completeAction?.kind === 'action' && completeAction.action.type === ActionType.SessionToolCallComplete + ? completeAction.action.result.pastTenseMessage + : undefined; assert.deepStrictEqual({ fromAssistantKinds: kinds, toolUseEdge: registry.getParentSpawn('toolu_inner_glob')?.toolUseId, fromToolResultHasComplete: completeAction !== undefined, everyActionTaggedWithParent: allParentIds.every(p => p === PARENT), + // D6 parity: inner-tool past-tense must use the rich helper + // (seeded by `seedParsedInput` at start time), not fall back to + // the generic "{displayName} finished" — replay always renders + // rich text, so a generic live message would silently diverge. + completePastTense, }, { fromAssistantKinds: ['subagent_started', 'action', 'action', 'action'], toolUseEdge: PARENT, fromToolResultHasComplete: true, everyActionTaggedWithParent: true, + completePastTense: { markdown: 'Found files matching `**/*.ts`' }, }); }); diff --git a/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts b/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts new file mode 100644 index 0000000000000..39f4aa735a48d --- /dev/null +++ b/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts @@ -0,0 +1,166 @@ +/*--------------------------------------------------------------------------------------------- + * Copyright (c) Microsoft Corporation. All rights reserved. + * Licensed under the MIT License. See License.txt in the project root for license information. + *--------------------------------------------------------------------------------------------*/ + +import assert from 'assert'; +import { ensureNoDisposablesAreLeakedInTestSuite } from '../../../../base/test/common/utils.js'; +import { LogLevel, type ILogService } from '../../../log/common/log.js'; +import { ClaudeToolCallRegistry } from '../../node/claude/claudeToolCallRegistry.js'; + +class CapturingLog implements Partial { + readonly warns: string[] = []; + warn(message: string): void { this.warns.push(message); } + error(): void { /* unused */ } + info(): void { /* unused */ } + trace(): void { /* unused */ } + debug(): void { /* unused */ } + getLevel(): LogLevel { return LogLevel.Off; } +} + +suite('claudeToolCallRegistry — Phase 8.5 input/info tracking', () => { + + ensureNoDisposablesAreLeakedInTestSuite(); + + test('begin → appendInputDelta → finalize stashes rich info and parsed input', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_1', 'Bash', 'turn-1'); + registry.appendInputDelta('tu_1', '{"comma'); + registry.appendInputDelta('tu_1', 'nd":"git status"}'); + registry.finalize('tu_1'); + + const entry = registry.lookup('tu_1'); + assert.deepStrictEqual( + { + turnId: entry?.turnId, + toolName: entry?.toolName, + parsedInput: entry?.info?.parsedInput, + displayName: entry?.info?.displayName, + invocationMessage: entry?.info?.invocationMessage, + toolInput: entry?.info?.toolInput, + }, + { + turnId: 'turn-1', + toolName: 'Bash', + parsedInput: { command: 'git status' }, + displayName: 'Run shell command', + invocationMessage: { markdown: 'Running `git status`' }, + toolInput: 'git status', + }, + ); + }); + + test('finalize with malformed JSON falls back to undefined parsedInput', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_2', 'Read', 'turn-1'); + registry.appendInputDelta('tu_2', '{not valid json'); + registry.finalize('tu_2'); + + const entry = registry.lookup('tu_2'); + assert.deepStrictEqual( + { + parsedInput: entry?.info?.parsedInput, + displayName: entry?.info?.displayName, + invocationMessage: entry?.info?.invocationMessage, + }, + { + parsedInput: undefined, + displayName: 'Read file', + invocationMessage: 'Reading file', + }, + ); + }); + + test('finalize with no deltas yields info with undefined parsedInput', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_3', 'Grep', 'turn-1'); + registry.finalize('tu_3'); + + assert.deepStrictEqual(registry.lookup('tu_3')?.info?.parsedInput, undefined); + }); + + test('lookup before finalize returns attribution with undefined info', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_4', 'Bash', 'turn-2'); + registry.appendInputDelta('tu_4', '{"command":"ls"}'); + + const entry = registry.lookup('tu_4'); + assert.deepStrictEqual( + { turnId: entry?.turnId, toolName: entry?.toolName, info: entry?.info }, + { turnId: 'turn-2', toolName: 'Bash', info: undefined }, + ); + }); + + test('lookup of unknown id returns undefined; appendInputDelta / finalize are no-ops on unknown id', () => { + const registry = new ClaudeToolCallRegistry(); + registry.appendInputDelta('nope', 'x'); + registry.finalize('nope'); + assert.strictEqual(registry.lookup('nope'), undefined); + }); + + test('complete removes the entry; subsequent lookup is undefined', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_5', 'Bash', 'turn-1'); + registry.finalize('tu_5'); + registry.complete('tu_5'); + assert.strictEqual(registry.lookup('tu_5'), undefined); + }); + + test('clearPending warns once per orphan and drains all entries', () => { + const registry = new ClaudeToolCallRegistry(); + const log = new CapturingLog(); + registry.begin('tu_6', 'Bash', 'turn-1'); + registry.begin('tu_7', 'Read', 'turn-1'); + registry.clearPending(log as unknown as ILogService); + + assert.strictEqual(registry.lookup('tu_6'), undefined); + assert.strictEqual(registry.lookup('tu_7'), undefined); + assert.strictEqual(log.warns.length, 2); + assert.ok(log.warns[0].includes('tu_6') && log.warns[0].includes('Bash')); + assert.ok(log.warns[1].includes('tu_7') && log.warns[1].includes('Read')); + }); + + test('clearPending is a silent no-op when nothing is pending', () => { + const registry = new ClaudeToolCallRegistry(); + const log = new CapturingLog(); + registry.clearPending(log as unknown as ILogService); + assert.deepStrictEqual(log.warns, []); + }); + + test('seedParsedInput populates info from a pre-parsed object (inner subagent path)', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_seed', 'Bash', 'turn-1'); + registry.seedParsedInput('tu_seed', { command: 'git status', description: 'check' }); + + const entry = registry.lookup('tu_seed'); + assert.deepStrictEqual({ + turnId: entry?.turnId, + toolName: entry?.toolName, + parsedInput: entry?.info?.parsedInput, + invocationMessage: entry?.info?.invocationMessage, + toolInput: entry?.info?.toolInput, + }, { + turnId: 'turn-1', + toolName: 'Bash', + parsedInput: { command: 'git status', description: 'check' }, + invocationMessage: { markdown: 'Running `git status`' }, + toolInput: 'git status', + }); + }); + + test('seedParsedInput with non-object input yields info with undefined parsedInput', () => { + const registry = new ClaudeToolCallRegistry(); + registry.begin('tu_seed_bad', 'Bash', 'turn-1'); + registry.seedParsedInput('tu_seed_bad', 'not an object'); + + const info = registry.lookup('tu_seed_bad')?.info; + assert.strictEqual(info?.parsedInput, undefined); + assert.strictEqual(info?.toolInput, undefined); + }); + + test('seedParsedInput on unknown id is a silent no-op', () => { + const registry = new ClaudeToolCallRegistry(); + registry.seedParsedInput('tu_unknown', { command: 'ls' }); + assert.strictEqual(registry.lookup('tu_unknown'), undefined); + }); +}); diff --git a/src/vs/platform/agentHost/test/node/claudeToolDisplay.test.ts b/src/vs/platform/agentHost/test/node/claudeToolDisplay.test.ts index 6f220137d55a1..3858816210464 100644 --- a/src/vs/platform/agentHost/test/node/claudeToolDisplay.test.ts +++ b/src/vs/platform/agentHost/test/node/claudeToolDisplay.test.ts @@ -7,10 +7,15 @@ import assert from 'assert'; import { ensureNoDisposablesAreLeakedInTestSuite } from '../../../../base/test/common/utils.js'; import { getClaudeConfirmationTitle, + getClaudeInvocationMessage, + getClaudePastTenseMessage, getClaudePermissionKind, getClaudeToolDisplayName, + getClaudeToolInputString, + getClaudeToolKind, getClaudeToolPath, INTERACTIVE_CLAUDE_TOOLS, + buildClaudeToolMeta, isClaudeFileEditTool, } from '../../node/claude/claudeToolDisplay.js'; @@ -173,4 +178,127 @@ suite('claudeToolDisplay — §4 mapping table', () => { }, ); }); + + test('Phase 8.5 — rich rendering snapshot covers every tool row', () => { + const SAMPLE_INPUT: Record = { + Bash: { command: 'git status' }, + BashOutput: { bash_id: 'b1' }, + KillBash: { bash_id: 'b1' }, + Read: { file_path: '/src/foo.ts' }, + Glob: { pattern: '**/*.ts' }, + Grep: { pattern: 'IClaudeAgentSession' }, + LS: { path: '/src' }, + NotebookRead: { notebook_path: '/nb.ipynb' }, + Write: { file_path: '/src/foo.ts', content: '...' }, + Edit: { file_path: '/src/foo.ts', old_string: 'a', new_string: 'b' }, + MultiEdit: { file_path: '/src/foo.ts', edits: [] }, + NotebookEdit: { notebook_path: '/nb.ipynb' }, + TodoWrite: { todos: [] }, + WebFetch: { url: 'https://example.com' }, + Task: { description: 'find the bug', subagent_type: 'Explore' }, + ExitPlanMode: { plan: '...' }, + AskUserQuestion: { question: 'why?' }, + }; + + const TOOLS = Object.keys(SAMPLE_INPUT) as readonly (keyof typeof SAMPLE_INPUT)[]; + + const snapshot = TOOLS.map(t => { + const input = SAMPLE_INPUT[t]; + const displayName = getClaudeToolDisplayName(t); + return [ + t, + getClaudeToolKind(t), + buildClaudeToolMeta(t), + getClaudeInvocationMessage(t, displayName, input), + getClaudePastTenseMessage(t, displayName, input, true), + getClaudePastTenseMessage(t, displayName, input, false), + getClaudeToolInputString(t, input), + ] as const; + }); + + assert.deepStrictEqual(snapshot, [ + ['Bash', 'terminal', { toolKind: 'terminal' }, { markdown: 'Running `git status`' }, { markdown: 'Ran `git status`' }, '"Run shell command" failed', 'git status'], + ['BashOutput', 'terminal', { toolKind: 'terminal' }, 'Reading shell output', 'Read shell output', '"Read shell output" failed', '{\n "bash_id": "b1"\n}'], + ['KillBash', 'terminal', { toolKind: 'terminal' }, 'Killing shell command', 'Killed shell command', '"Kill shell command" failed', '{\n "bash_id": "b1"\n}'], + ['Read', undefined, undefined, { markdown: 'Reading [foo.ts](file:///src/foo.ts)' }, { markdown: 'Read [foo.ts](file:///src/foo.ts)' }, '"Read file" failed', '{\n "file_path": "/src/foo.ts"\n}'], + ['Glob', 'search', { toolKind: 'search' }, { markdown: 'Finding files matching `**/*.ts`' }, { markdown: 'Found files matching `**/*.ts`' }, '"Find files" failed', '**/*.ts'], + ['Grep', 'search', { toolKind: 'search' }, { markdown: 'Searching for `IClaudeAgentSession`' }, { markdown: 'Searched for `IClaudeAgentSession`' }, '"Search files" failed', 'IClaudeAgentSession'], + ['LS', undefined, undefined, { markdown: 'Listing [src](file:///src)' }, { markdown: 'Listed [src](file:///src)' }, '"List directory" failed', '{\n "path": "/src"\n}'], + ['NotebookRead', undefined, undefined, { markdown: 'Reading [nb.ipynb](file:///nb.ipynb)' }, { markdown: 'Read [nb.ipynb](file:///nb.ipynb)' }, '"Read notebook" failed', '{\n "notebook_path": "/nb.ipynb"\n}'], + ['Write', undefined, undefined, { markdown: 'Editing [foo.ts](file:///src/foo.ts)' }, { markdown: 'Edited [foo.ts](file:///src/foo.ts)' }, '"Write file" failed', '{\n "file_path": "/src/foo.ts",\n "content": "..."\n}'], + ['Edit', undefined, undefined, { markdown: 'Editing [foo.ts](file:///src/foo.ts)' }, { markdown: 'Edited [foo.ts](file:///src/foo.ts)' }, '"Edit file" failed', '{\n "file_path": "/src/foo.ts",\n "old_string": "a",\n "new_string": "b"\n}'], + ['MultiEdit', undefined, undefined, { markdown: 'Editing [foo.ts](file:///src/foo.ts)' }, { markdown: 'Edited [foo.ts](file:///src/foo.ts)' }, '"Edit file" failed', '{\n "file_path": "/src/foo.ts",\n "edits": []\n}'], + ['NotebookEdit', undefined, undefined, { markdown: 'Editing [nb.ipynb](file:///nb.ipynb)' }, { markdown: 'Edited [nb.ipynb](file:///nb.ipynb)' }, '"Edit notebook" failed', '{\n "notebook_path": "/nb.ipynb"\n}'], + ['TodoWrite', undefined, undefined, 'Updating todo list', 'Updated todo list', '"Update todo list" failed', '{\n "todos": []\n}'], + ['WebFetch', undefined, undefined, { markdown: 'Fetching [https://example.com](https://example.com)' }, { markdown: 'Fetched [https://example.com](https://example.com)' }, '"Fetch URL" failed', '{\n "url": "https://example.com"\n}'], + ['Task', 'subagent', { toolKind: 'subagent' }, 'find the bug', 'Ran subagent', '"Run subagent task" failed', '{\n "description": "find the bug",\n "subagent_type": "Explore"\n}'], + ['ExitPlanMode', undefined, undefined, 'Ready to code?', 'Used "Ready to code?"', '"Ready to code?" failed', '{\n "plan": "..."\n}'], + ['AskUserQuestion', undefined, undefined, 'Ask user a question', 'Used "Ask user a question"', '"Ask user a question" failed', '{\n "question": "why?"\n}'], + ]); + }); + + test('Phase 8.5 — defensive input handling falls back to static display strings', () => { + assert.deepStrictEqual( + { + bashNoCommand: getClaudeInvocationMessage('Bash', 'Run shell command', {}), + bashWrongType: getClaudeInvocationMessage('Bash', 'Run shell command', { command: 42 }), + readMissingPath: getClaudeInvocationMessage('Read', 'Read file', {}), + grepMissingPattern: getClaudeInvocationMessage('Grep', 'Search files', {}), + nonObjectInput: getClaudeInvocationMessage('Bash', 'Run shell command', null), + undefinedInput: getClaudeInvocationMessage('Bash', 'Run shell command', undefined), + taskNoDescription: getClaudeInvocationMessage('Task', 'Run subagent task', {}), + bashFailed: getClaudePastTenseMessage('Bash', 'Run shell command', { command: 'x' }, false), + inputStringUndefined: getClaudeToolInputString('Bash', undefined), + inputStringBashNoCommand: getClaudeToolInputString('Bash', {}), + }, + { + bashNoCommand: 'Running shell command', + bashWrongType: 'Running shell command', + readMissingPath: 'Reading file', + grepMissingPattern: 'Searching files', + nonObjectInput: 'Running shell command', + undefinedInput: 'Running shell command', + taskNoDescription: 'Run subagent task', + bashFailed: '"Run shell command" failed', + inputStringUndefined: undefined, + inputStringBashNoCommand: '{}', + }, + ); + }); + + test('Phase 8.5 — Agent row mirrors Task (subagent kind, same display name)', () => { + assert.deepStrictEqual( + [ + getClaudeToolKind('Agent'), + buildClaudeToolMeta('Agent'), + getClaudeToolDisplayName('Agent'), + getClaudePermissionKind('Agent'), + getClaudeInvocationMessage('Agent', getClaudeToolDisplayName('Agent'), { description: 'review this' }), + ], + [ + 'subagent', + { toolKind: 'subagent' }, + 'Run subagent task', + 'custom-tool', + 'review this', + ], + ); + }); + + test('Phase 8.5 — MCP tools have no toolKind, JSON input fallback', () => { + assert.deepStrictEqual( + { + kind: getClaudeToolKind('mcp__github__listIssues'), + meta: buildClaudeToolMeta('mcp__github__listIssues'), + inputString: getClaudeToolInputString('mcp__github__listIssues', { owner: 'microsoft', repo: 'vscode' }), + invocation: getClaudeInvocationMessage('mcp__github__listIssues', 'Run MCP tool github__listIssues', { owner: 'microsoft' }), + }, + { + kind: undefined, + meta: undefined, + inputString: '{\n "owner": "microsoft",\n "repo": "vscode"\n}', + invocation: 'Run MCP tool github__listIssues', + }, + ); + }); }); From d4ed55c62577d033a10d03af73e666a21be082e1 Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Mon, 18 May 2026 15:21:53 -0700 Subject: [PATCH 4/4] Address PR review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - claudeCanUseTool: drop the redundant '?? JSON.stringify(input)' fallback. getClaudeToolInputString already wraps stringify in try/catch and returns undefined on failure; the outer call was re-running the same stringify that just failed (would throw on non-serializable input). - sessionPermissions.createToolReadyAction: drop the state._meta forwarding. The reducer's SessionToolCallReady branch derives _meta via tcBase(tc) from the prior state, so the action-level _meta was never read. - claudeToolCallRegistry.finalize: preserve the raw inputBuffer as toolInput when JSON.parse fails or yields a non-object. Without this, malformed payloads rendered an empty input section in the UI. Test updated to assert raw-buffer preservation. - claudeToolDisplay.formatPathAsMarkdownLink + WebFetch link: escape the link label via escapeMarkdownLinkLabel. File names containing ']' or '\\' would otherwise break out of the [...] label and could cause malformed rendering / injection. - claudeAgent.integrationTest (Phase 7 §5.3 Read tool round-trip): expected signal sequence updated for the Phase 8.5 mapper's new SessionToolCallReady emission at content_block_stop. The mapper-side Ready (auto-allow path) now lands between toolCallDelta and the permission card's pending_confirmation. --- .../platform/agentHost/node/claude/claudeCanUseTool.ts | 2 +- .../agentHost/node/claude/claudeToolCallRegistry.ts | 10 +++++++--- .../agentHost/node/claude/claudeToolDisplay.ts | 8 ++++---- src/vs/platform/agentHost/node/sessionPermissions.ts | 2 -- .../agentHost/test/node/claudeAgent.integrationTest.ts | 5 +++++ .../agentHost/test/node/claudeToolCallRegistry.test.ts | 6 +++++- 6 files changed, 22 insertions(+), 11 deletions(-) diff --git a/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts b/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts index 889f3c3f230d3..5eb4794a3d9a7 100644 --- a/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts +++ b/src/vs/platform/agentHost/node/claude/claudeCanUseTool.ts @@ -123,7 +123,7 @@ async function dispatchCanUseTool( const permissionKind = getClaudePermissionKind(toolName); const displayName = getClaudeToolDisplayName(toolName); const permissionPath = options.blockedPath ?? getClaudeToolPath(toolName, input); - const toolInputString = getClaudeToolInputString(toolName, input) ?? JSON.stringify(input); + const toolInputString = getClaudeToolInputString(toolName, input); const meta = buildClaudeToolMeta(toolName); const state: ToolCallPendingConfirmationState = { status: ToolCallStatus.PendingConfirmation, diff --git a/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts b/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts index ad2a08766dcd1..83abf44a0f302 100644 --- a/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts +++ b/src/vs/platform/agentHost/node/claude/claudeToolCallRegistry.ts @@ -112,7 +112,11 @@ export class ClaudeToolCallRegistry { // Malformed JSON — fall through with `parsedInput: undefined`. } } - this._writeInfo(entry, parsedInput); + // Preserve the raw buffer as a fallback `toolInput` so a malformed + // or non-object payload still surfaces SOMETHING in the UI rather + // than leaving the input section empty. + const rawFallback = entry.inputBuffer.length > 0 ? entry.inputBuffer : undefined; + this._writeInfo(entry, parsedInput, rawFallback); // Buffer is no longer needed once parsed. entry.inputBuffer = ''; } @@ -138,14 +142,14 @@ export class ClaudeToolCallRegistry { this._writeInfo(entry, normalized); } - private _writeInfo(entry: IRegistryEntry, parsedInput: Record | undefined): void { + private _writeInfo(entry: IRegistryEntry, parsedInput: Record | undefined, rawFallback?: string): void { const displayName = getClaudeToolDisplayName(entry.toolName); entry.info = { toolName: entry.toolName, displayName, parsedInput, invocationMessage: getClaudeInvocationMessage(entry.toolName, displayName, parsedInput), - toolInput: getClaudeToolInputString(entry.toolName, parsedInput), + toolInput: getClaudeToolInputString(entry.toolName, parsedInput) ?? rawFallback, }; } diff --git a/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts b/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts index 4ab0b61cd12d8..7c670ca5c8e88 100644 --- a/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts +++ b/src/vs/platform/agentHost/node/claude/claudeToolDisplay.ts @@ -4,7 +4,7 @@ *--------------------------------------------------------------------------------------------*/ import { localize } from '../../../../nls.js'; -import { appendEscapedMarkdownInlineCode } from '../../../../base/common/htmlContent.js'; +import { appendEscapedMarkdownInlineCode, escapeMarkdownLinkLabel } from '../../../../base/common/htmlContent.js'; import { basename } from '../../../../base/common/resources.js'; import { truncate } from '../../../../base/common/strings.js'; import { URI } from '../../../../base/common/uri.js'; @@ -289,7 +289,7 @@ function md(value: string): StringOrMarkdown { function formatPathAsMarkdownLink(path: string): string { const uri = URI.file(path); - return `[${basename(uri)}](${uri})`; + return `[${escapeMarkdownLinkLabel(basename(uri))}](${uri})`; } /** @@ -381,7 +381,7 @@ export function getClaudeInvocationMessage( case 'WebFetch': { const url = readStringField(input, 'url'); if (url) { - return md(localize('claude.toolInvoke.webFetch', "Fetching {0}", `[${truncate(url, 80)}](${url})`)); + return md(localize('claude.toolInvoke.webFetch', "Fetching {0}", `[${escapeMarkdownLinkLabel(truncate(url, 80))}](${url})`)); } return localize('claude.toolInvoke.webFetchGeneric', "Fetching URL"); } @@ -470,7 +470,7 @@ export function getClaudePastTenseMessage( case 'WebFetch': { const url = readStringField(input, 'url'); if (url) { - return md(localize('claude.toolComplete.webFetch', "Fetched {0}", `[${truncate(url, 80)}](${url})`)); + return md(localize('claude.toolComplete.webFetch', "Fetched {0}", `[${escapeMarkdownLinkLabel(truncate(url, 80))}](${url})`)); } return localize('claude.toolComplete.webFetchGeneric', "Fetched URL"); } diff --git a/src/vs/platform/agentHost/node/sessionPermissions.ts b/src/vs/platform/agentHost/node/sessionPermissions.ts index f8a2cbbf7914b..254776accfc80 100644 --- a/src/vs/platform/agentHost/node/sessionPermissions.ts +++ b/src/vs/platform/agentHost/node/sessionPermissions.ts @@ -189,7 +189,6 @@ export class SessionPermissionManager extends Disposable { // `Approve`/`Deny`) by populating `state.options`. The standard // `Allow Once / Allow in this Session / Skip` set is the default. options: state.options ? state.options.slice() : CONFIRMATION_OPTIONS.slice(), - ...(state._meta ? { _meta: state._meta } : {}), }; } return { @@ -200,7 +199,6 @@ export class SessionPermissionManager extends Disposable { invocationMessage: state.invocationMessage, toolInput: state.toolInput, confirmed: ToolCallConfirmationReason.NotNeeded, - ...(state._meta ? { _meta: state._meta } : {}), }; } diff --git a/src/vs/platform/agentHost/test/node/claudeAgent.integrationTest.ts b/src/vs/platform/agentHost/test/node/claudeAgent.integrationTest.ts index 4407c4b56f246..476aa1fa007ec 100644 --- a/src/vs/platform/agentHost/test/node/claudeAgent.integrationTest.ts +++ b/src/vs/platform/agentHost/test/node/claudeAgent.integrationTest.ts @@ -839,6 +839,11 @@ suite('ClaudeAgent integration (proxy-backed)', function () { { kind: 'action', type: ActionType.SessionDelta, content: 'reading' }, { kind: 'action', type: ActionType.SessionToolCallStart, toolCallId: TOOL_USE_ID, toolName: 'Read' }, { kind: 'action', type: ActionType.SessionToolCallDelta, toolCallId: TOOL_USE_ID, content: '{"file_path":"/tmp/x"}' }, + // Phase 8.5 — mapper emits `SessionToolCallReady` at + // `content_block_stop` so auto-allowed tools transition out of + // `Streaming`; `sessionPermissions` then emits a second Ready + // for the pending_confirmation card below. + { kind: 'action', type: ActionType.SessionToolCallReady }, { kind: 'pending_confirmation', toolCallId: TOOL_USE_ID, toolName: 'Read', permissionKind: 'read', permissionPath: '/tmp/x' }, { kind: 'action', type: ActionType.SessionToolCallComplete, toolCallId: TOOL_USE_ID, success: true, content: [{ type: ToolResultContentType.Text, text: 'file contents' }] }, { kind: 'action', type: ActionType.SessionResponsePart, partKind: ResponsePartKind.Markdown, content: '' }, diff --git a/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts b/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts index 39f4aa735a48d..196846d27e938 100644 --- a/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts +++ b/src/vs/platform/agentHost/test/node/claudeToolCallRegistry.test.ts @@ -50,7 +50,7 @@ suite('claudeToolCallRegistry — Phase 8.5 input/info tracking', () => { ); }); - test('finalize with malformed JSON falls back to undefined parsedInput', () => { + test('finalize with malformed JSON falls back to undefined parsedInput, preserves raw buffer as toolInput', () => { const registry = new ClaudeToolCallRegistry(); registry.begin('tu_2', 'Read', 'turn-1'); registry.appendInputDelta('tu_2', '{not valid json'); @@ -62,11 +62,15 @@ suite('claudeToolCallRegistry — Phase 8.5 input/info tracking', () => { parsedInput: entry?.info?.parsedInput, displayName: entry?.info?.displayName, invocationMessage: entry?.info?.invocationMessage, + // Raw buffer preserved so the UI still shows the SDK's payload + // instead of an empty input section. + toolInput: entry?.info?.toolInput, }, { parsedInput: undefined, displayName: 'Read file', invocationMessage: 'Reading file', + toolInput: '{not valid json', }, ); });