Skip to content

[luv-293] fix: canonicalize tool names across all agent CLIs so builtin policies fire#293

Merged
NiveditJain merged 1 commit intomainfrom
luv-293
May 5, 2026
Merged

[luv-293] fix: canonicalize tool names across all agent CLIs so builtin policies fire#293
NiveditJain merged 1 commit intomainfrom
luv-293

Conversation

@NiveditJain
Copy link
Copy Markdown
Member

@NiveditJain NiveditJain commented May 5, 2026

Summary

  • Builtin policies (["Bash"], ["Read","Glob","Grep","Bash"], …) match on tool names with case-sensitive Array.includes at policy-registry.ts:93-95, but Copilot's tool registry emits lowercase IDs (bash, read, …) — every Bash/Read/Write/Edit builtin silently no-ops under Copilot.
  • Adds COPILOT_TOOL_MAP (handler-side) plus OPENCODE_TOOL_MAP / PI_TOOL_MAP (shim-side; the OpenCode and Pi shims embed identical maps inline since they're loaded in-process by their host CLI). Replaces Pi's naive charAt(0).toUpperCase() heuristic that would have mis-canonicalized future multi-word tool IDs.
  • E2e fixture for Copilot now uses the real lowercase shape so the suite catches future regressions in this layer.

User-reported regression

Under GitHub Copilot CLI, an out-of-cwd ls piped through sed ran successfully despite an enabled block-read-outside-cwd policy. The same flow blocks correctly under Claude Code today.

Test plan

  • bun run test:run — 1457/1457 pass (added 4 unit tests covering the new Copilot canonicalization).
  • bun run test:e2e — 290/290 pass (added a pinned regression test in __tests__/e2e/hooks/copilot-integration.e2e.test.ts).
  • bunx tsc --noEmit — clean.
  • bun run lint — clean (only the pre-existing app/components/log-viewer/tool-input-output.tsx warning unrelated to this PR).
  • Manual repro pre-fix vs post-fix via the binary directly: payload with tool_name: bash and an out-of-cwd ls now returns permissionDecision: deny (pre-fix: allow).
  • Empirical capture of real-CLI stdin under Codex / Cursor / OpenCode / Pi to confirm tool-name shapes assumed in OPENCODE_TOOL_MAP / PI_TOOL_MAP (bash, read, write, edit, glob, grep, plus OpenCode's list, webfetch, todowrite, todoread). The handler/shim's ?? raw fallback keeps unknown tools functional even if the assumed set is incomplete; the regression test pins behaviour for bash specifically. Codex and Cursor remain passthrough until empirical evidence shows otherwise (per docs they're already PascalCase).

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes

  • Enhanced compatibility across Copilot, OpenCode, and Pi agents with improved tool handling, ensuring built-in policies function consistently
  • Corrected session log viewer vertical positioning to prevent list items from displaying at stale offsets after layout changes
  • Added installation diagnostics to detect shadowed or conflicting installs, providing version information and copy-pasteable cleanup commands

…in policies fire

Builtin policies match tool names in PascalCase (`["Bash"]`, `["Read","Glob","Grep","Bash"]`, …) via case-sensitive `Array.includes` at policy-registry.ts:93-95. But Copilot's tool registry emits lowercase IDs (`bash`, `read`, …) — confirmed by the session-log shape at `lib/copilot-sessions.ts:257` and `__tests__/lib/copilot-sessions.test.ts:87` — and OpenCode's plugin SDK exposes the same. Without canonicalization every Bash/Read/Write/Edit builtin silently no-ops under those CLIs.

User-reported regression: under GitHub Copilot CLI, `ls -la --almost-all $HOME | sed -n '1,200p'` ran successfully despite an enabled `block-read-outside-cwd` policy. The same command blocks correctly under Claude Code today.

Fix:
- `COPILOT_TOOL_MAP` in `src/hooks/types.ts` and a copilot branch in `handler.ts:canonicalizeToolName` (subprocess CLIs canonicalize handler-side).
- `OPENCODE_TOOL_MAP` and `PI_TOOL_MAP` in `src/hooks/types.ts`; the OpenCode and Pi shims embed identical maps inline (shims must be self-contained — they're loaded in-process by their host CLI). The Pi shim's previous `charAt(0).toUpperCase()` heuristic only worked for single-word tool IDs and would have mis-canonicalized future multi-word tools (`todo_write` to `Todo_write`, not `TodoWrite`).
- Update Copilot e2e payload fixture (`__tests__/e2e/helpers/payloads.ts`) to match the real lowercase shape; existing e2e assertions still pass because the handler now canonicalizes.

Tests:
- `__tests__/hooks/handler.test.ts` — three new Copilot canonicalization cases (bash to Bash regression, full COPILOT_TOOL_MAP coverage, unknown-tool passthrough), mirroring the existing Gemini coverage.
- `__tests__/hooks/opencode-plugin-shim.test.ts` — full OPENCODE_TOOL_MAP coverage + unknown-tool passthrough; updated existing assertion to expect `Bash` post-canonicalization.
- `__tests__/e2e/hooks/copilot-integration.e2e.test.ts` — pinned regression test reproducing the user-reported bug (block-read-outside-cwd plus an out-of-cwd ls denies under Copilot).

Manual verification: piping a stdin payload with `tool_name: "bash"` and an out-of-cwd `ls` to `bun bin/failproofai.mjs --hook PreToolUse --cli copilot` now returns `permissionDecision: "deny"` with `Bash read outside project directory blocked`. Pre-fix it returned no decision (allow).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 547d5bbf-5263-499c-a466-0bd03ce1a839

📥 Commits

Reviewing files that changed from the base of the PR and between 2497f5d and 0e87eaf.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • __tests__/e2e/helpers/payloads.ts
  • __tests__/e2e/hooks/copilot-integration.e2e.test.ts
  • __tests__/hooks/handler.test.ts
  • __tests__/hooks/opencode-plugin-shim.test.ts
  • pi-extension/index.ts
  • src/hooks/handler.ts
  • src/hooks/integrations.ts
  • src/hooks/types.ts

📝 Walkthrough

Walkthrough

Tool-name canonicalization is introduced across Copilot, OpenCode, and Pi agent CLIs via new tool-name mapping constants (COPILOT_TOOL_MAP, OPENCODE_TOOL_MAP, PI_TOOL_MAP). These maps normalize lowercase tool IDs to Claude PascalCase names before policy evaluation. The handler, OpenCode shim, and Pi extension are updated to canonicalize incoming tool names. Comprehensive unit and end-to-end tests verify the canonicalization behavior.

Changes

Tool-Name Canonicalization Across Agent CLIs

Layer / File(s) Summary
Tool Mapping Constants
src/hooks/types.ts
Defines three new exported constants: COPILOT_TOOL_MAP, OPENCODE_TOOL_MAP, and PI_TOOL_MAP, each mapping lowercase/snake_case tool IDs to PascalCase canonical names (e.g., bashBash, str_replace_editorStrReplaceEditor).
Handler Canonicalization
src/hooks/handler.ts
Handler imports COPILOT_TOOL_MAP and extends canonicalizeToolName to canonicalize Copilot tool IDs via map lookup before policy evaluation and activity logging.
OpenCode Shim Integration
src/hooks/integrations.ts
OpenCode plugin shim template adds inline TOOL_NAME_MAP and canonicalizeTool helper; shim now canonicalizes tool IDs from input.tool before passing to PreToolUse, PostToolUse, and PermissionRequest hooks.
Pi Extension Integration
pi-extension/index.ts
Pi extension replaces generic first-character capitalization with explicit PI_TOOL_MAP lookup in canonicalizeToolName, supporting {bash, read, write, edit, glob, grep} tool IDs.
Test Fixtures
__tests__/e2e/helpers/payloads.ts
Copilot payload factories are updated to emit lowercase tool IDs (bash, write, read) in tool_name fields, reflecting the actual Copilot event structure.
Canonicalization Unit Tests
__tests__/hooks/handler.test.ts
Three new test cases verify Copilot tool canonicalization: (1) bash maps to Bash before policy evaluation, (2) all entries in COPILOT_TOOL_MAP are canonicalized correctly, (3) unknown Copilot tool names pass through unchanged.
Shim Canonicalization Tests
__tests__/hooks/opencode-plugin-shim.test.ts
Updated tool.execute.before assertion from "bash" to "Bash"; added two new tests verifying unknown tools pass through unchanged and all OPENCODE_TOOL_MAP entries canonicalize correctly.
E2E Integration Test
__tests__/e2e/hooks/copilot-integration.e2e.test.ts
New regression test verifies Copilot PreToolUse hook correctly blocks Bash commands targeting paths outside cwd under the block-read-outside-cwd policy.
Changelog
CHANGELOG.md
Documents tool-name canonicalization feature along with unrelated fixes for session log viewer vertical positioning and PATH shadow detection for failproofai.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰✨ Tool names canonicalized, now they're all the same,
Copilot, OpenCode, and Pi—each speaks Claude's name!
Maps and maps and maps galore, from lowercase to tall,
Policy denials sing so clear, when Bash answers the call.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: canonicalizing tool names across agent CLIs to fix builtin policy matching.
Description check ✅ Passed The PR description covers the problem statement, solution, user-reported regression, and test plan with comprehensive details matching the template's requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@NiveditJain NiveditJain merged commit 86e5da6 into main May 5, 2026
9 checks passed
NiveditJain added a commit that referenced this pull request May 5, 2026
…atch + complete the PR #293 audit (#297)

PR #293 wired per-CLI tool-name canonicalization so builtin policies that match Claude PascalCase names (`Bash`, `Read`, `Write`, `Edit`, `Glob`, `Grep`) fire under non-Claude CLIs. The map for Copilot was incomplete: Copilot's `view` tool — used by the model both to read files and to list directory contents — was not mapped, so `block-read-outside-cwd` never fired on `view` calls. User-reported regression: with `block-read-outside-cwd` enabled under Copilot CLI, the model could still list `$HOME` via a `view` call (a single tool invocation with `tool_input: {path: "/home/user"}`) where PR #293 had only fixed the `bash ls -la` flow.

Empirical confirmation in this user's local session log at `~/.copilot/session-state/.../events.jsonl`: `{"type":"tool.execution_start","data":{"toolName":"view","arguments":{"path":"/home/nivedit"}}}` against Copilot CLI 1.0.39.

Auditing all seven supported CLIs against their public tool registries plus on-disk session evidence revealed three more gaps in the same class:

- Copilot was missing `view`, `create`, `apply_patch`, `web_fetch`, `powershell`, `*_bash`/`*_powershell` (the eight session-management tools), `rg`, `show_file` — directory/file reads, file creation, patches, PowerShell, web fetches all bypassed policies.
- Cursor (PR #293 left it as passthrough) — Cursor uses `Shell` for what Claude calls `Bash`, so every Bash builtin (`block-sudo`, `block-rm-rf`, `block-read-outside-cwd` Bash branch, …) silently no-op'd on Cursor sessions.
- Codex (PR #293 left it as passthrough) — Codex hooks report `tool_name: "apply_patch"` even when matchers say `Edit`/`Write`; live sessions also expose `write_stdin` which sends input to a running shell.
- OpenCode was missing `apply_patch` and `websearch`.

Fix:
- Extend `COPILOT_TOOL_MAP` in `src/hooks/types.ts` with the full Copilot CLI tool surface — `view`/`show_file` → `Read`, `create` → `Write`, `apply_patch` → `Edit`, `web_fetch` → `WebFetch`, `powershell` and the `*_bash`/`*_powershell` session-management tools → `Bash`, `rg` → `Grep`.
- Extend `OPENCODE_TOOL_MAP` with `apply_patch` → `Edit`, `websearch` → `WebSearch`. Mirror the same entries in the OpenCode shim template at `src/hooks/integrations.ts:734` (the shim must stay self-contained — it's loaded in-process by opencode).
- Add `CURSOR_TOOL_MAP` (`Shell` → `Bash`) and `CODEX_TOOL_MAP` (`apply_patch` → `Edit`, `write_stdin` → `Bash`) in `src/hooks/types.ts`, plus matching cursor/codex branches in `handler.ts:canonicalizeToolName`. `apply_patch` maps to `Edit` (not `Write`) for consistency with the existing `str_replace_editor` → `Edit` entry; the choice was confirmed via AskUserQuestion. The trade-off is documented: `Edit` preserves Claude semantics (Claude's own `Edit` tool doesn't trigger `block-write-outside-cwd` either), while `Write` would have been stricter but inconsistent with Claude.

Tests:
- `__tests__/hooks/handler.test.ts` — extend the existing per-Copilot canonicalization loop to cover every new entry (with `[view, Read]` listed first as the regression anchor); add new test blocks for Cursor (`Shell` → `Bash`, plus passthrough for `Read`/`Write`/`Grep`/`Delete`/`Task`/`MCP:*`) and Codex (`apply_patch` → `Edit`, `write_stdin` → `Bash`, plus passthrough for `Bash`/`mcp__*`).
- `__tests__/e2e/hooks/copilot-integration.e2e.test.ts` — pinned regression test "blocks `view` of a path outside cwd under Copilot (regression for #295)" mirroring the PR #293 `ls -la` test, plus a new `CopilotPayloads.preToolUse.view` factory in `__tests__/e2e/helpers/payloads.ts`.
- `__tests__/hooks/opencode-plugin-shim.test.ts` — extend the OPENCODE_TOOL_MAP coverage loop with `apply_patch` and `websearch`.

Verified: `bun run test:run` → 1461 passed, `bun run test:e2e` → 291 passed (Copilot e2e went 11 → 12), `bunx tsc --noEmit` → clean. Manual repro of all three: Copilot `view /etc` denies, Cursor `Shell sudo …` denies, Codex `write_stdin` denies (canonicalized to `Bash`).

Pre-existing item not in this PR: the dogfood `.opencode/plugins/failproofai.mjs` was never updated with `TOOL_NAME_MAP` after PR #293 (the template was updated but the dogfood file is hand-maintained). Production users get the correct map via the template; only contributors running OpenCode against this repo are affected. Reading or rewriting that file requires temporarily disabling the `block-read-outside-cwd` agent-settings guard — deferred to a follow-up PR.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
NiveditJain added a commit that referenced this pull request May 10, 2026
* [luv-338] fix: canonicalize OpenCode + Pi tool-input arg keys so path-checking builtins fire

OpenCode's `read` slipped past `block-read-outside-cwd`: the shim
canonicalized the tool name (`read` → `Read`) but forwarded `output.args`
verbatim. OpenCode delivers `{ filePath }`; the policy reads
`ctx.toolInput.file_path`, so `getFilePath()` returned "" and the
target-empty short-circuit at builtin-policies.ts:799 allowed the read.

Same family of bug on Pi with a different mismatch shape: Pi's
read/write/edit use `path`. `block-read-outside-cwd` happens to work
on Pi via the existing `tool_input.path` fallback at builtin-policies.ts:796,
but `block-env-files` and `block-secrets-write` only check `file_path`
and were silently no-op'd.

Mirrors the TOOL_NAME_MAP pattern from #293 with two new per-tool
input-key maps keyed by canonical PascalCase tool name:

  • OPENCODE_TOOL_INPUT_MAP — Read/Write/Edit: filePath→file_path,
    plus oldString/newString/replaceAll for Edit
  • PI_TOOL_INPUT_MAP — Read/Write/Edit: path→file_path
    (Pi's nested edits[{oldText,newText}] array doesn't flat-rename,
    no current builtin reads it)

Both maps are mirrored inline in their shims so .opencode/plugins/
failproofai.mjs and pi-extension/index.ts stay self-contained. MCP
mcp_* and any unmapped tool pass through unchanged.

Existing users must re-run `failproofai policies --install --cli opencode`
/ `--cli pi` to regenerate their shims and pick up the fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [luv-338] docs: document OPENCODE_TOOL_INPUT_MAP and PI_TOOL_INPUT_MAP in configuration.mdx

Mirror the existing tool-name canonicalization wording in the OpenCode
and Pi sections. Calls out which builtin policies start firing as a
result (block-read-outside-cwd / block-env-files / block-secrets-write
on OpenCode; block-env-files / block-secrets-write on Pi).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [luv-338] chore: stamp PR number in CHANGELOG entry

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant