refactor(dogfooding): extract classify logic and add 107 tests#16
refactor(dogfooding): extract classify logic and add 107 tests#16
Conversation
… tests Splits the classification heuristics out of `tools/claude-code-hook.mjs` into a pure, dependency-free module at `tools/lib/classify.mjs`. The hook becomes a thin stdin → classify → propose → log layer. Adds `tools/lib/classify.test.mjs` with 107 table-driven vitest cases covering every branch: git push / force / force-with-lease, rm -rf flag permutations (+ false-positive guards for `git commit -m "rm -rf"`), git read/write/reset/destructive buckets, package-manager invocations, gh CLI, MCP read/write detection, null/undefined tool names, fallback. Writing the tests surfaced two classification gaps in the current heuristics (kept as-is in this refactor, documented as KNOWN GAP): 1. `npx vitest` falls through to `shell:exec:generic` — the regex only catches `npx <test|typecheck|lint|build>`. 2. Notion's `notion-search` / `notion-fetch` are misclassified as write because the read-verb detector expects the op name to *start* with read/list/search/fetch/..., but Notion double-namespaces ops with a `notion-` prefix. Both gaps are tracked in Kanbi task `oTphpxViAqvBobM3WMUz` (classification table re-tuning, after real shadow-mode data is collected). `vitest.config.ts` include pattern extended to pick up `tools/**/*.test.mjs`. Full suite: 197 tests pass (was 90). Kanbi: `SRxlAOrknM6VaVxxavIa`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughHook-internal classification logic was extracted into Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the tool classification logic by extracting it from claude-code-hook.mjs into a dedicated classify.mjs module and adding comprehensive unit tests. Feedback focuses on improving the robustness of command classification, specifically refining regex patterns for recursive deletions and MCP read operations, as well as expanding coverage for common commands like npm ci and wget.
| /^(?:sudo\s+)?rm\s+(?:-[a-z]*r[a-z]*|--recursive)(?:\s|$)/.test(c) || | ||
| /^(?:sudo\s+)?rm\s+-[a-z]+\s+-[a-z]*r/.test(c) |
There was a problem hiding this comment.
The current regexes for recursive deletion only handle up to two flag blocks (e.g., rm -rf or rm -f -r). Commands with more flags, such as rm -f -v -r, will bypass this check and fall through to a generic shell execution with a much lower risk score (30 instead of 85). Improving the regex to handle an arbitrary number of flag blocks ensures more robust detection of destructive commands.
/^(?:sudo\\s+)?rm\\s+(?:-[a-z]+\\s+)*(-[a-z]*r[a-z]*|--recursive)(?:\\s|$)/.test(c)| return { type: "shell:read:query", riskScore: 5 }; | ||
| if (/^(npm|pnpm|yarn|npx)\s+(run\s+)?(test|typecheck|lint|build)\b/.test(c)) | ||
| return { type: "shell:test:run", riskScore: 10 }; | ||
| if (/^(npm|pnpm|yarn)\s+(publish|install|i\b|add|uninstall|remove)\b/.test(c)) |
There was a problem hiding this comment.
npm ci (and its equivalents in pnpm/yarn) is a common command for clean installations in CI and local development. It should be classified as a package mutation action to ensure it receives the appropriate risk score.
| if (/^(npm|pnpm|yarn)\s+(publish|install|i\b|add|uninstall|remove)\b/.test(c)) | |
| if (/^(npm|pnpm|yarn)\\s+(publish|install|i\\b|ci\\b|add|uninstall|remove)\\b/.test(c)) |
| ) | ||
| ) | ||
| return { type: "shell:gh:read", riskScore: 10 }; | ||
| if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 }; |
There was a problem hiding this comment.
wget is a common alternative to curl for network requests and should be classified similarly to ensure consistent risk scoring for outbound network activity.
| if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 }; | |
| if (/^(curl|wget)\\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 }; |
| const parts = toolName.split("__"); | ||
| const server = parts[1] ?? "unknown"; | ||
| const op = parts.slice(2).join("_") || "unknown"; | ||
| const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op); |
There was a problem hiding this comment.
The current read-operation detector for MCP tools uses a simple prefix match, which can lead to false positives. For example, an operation named listen or find_and_replace would be incorrectly classified as a "read" operation (risk score 10) instead of a "write" operation (risk score 40). Using a negative lookahead to ensure the verb is not followed by other letters (while still allowing separators or camelCase) improves accuracy.
| const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op); | |
| const isRead = /^(read|list|search|fetch|get|ls|find)(?![a-z])/i.test(op); |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 317e21c35f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| import { dirname, join, resolve } from "node:path"; | ||
| import { fileURLToPath, pathToFileURL } from "node:url"; | ||
|
|
||
| import { classify } from "./lib/classify.mjs"; |
There was a problem hiding this comment.
Preserve fail-open semantics for classify import
Importing classify at module top-level means any load-time problem in tools/lib/classify.mjs (missing file, syntax error, unreadable file) will cause Node to terminate before main() runs, so the hook never reaches the safeExit fail-open path documented in this file. In a PreToolUse hook context, that can turn a local packaging/runtime issue into tool-call failures for every invocation instead of gracefully allowing calls through.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/lib/classify.mjs`:
- Around line 46-51: The current branch in classify.mjs that tests the command
string c against the read-only shell regex and returns { type:
"shell:read:query", riskScore: 5 } must skip commands containing shell
redirection; update the condition around the regex test to also check that c
does NOT contain redirection tokens (e.g., unescaped >, >>, 2>, 2>>, >& , <, or
pipe with output redirection) before returning the read-only bucket. In practice
modify the if that uses
/^(ls|cat|head|tail|pwd|echo|which|whoami|hostname|uname|date|env|wc|file)\b/.test(c)
to also assert a negative test like !/[^\S\r\n]*([0-9]*>+|<&|>\&|<|>>|\|[^|])/
(or simply /[^\S\r\n]([0-9]*>+|<|>>|&>)/) to detect redirection and bail out
when present so commands such as "echo foo > ~/.bashrc" or "cat README.md >>
/tmp/out" do not get classified as shell:read:query.
- Around line 27-28: The current check only matches "git clean -f" but misses
common destructive variants like "-fd" or "-df"; update the regex in
tools/lib/classify.mjs that tests the command string variable c so the "clean"
branch also matches any flag token containing "f" (and optionally "d") in any
order or combined form (e.g., "-f", "-fd", "-df"), and continue returning {
type: "shell:git:destructive", riskScore: 75 } for those matches.
- Around line 58-63: The current rule treats all "gh api" invocations as reads;
add a prior check against the input string variable c to detect mutating "gh
api" usage (match /^gh\s+api\b/ and look for mutating indicators like -X or
--method with POST/PUT/PATCH/DELETE, and field/flag usage such as -f/--field/-F)
and return a write classification (e.g., { type: "shell:gh:write", riskScore:
high }) before the existing read branch; update the condition order in
classify.mjs so the new mutating-gh-api check runs before the
/^gh\s+(pr\s+view|...|api\s+)/ read rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9017f325-028a-4421-bd08-ecc35233aaaa
📒 Files selected for processing (4)
tools/claude-code-hook.mjstools/lib/classify.mjstools/lib/classify.test.mjsvitest.config.ts
Fixes 8 issues raised by reviewers on the classify.mjs refactor: P1 (fail-open regression): classify is now dynamically imported inside main() — a broken classify.mjs falls into the fail-open path instead of crashing the hook at module load time. Classification fixes: - rm: regex now handles arbitrary flag permutations (`rm -f -v -r`) - git clean: detects `-fd` / `-df` / `-fdx` via lookahead over the tail - read-query: bails out on shell redirection (`echo x > ~/.bashrc`) - gh api: `-X POST`, `-f`, `-F`, `--field`, `--raw-field` classify as write - npm/pnpm/yarn ci: added to pkg:mutate bucket - wget: classified alongside curl as shell:net:curl - MCP lookahead: `(?![a-z])` prevents `listen` matching `list`; dropped `/i` flag so camelCase boundaries (`getBoard`) still resolve as read Tests added for each fix; full suite is 226 passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
tools/claude-code-hook.mjshad ~20 regex branches with zero test coverage. Therm -rfflag-permutation bug caught during PR feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15 review would have been caught pre-merge by a table-driven test.classifyBash/classifyMcp/classifyinto a pure, dependency-free module attools/lib/classify.mjs(no IO, no Tegata import — reusable by LangGraph / OpenAI Agents SDK adapters later).tools/lib/classify.test.mjscovering every branch including the false-positive guards (git commit -m "rm -rf"must NOT match the recursive-delete bucket).vitest.config.tsinclude pattern withtools/**/*.test.mjs.Classification gaps surfaced by the tests (kept as-is, documented as KNOWN GAP)
Writing the tests surfaced two real misclassifications in the current heuristics. Both are intentionally NOT fixed in this PR (out of scope — refactor only) and are tracked in Kanbi task
oTphpxViAqvBobM3WMUzfor post-shadow-mode-data tuning:npx vitestfalls through toshell:exec:generic(riskScore 30) — the regex only catchesnpx <test|typecheck|lint|build>with a fixed verb list.notion-search/notion-fetchare misclassified as write, because the read-verb detector expects the op name to start with read/list/search/fetch/..., but Notion prefixes its ops withnotion-.Verification
pnpm run test— 197 tests pass (was 90; +107 new for classify)pnpm run typecheckgreenpnpm run lintgreen (prettier + eslint)echo '{"tool_name":"Bash",...}' | node tools/claude-code-hook.mjs→ audit log entry written, exit 0Related
SRxlAOrknM6VaVxxavIaoTphpxViAqvBobM3WMUz🤖 Generated with Claude Code
Summary by CodeRabbit
Refactor
Bug Fixes
Tests