Skip to content

refactor(dogfooding): extract classify logic and add 107 tests#16

Merged
renasami merged 2 commits intomainfrom
refactor/extract-classify-for-testing
Apr 20, 2026
Merged

refactor(dogfooding): extract classify logic and add 107 tests#16
renasami merged 2 commits intomainfrom
refactor/extract-classify-for-testing

Conversation

@renasami
Copy link
Copy Markdown
Owner

@renasami renasami commented Apr 20, 2026

Summary

  • Addresses M1 from PR feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15 review — the classification heuristics in tools/claude-code-hook.mjs had ~20 regex branches with zero test coverage. The rm -rf flag-permutation bug caught during PR feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15 review would have been caught pre-merge by a table-driven test.
  • Extracts classifyBash / classifyMcp / classify into a pure, dependency-free module at tools/lib/classify.mjs (no IO, no Tegata import — reusable by LangGraph / OpenAI Agents SDK adapters later).
  • Adds 107 table-driven vitest cases in tools/lib/classify.test.mjs covering every branch including the false-positive guards (git commit -m "rm -rf" must NOT match the recursive-delete bucket).
  • Extends vitest.config.ts include pattern with tools/**/*.test.mjs.
  • Hook body shrinks from 235 lines to 140 — stdin → classify → propose → log.
  • No behavior change. This is a pure refactor.

Classification gaps surfaced by the tests (kept as-is, documented as KNOWN GAP)

Writing the tests surfaced two real misclassifications in the current heuristics. Both are intentionally NOT fixed in this PR (out of scope — refactor only) and are tracked in Kanbi task oTphpxViAqvBobM3WMUz for post-shadow-mode-data tuning:

  1. npx vitest falls through to shell:exec:generic (riskScore 30) — the regex only catches npx <test|typecheck|lint|build> with a fixed verb list.
  2. Notion's notion-search / notion-fetch are misclassified as write, because the read-verb detector expects the op name to start with read/list/search/fetch/..., but Notion prefixes its ops with notion-.

Verification

  • pnpm run test — 197 tests pass (was 90; +107 new for classify)
  • pnpm run typecheck green
  • pnpm run lint green (prettier + eslint)
  • Smoke test: echo '{"tool_name":"Bash",...}' | node tools/claude-code-hook.mjs → audit log entry written, exit 0

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Moved action classification into a dedicated runtime-loaded module for clearer separation and maintainability.
  • Bug Fixes

    • Hook now fails safe if classification cannot be loaded at runtime, avoiding crashes and preserving runtime flow.
  • Tests

    • Added comprehensive tests for classification and risk scoring.
    • Test suite configuration extended to include the new tests.

… tests

Splits the classification heuristics out of `tools/claude-code-hook.mjs`
into a pure, dependency-free module at `tools/lib/classify.mjs`. The
hook becomes a thin stdin → classify → propose → log layer.

Adds `tools/lib/classify.test.mjs` with 107 table-driven vitest cases
covering every branch: git push / force / force-with-lease, rm -rf
flag permutations (+ false-positive guards for `git commit -m "rm -rf"`),
git read/write/reset/destructive buckets, package-manager invocations,
gh CLI, MCP read/write detection, null/undefined tool names, fallback.

Writing the tests surfaced two classification gaps in the current
heuristics (kept as-is in this refactor, documented as KNOWN GAP):

1. `npx vitest` falls through to `shell:exec:generic` — the regex only
   catches `npx <test|typecheck|lint|build>`.
2. Notion's `notion-search` / `notion-fetch` are misclassified as write
   because the read-verb detector expects the op name to *start* with
   read/list/search/fetch/..., but Notion double-namespaces ops with a
   `notion-` prefix.

Both gaps are tracked in Kanbi task `oTphpxViAqvBobM3WMUz` (classification
table re-tuning, after real shadow-mode data is collected).

`vitest.config.ts` include pattern extended to pick up `tools/**/*.test.mjs`.
Full suite: 197 tests pass (was 90).

Kanbi: `SRxlAOrknM6VaVxxavIa`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eac2fb66-e93a-4488-bc59-8daccd30930a

📥 Commits

Reviewing files that changed from the base of the PR and between 317e21c and 9f030e9.

📒 Files selected for processing (3)
  • tools/claude-code-hook.mjs
  • tools/lib/classify.mjs
  • tools/lib/classify.test.mjs
🚧 Files skipped from review as they are similar to previous changes (2)
  • tools/claude-code-hook.mjs
  • tools/lib/classify.mjs

📝 Walkthrough

Walkthrough

Hook-internal classification logic was extracted into tools/lib/classify.mjs; the hook now dynamically imports classify at runtime and defers to it for type and riskScore decisions; comprehensive tests and test-config updates were added.

Changes

Cohort / File(s) Summary
Hook & Classification Module
tools/claude-code-hook.mjs, tools/lib/classify.mjs
Removed in-file classifiers from the hook; added tools/lib/classify.mjs exporting classifyBash, classifyMcp, and classify. Hook performs dynamic import() of ./lib/classify.mjs, fails open on import error, and uses returned {type,riskScore} for tegata.propose/audit/enforce logic.
Tests & Test Config
tools/lib/classify.test.mjs, vitest.config.ts
Added Vitest suite covering bash, MCP, and tool-name classification behaviors and edge cases; expanded vitest.config.ts test.include to include tools/**/*.test.mjs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through code with curious eyes,
Pulled classification out to fresh blue skies,
Tests lined up like carrots in a row,
Now imports fetch where patterns grow,
A tidy burrow — oh what a prize! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: extracting classification logic into a separate module and adding a comprehensive test suite. It directly reflects the core refactoring work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/extract-classify-for-testing

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the tool classification logic by extracting it from claude-code-hook.mjs into a dedicated classify.mjs module and adding comprehensive unit tests. Feedback focuses on improving the robustness of command classification, specifically refining regex patterns for recursive deletions and MCP read operations, as well as expanding coverage for common commands like npm ci and wget.

Comment thread tools/lib/classify.mjs Outdated
Comment on lines +42 to +43
/^(?:sudo\s+)?rm\s+(?:-[a-z]*r[a-z]*|--recursive)(?:\s|$)/.test(c) ||
/^(?:sudo\s+)?rm\s+-[a-z]+\s+-[a-z]*r/.test(c)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The current regexes for recursive deletion only handle up to two flag blocks (e.g., rm -rf or rm -f -r). Commands with more flags, such as rm -f -v -r, will bypass this check and fall through to a generic shell execution with a much lower risk score (30 instead of 85). Improving the regex to handle an arbitrary number of flag blocks ensures more robust detection of destructive commands.

    /^(?:sudo\\s+)?rm\\s+(?:-[a-z]+\\s+)*(-[a-z]*r[a-z]*|--recursive)(?:\\s|$)/.test(c)

Comment thread tools/lib/classify.mjs Outdated
return { type: "shell:read:query", riskScore: 5 };
if (/^(npm|pnpm|yarn|npx)\s+(run\s+)?(test|typecheck|lint|build)\b/.test(c))
return { type: "shell:test:run", riskScore: 10 };
if (/^(npm|pnpm|yarn)\s+(publish|install|i\b|add|uninstall|remove)\b/.test(c))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

npm ci (and its equivalents in pnpm/yarn) is a common command for clean installations in CI and local development. It should be classified as a package mutation action to ensure it receives the appropriate risk score.

Suggested change
if (/^(npm|pnpm|yarn)\s+(publish|install|i\b|add|uninstall|remove)\b/.test(c))
if (/^(npm|pnpm|yarn)\\s+(publish|install|i\\b|ci\\b|add|uninstall|remove)\\b/.test(c))

Comment thread tools/lib/classify.mjs Outdated
)
)
return { type: "shell:gh:read", riskScore: 10 };
if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

wget is a common alternative to curl for network requests and should be classified similarly to ensure consistent risk scoring for outbound network activity.

Suggested change
if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };
if (/^(curl|wget)\\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };

Comment thread tools/lib/classify.mjs Outdated
const parts = toolName.split("__");
const server = parts[1] ?? "unknown";
const op = parts.slice(2).join("_") || "unknown";
const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The current read-operation detector for MCP tools uses a simple prefix match, which can lead to false positives. For example, an operation named listen or find_and_replace would be incorrectly classified as a "read" operation (risk score 10) instead of a "write" operation (risk score 40). Using a negative lookahead to ensure the verb is not followed by other letters (while still allowing separators or camelCase) improves accuracy.

Suggested change
const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op);
const isRead = /^(read|list|search|fetch|get|ls|find)(?![a-z])/i.test(op);

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 317e21c35f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/claude-code-hook.mjs Outdated
import { dirname, join, resolve } from "node:path";
import { fileURLToPath, pathToFileURL } from "node:url";

import { classify } from "./lib/classify.mjs";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve fail-open semantics for classify import

Importing classify at module top-level means any load-time problem in tools/lib/classify.mjs (missing file, syntax error, unreadable file) will cause Node to terminate before main() runs, so the hook never reaches the safeExit fail-open path documented in this file. In a PreToolUse hook context, that can turn a local packaging/runtime issue into tool-call failures for every invocation instead of gracefully allowing calls through.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/lib/classify.mjs`:
- Around line 46-51: The current branch in classify.mjs that tests the command
string c against the read-only shell regex and returns { type:
"shell:read:query", riskScore: 5 } must skip commands containing shell
redirection; update the condition around the regex test to also check that c
does NOT contain redirection tokens (e.g., unescaped >, >>, 2>, 2>>, >& , <, or
pipe with output redirection) before returning the read-only bucket. In practice
modify the if that uses
/^(ls|cat|head|tail|pwd|echo|which|whoami|hostname|uname|date|env|wc|file)\b/.test(c)
to also assert a negative test like !/[^\S\r\n]*([0-9]*>+|<&|>\&|<|>>|\|[^|])/
(or simply /[^\S\r\n]([0-9]*>+|<|>>|&>)/) to detect redirection and bail out
when present so commands such as "echo foo > ~/.bashrc" or "cat README.md >>
/tmp/out" do not get classified as shell:read:query.
- Around line 27-28: The current check only matches "git clean -f" but misses
common destructive variants like "-fd" or "-df"; update the regex in
tools/lib/classify.mjs that tests the command string variable c so the "clean"
branch also matches any flag token containing "f" (and optionally "d") in any
order or combined form (e.g., "-f", "-fd", "-df"), and continue returning {
type: "shell:git:destructive", riskScore: 75 } for those matches.
- Around line 58-63: The current rule treats all "gh api" invocations as reads;
add a prior check against the input string variable c to detect mutating "gh
api" usage (match /^gh\s+api\b/ and look for mutating indicators like -X or
--method with POST/PUT/PATCH/DELETE, and field/flag usage such as -f/--field/-F)
and return a write classification (e.g., { type: "shell:gh:write", riskScore:
high }) before the existing read branch; update the condition order in
classify.mjs so the new mutating-gh-api check runs before the
/^gh\s+(pr\s+view|...|api\s+)/ read rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9017f325-028a-4421-bd08-ecc35233aaaa

📥 Commits

Reviewing files that changed from the base of the PR and between cf5e90a and 317e21c.

📒 Files selected for processing (4)
  • tools/claude-code-hook.mjs
  • tools/lib/classify.mjs
  • tools/lib/classify.test.mjs
  • vitest.config.ts

Comment thread tools/lib/classify.mjs Outdated
Comment thread tools/lib/classify.mjs
Comment thread tools/lib/classify.mjs
Fixes 8 issues raised by reviewers on the classify.mjs refactor:

P1 (fail-open regression): classify is now dynamically imported inside
main() — a broken classify.mjs falls into the fail-open path instead of
crashing the hook at module load time.

Classification fixes:
- rm: regex now handles arbitrary flag permutations (`rm -f -v -r`)
- git clean: detects `-fd` / `-df` / `-fdx` via lookahead over the tail
- read-query: bails out on shell redirection (`echo x > ~/.bashrc`)
- gh api: `-X POST`, `-f`, `-F`, `--field`, `--raw-field` classify as write
- npm/pnpm/yarn ci: added to pkg:mutate bucket
- wget: classified alongside curl as shell:net:curl
- MCP lookahead: `(?![a-z])` prevents `listen` matching `list`; dropped
  `/i` flag so camelCase boundaries (`getBoard`) still resolve as read

Tests added for each fix; full suite is 226 passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@renasami renasami merged commit f86aab9 into main Apr 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant