refactor(dogfooding): extract classify logic and add 107 tests by renasami · Pull Request #16 · renasami/tegata

renasami · 2026-04-20T16:09:30Z

Summary

Addresses M1 from PR feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15 review — the classification heuristics in tools/claude-code-hook.mjs had ~20 regex branches with zero test coverage. The rm -rf flag-permutation bug caught during PR feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15 review would have been caught pre-merge by a table-driven test.
Extracts classifyBash / classifyMcp / classify into a pure, dependency-free module at tools/lib/classify.mjs (no IO, no Tegata import — reusable by LangGraph / OpenAI Agents SDK adapters later).
Adds 107 table-driven vitest cases in tools/lib/classify.test.mjs covering every branch including the false-positive guards (git commit -m "rm -rf" must NOT match the recursive-delete bucket).
Extends vitest.config.ts include pattern with tools/**/*.test.mjs.
Hook body shrinks from 235 lines to 140 — stdin → classify → propose → log.
No behavior change. This is a pure refactor.

Classification gaps surfaced by the tests (kept as-is, documented as KNOWN GAP)

Writing the tests surfaced two real misclassifications in the current heuristics. Both are intentionally NOT fixed in this PR (out of scope — refactor only) and are tracked in Kanbi task oTphpxViAqvBobM3WMUz for post-shadow-mode-data tuning:

npx vitest falls through to shell:exec:generic (riskScore 30) — the regex only catches npx <test|typecheck|lint|build> with a fixed verb list.
Notion's notion-search / notion-fetch are misclassified as write, because the read-verb detector expects the op name to start with read/list/search/fetch/..., but Notion prefixes its ops with notion-.

Verification

pnpm run test — 197 tests pass (was 90; +107 new for classify)
pnpm run typecheck green
pnpm run lint green (prettier + eslint)
Smoke test: echo '{"tool_name":"Bash",...}' | node tools/claude-code-hook.mjs → audit log entry written, exit 0

Summary by CodeRabbit

Refactor
- Moved action classification into a dedicated runtime-loaded module for clearer separation and maintainability.
Bug Fixes
- Hook now fails safe if classification cannot be loaded at runtime, avoiding crashes and preserving runtime flow.
Tests
- Added comprehensive tests for classification and risk scoring.
- Test suite configuration extended to include the new tests.

… tests Splits the classification heuristics out of `tools/claude-code-hook.mjs` into a pure, dependency-free module at `tools/lib/classify.mjs`. The hook becomes a thin stdin → classify → propose → log layer. Adds `tools/lib/classify.test.mjs` with 107 table-driven vitest cases covering every branch: git push / force / force-with-lease, rm -rf flag permutations (+ false-positive guards for `git commit -m "rm -rf"`), git read/write/reset/destructive buckets, package-manager invocations, gh CLI, MCP read/write detection, null/undefined tool names, fallback. Writing the tests surfaced two classification gaps in the current heuristics (kept as-is in this refactor, documented as KNOWN GAP): 1. `npx vitest` falls through to `shell:exec:generic` — the regex only catches `npx <test|typecheck|lint|build>`. 2. Notion's `notion-search` / `notion-fetch` are misclassified as write because the read-verb detector expects the op name to *start* with read/list/search/fetch/..., but Notion double-namespaces ops with a `notion-` prefix. Both gaps are tracked in Kanbi task `oTphpxViAqvBobM3WMUz` (classification table re-tuning, after real shadow-mode data is collected). `vitest.config.ts` include pattern extended to pick up `tools/**/*.test.mjs`. Full suite: 197 tests pass (was 90). Kanbi: `SRxlAOrknM6VaVxxavIa`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-20T16:09:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eac2fb66-e93a-4488-bc59-8daccd30930a

📥 Commits

Reviewing files that changed from the base of the PR and between 317e21c and 9f030e9.

📒 Files selected for processing (3)

tools/claude-code-hook.mjs
tools/lib/classify.mjs
tools/lib/classify.test.mjs

🚧 Files skipped from review as they are similar to previous changes (2)

tools/claude-code-hook.mjs
tools/lib/classify.mjs

📝 Walkthrough

Walkthrough

Hook-internal classification logic was extracted into tools/lib/classify.mjs; the hook now dynamically imports classify at runtime and defers to it for type and riskScore decisions; comprehensive tests and test-config updates were added.

Changes

Cohort / File(s)	Summary
Hook & Classification Module `tools/claude-code-hook.mjs`, `tools/lib/classify.mjs`	Removed in-file classifiers from the hook; added `tools/lib/classify.mjs` exporting `classifyBash`, `classifyMcp`, and `classify`. Hook performs dynamic `import()` of `./lib/classify.mjs`, fails open on import error, and uses returned `{type,riskScore}` for tegata.propose/audit/enforce logic.
Tests & Test Config `tools/lib/classify.test.mjs`, `vitest.config.ts`	Added Vitest suite covering bash, MCP, and tool-name classification behaviors and edge cases; expanded `vitest.config.ts` test.include to include `tools/*/.test.mjs`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(dogfooding): Claude Code PreToolUse hook that runs Tegata on self #15: Prior PR that introduced the Claude Code PreToolUse hook and its original in-file classification logic, which this change extracts and centralizes into tools/lib/classify.mjs.

Poem

🐰 I hopped through code with curious eyes,
Pulled classification out to fresh blue skies,
Tests lined up like carrots in a row,
Now imports fetch where patterns grow,
A tidy burrow — oh what a prize! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: extracting classification logic into a separate module and adding a comprehensive test suite. It directly reflects the core refactoring work.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/extract-classify-for-testing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the tool classification logic by extracting it from claude-code-hook.mjs into a dedicated classify.mjs module and adding comprehensive unit tests. Feedback focuses on improving the robustness of command classification, specifically refining regex patterns for recursive deletions and MCP read operations, as well as expanding coverage for common commands like npm ci and wget.

gemini-code-assist · 2026-04-20T16:11:40Z

+    /^(?:sudo\s+)?rm\s+(?:-[a-z]*r[a-z]*|--recursive)(?:\s|$)/.test(c) ||
+    /^(?:sudo\s+)?rm\s+-[a-z]+\s+-[a-z]*r/.test(c)


The current regexes for recursive deletion only handle up to two flag blocks (e.g., rm -rf or rm -f -r). Commands with more flags, such as rm -f -v -r, will bypass this check and fall through to a generic shell execution with a much lower risk score (30 instead of 85). Improving the regex to handle an arbitrary number of flag blocks ensures more robust detection of destructive commands.

/^(?:sudo\\s+)?rm\\s+(?:-[a-z]+\\s+)*(-[a-z]*r[a-z]*|--recursive)(?:\\s|$)/.test(c)

gemini-code-assist · 2026-04-20T16:11:40Z

+    return { type: "shell:read:query", riskScore: 5 };
+  if (/^(npm|pnpm|yarn|npx)\s+(run\s+)?(test|typecheck|lint|build)\b/.test(c))
+    return { type: "shell:test:run", riskScore: 10 };
+  if (/^(npm|pnpm|yarn)\s+(publish|install|i\b|add|uninstall|remove)\b/.test(c))


gemini-code-assist · 2026-04-20T16:11:41Z

+    )
+  )
+    return { type: "shell:gh:read", riskScore: 10 };
+  if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };


wget is a common alternative to curl for network requests and should be classified similarly to ensure consistent risk scoring for outbound network activity.

Suggested change

if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };

if (/^(curl|wget)\\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };

gemini-code-assist · 2026-04-20T16:11:41Z

+  const parts = toolName.split("__");
+  const server = parts[1] ?? "unknown";
+  const op = parts.slice(2).join("_") || "unknown";
+  const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op);


The current read-operation detector for MCP tools uses a simple prefix match, which can lead to false positives. For example, an operation named listen or find_and_replace would be incorrectly classified as a "read" operation (risk score 10) instead of a "write" operation (risk score 40). Using a negative lookahead to ensure the verb is not followed by other letters (while still allowing separators or camelCase) improves accuracy.

Suggested change

const isRead = /(^(read|list|search|fetch|get|ls|find))/i.test(op);

const isRead = /^(read|list|search|fetch|get|ls|find)(?![a-z])/i.test(op);

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 317e21c35f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T16:12:10Z

 import { dirname, join, resolve } from "node:path";
 import { fileURLToPath, pathToFileURL } from "node:url";

+import { classify } from "./lib/classify.mjs";


Preserve fail-open semantics for classify import

Importing classify at module top-level means any load-time problem in tools/lib/classify.mjs (missing file, syntax error, unreadable file) will cause Node to terminate before main() runs, so the hook never reaches the safeExit fail-open path documented in this file. In a PreToolUse hook context, that can turn a local packaging/runtime issue into tool-call failures for every invocation instead of gracefully allowing calls through.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/lib/classify.mjs`:
- Around line 46-51: The current branch in classify.mjs that tests the command
string c against the read-only shell regex and returns { type:
"shell:read:query", riskScore: 5 } must skip commands containing shell
redirection; update the condition around the regex test to also check that c
does NOT contain redirection tokens (e.g., unescaped >, >>, 2>, 2>>, >& , <, or
pipe with output redirection) before returning the read-only bucket. In practice
modify the if that uses
/^(ls|cat|head|tail|pwd|echo|which|whoami|hostname|uname|date|env|wc|file)\b/.test(c)
to also assert a negative test like !/[^\S\r\n]*([0-9]*>+|<&|>\&|<|>>|\|[^|])/
(or simply /[^\S\r\n]([0-9]*>+|<|>>|&>)/) to detect redirection and bail out
when present so commands such as "echo foo > ~/.bashrc" or "cat README.md >>
/tmp/out" do not get classified as shell:read:query.
- Around line 27-28: The current check only matches "git clean -f" but misses
common destructive variants like "-fd" or "-df"; update the regex in
tools/lib/classify.mjs that tests the command string variable c so the "clean"
branch also matches any flag token containing "f" (and optionally "d") in any
order or combined form (e.g., "-f", "-fd", "-df"), and continue returning {
type: "shell:git:destructive", riskScore: 75 } for those matches.
- Around line 58-63: The current rule treats all "gh api" invocations as reads;
add a prior check against the input string variable c to detect mutating "gh
api" usage (match /^gh\s+api\b/ and look for mutating indicators like -X or
--method with POST/PUT/PATCH/DELETE, and field/flag usage such as -f/--field/-F)
and return a write classification (e.g., { type: "shell:gh:write", riskScore:
high }) before the existing read branch; update the condition order in
classify.mjs so the new mutating-gh-api check runs before the
/^gh\s+(pr\s+view|...|api\s+)/ read rule.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9017f325-028a-4421-bd08-ecc35233aaaa

📥 Commits

Reviewing files that changed from the base of the PR and between cf5e90a and 317e21c.

📒 Files selected for processing (4)

tools/claude-code-hook.mjs
tools/lib/classify.mjs
tools/lib/classify.test.mjs
vitest.config.ts

Fixes 8 issues raised by reviewers on the classify.mjs refactor: P1 (fail-open regression): classify is now dynamically imported inside main() — a broken classify.mjs falls into the fail-open path instead of crashing the hook at module load time. Classification fixes: - rm: regex now handles arbitrary flag permutations (`rm -f -v -r`) - git clean: detects `-fd` / `-df` / `-fdx` via lookahead over the tail - read-query: bails out on shell redirection (`echo x > ~/.bashrc`) - gh api: `-X POST`, `-f`, `-F`, `--field`, `--raw-field` classify as write - npm/pnpm/yarn ci: added to pkg:mutate bucket - wget: classified alongside curl as shell:net:curl - MCP lookahead: `(?![a-z])` prevents `listen` matching `list`; dropped `/i` flag so camelCase boundaries (`getBoard`) still resolve as read Tests added for each fix; full suite is 226 passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread tools/lib/classify.mjs Outdated

Comment thread tools/lib/classify.mjs

Comment thread tools/lib/classify.mjs

renasami merged commit f86aab9 into main Apr 20, 2026
2 checks passed

coderabbitai bot mentioned this pull request Apr 20, 2026

docs: commit real shadow-mode audit log + analyzer #17

Open

4 tasks

		/^(?:sudo\s+)?rm\s+(?:-[a-z]r[a-z]\|--recursive)(?:\s\|$)/.test(c) \|\|
		/^(?:sudo\s+)?rm\s+-[a-z]+\s+-[a-z]*r/.test(c)

	if (/^(npm\|pnpm\|yarn)\s+(publish\|install\|i\b\|add\|uninstall\|remove)\b/.test(c))
	if (/^(npm\|pnpm\|yarn)\\s+(publish\|install\|i\\b\|ci\\b\|add\|uninstall\|remove)\\b/.test(c))

	if (/^curl\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };
	if (/^(curl\|wget)\\b/.test(c)) return { type: "shell:net:curl", riskScore: 30 };

	const isRead = /(^(read\|list\|search\|fetch\|get\|ls\|find))/i.test(op);
	const isRead = /^(read\|list\|search\|fetch\|get\|ls\|find)(?![a-z])/i.test(op);

Conversation

renasami commented Apr 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Classification gaps surfaced by the tests (kept as-is, documented as KNOWN GAP)

Verification

Related

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renasami commented Apr 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 20, 2026 •

edited

Loading