Skip to content

Record tool usage using hooks instead#621

Merged
haoranpb merged 7 commits intomainfrom
bugs/fix-claude-code-tool-uage
Apr 24, 2026
Merged

Record tool usage using hooks instead#621
haoranpb merged 7 commits intomainfrom
bugs/fix-claude-code-tool-uage

Conversation

@haoranpb
Copy link
Copy Markdown
Collaborator

@haoranpb haoranpb commented Apr 21, 2026

Previous metrics parsing relies on the debug logs, which no longer contains the tool usage.

Instead of relying on logs parsing, we leverage PreToolUse hook to record all tool usage

@haoranpb haoranpb changed the title Update metrics parsing after Claude Code update Record tool usage using hooks instead Apr 23, 2026
@haoranpb haoranpb marked this pull request as ready for review April 23, 2026 10:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates BC-Bench’s agent metrics collection to stop relying on Copilot/Claude debug logs for tool usage (which no longer include it) and instead records tool usage via PreToolUse hooks, then parses a dedicated tool_usage.jsonl output.

Changes:

  • Add hook setup + a PowerShell hook script to log tool usage to tool_usage.jsonl.
  • Add a shared parser for the hooks output and wire it into Copilot/Claude agents.
  • Remove tool-usage-from-log parsing, keep Copilot turn counting from session logs, and update tests accordingly.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/bcbench/operations/hooks_operations.py New operation to write Copilot/Claude hook configuration and define tool usage log destination.
src/bcbench/agent/shared/hooks_parser.py New JSONL parser that aggregates tool usage counts from hook output.
src/bcbench/agent/shared/hooks/log-tool-usage.ps1 New hook script that appends tool usage entries to a JSONL file.
src/bcbench/agent/copilot/metrics.py Removes tool usage parsing from Copilot logs; keeps turn counting.
src/bcbench/agent/copilot/agent.py Configures hooks and attaches parsed tool usage to collected metrics.
src/bcbench/agent/claude/metrics.py Removes debug-log-based tool usage parsing from Claude metrics.
src/bcbench/agent/claude/agent.py Configures hooks and attaches parsed tool usage to collected metrics.
src/bcbench/commands/run.py Updates “copilot-inspector” to use hook-based tool usage parsing + log-based turn counting.
src/bcbench/config.py Adds hook script path and new file-pattern config entries.
src/bcbench/operations/__init__.py Exposes setup_hooks via operations package exports.
src/bcbench/agent/shared/__init__.py Re-exports parse_tool_usage_from_hooks.
tests/test_hooks_parser.py New unit tests for hook JSONL parsing behavior.
tests/test_hooks_operations.py New unit tests for hook config generation for Copilot/Claude.
tests/test_tool_usage_parser.py Replaces tool-usage-from-log tests with turn-count-from-log tests.
tests/test_copilot_metrics_parsing.py Removes tool usage assertions from session logs; keeps turn count assertions.
tests/test_claude_code_metrics.py Removes debug-log tool usage tests; asserts tool usage remains None without hooks.
pyproject.toml / uv.lock Version bump to 0.5.2.

Comment thread src/bcbench/commands/run.py Outdated
Comment thread src/bcbench/operations/hooks_operations.py
@haoranpb haoranpb enabled auto-merge (squash) April 24, 2026 07:24
@haoranpb haoranpb merged commit a9cd827 into main Apr 24, 2026
22 of 27 checks passed
@haoranpb haoranpb deleted the bugs/fix-claude-code-tool-uage branch April 24, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants