fix(grader): support 'arguments' key for tool calls in task_workflow#306
Merged
olearycrew merged 1 commit intopinchbench:mainfrom Apr 14, 2026
Merged
Conversation
OpenClaw and Claude Code serialize tool call parameters under the 'arguments' key, while Cursor and Windsurf use 'params'. The grader only checked 'params', so read_config always scored 0 for OpenClaw/ Claude Code agents even when they correctly read config.json. Fix: fall back to 'arguments' when 'params' is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge The fix correctly handles both Files Reviewed (1 file)
Reviewed by claude-4.6-sonnet-20260217 · 58,447 tokens |
Member
|
@mgoulart thanks for this fix! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The grader in
task_workflow.mdchecks whether the agent readconfig.jsonby inspecting tool call parameters:This only works for agents that serialize tool parameters under the
"params"key (Cursor, Windsurf). OpenClaw and Claude Code use"arguments"instead — matching the OpenAI tool call spec. As a result,read_configalways scores0for those agents even when they correctly read the file.Fix
Fall back to
"arguments"when"params"is absent:Verified
Confirmed by running
task_10_workflow(pre-rename) againstkimi-k2p5on Fireworks. The agent correctly readconfig.json— visible in the transcript — but scored0onread_configbefore this fix and1.0after.🤖 Generated with Claude Code