Skip to content

refactor(llm): unified ToolError contract for tool arg validation#5807

Merged
longcw merged 7 commits into
mainfrom
longc/tool-error-contract
May 26, 2026
Merged

refactor(llm): unified ToolError contract for tool arg validation#5807
longcw merged 7 commits into
mainfrom
longc/tool-error-contract

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented May 22, 2026

Summary

Unifies tool-argument validation under a single ToolError contract owned by prepare_function_arguments, and fixes a malformed-JSON propagation bug along the way.

  • prepare_function_arguments wraps ValidationError / ValueError / TypeError as ToolError("Error parsing arguments for ..."), so the message the LLM sees is owned in one place.
  • Accepts an optional fnc_call: FunctionCall | None parameter. When provided and json_arguments is a string, the canonical JSON (post json_repair) is written back to fnc_call.arguments before pydantic validation runs. This way, even when validation later fails, the conversation history holds valid JSON instead of the broken raw payload — without this, providers like Vertex/OpenAI reject the next request with a 5xx when re-serializing it (#5807 review).
  • execute_function_call drops its dedicated (ValidationError, ValueError) branch — one prepare_function_arguments(..., fnc_call=fnc_call) call covers both parse failures and validation failures.
  • ToolProxyToolset._handle_call drops its own try/except wrapper and passes the parameters dict directly (no extra json.dumps).
  • Pure-validation callers (judge.py, async_toolset.py, run_result.py) are unchanged — they just don't pass fnc_call.

Extracted from #5711.

- prepare_function_arguments wraps ValidationError/ValueError/TypeError as
  ToolError("Error parsing arguments for ..."), so the message the LLM sees
  is owned in one place.
- execute_function_call drops its dedicated validation branch and no longer
  logs a traceback for ToolError — it's intentional signal to the LLM.
- _execute_tools_task moves argument prep into a _execute(ctx) closure called
  inside the per-tool try block, so the new ToolError is routed via
  _tool_completed the same way a ToolError raised by the tool body is.
- ToolProxyToolset._handle_call drops its own try/except wrapper.

Extracted from #5711.
@chenghao-mou chenghao-mou requested a review from a team May 22, 2026 07:32
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

devin-ai-integration[bot]

This comment was marked as resolved.

longcw added 4 commits May 26, 2026 15:36
`prepare_function_arguments` now returns a `PreparedFunctionArguments`
dataclass exposing `canonical_arguments` alongside `args`/`kwargs`. The
dataclass iterates as `(args, kwargs)` so existing unpacking call sites
keep working.

`execute_function_call` and `_execute_tools_task` use the canonical form
to overwrite `fnc_call.arguments` when json_repair had to recover the
payload — without this, malformed JSON propagates into the next LLM turn
and providers like Vertex/OpenAI reject the request with 5xx.

ToolError wrapping stays in one place (`prepare_function_arguments`), so
raw JSON parse failures surface as descriptive errors to the LLM instead
of the generic "An internal error occurred" message.
`prepare_function_arguments` now runs before `first_tool_started_fut` and
`tool_execution_started_cb`, so a tool call that fails arg validation is
short-circuited via `_tool_completed` the same way an unknown function or
wrong tool type already is — no spurious "started" signals for tools that
never run.

Canonical args are written back to `fnc_call.arguments` before the started
callback fires, so subscribers and telemetry see the normalized payload,
not the broken raw JSON the model emitted.

Drops the `_execute` closure: `function_callable` is built directly with
`functools.partial` for mock vs real tool.
devin-ai-integration[bot]

This comment was marked as resolved.

`prepare_function_arguments` accepts an optional `fnc_call` parameter and
runs in two phases: parse → write canonical JSON to `fnc_call.arguments`
→ validate. Canonical args are persisted BEFORE pydantic validation, so
when validation fails the conversation history still contains valid JSON.

Without this, a malformed payload that json_repair fixed but pydantic
rejected (wrong types, missing required fields) would leave the broken
raw string on the FunctionCall — and providers like Vertex/OpenAI reject
the next request with a 5xx when re-serializing it.

Drops the `PreparedFunctionArguments` dataclass; back to returning
`tuple[args, kwargs]`. `execute_function_call` and `_execute_tools_task`
now pass `fnc_call=fnc_call` and handle both parse and validation
failures with a single `except ToolError`.

Pure-validation callers (judge.py, async_toolset.py, tool_proxy.py,
run_result.py) are unchanged.

Adds test_execute_function_call_canonicalizes_when_validation_fails
covering the bug scenario.
@longcw longcw merged commit ccdf2e0 into main May 26, 2026
24 checks passed
@longcw longcw deleted the longc/tool-error-contract branch May 26, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants