refactor(llm): unified ToolError contract for tool arg validation#5807
Merged
Conversation
- prepare_function_arguments wraps ValidationError/ValueError/TypeError as
ToolError("Error parsing arguments for ..."), so the message the LLM sees
is owned in one place.
- execute_function_call drops its dedicated validation branch and no longer
logs a traceback for ToolError — it's intentional signal to the LLM.
- _execute_tools_task moves argument prep into a _execute(ctx) closure called
inside the per-tool try block, so the new ToolError is routed via
_tool_completed the same way a ToolError raised by the tool body is.
- ToolProxyToolset._handle_call drops its own try/except wrapper.
Extracted from #5711.
theomonnom
approved these changes
May 26, 2026
`prepare_function_arguments` now returns a `PreparedFunctionArguments` dataclass exposing `canonical_arguments` alongside `args`/`kwargs`. The dataclass iterates as `(args, kwargs)` so existing unpacking call sites keep working. `execute_function_call` and `_execute_tools_task` use the canonical form to overwrite `fnc_call.arguments` when json_repair had to recover the payload — without this, malformed JSON propagates into the next LLM turn and providers like Vertex/OpenAI reject the request with 5xx. ToolError wrapping stays in one place (`prepare_function_arguments`), so raw JSON parse failures surface as descriptive errors to the LLM instead of the generic "An internal error occurred" message.
`prepare_function_arguments` now runs before `first_tool_started_fut` and `tool_execution_started_cb`, so a tool call that fails arg validation is short-circuited via `_tool_completed` the same way an unknown function or wrong tool type already is — no spurious "started" signals for tools that never run. Canonical args are written back to `fnc_call.arguments` before the started callback fires, so subscribers and telemetry see the normalized payload, not the broken raw JSON the model emitted. Drops the `_execute` closure: `function_callable` is built directly with `functools.partial` for mock vs real tool.
`prepare_function_arguments` accepts an optional `fnc_call` parameter and runs in two phases: parse → write canonical JSON to `fnc_call.arguments` → validate. Canonical args are persisted BEFORE pydantic validation, so when validation fails the conversation history still contains valid JSON. Without this, a malformed payload that json_repair fixed but pydantic rejected (wrong types, missing required fields) would leave the broken raw string on the FunctionCall — and providers like Vertex/OpenAI reject the next request with a 5xx when re-serializing it. Drops the `PreparedFunctionArguments` dataclass; back to returning `tuple[args, kwargs]`. `execute_function_call` and `_execute_tools_task` now pass `fnc_call=fnc_call` and handle both parse and validation failures with a single `except ToolError`. Pure-validation callers (judge.py, async_toolset.py, tool_proxy.py, run_result.py) are unchanged. Adds test_execute_function_call_canonicalizes_when_validation_fails covering the bug scenario.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unifies tool-argument validation under a single
ToolErrorcontract owned byprepare_function_arguments, and fixes a malformed-JSON propagation bug along the way.prepare_function_argumentswrapsValidationError/ValueError/TypeErrorasToolError("Error parsing arguments for ..."), so the message the LLM sees is owned in one place.fnc_call: FunctionCall | Noneparameter. When provided andjson_argumentsis a string, the canonical JSON (post json_repair) is written back tofnc_call.argumentsbefore pydantic validation runs. This way, even when validation later fails, the conversation history holds valid JSON instead of the broken raw payload — without this, providers like Vertex/OpenAI reject the next request with a 5xx when re-serializing it (#5807 review).execute_function_calldrops its dedicated(ValidationError, ValueError)branch — oneprepare_function_arguments(..., fnc_call=fnc_call)call covers both parse failures and validation failures.ToolProxyToolset._handle_calldrops its own try/except wrapper and passes the parameters dict directly (no extrajson.dumps).fnc_call.Extracted from #5711.