refactor(llm): unified ToolError contract for tool arg validation by longcw · Pull Request #5807 · livekit/agents

longcw · 2026-05-22T07:32:13Z

Summary

Unifies tool-argument validation under a single ToolError contract owned by prepare_function_arguments, and fixes a malformed-JSON propagation bug along the way.

prepare_function_arguments wraps ValidationError / ValueError / TypeError as ToolError("Error parsing arguments for ..."), so the message the LLM sees is owned in one place.
Accepts an optional fnc_call: FunctionCall | None parameter. When provided and json_arguments is a string, the canonical JSON (post json_repair) is written back to fnc_call.arguments before pydantic validation runs. This way, even when validation later fails, the conversation history holds valid JSON instead of the broken raw payload — without this, providers like Vertex/OpenAI reject the next request with a 5xx when re-serializing it (#5807 review).
execute_function_call drops its dedicated (ValidationError, ValueError) branch — one prepare_function_arguments(..., fnc_call=fnc_call) call covers both parse failures and validation failures.
ToolProxyToolset._handle_call drops its own try/except wrapper and passes the parameters dict directly (no extra json.dumps).
Pure-validation callers (judge.py, async_toolset.py, run_result.py) are unchanged — they just don't pass fnc_call.

Extracted from #5711.

- prepare_function_arguments wraps ValidationError/ValueError/TypeError as ToolError("Error parsing arguments for ..."), so the message the LLM sees is owned in one place. - execute_function_call drops its dedicated validation branch and no longer logs a traceback for ToolError — it's intentional signal to the LLM. - _execute_tools_task moves argument prep into a _execute(ctx) closure called inside the per-tool try block, so the new ToolError is routed via _tool_completed the same way a ToolError raised by the tool body is. - ToolProxyToolset._handle_call drops its own try/except wrapper. Extracted from #5711.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

…ract

`prepare_function_arguments` now returns a `PreparedFunctionArguments` dataclass exposing `canonical_arguments` alongside `args`/`kwargs`. The dataclass iterates as `(args, kwargs)` so existing unpacking call sites keep working. `execute_function_call` and `_execute_tools_task` use the canonical form to overwrite `fnc_call.arguments` when json_repair had to recover the payload — without this, malformed JSON propagates into the next LLM turn and providers like Vertex/OpenAI reject the request with 5xx. ToolError wrapping stays in one place (`prepare_function_arguments`), so raw JSON parse failures surface as descriptive errors to the LLM instead of the generic "An internal error occurred" message.

`prepare_function_arguments` now runs before `first_tool_started_fut` and `tool_execution_started_cb`, so a tool call that fails arg validation is short-circuited via `_tool_completed` the same way an unknown function or wrong tool type already is — no spurious "started" signals for tools that never run. Canonical args are written back to `fnc_call.arguments` before the started callback fires, so subscribers and telemetry see the normalized payload, not the broken raw JSON the model emitted. Drops the `_execute` closure: `function_callable` is built directly with `functools.partial` for mock vs real tool.

`prepare_function_arguments` accepts an optional `fnc_call` parameter and runs in two phases: parse → write canonical JSON to `fnc_call.arguments` → validate. Canonical args are persisted BEFORE pydantic validation, so when validation fails the conversation history still contains valid JSON. Without this, a malformed payload that json_repair fixed but pydantic rejected (wrong types, missing required fields) would leave the broken raw string on the FunctionCall — and providers like Vertex/OpenAI reject the next request with a 5xx when re-serializing it. Drops the `PreparedFunctionArguments` dataclass; back to returning `tuple[args, kwargs]`. `execute_function_call` and `_execute_tools_task` now pass `fnc_call=fnc_call` and handle both parse and validation failures with a single `except ToolError`. Pure-validation callers (judge.py, async_toolset.py, tool_proxy.py, run_result.py) are unchanged. Adds test_execute_function_call_canonicalizes_when_validation_fails covering the bug scenario.

chenghao-mou requested a review from a team May 22, 2026 07:32

devin-ai-integration Bot reviewed May 22, 2026

View reviewed changes

theomonnom approved these changes May 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into longc/tool-error-cont…

d9ef20d

…ract

This comment was marked as resolved.

Sign in to view

longcw added 4 commits May 26, 2026 15:36

fix type check

225245f

revert warning message

b2910ab

This comment was marked as resolved.

Sign in to view

longcw merged commit ccdf2e0 into main May 26, 2026
24 checks passed

longcw deleted the longc/tool-error-contract branch May 26, 2026 11:20

rosetta-livekit-bot Bot mentioned this pull request May 26, 2026

refactor(llm): unify ToolError contract for arg validation livekit/agents-js#1607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(llm): unified ToolError contract for tool arg validation#5807

refactor(llm): unified ToolError contract for tool arg validation#5807
longcw merged 7 commits into
mainfrom
longc/tool-error-contract

longcw commented May 22, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longcw commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

longcw commented May 22, 2026 •

edited

Loading