fix(completions): pass unknown chat tool types through to upstream#676
Conversation
extract_tools_from_extra parses request.extra["tools"] as the typed
Vec<ToolDefinition>; on parse failure it called .ok() which silently
dropped the value, AND it removed the key from `extra` unconditionally
before parsing. The combination meant any chat-completions request
carrying a non-function-shaped tool (e.g. NEAR's namespaced
`{"type":"web_context_search"}` that inference-proxy expands into a
function tool inside the CVM) lost the entire tools array — cloud-api
forwarded the request without it, and the model never saw the tool.
Fix: only consume from `extra` on successful parse. If parsing fails,
leave the raw value in place so it flows through to the upstream
request body verbatim via the existing #[serde(flatten)] on
ChatCompletionParams.extra. Same treatment for tool_choice for symmetry.
This is a permissive-passthrough change at the cloud-api layer; the
upstream (vLLM/SGLang via inference-proxy) is the right place to decide
whether a tool type is supported. External providers (Gemini, Anthropic)
already only see typed tools today, so they're unaffected.
Tests:
- extract_tools_consumes_function_shaped_tools — regression cover for
the existing path: function tools parse, get consumed from extra
(no duplicate keys when ChatCompletionParams serializes).
- extract_tools_passes_through_unknown_tool_types — the
web_context_search regression: array remains in extra.
- extract_tools_passes_through_mixed_unknown_in_array — array with
both function and unknown stays opaque (conservative: don't strip
some entries from a mixed array).
- extract_tools_passes_through_unknown_tool_choice — symmetric for
tool_choice.
Verified: cargo test -p services --lib (315 + 4 new pass), cargo
clippy --all-targets, cargo fmt --check all clean.
There was a problem hiding this comment.
Code Review
This pull request refactors extract_tools_from_extra to only remove tools and tool_choice from the extra map if they successfully parse into the typed ToolDefinition and ToolChoice shapes. If parsing fails, they are left in extra to flow through to the upstream request body verbatim. It also adds comprehensive unit tests to verify this behavior. The reviewer suggested simplifying the nested match statements in extract_tools_from_extra using idiomatic Rust combinators like and_then and map to reduce boilerplate and improve readability.
| let tools = match extra.get("tools").cloned() { | ||
| Some(raw) => { | ||
| match serde_json::from_value::<Vec<inference_providers::ToolDefinition>>(raw) { | ||
| Ok(parsed) => { | ||
| extra.remove("tools"); | ||
| Some(parsed) | ||
| } | ||
| Err(_) => None, | ||
| } | ||
| } | ||
| None => None, | ||
| }; | ||
|
|
||
| let tool_choice = extra | ||
| .remove("tool_choice") | ||
| .and_then(|v| serde_json::from_value::<inference_providers::ToolChoice>(v).ok()); | ||
| let tool_choice = match extra.get("tool_choice").cloned() { | ||
| Some(raw) => match serde_json::from_value::<inference_providers::ToolChoice>(raw) { | ||
| Ok(parsed) => { | ||
| extra.remove("tool_choice"); | ||
| Some(parsed) | ||
| } | ||
| Err(_) => None, | ||
| }, | ||
| None => None, | ||
| }; |
There was a problem hiding this comment.
The nested match statements can be simplified and made more idiomatic by using and_then and map combinators. This improves readability and reduces boilerplate.
let tools = extra.get("tools").and_then(|raw| {
serde_json::from_value::<Vec<inference_providers::ToolDefinition>>(raw.clone())
.ok()
.map(|parsed| {
extra.remove("tools");
parsed
})
});
let tool_choice = extra.get("tool_choice").and_then(|raw| {
serde_json::from_value::<inference_providers::ToolChoice>(raw.clone())
.ok()
.map(|parsed| {
extra.remove("tool_choice");
parsed
})
});There was a problem hiding this comment.
Pull request overview
This PR fixes tool forwarding in the completions service by ensuring tools / tool_choice are only removed from extra when they successfully parse into the typed inference_providers shapes, preventing non-standard tool definitions from being silently dropped before reaching upstream.
Changes:
- Update
extract_tools_from_extrato only consumeextra.tools/extra.tool_choiceon successful typed parsing (otherwise leave raw JSON intact). - Add unit tests covering: standard function tools are consumed; unknown/mixed tool arrays and unknown
tool_choiceare preserved inextra.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// function-tool shape — the original value is left in `extra` so it | ||
| /// flows through to the upstream request body verbatim via the | ||
| /// `#[serde(flatten)]` on `ChatCompletionParams.extra`. This means | ||
| /// cloud-api never silently drops tool definitions it doesn't | ||
| /// recognise; the upstream gets to decide. |
| // Symmetric to tools: a non-standard tool_choice string survives | ||
| // for upstream interpretation. |
|
Code Review Scope: extract_tools_from_extra in crates/services/src/completions/mod.rs - fix for silent drop of unknown chat tool types, plus 4 unit tests. Correctness The fix is right and the bug is real. tools on ChatCompletionParams has skip_serializing_if = Option::is_none (crates/inference_providers/src/models.rs:217) and extra is serde flatten (models.rs:243). So when parse fails and the raw JSON stays in extra, it flattens into the outbound body without a duplicate tools key. Successful parse removes the key from extra, avoiding a duplicate-key collision on serialization - the first test explicitly guards this. tools and tool_choice are handled independently, so partial success behaves correctly. Behavior change worth flagging For OpenAI-compatible external backends (crates/inference_providers/src/external/openai_compatible.rs:151) the entire ChatCompletionParams is serialized as JSON, so the flattened extra does reach upstream. The PR description says external providers only consume typed tools and are unaffected - that holds for Gemini (gemini/mod.rs:142 builds its own request struct) and Anthropic, but not for OpenAI-compatible providers (OpenAI, Together, Groq, Fireworks). Practical effect: before this PR a client sending a web_context_search tool through an OpenAI-compatible route got the tool silently dropped and a successful response. After this PR that request will surface an upstream 400. Arguably the right behavior (upstream owns the decision) but a user-visible change the PR description understates. Worth a one-line note; no code change required. Style nit (already raised by gemini-code-assist) The nested match blocks could collapse to an and_then + ok chain that removes only on success. Equivalent, half the lines. Not blocking - the current version is also readable. Minor: redundant clone extra.get(tools).cloned() clones the JSON value in both success and failure paths; the success path then calls remove, throwing away the clone. A remove-first / insert-back-on-failure pattern would avoid it, but the perf delta for a small tools array is negligible. Tests Four tests cover: function-shaped happy path, unknown type passthrough, mixed-array conservatism, and tool_choice symmetry. The mixed-array case is the right call - half-parsing arrays in this layer would be worse than passing through opaque. Privacy / logging No new log statements. Compliant with CLAUDE.md. Approved. One ask: add a sentence to the PR description noting that OpenAI-compatible upstreams will now see unknown tools that were previously absorbed silently. Everything else is optional. |
PierreLeGuen
left a comment
There was a problem hiding this comment.
Reviewed the live /v1/chat/completions service path plus the relevant outbound provider paths.
I do not see a blocking issue. The helper now only consumes tools / tool_choice when they parse into the typed shape; otherwise the raw values stay in extra, which is what the direct pass-through providers serialize. Provider-native adapters still rebuild their own request bodies from typed fields, so this does not unexpectedly enable unsupported raw tool shapes there.
Verified locally:
cargo fmt --checkgit diff --check origin/main...HEADcargo test -p services --lib extract_toolscargo test -p services --libcargo test -p inference_providers --lib strip_reasoning_effortcargo test -p inference_providers --lib flatten
GitHub checks are green. Non-blocking: the PR text should be explicit that pass-through providers may now surface upstream validation errors for tool shapes that were previously silently dropped.
`ChatDelta` is a typed struct with no `#[serde(flatten)]` catch-all, so any field the upstream emits that's not in our explicit field list is silently dropped on deserialize and never reaches the client. This bites the inference-proxy's server-side agent loop (nearai/inference-proxy#144): the proxy emits a synthetic `delta.nearai_tool_result` chunk between iterations carrying the tool's grounded output. Empirically, against a freshly-deployed staging running #676, the model successfully called `web_context_search` (we saw the tool_calls chunks on the wire), but the proxy's `nearai_tool_result` chunks never made it through to the client — cloud-api stripped them on chunk re-serialization. Same root cause as #676 on the request side: typed structs with no catch-all. Fix: add a flattened `extra: HashMap<String, serde_json::Value>` to `ChatDelta`. Unknown delta fields are now preserved verbatim on the deserialize-then-re-serialize round trip. Derived `Default` on the struct so the empty catch-all is `HashMap::new()` by default, then updated the handful of explicit `ChatDelta { ... }` literals across the workspace to set `extra: Default::default()`. Regression test: `chat_delta_preserves_unknown_fields_round_trip` constructs a chunk with `delta.nearai_tool_result`, deserializes it, asserts the catch-all contains the expected payload, and round-trips the re-serialized JSON to confirm the synthetic field survives. `cargo test --workspace --lib` clean (315 + 174 + 106 + 16 + 11 passing, including the new test). `cargo clippy --all-targets -- -D warnings` clean.
Problem
extract_tools_from_extraincrates/services/src/completions/mod.rshad two interacting bugs:extra.remove(\"tools\")happens unconditionally, before parsing..ok()swallows parse errors silently.Net effect: any chat-completions request whose
toolsarray doesn't fit our typedToolDefinitionshape (i.e. anything other than{type:\"function\", function:{name,description,parameters}}) is silently dropped — removed fromextraand parsed toNone. The upstream never sees the tool, and the model never knows it could call it.This blocked the
web_context_searchagent loop (nearai/inference-proxy#144) from running end-to-end through cloud-api. The inference-proxy expands{\"type\":\"web_context_search\"}into a proper function tool inside the CVM, but cloud-api was stripping it before the request ever got there.Empirical evidence:
POST /v1/chat/completionswithtools:[{\"type\":\"web_context_search\"}]against GLM-5.1 returnsprompt_tokens=19(just the user message). The tool schema alone would add ~100 tokens — so cloud-api forwarded the request without it.Fix
Only consume from
extraon successful parse. If parsing fails, the raw value stays inextraand flows through to the upstream request body verbatim via the existing#[serde(flatten)]onChatCompletionParams.extra. Same treatment fortool_choicefor symmetry.This is a permissive-passthrough change at the cloud-api layer: deciding whether a tool type is supported is the upstream's job (vLLM/SGLang via inference-proxy for our models, the provider for Gemini/Anthropic). External providers already only consume typed tools, so they're unaffected — non-function tools simply don't reach them.
Tests
cargo test -p services --lib extract_tools(4 new tests, all pass):extract_tools_consumes_function_shaped_tools— regression cover for the existing path: standard function tools parse, get consumed fromextra(so we don't end up with both the typedtoolsfield and a flattenedextra.tools, which would be a duplicate key in the upstream JSON).extract_tools_passes_through_unknown_tool_types— theweb_context_searchregression: array remains inextra.extract_tools_passes_through_mixed_unknown_in_array— an array mixing function and unknown stays fully opaque. Conservative: don't strip some entries; cloud-api isn't the right layer to half-parse arrays.extract_tools_passes_through_unknown_tool_choice— symmetric guard fortool_choice.Full suite:
cargo test -p services --lib→ 315 existing + 4 new pass.cargo clippy --all-targets -- -D warningsclean.cargo fmt --checkclean.Test plan after deploy
POST /v1/chat/completionstocloud-api.near.aiwithtools:[{\"type\":\"web_context_search\"}]against a GLM-5.1 deployment that hasWEB_CONTEXT_SEARCH_*env set (gpu23/04/03/26 — done in cvm-conf v0.0.191). Expect:nearai_tool_resultchunk +data: [DONE].X-Signing-Algo,X-Client-Pub-Key). Expect: encryptednearai_tool_result.outputon the wire, decryptable with the client key.function-typed tool — should be byte-identical to today (the regression test covers this in code).Related
web_context_searchloop in the inference-proxy.