Skip to content

fix(completions): pass unknown chat tool types through to upstream#676

Merged
Evrard-Nil merged 1 commit into
mainfrom
feat/passthrough-unknown-tool-types
May 26, 2026
Merged

fix(completions): pass unknown chat tool types through to upstream#676
Evrard-Nil merged 1 commit into
mainfrom
feat/passthrough-unknown-tool-types

Conversation

@Evrard-Nil
Copy link
Copy Markdown
Collaborator

Problem

extract_tools_from_extra in crates/services/src/completions/mod.rs had two interacting bugs:

let tools = extra.remove("tools").and_then(|v| {
    serde_json::from_value::<Vec<inference_providers::ToolDefinition>>(v).ok()
});
  1. extra.remove(\"tools\") happens unconditionally, before parsing.
  2. .ok() swallows parse errors silently.

Net effect: any chat-completions request whose tools array doesn't fit our typed ToolDefinition shape (i.e. anything other than {type:\"function\", function:{name,description,parameters}}) is silently dropped — removed from extra and parsed to None. The upstream never sees the tool, and the model never knows it could call it.

This blocked the web_context_search agent loop (nearai/inference-proxy#144) from running end-to-end through cloud-api. The inference-proxy expands {\"type\":\"web_context_search\"} into a proper function tool inside the CVM, but cloud-api was stripping it before the request ever got there.

Empirical evidence: POST /v1/chat/completions with tools:[{\"type\":\"web_context_search\"}] against GLM-5.1 returns prompt_tokens=19 (just the user message). The tool schema alone would add ~100 tokens — so cloud-api forwarded the request without it.

Fix

Only consume from extra on successful parse. If parsing fails, the raw value stays in extra and flows through to the upstream request body verbatim via the existing #[serde(flatten)] on ChatCompletionParams.extra. Same treatment for tool_choice for symmetry.

This is a permissive-passthrough change at the cloud-api layer: deciding whether a tool type is supported is the upstream's job (vLLM/SGLang via inference-proxy for our models, the provider for Gemini/Anthropic). External providers already only consume typed tools, so they're unaffected — non-function tools simply don't reach them.

Tests

cargo test -p services --lib extract_tools (4 new tests, all pass):

  • extract_tools_consumes_function_shaped_tools — regression cover for the existing path: standard function tools parse, get consumed from extra (so we don't end up with both the typed tools field and a flattened extra.tools, which would be a duplicate key in the upstream JSON).
  • extract_tools_passes_through_unknown_tool_types — the web_context_search regression: array remains in extra.
  • extract_tools_passes_through_mixed_unknown_in_array — an array mixing function and unknown stays fully opaque. Conservative: don't strip some entries; cloud-api isn't the right layer to half-parse arrays.
  • extract_tools_passes_through_unknown_tool_choice — symmetric guard for tool_choice.

Full suite: cargo test -p services --lib → 315 existing + 4 new pass. cargo clippy --all-targets -- -D warnings clean. cargo fmt --check clean.

Test plan after deploy

  • Plain (no-E2EE) POST /v1/chat/completions to cloud-api.near.ai with tools:[{\"type\":\"web_context_search\"}] against a GLM-5.1 deployment that has WEB_CONTEXT_SEARCH_* env set (gpu23/04/03/26 — done in cvm-conf v0.0.191). Expect: nearai_tool_result chunk + data: [DONE].
  • Same request with E2EE headers (X-Signing-Algo, X-Client-Pub-Key). Expect: encrypted nearai_tool_result.output on the wire, decryptable with the client key.
  • Existing chat completion + standard function-typed tool — should be byte-identical to today (the regression test covers this in code).

Related

extract_tools_from_extra parses request.extra["tools"] as the typed
Vec<ToolDefinition>; on parse failure it called .ok() which silently
dropped the value, AND it removed the key from `extra` unconditionally
before parsing. The combination meant any chat-completions request
carrying a non-function-shaped tool (e.g. NEAR's namespaced
`{"type":"web_context_search"}` that inference-proxy expands into a
function tool inside the CVM) lost the entire tools array — cloud-api
forwarded the request without it, and the model never saw the tool.

Fix: only consume from `extra` on successful parse. If parsing fails,
leave the raw value in place so it flows through to the upstream
request body verbatim via the existing #[serde(flatten)] on
ChatCompletionParams.extra. Same treatment for tool_choice for symmetry.

This is a permissive-passthrough change at the cloud-api layer; the
upstream (vLLM/SGLang via inference-proxy) is the right place to decide
whether a tool type is supported. External providers (Gemini, Anthropic)
already only see typed tools today, so they're unaffected.

Tests:
- extract_tools_consumes_function_shaped_tools — regression cover for
  the existing path: function tools parse, get consumed from extra
  (no duplicate keys when ChatCompletionParams serializes).
- extract_tools_passes_through_unknown_tool_types — the
  web_context_search regression: array remains in extra.
- extract_tools_passes_through_mixed_unknown_in_array — array with
  both function and unknown stays opaque (conservative: don't strip
  some entries from a mixed array).
- extract_tools_passes_through_unknown_tool_choice — symmetric for
  tool_choice.

Verified: cargo test -p services --lib (315 + 4 new pass), cargo
clippy --all-targets, cargo fmt --check all clean.
Copilot AI review requested due to automatic review settings May 26, 2026 17:37
@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env May 26, 2026 17:37 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors extract_tools_from_extra to only remove tools and tool_choice from the extra map if they successfully parse into the typed ToolDefinition and ToolChoice shapes. If parsing fails, they are left in extra to flow through to the upstream request body verbatim. It also adds comprehensive unit tests to verify this behavior. The reviewer suggested simplifying the nested match statements in extract_tools_from_extra using idiomatic Rust combinators like and_then and map to reduce boilerplate and improve readability.

Comment on lines +577 to +599
let tools = match extra.get("tools").cloned() {
Some(raw) => {
match serde_json::from_value::<Vec<inference_providers::ToolDefinition>>(raw) {
Ok(parsed) => {
extra.remove("tools");
Some(parsed)
}
Err(_) => None,
}
}
None => None,
};

let tool_choice = extra
.remove("tool_choice")
.and_then(|v| serde_json::from_value::<inference_providers::ToolChoice>(v).ok());
let tool_choice = match extra.get("tool_choice").cloned() {
Some(raw) => match serde_json::from_value::<inference_providers::ToolChoice>(raw) {
Ok(parsed) => {
extra.remove("tool_choice");
Some(parsed)
}
Err(_) => None,
},
None => None,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested match statements can be simplified and made more idiomatic by using and_then and map combinators. This improves readability and reduces boilerplate.

        let tools = extra.get("tools").and_then(|raw| {
            serde_json::from_value::<Vec<inference_providers::ToolDefinition>>(raw.clone())
                .ok()
                .map(|parsed| {
                    extra.remove("tools");
                    parsed
                })
        });

        let tool_choice = extra.get("tool_choice").and_then(|raw| {
            serde_json::from_value::<inference_providers::ToolChoice>(raw.clone())
                .ok()
                .map(|parsed| {
                    extra.remove("tool_choice");
                    parsed
                })
        });

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes tool forwarding in the completions service by ensuring tools / tool_choice are only removed from extra when they successfully parse into the typed inference_providers shapes, preventing non-standard tool definitions from being silently dropped before reaching upstream.

Changes:

  • Update extract_tools_from_extra to only consume extra.tools / extra.tool_choice on successful typed parsing (otherwise leave raw JSON intact).
  • Add unit tests covering: standard function tools are consumed; unknown/mixed tool arrays and unknown tool_choice are preserved in extra.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +566 to +570
/// function-tool shape — the original value is left in `extra` so it
/// flows through to the upstream request body verbatim via the
/// `#[serde(flatten)]` on `ChatCompletionParams.extra`. This means
/// cloud-api never silently drops tool definitions it doesn't
/// recognise; the upstream gets to decide.
Comment on lines +2852 to +2853
// Symmetric to tools: a non-standard tool_choice string survives
// for upstream interpretation.
@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review

Scope: extract_tools_from_extra in crates/services/src/completions/mod.rs - fix for silent drop of unknown chat tool types, plus 4 unit tests.

Correctness

The fix is right and the bug is real. tools on ChatCompletionParams has skip_serializing_if = Option::is_none (crates/inference_providers/src/models.rs:217) and extra is serde flatten (models.rs:243). So when parse fails and the raw JSON stays in extra, it flattens into the outbound body without a duplicate tools key. Successful parse removes the key from extra, avoiding a duplicate-key collision on serialization - the first test explicitly guards this. tools and tool_choice are handled independently, so partial success behaves correctly.

Behavior change worth flagging

For OpenAI-compatible external backends (crates/inference_providers/src/external/openai_compatible.rs:151) the entire ChatCompletionParams is serialized as JSON, so the flattened extra does reach upstream. The PR description says external providers only consume typed tools and are unaffected - that holds for Gemini (gemini/mod.rs:142 builds its own request struct) and Anthropic, but not for OpenAI-compatible providers (OpenAI, Together, Groq, Fireworks).

Practical effect: before this PR a client sending a web_context_search tool through an OpenAI-compatible route got the tool silently dropped and a successful response. After this PR that request will surface an upstream 400. Arguably the right behavior (upstream owns the decision) but a user-visible change the PR description understates. Worth a one-line note; no code change required.

Style nit (already raised by gemini-code-assist)

The nested match blocks could collapse to an and_then + ok chain that removes only on success. Equivalent, half the lines. Not blocking - the current version is also readable.

Minor: redundant clone

extra.get(tools).cloned() clones the JSON value in both success and failure paths; the success path then calls remove, throwing away the clone. A remove-first / insert-back-on-failure pattern would avoid it, but the perf delta for a small tools array is negligible.

Tests

Four tests cover: function-shaped happy path, unknown type passthrough, mixed-array conservatism, and tool_choice symmetry. The mixed-array case is the right call - half-parsing arrays in this layer would be worse than passing through opaque.

Privacy / logging

No new log statements. Compliant with CLAUDE.md.

Approved. One ask: add a sentence to the PR description noting that OpenAI-compatible upstreams will now see unknown tools that were previously absorbed silently. Everything else is optional.

Copy link
Copy Markdown
Contributor

@PierreLeGuen PierreLeGuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the live /v1/chat/completions service path plus the relevant outbound provider paths.

I do not see a blocking issue. The helper now only consumes tools / tool_choice when they parse into the typed shape; otherwise the raw values stay in extra, which is what the direct pass-through providers serialize. Provider-native adapters still rebuild their own request bodies from typed fields, so this does not unexpectedly enable unsupported raw tool shapes there.

Verified locally:

  • cargo fmt --check
  • git diff --check origin/main...HEAD
  • cargo test -p services --lib extract_tools
  • cargo test -p services --lib
  • cargo test -p inference_providers --lib strip_reasoning_effort
  • cargo test -p inference_providers --lib flatten

GitHub checks are green. Non-blocking: the PR text should be explicit that pass-through providers may now surface upstream validation errors for tool shapes that were previously silently dropped.

@Evrard-Nil Evrard-Nil merged commit f017653 into main May 26, 2026
6 checks passed
Evrard-Nil added a commit that referenced this pull request May 26, 2026
`ChatDelta` is a typed struct with no `#[serde(flatten)]` catch-all,
so any field the upstream emits that's not in our explicit field list
is silently dropped on deserialize and never reaches the client.

This bites the inference-proxy's server-side agent loop
(nearai/inference-proxy#144): the proxy emits a synthetic
`delta.nearai_tool_result` chunk between iterations carrying the
tool's grounded output. Empirically, against a freshly-deployed
staging running #676, the model successfully called
`web_context_search` (we saw the tool_calls chunks on the wire), but
the proxy's `nearai_tool_result` chunks never made it through to the
client — cloud-api stripped them on chunk re-serialization. Same root
cause as #676 on the request side: typed structs with no catch-all.

Fix: add a flattened `extra: HashMap<String, serde_json::Value>` to
`ChatDelta`. Unknown delta fields are now preserved verbatim on the
deserialize-then-re-serialize round trip. Derived `Default` on the
struct so the empty catch-all is `HashMap::new()` by default, then
updated the handful of explicit `ChatDelta { ... }` literals across
the workspace to set `extra: Default::default()`.

Regression test: `chat_delta_preserves_unknown_fields_round_trip`
constructs a chunk with `delta.nearai_tool_result`, deserializes it,
asserts the catch-all contains the expected payload, and round-trips
the re-serialized JSON to confirm the synthetic field survives.

`cargo test --workspace --lib` clean (315 + 174 + 106 + 16 + 11
passing, including the new test). `cargo clippy --all-targets
-- -D warnings` clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants