improvement(providers): harden OpenAI-compatible providers + add tests#4796
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Shared behavior: Tool + stream fixes: Post–tool-loop streaming calls use Structured output: Fireworks stops sending Tool loop robustness (LiteLLM): Safer parsing of empty tool arguments, stub tool messages for unknown tool names, Errors: Ollama maps Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here. |
Greptile SummaryThis PR hardens five OpenAI-compatible providers (vLLM, Ollama, OpenRouter, Fireworks, LiteLLM) for tool calling, streaming, and structured output, and adds 56 unit tests. Key improvements include
Confidence Score: 4/5Safe to merge with the OpenRouter streaming path fix outstanding — the bug only affects callers that use both streaming and structured output with active tools on OpenRouter. The OpenRouter streaming final call retains tool_choice:'auto' while the rest of the payload still carries the full tool list, meaning the model can emit another round of tool calls on what is supposed to be the final content-generating call. The bug is scoped to users who combine streaming, tools, and responseFormat on the OpenRouter provider, but it can produce empty or stale streamed content with no error surfaced to the caller. apps/sim/providers/openrouter/index.ts — the post-tool-loop streaming call at line 473 needs tool_choice:'none' to match the fix applied to fireworks and litellm in this PR. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Provider executeRequest] --> B{Has tools?}
B -- No --> C{stream?}
C -- Yes --> D[Direct streaming call]
C -- No --> E[Single completion call]
B -- Yes --> F[Tool loop: up to MAX_TOOL_ITERATIONS]
F --> G{Response has tool_calls?}
G -- Yes --> H[Execute tools in parallel]
H --> I[Push assistant + tool messages]
I --> J{Unknown tool ID?}
J -- litellm only --> K[Push stub error message]
J -- vllm/ollama --> L[No message pushed]
K --> F
L --> F
G -- No --> M{stream?}
M -- Yes --> N{Provider}
N -- fireworks/vllm/litellm --> O[tool_choice: none]
N -- openrouter --> P[tool_choice: auto]
O --> Q[Final streaming response]
P --> Q
M -- No --> R{responseFormat?}
R -- litellm/fireworks --> S[Deferred final call with response_format]
R -- openrouter --> T[Fresh finalPayload without tools]
S --> U[Return ProviderResponse]
T --> U
E --> U
D --> V[Return StreamingExecution]
Q --> V
Reviews (3): Last reviewed commit: "chore(providers/ollama): drop orphaned e..." | Re-trigger Greptile |
The deferred final call used tool_choice 'auto', so the model could emit another tool_calls round instead of the structured answer, leaving content stale. Use 'none' (matching vLLM/Fireworks) on both the streaming and non-streaming final calls so the model must return the structured response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
Ollama ignores tool_choice (not in its supported fields), so vLLM/Fireworks' tool_choice:'none' guard is a no-op here. Omit tools from the final streaming payload instead so the summarization turn can't emit dropped tool calls. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…fort carries over The non-streaming deferred finalPayload hand-picked fields and dropped reasoning_effort (and any future payload field), diverging from the streaming path which spreads ...payload. Spread payload here too for consistency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keeps parity with sibling Chat Completions providers (cerebras/mistral/xai). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the TSDoc blocks on supportsNativeStructuredOutputs, createReadableStreamFromOpenAIStream, and checkForForcedToolUsage — TSDoc is the codebase documentation standard and should not have been stripped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The block documented a function that now lives in trace-enrichment.ts, so it documents nothing in this file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here.
Summary
tool_choice: 'none'on the post-tool streaming call (prevents silently-dropped tool calls and an--enable-auto-tool-choice400); remove dead codetool_choice: 'none'on the final streaming call; clarify native json_schema vs json_object fallbackenforceStrictSchema; deferresponse_formatpast the tool loop (response_format + tools conflict on some backends); addreasoning_effortpassthrough; usemax_completion_tokensOpenAI.APIError(type/code/status)enforceStrictSchemainto the shared utils (litellm and the OpenAI provider now consume it; removes the duplicated local copy in openai/core)Type of Change
Testing
bunx vitest runon the five provider suites — 56/56 passing. Behavior verified against vLLM and OpenAI live API docs. (Note:tscis not installed in this workspace, so no type-check was run.)Checklist