improvement(providers): harden OpenAI-compatible providers + add tests by waleedlatif1 · Pull Request #4796 · simstudioai/sim

waleedlatif1 · 2026-05-29T20:03:38Z

Summary

Harden the OpenAI-compatible Chat Completions providers (vllm, ollama, openrouter, fireworks, litellm) for tool calling, streaming, structured output, and error handling — validated against vLLM/OpenAI live API docs
vLLM: consolidate forced-tool tracking onto the shared helper; wrap the tool loop in try/catch so a mid-loop failure returns partial results; set tool_choice: 'none' on the post-tool streaming call (prevents silently-dropped tool calls and an --enable-auto-tool-choice 400); remove dead code
fireworks: tool_choice: 'none' on the final streaming call; clarify native json_schema vs json_object fallback
litellm: use shared enforceStrictSchema; defer response_format past the tool loop (response_format + tools conflict on some backends); add reasoning_effort passthrough; use max_completion_tokens
ollama: only strip ```json fences when a response format is requested; structured error extraction from OpenAI.APIError (type/code/status)
openrouter: only record tool output when the call succeeded and returned output
providers/utils + openai/core: extract enforceStrictSchema into the shared utils (litellm and the OpenAI provider now consume it; removes the duplicated local copy in openai/core)
Add unit tests across all five providers (56 tests)

Type of Change

Improvement

Testing

bunx vitest run on the five provider suites — 56/56 passing. Behavior verified against vLLM and OpenAI live API docs. (Note: tsc is not installed in this workspace, so no type-check was run.)

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

vercel · 2026-05-29T20:03:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		May 29, 2026 8:35pm

cursor · 2026-05-29T20:03:50Z

PR Summary

Medium Risk
Changes affect live LLM request payloads (tool_choice, deferred response_format, schema strictness) across multiple providers; regressions could break tool+JSON workflows or streaming, though behavior is heavily covered by new unit tests.

Overview
This PR hardens OpenAI-compatible Chat Completions providers (Fireworks, LiteLLM, Ollama, OpenRouter, vLLM) around tool loops, streaming, structured output, and errors, and adds broad Vitest coverage (~56 tests) for those paths.

Shared behavior: enforceStrictSchema moves to providers/utils (replacing the duplicate in openai/core) so strict json_schema requests satisfy backends that require additionalProperties: false and full required lists—LiteLLM now uses it when strict is on.

Tool + stream fixes: Post–tool-loop streaming calls use tool_choice: 'none' (Fireworks, LiteLLM, vLLM) so the model does not re-invoke tools on the final answer. Ollama drops tools / tool_choice from that final stream payload. vLLM routes forced-tool tracking through the shared checkForForcedToolUsage helper instead of an inline copy.

Structured output: Fireworks stops sending strict on native json_schema. LiteLLM defers response_format until after the tool loop when tools are active (avoids tool + schema conflicts), applies it on a final call or stream with parallel_tool_calls: false, and forwards reasoning_effort when not auto. Ollama only strips ```json fences when a responseFormat is set (plain text keeps fences).

Tool loop robustness (LiteLLM): Safer parsing of empty tool arguments, stub tool messages for unknown tool names, name on tool results, and stricter success handling for tool outputs. OpenRouter only records toolResults when execution succeeded and output is present.

Errors: Ollama maps OpenAI.APIError into clearer ProviderError / logging; LiteLLM improves proxy error envelope handling.

^{Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here.}

greptile-apps · 2026-05-29T20:10:02Z

Greptile Summary

This PR hardens five OpenAI-compatible providers (vLLM, Ollama, OpenRouter, Fireworks, LiteLLM) for tool calling, streaming, and structured output, and adds 56 unit tests. Key improvements include tool_choice:'none' on post-tool streaming calls to prevent silent tool-loop re-entry, deferred response_format past the tool loop in LiteLLM and Fireworks, stub tool-response messages for unknown tool IDs in LiteLLM, and extraction of enforceStrictSchema into shared utils.

LiteLLM received the most thorough hardening: deferred response_format, tool_choice:'none' on both streaming and non-streaming final calls, stub messages for unmatched tool call IDs, max_completion_tokens, and reasoning_effort passthrough.
Fireworks and vLLM had their post-tool streaming calls changed to tool_choice:'none', and vLLM's forced-tool tracking was consolidated onto the shared helper.
OpenRouter's streaming final call still uses tool_choice:'auto' (the same fix applied to fireworks/litellm was missed here), and both vLLM and Ollama still call JSON.parse(toolCall.function.arguments) without the empty-string guard that LiteLLM received in this PR.

Confidence Score: 4/5

Safe to merge with the OpenRouter streaming path fix outstanding — the bug only affects callers that use both streaming and structured output with active tools on OpenRouter.

The OpenRouter streaming final call retains tool_choice:'auto' while the rest of the payload still carries the full tool list, meaning the model can emit another round of tool calls on what is supposed to be the final content-generating call. The bug is scoped to users who combine streaming, tools, and responseFormat on the OpenRouter provider, but it can produce empty or stale streamed content with no error surfaced to the caller.

apps/sim/providers/openrouter/index.ts — the post-tool-loop streaming call at line 473 needs tool_choice:'none' to match the fix applied to fireworks and litellm in this PR.

Important Files Changed

Filename	Overview
apps/sim/providers/openrouter/index.ts	Tool-output recording correctly guarded; streaming final call still uses tool_choice:'auto' allowing the model to emit more tool calls instead of the structured answer.
apps/sim/providers/vllm/index.ts	tool_choice:'none' added to streaming final call; JSON.parse still lacks empty-string guard (caught by try-catch but incorrectly marks valid tool calls as failures).
apps/sim/providers/ollama/index.ts	Structured APIError extraction and conditional fence-stripping are correct; JSON.parse lacks empty-string guard, same as vllm.
apps/sim/providers/litellm/index.ts	Comprehensive hardening: deferred response_format past tool loop, tool_choice:'none' on both streaming/non-streaming final calls, empty-args guard, stub messages for unknown tool IDs, max_completion_tokens, reasoning_effort passthrough.
apps/sim/providers/fireworks/index.ts	Removes strict flag from json_schema (Fireworks doesn't support it), sets tool_choice:'none' on streaming final call, removes dead code.
apps/sim/providers/utils.ts	Extracts enforceStrictSchema into shared utils with correct recursive handling of object, array, anyOf/oneOf/allOf, and $defs/$definitions; now consumed by both litellm and openai/core.
apps/sim/providers/openrouter/utils.ts	Now delegates streaming and forced-tool-usage helpers to shared providers/utils, removing duplication.
apps/sim/providers/vllm/utils.ts	Delegates to shared OpenAI-compatible stream and forced-tool-usage utilities, no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Provider executeRequest] --> B{Has tools?}
    B -- No --> C{stream?}
    C -- Yes --> D[Direct streaming call]
    C -- No --> E[Single completion call]
    B -- Yes --> F[Tool loop: up to MAX_TOOL_ITERATIONS]
    F --> G{Response has tool_calls?}
    G -- Yes --> H[Execute tools in parallel]
    H --> I[Push assistant + tool messages]
    I --> J{Unknown tool ID?}
    J -- litellm only --> K[Push stub error message]
    J -- vllm/ollama --> L[No message pushed]
    K --> F
    L --> F
    G -- No --> M{stream?}
    M -- Yes --> N{Provider}
    N -- fireworks/vllm/litellm --> O[tool_choice: none]
    N -- openrouter --> P[tool_choice: auto]
    O --> Q[Final streaming response]
    P --> Q
    M -- No --> R{responseFormat?}
    R -- litellm/fireworks --> S[Deferred final call with response_format]
    R -- openrouter --> T[Fresh finalPayload without tools]
    S --> U[Return ProviderResponse]
    T --> U
    E --> U
    D --> V[Return StreamingExecution]
    Q --> V

_{Reviews (3): Last reviewed commit: "chore(providers/ollama): drop orphaned e..." | Re-trigger Greptile}

… partial success

The deferred final call used tool_choice 'auto', so the model could emit another tool_calls round instead of the structured answer, leaving content stale. Use 'none' (matching vLLM/Fireworks) on both the streaming and non-streaming final calls so the model must return the structured response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

waleedlatif1 · 2026-05-29T20:20:41Z

@greptile

waleedlatif1 · 2026-05-29T20:20:45Z

@cursor review

Ollama ignores tool_choice (not in its supported fields), so vLLM/Fireworks' tool_choice:'none' guard is a no-op here. Omit tools from the final streaming payload instead so the summarization turn can't emit dropped tool calls. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…fort carries over The non-streaming deferred finalPayload hand-picked fields and dropped reasoning_effort (and any future payload field), diverging from the streaming path which spreads ...payload. Spread payload here too for consistency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Keeps parity with sibling Chat Completions providers (cerebras/mistral/xai). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Restore the TSDoc blocks on supportsNativeStructuredOutputs, createReadableStreamFromOpenAIStream, and checkForForcedToolUsage — TSDoc is the codebase documentation standard and should not have been stripped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The block documented a function that now lives in trace-enrichment.ts, so it documents nothing in this file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

waleedlatif1 · 2026-05-29T20:36:25Z

@greptile

waleedlatif1 · 2026-05-29T20:36:28Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here.}

improvement(providers): harden OpenAI-compatible providers + add tests

8988ac7

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread apps/sim/providers/litellm/index.ts

Comment thread apps/sim/providers/vllm/index.ts Outdated

cursor Bot reviewed May 29, 2026

View reviewed changes

Comment thread apps/sim/providers/vllm/index.ts Outdated

fix(vllm): let tool-loop errors propagate instead of returning silent…

4e286a2

… partial success

vercel Bot temporarily deployed to Preview May 29, 2026 20:14 Inactive

vercel Bot temporarily deployed to Preview May 29, 2026 20:16 Inactive

cursor Bot reviewed May 29, 2026

View reviewed changes

Comment thread apps/sim/providers/litellm/index.ts Outdated

vercel Bot temporarily deployed to Preview May 29, 2026 20:29 Inactive

vercel Bot temporarily deployed to Preview May 29, 2026 20:31 Inactive

chore(providers/ollama): restore enrichment TSDoc block

683c105

Keeps parity with sibling Chat Completions providers (cerebras/mistral/xai). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot temporarily deployed to Preview May 29, 2026 20:32 Inactive

vercel Bot temporarily deployed to Preview May 29, 2026 20:34 Inactive

chore(litellm): remove inline rationale comments (codebase uses TSDoc)

7d56656

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot temporarily deployed to Preview May 29, 2026 20:34 Inactive

chore(providers/ollama): drop orphaned enrichment TSDoc

0be3ca2

The block documented a function that now lives in trace-enrichment.ts, so it documents nothing in this file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot temporarily deployed to Preview May 29, 2026 20:35 Inactive

cursor Bot reviewed May 29, 2026

View reviewed changes

waleedlatif1 merged commit e533f1b into staging May 29, 2026
14 checks passed

waleedlatif1 deleted the waleedlatif1/vllm-provider-audit branch May 29, 2026 21:22

waleedlatif1 mentioned this pull request May 29, 2026

v0.6.96: pinned table columns, sequence number in copilot messages, tables UI improvements, new slack scopes, model-level denylists, object storage tracespans #4798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(providers): harden OpenAI-compatible providers + add tests#4796

improvement(providers): harden OpenAI-compatible providers + add tests#4796
waleedlatif1 merged 9 commits into
stagingfrom
waleedlatif1/vllm-provider-audit

waleedlatif1 commented May 29, 2026

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented May 29, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

greptile-apps Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

waleedlatif1 commented May 29, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 29, 2026 •

edited

Loading

cursor Bot commented May 29, 2026 •

edited

Loading

greptile-apps Bot commented May 29, 2026 •

edited

Loading