Skip to content

improvement(providers): harden OpenAI-compatible providers + add tests#4796

Merged
waleedlatif1 merged 9 commits into
stagingfrom
waleedlatif1/vllm-provider-audit
May 29, 2026
Merged

improvement(providers): harden OpenAI-compatible providers + add tests#4796
waleedlatif1 merged 9 commits into
stagingfrom
waleedlatif1/vllm-provider-audit

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Harden the OpenAI-compatible Chat Completions providers (vllm, ollama, openrouter, fireworks, litellm) for tool calling, streaming, structured output, and error handling — validated against vLLM/OpenAI live API docs
  • vLLM: consolidate forced-tool tracking onto the shared helper; wrap the tool loop in try/catch so a mid-loop failure returns partial results; set tool_choice: 'none' on the post-tool streaming call (prevents silently-dropped tool calls and an --enable-auto-tool-choice 400); remove dead code
  • fireworks: tool_choice: 'none' on the final streaming call; clarify native json_schema vs json_object fallback
  • litellm: use shared enforceStrictSchema; defer response_format past the tool loop (response_format + tools conflict on some backends); add reasoning_effort passthrough; use max_completion_tokens
  • ollama: only strip ```json fences when a response format is requested; structured error extraction from OpenAI.APIError (type/code/status)
  • openrouter: only record tool output when the call succeeded and returned output
  • providers/utils + openai/core: extract enforceStrictSchema into the shared utils (litellm and the OpenAI provider now consume it; removes the duplicated local copy in openai/core)
  • Add unit tests across all five providers (56 tests)

Type of Change

  • Improvement

Testing

bunx vitest run on the five provider suites — 56/56 passing. Behavior verified against vLLM and OpenAI live API docs. (Note: tsc is not installed in this workspace, so no type-check was run.)

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 29, 2026 8:35pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 29, 2026

PR Summary

Medium Risk
Changes affect live LLM request payloads (tool_choice, deferred response_format, schema strictness) across multiple providers; regressions could break tool+JSON workflows or streaming, though behavior is heavily covered by new unit tests.

Overview
This PR hardens OpenAI-compatible Chat Completions providers (Fireworks, LiteLLM, Ollama, OpenRouter, vLLM) around tool loops, streaming, structured output, and errors, and adds broad Vitest coverage (~56 tests) for those paths.

Shared behavior: enforceStrictSchema moves to providers/utils (replacing the duplicate in openai/core) so strict json_schema requests satisfy backends that require additionalProperties: false and full required lists—LiteLLM now uses it when strict is on.

Tool + stream fixes: Post–tool-loop streaming calls use tool_choice: 'none' (Fireworks, LiteLLM, vLLM) so the model does not re-invoke tools on the final answer. Ollama drops tools / tool_choice from that final stream payload. vLLM routes forced-tool tracking through the shared checkForForcedToolUsage helper instead of an inline copy.

Structured output: Fireworks stops sending strict on native json_schema. LiteLLM defers response_format until after the tool loop when tools are active (avoids tool + schema conflicts), applies it on a final call or stream with parallel_tool_calls: false, and forwards reasoning_effort when not auto. Ollama only strips ```json fences when a responseFormat is set (plain text keeps fences).

Tool loop robustness (LiteLLM): Safer parsing of empty tool arguments, stub tool messages for unknown tool names, name on tool results, and stricter success handling for tool outputs. OpenRouter only records toolResults when execution succeeded and output is present.

Errors: Ollama maps OpenAI.APIError into clearer ProviderError / logging; LiteLLM improves proxy error envelope handling.

Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR hardens five OpenAI-compatible providers (vLLM, Ollama, OpenRouter, Fireworks, LiteLLM) for tool calling, streaming, and structured output, and adds 56 unit tests. Key improvements include tool_choice:'none' on post-tool streaming calls to prevent silent tool-loop re-entry, deferred response_format past the tool loop in LiteLLM and Fireworks, stub tool-response messages for unknown tool IDs in LiteLLM, and extraction of enforceStrictSchema into shared utils.

  • LiteLLM received the most thorough hardening: deferred response_format, tool_choice:'none' on both streaming and non-streaming final calls, stub messages for unmatched tool call IDs, max_completion_tokens, and reasoning_effort passthrough.
  • Fireworks and vLLM had their post-tool streaming calls changed to tool_choice:'none', and vLLM's forced-tool tracking was consolidated onto the shared helper.
  • OpenRouter's streaming final call still uses tool_choice:'auto' (the same fix applied to fireworks/litellm was missed here), and both vLLM and Ollama still call JSON.parse(toolCall.function.arguments) without the empty-string guard that LiteLLM received in this PR.

Confidence Score: 4/5

Safe to merge with the OpenRouter streaming path fix outstanding — the bug only affects callers that use both streaming and structured output with active tools on OpenRouter.

The OpenRouter streaming final call retains tool_choice:'auto' while the rest of the payload still carries the full tool list, meaning the model can emit another round of tool calls on what is supposed to be the final content-generating call. The bug is scoped to users who combine streaming, tools, and responseFormat on the OpenRouter provider, but it can produce empty or stale streamed content with no error surfaced to the caller.

apps/sim/providers/openrouter/index.ts — the post-tool-loop streaming call at line 473 needs tool_choice:'none' to match the fix applied to fireworks and litellm in this PR.

Important Files Changed

Filename Overview
apps/sim/providers/openrouter/index.ts Tool-output recording correctly guarded; streaming final call still uses tool_choice:'auto' allowing the model to emit more tool calls instead of the structured answer.
apps/sim/providers/vllm/index.ts tool_choice:'none' added to streaming final call; JSON.parse still lacks empty-string guard (caught by try-catch but incorrectly marks valid tool calls as failures).
apps/sim/providers/ollama/index.ts Structured APIError extraction and conditional fence-stripping are correct; JSON.parse lacks empty-string guard, same as vllm.
apps/sim/providers/litellm/index.ts Comprehensive hardening: deferred response_format past tool loop, tool_choice:'none' on both streaming/non-streaming final calls, empty-args guard, stub messages for unknown tool IDs, max_completion_tokens, reasoning_effort passthrough.
apps/sim/providers/fireworks/index.ts Removes strict flag from json_schema (Fireworks doesn't support it), sets tool_choice:'none' on streaming final call, removes dead code.
apps/sim/providers/utils.ts Extracts enforceStrictSchema into shared utils with correct recursive handling of object, array, anyOf/oneOf/allOf, and $defs/$definitions; now consumed by both litellm and openai/core.
apps/sim/providers/openrouter/utils.ts Now delegates streaming and forced-tool-usage helpers to shared providers/utils, removing duplication.
apps/sim/providers/vllm/utils.ts Delegates to shared OpenAI-compatible stream and forced-tool-usage utilities, no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Provider executeRequest] --> B{Has tools?}
    B -- No --> C{stream?}
    C -- Yes --> D[Direct streaming call]
    C -- No --> E[Single completion call]
    B -- Yes --> F[Tool loop: up to MAX_TOOL_ITERATIONS]
    F --> G{Response has tool_calls?}
    G -- Yes --> H[Execute tools in parallel]
    H --> I[Push assistant + tool messages]
    I --> J{Unknown tool ID?}
    J -- litellm only --> K[Push stub error message]
    J -- vllm/ollama --> L[No message pushed]
    K --> F
    L --> F
    G -- No --> M{stream?}
    M -- Yes --> N{Provider}
    N -- fireworks/vllm/litellm --> O[tool_choice: none]
    N -- openrouter --> P[tool_choice: auto]
    O --> Q[Final streaming response]
    P --> Q
    M -- No --> R{responseFormat?}
    R -- litellm/fireworks --> S[Deferred final call with response_format]
    R -- openrouter --> T[Fresh finalPayload without tools]
    S --> U[Return ProviderResponse]
    T --> U
    E --> U
    D --> V[Return StreamingExecution]
    Q --> V
Loading

Reviews (3): Last reviewed commit: "chore(providers/ollama): drop orphaned e..." | Re-trigger Greptile

Comment thread apps/sim/providers/litellm/index.ts
Comment thread apps/sim/providers/vllm/index.ts Outdated
Comment thread apps/sim/providers/vllm/index.ts Outdated
The deferred final call used tool_choice 'auto', so the model could emit
another tool_calls round instead of the structured answer, leaving content
stale. Use 'none' (matching vLLM/Fireworks) on both the streaming and
non-streaming final calls so the model must return the structured response.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/providers/litellm/index.ts Outdated
Ollama ignores tool_choice (not in its supported fields), so vLLM/Fireworks'
tool_choice:'none' guard is a no-op here. Omit tools from the final streaming
payload instead so the summarization turn can't emit dropped tool calls.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…fort carries over

The non-streaming deferred finalPayload hand-picked fields and dropped
reasoning_effort (and any future payload field), diverging from the streaming
path which spreads ...payload. Spread payload here too for consistency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keeps parity with sibling Chat Completions providers (cerebras/mistral/xai).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the TSDoc blocks on supportsNativeStructuredOutputs,
createReadableStreamFromOpenAIStream, and checkForForcedToolUsage —
TSDoc is the codebase documentation standard and should not have been
stripped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The block documented a function that now lives in trace-enrichment.ts, so it
documents nothing in this file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 0be3ca2. Configure here.

@waleedlatif1 waleedlatif1 merged commit e533f1b into staging May 29, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/vllm-provider-audit branch May 29, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant