Skip to content

fix(streaming): comply with OpenAI usage / stream_options spec#9815

Merged
mudler merged 2 commits into
masterfrom
fix/streaming-usage-spec-8546
May 14, 2026
Merged

fix(streaming): comply with OpenAI usage / stream_options spec#9815
mudler merged 2 commits into
masterfrom
fix/streaming-usage-spec-8546

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

@localai-bot localai-bot commented May 13, 2026

Summary

Closes #8546 and continuedev/continue#10113.

LocalAI emitted "usage":{"prompt_tokens":0,...} on every streamed chunk because OpenAIResponse.Usage was a value type without omitempty. Consumers of the official OpenAI Node SDK (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthy result.usage to detect the trailing usage chunk — so the zero-but-non-null usage on every intermediate chunk made them swallow every content chunk and surface an empty chat response while the server log looked successful.

I confirmed the bug against a live LocalAI instance by replaying Continue's filter logic16/16 content chunks were swallowed.

Changes

  • core/schema/openai.goUsage *OpenAIUsage with omitempty so intermediate chunks no longer carry a usage key. Add OpenAIRequest.StreamOptions mirroring OpenAI's request field with include_usage.
  • core/http/endpoints/openai/chat.go, completion.go — keep Usage on the struct as an in-process channel for the running cumulative, but strip it before JSON marshal. When the request set stream_options.include_usage: true, emit a dedicated trailing chunk with "choices": [] and the populated usage — matching the OpenAI spec and llama.cpp's server.
  • chat_emit.go — new streamUsageTrailerJSON helper; drop the usage parameter from buildNoActionFinalChunks since chunks no longer carry usage.
  • image.go, inpainting.go, edit.go — wrap Usage values with & for the new pointer field.
  • UI — send stream_options:{include_usage:true} from the React (core/http/react-ui/src/hooks/useChat.js) and legacy (core/http/static/chat.js) chat clients so the token-count badge keeps populating now that the server is spec-compliant.

How LocalAI compares now

Before After
usage on every chunk yes ({0,0,0}) no
Honors stream_options.include_usage no yes
Final choices:[] trailer chunk no yes (when opted in)
Continue / OpenAI-SDK consumers drops all content works
llama.cpp server parity no yes

Tests

  • New chat_stream_usage_test.go (Ginkgo) pins the spec invariants:
    • intermediate chunks have no usage key
    • the trailer JSON has "choices":[] and a populated usage
    • OpenAIRequest parses stream_options.include_usage
  • chat_emit_test.go updated to reflect that finals no longer embed usage.

Continue's filter logic, replayed against the new shape: 4/5 chunks yielded, 1 trailer carrying the usage totals — exactly what the OpenAI SDK expects.

Test plan

  • CI: go test ./core/schema/ ./core/http/endpoints/openai/ ./core/http/middleware/ (passing locally)
  • Manual: stream a chat completion from VS Code + Continue against a build of this branch; UI should show streamed content (was empty before)
  • Manual: confirm React UI token-count badge still populates after the change

🤖 Generated with Claude Code

mudler added 2 commits May 13, 2026 23:12
LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed
chunk because `OpenAIResponse.Usage` was a value type without
`omitempty`. The official OpenAI Node SDK and its consumers
(continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue)
filter on a truthy `result.usage` to detect the trailing usage chunk;
LocalAI's zero-but-non-null usage on every intermediate chunk made
that filter swallow every content chunk and surface an empty chat
response while the server log looked successful.

Changes:

- `core/schema/openai.go`: `Usage *OpenAIUsage \`json:"usage,omitempty"\``
  so intermediate chunks no longer carry a `usage` key. Add
  `OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's
  request field.
- `core/http/endpoints/openai/chat.go` and `completion.go`: keep using
  the `Usage` struct field as an in-process channel for the running
  cumulative, but strip it before JSON marshalling. When the request
  set `stream_options.include_usage: true`, emit a dedicated trailing
  chunk with `"choices": []` and the populated usage (matching the
  OpenAI spec and llama.cpp's server behavior).
- `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the
  `usage` parameter from `buildNoActionFinalChunks` since chunks no
  longer carry usage.
- Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage
  values with `&` for the new pointer field.
- UI: send `stream_options:{include_usage:true}` from the React
  (`useChat.js`) and legacy (`static/chat.js`) chat clients so the
  token-count badge keeps populating now that the server is
  spec-compliant.

Tests:

- New `chat_stream_usage_test.go` pins the spec invariants:
  intermediate chunks have no `usage` key, the trailer JSON has
  `"choices":[]` and a populated `usage`, and `OpenAIRequest` parses
  `stream_options.include_usage`.
- Update `chat_emit_test.go` to reflect that finals no longer embed
  usage.

Verified against the live LocalAI instance: before the fix Continue's
filter logic swallowed 16/16 token chunks; with the new shape it
yields 4/5 and routes usage through the dedicated trailer chunk.

Fixes #8546

Assisted-by: Claude:opus-4.7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The new spec-compliant `stream_options.include_usage` trailer writes
were flagged by errcheck since they're new code (golangci-lint runs
new-from-merge-base on master); the surrounding `fmt.Fprintf` data:
writes are grandfathered. Drop the return values explicitly to match
the linter's contract without adding a nolint shim.

Assisted-by: Claude:opus-4.7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 8af963b into master May 14, 2026
57 checks passed
@mudler mudler deleted the fix/streaming-usage-spec-8546 branch May 14, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI API implementation is broken - shows up only for agentic coding

2 participants