fix(streaming): comply with OpenAI usage / stream_options spec#9815
Merged
Conversation
LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed
chunk because `OpenAIResponse.Usage` was a value type without
`omitempty`. The official OpenAI Node SDK and its consumers
(continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue)
filter on a truthy `result.usage` to detect the trailing usage chunk;
LocalAI's zero-but-non-null usage on every intermediate chunk made
that filter swallow every content chunk and surface an empty chat
response while the server log looked successful.
Changes:
- `core/schema/openai.go`: `Usage *OpenAIUsage \`json:"usage,omitempty"\``
so intermediate chunks no longer carry a `usage` key. Add
`OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's
request field.
- `core/http/endpoints/openai/chat.go` and `completion.go`: keep using
the `Usage` struct field as an in-process channel for the running
cumulative, but strip it before JSON marshalling. When the request
set `stream_options.include_usage: true`, emit a dedicated trailing
chunk with `"choices": []` and the populated usage (matching the
OpenAI spec and llama.cpp's server behavior).
- `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the
`usage` parameter from `buildNoActionFinalChunks` since chunks no
longer carry usage.
- Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage
values with `&` for the new pointer field.
- UI: send `stream_options:{include_usage:true}` from the React
(`useChat.js`) and legacy (`static/chat.js`) chat clients so the
token-count badge keeps populating now that the server is
spec-compliant.
Tests:
- New `chat_stream_usage_test.go` pins the spec invariants:
intermediate chunks have no `usage` key, the trailer JSON has
`"choices":[]` and a populated `usage`, and `OpenAIRequest` parses
`stream_options.include_usage`.
- Update `chat_emit_test.go` to reflect that finals no longer embed
usage.
Verified against the live LocalAI instance: before the fix Continue's
filter logic swallowed 16/16 token chunks; with the new shape it
yields 4/5 and routes usage through the dedicated trailer chunk.
Fixes #8546
Assisted-by: Claude:opus-4.7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The new spec-compliant `stream_options.include_usage` trailer writes were flagged by errcheck since they're new code (golangci-lint runs new-from-merge-base on master); the surrounding `fmt.Fprintf` data: writes are grandfathered. Drop the return values explicitly to match the linter's contract without adding a nolint shim. Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #8546 and continuedev/continue#10113.
LocalAI emitted
"usage":{"prompt_tokens":0,...}on every streamed chunk becauseOpenAIResponse.Usagewas a value type withoutomitempty. Consumers of the official OpenAI Node SDK (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthyresult.usageto detect the trailing usage chunk — so the zero-but-non-null usage on every intermediate chunk made them swallow every content chunk and surface an empty chat response while the server log looked successful.I confirmed the bug against a live LocalAI instance by replaying Continue's filter logic — 16/16 content chunks were swallowed.
Changes
core/schema/openai.go—Usage *OpenAIUsagewithomitemptyso intermediate chunks no longer carry ausagekey. AddOpenAIRequest.StreamOptionsmirroring OpenAI's request field withinclude_usage.core/http/endpoints/openai/chat.go,completion.go— keepUsageon the struct as an in-process channel for the running cumulative, but strip it before JSON marshal. When the request setstream_options.include_usage: true, emit a dedicated trailing chunk with"choices": []and the populated usage — matching the OpenAI spec and llama.cpp's server.chat_emit.go— newstreamUsageTrailerJSONhelper; drop theusageparameter frombuildNoActionFinalChunkssince chunks no longer carry usage.image.go,inpainting.go,edit.go— wrapUsagevalues with&for the new pointer field.stream_options:{include_usage:true}from the React (core/http/react-ui/src/hooks/useChat.js) and legacy (core/http/static/chat.js) chat clients so the token-count badge keeps populating now that the server is spec-compliant.How LocalAI compares now
usageon every chunk{0,0,0})stream_options.include_usagechoices:[]trailer chunkTests
chat_stream_usage_test.go(Ginkgo) pins the spec invariants:usagekey"choices":[]and a populatedusageOpenAIRequestparsesstream_options.include_usagechat_emit_test.goupdated to reflect that finals no longer embed usage.Continue's filter logic, replayed against the new shape: 4/5 chunks yielded, 1 trailer carrying the usage totals — exactly what the OpenAI SDK expects.
Test plan
go test ./core/schema/ ./core/http/endpoints/openai/ ./core/http/middleware/(passing locally)🤖 Generated with Claude Code