fix(streaming): comply with OpenAI usage / stream_options spec by localai-bot · Pull Request #9815 · mudler/LocalAI

localai-bot · 2026-05-13T23:13:01Z

Summary

Closes #8546 and continuedev/continue#10113.

LocalAI emitted "usage":{"prompt_tokens":0,...} on every streamed chunk because OpenAIResponse.Usage was a value type without omitempty. Consumers of the official OpenAI Node SDK (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthy result.usage to detect the trailing usage chunk — so the zero-but-non-null usage on every intermediate chunk made them swallow every content chunk and surface an empty chat response while the server log looked successful.

I confirmed the bug against a live LocalAI instance by replaying Continue's filter logic — 16/16 content chunks were swallowed.

Changes

core/schema/openai.go — Usage *OpenAIUsage with omitempty so intermediate chunks no longer carry a usage key. Add OpenAIRequest.StreamOptions mirroring OpenAI's request field with include_usage.
core/http/endpoints/openai/chat.go, completion.go — keep Usage on the struct as an in-process channel for the running cumulative, but strip it before JSON marshal. When the request set stream_options.include_usage: true, emit a dedicated trailing chunk with "choices": [] and the populated usage — matching the OpenAI spec and llama.cpp's server.
chat_emit.go — new streamUsageTrailerJSON helper; drop the usage parameter from buildNoActionFinalChunks since chunks no longer carry usage.
image.go, inpainting.go, edit.go — wrap Usage values with & for the new pointer field.
UI — send stream_options:{include_usage:true} from the React (core/http/react-ui/src/hooks/useChat.js) and legacy (core/http/static/chat.js) chat clients so the token-count badge keeps populating now that the server is spec-compliant.

How LocalAI compares now

	Before	After
`usage` on every chunk	yes (`{0,0,0}`)	no
Honors `stream_options.include_usage`	no	yes
Final `choices:[]` trailer chunk	no	yes (when opted in)
Continue / OpenAI-SDK consumers	drops all content	works
llama.cpp server parity	no	yes

Tests

New chat_stream_usage_test.go (Ginkgo) pins the spec invariants:
- intermediate chunks have no usage key
- the trailer JSON has "choices":[] and a populated usage
- OpenAIRequest parses stream_options.include_usage
chat_emit_test.go updated to reflect that finals no longer embed usage.

Continue's filter logic, replayed against the new shape: 4/5 chunks yielded, 1 trailer carrying the usage totals — exactly what the OpenAI SDK expects.

Test plan

CI: go test ./core/schema/ ./core/http/endpoints/openai/ ./core/http/middleware/ (passing locally)
Manual: stream a chat completion from VS Code + Continue against a build of this branch; UI should show streamed content (was empty before)
Manual: confirm React UI token-count badge still populates after the change

🤖 Generated with Claude Code

LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed chunk because `OpenAIResponse.Usage` was a value type without `omitempty`. The official OpenAI Node SDK and its consumers (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthy `result.usage` to detect the trailing usage chunk; LocalAI's zero-but-non-null usage on every intermediate chunk made that filter swallow every content chunk and surface an empty chat response while the server log looked successful. Changes: - `core/schema/openai.go`: `Usage *OpenAIUsage \`json:"usage,omitempty"\`` so intermediate chunks no longer carry a `usage` key. Add `OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's request field. - `core/http/endpoints/openai/chat.go` and `completion.go`: keep using the `Usage` struct field as an in-process channel for the running cumulative, but strip it before JSON marshalling. When the request set `stream_options.include_usage: true`, emit a dedicated trailing chunk with `"choices": []` and the populated usage (matching the OpenAI spec and llama.cpp's server behavior). - `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the `usage` parameter from `buildNoActionFinalChunks` since chunks no longer carry usage. - Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage values with `&` for the new pointer field. - UI: send `stream_options:{include_usage:true}` from the React (`useChat.js`) and legacy (`static/chat.js`) chat clients so the token-count badge keeps populating now that the server is spec-compliant. Tests: - New `chat_stream_usage_test.go` pins the spec invariants: intermediate chunks have no `usage` key, the trailer JSON has `"choices":[]` and a populated `usage`, and `OpenAIRequest` parses `stream_options.include_usage`. - Update `chat_emit_test.go` to reflect that finals no longer embed usage. Verified against the live LocalAI instance: before the fix Continue's filter logic swallowed 16/16 token chunks; with the new shape it yields 4/5 and routes usage through the dedicated trailer chunk. Fixes #8546 Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The new spec-compliant `stream_options.include_usage` trailer writes were flagged by errcheck since they're new code (golangci-lint runs new-from-merge-base on master); the surrounding `fmt.Fprintf` data: writes are grandfathered. Drop the return values explicitly to match the linter's contract without adding a nolint shim. Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 2 commits May 13, 2026 23:12

mudler merged commit 8af963b into master May 14, 2026
57 checks passed

mudler deleted the fix/streaming-usage-spec-8546 branch May 14, 2026 06:53

BrewTestBot mentioned this pull request May 16, 2026

localai 4.2.5 Homebrew/homebrew-core#283184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(streaming): comply with OpenAI usage / stream_options spec#9815

fix(streaming): comply with OpenAI usage / stream_options spec#9815
mudler merged 2 commits into
masterfrom
fix/streaming-usage-spec-8546

localai-bot commented May 13, 2026 •

edited by mudler

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 13, 2026 • edited by mudler Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How LocalAI compares now

Tests

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

localai-bot commented May 13, 2026 •

edited by mudler

Loading