Skip to content

Conversation

Copy link

Copilot AI commented Nov 7, 2025

Fix SSE streaming format to comply with specification

  • Understand the current streaming implementation and identify issues
  • Fix SSE format in chat.go to explicitly use proper formatting
  • Fix SSE format in completion.go to explicitly use proper formatting
  • Ensure consistency across all streaming responses
  • Verify changes with format validation tests
  • Run code quality checks (go vet, gofmt)
  • Address code review feedback for error handling
  • Fix finish_reason to use pointer for proper null handling
  • Remove empty content from initial streaming chunks
  • Ensure proper stream termination with finish_reason
  • Improve code consistency
  • Move common finish reasons to constants

Complete Solution

This PR fixes all streaming response issues causing OpenAI API client failures:

SSE Format Fixes ✅

  1. Explicit SSE formatting: data: {json}\n\n with proper %s specifier
  2. Error message streaming: Proper JSON responses with correct SSE format
  3. Robust error handling: Explicit marshal error checking with fallbacks

OpenAI Compliance Fixes ✅

  1. Empty finish_reason: Changed to *string with omitempty

    • Streaming chunks: field omitted (null)
    • Final chunk: "finish_reason": "stop"
  2. Empty initial chunks: Initial message now only sends role

    • Before: {"delta": {"role": "assistant", "content": ""}}
    • After: {"delta": {"role": "assistant"}}
  3. Proper stream termination: Final chunk has empty delta with finish_reason

    • Before: {"delta": {"content": ""}, "finish_reason": "stop"}
    • After: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

Code Quality Improvements ✅

  1. Constants for finish reasons: Centralized constants prevent typos and improve maintainability
    • FinishReasonStop = "stop"
    • FinishReasonToolCalls = "tool_calls"
    • FinishReasonFunctionCall = "function_call"

Streaming Flow (OpenAI Compliant)

// 1. Initial chunk - only role
data: {"choices": [{"index": 0, "delta": {"role": "assistant"}}]}

// 2. Content chunks - no finish_reason field
data: {"choices": [{"index": 0, "delta": {"content": "Hello"}}]}
data: {"choices": [{"index": 0, "delta": {"content": " world"}}]}

// 3. Final chunk - empty delta with finish_reason
data: {"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Files Modified

  • core/http/endpoints/openai/constants.go: New file with finish reason constants
  • core/http/endpoints/openai/chat.go: Streaming fixes + use constants
  • core/http/endpoints/openai/completion.go: Streaming fixes + use constants
  • core/http/endpoints/openai/realtime.go: Consistency updates + use constants
  • core/schema/openai.go: FinishReason schema change

Validation

✅ Passes go vet and gofmt
✅ No security vulnerabilities (CodeQL)
✅ OpenAI API specification compliant
✅ Backward compatible for non-streaming responses

Fixes #7180

Original prompt

This section details on the original issue you should resolve

<issue_title>Address streaming responses incongruences</issue_title>
<issue_description>LocalAI version:
3.7.0

Environment, CPU architecture, OS, and Version:

Describe the bug
It seems that the way streaming is implemented in LocalAI breaks some clients, such as sst/opencode#3994 , and see also: #1194

To Reproduce
N/A yet

Expected behavior
Both opencode and other software using streaming token should work

Logs

Additional context

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@netlify
Copy link

netlify bot commented Nov 7, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 9c0c1ae
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/690e5fdc1892a8000879ceab
😎 Deploy Preview https://deploy-preview-7182--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI and others added 3 commits November 7, 2025 19:14
- Replace json.Encoder with json.Marshal for explicit formatting
- Use explicit \n\n for all SSE messages (instead of relying on implicit newlines)
- Change %v to %s format specifier for proper string formatting
- Fix error message streaming to include proper SSE format
- Ensure consistency between chat.go and completion.go endpoints

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
- Handle json.Marshal errors explicitly in error response paths
- Add fallback simple error message if marshal fails
- Prevents sending 'data: <nil>' on marshal failures
- Addresses code review feedback

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copilot AI changed the title [WIP] Address incongruences in streaming responses Fix SSE streaming format to comply with specification Nov 7, 2025
Copilot AI requested a review from mudler November 7, 2025 19:24
Copilot finished work on behalf of mudler November 7, 2025 19:24
@mudler
Copy link
Owner

mudler commented Nov 7, 2025

@copilot try to address additionally also these points:

  • Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null
  • Empty initial chunks: Sends empty content chunks that cause streaming clients to terminate early
  • Missing proper stream termination: Doesn't properly signal completion to OpenCode's processor

- Change FinishReason from string to *string in Choice schema
- Streaming chunks now omit finish_reason (null) instead of empty string
- Final chunks properly set finish_reason to "stop", "tool_calls", etc.
- Remove empty content from initial streaming chunks (only send role)
- Final streaming chunk sends empty delta with finish_reason
- Addresses OpenAI API compliance issues causing client failures

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copy link
Author

Copilot AI commented Nov 7, 2025

@copilot try to address additionally also these points:

  • Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null
    ...

All three issues have been addressed in commit deefcb0:

  1. Empty finish_reason: Changed schema to use *string with omitempty - streaming chunks now omit the field (null) while final chunks properly set it to "stop"

  2. Empty initial chunks: Initial message now only sends role without content field - removed Content: &textContentToReturn which was empty

  3. Missing proper stream termination: Final chunk now sends empty delta {} with finish_reason, followed by data: [DONE] per OpenAI spec

The streaming format now matches OpenAI's specification:

  • Initial: {"delta": {"role": "assistant"}}
  • Chunks: {"delta": {"content": "text"}} (no finish_reason)
  • Final: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

- Use consistent pattern: declare variable then take address
- Remove inline anonymous function for better readability
- Addresses code review feedback

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
}
log.Error().Msgf("Stream ended with error: %v", err)

stopReason := "stop"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot move common finish reasons to constants

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9c0c1ae. Created constants.go with:

  • FinishReasonStop = "stop"
  • FinishReasonToolCalls = "tool_calls"
  • FinishReasonFunctionCall = "function_call"

All string literals replaced with these constants across chat.go, completion.go, and realtime.go.

Copilot AI requested a review from mudler November 7, 2025 20:48
Copilot finished work on behalf of mudler November 7, 2025 20:48
- Create constants.go with FinishReasonStop, FinishReasonToolCalls, FinishReasonFunctionCall
- Replace all string literals with constants in chat.go, completion.go, realtime.go
- Improves code maintainability and prevents typos

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copilot finished work on behalf of mudler November 7, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address streaming responses incongruences

2 participants