realtime: honor output_modalities to skip TTS in text-only mode by localai-bot · Pull Request #9838 · mudler/LocalAI

localai-bot · 2026-05-15T10:20:46Z

Summary

The emulated realtime pipeline previously ignored the OpenAI Realtime spec field output_modalities — the field was declared on RealtimeSession and Response but never read, so the server always ran the TTS step and emitted response.output_audio.* events.

This PR gates the audio block in core/http/endpoints/openai/realtime.go on the resolved modalities. When a client requests ["text"] (session-level or per-response via response.create), the server emits response.output_text.delta + response.output_text.done with finalSpeech and skips TTS entirely.

This enables thin clients that want to use the realtime WebSocket for VAD + STT + LLM + tool-call parsing while running their own TTS pipeline (e.g., for client-side caching).

Changes

realtime.go: two new helpers resolveOutputModalities(session, response) and modalitiesContainAudio(m). The TTS / ResponseOutputAudio* block (lines ~1657-1755) is wrapped in an if modalitiesContainAudio(modalities) branch; the else branch emits the text events.
Plumbing: OutputModalities []types.Modality added on the local Session struct (mirrors MaxOutputTokens pattern), copied from SessionUpdate in updateSession, echoed back in the session.update server response, and resolved against overrides.OutputModalities from response.create.
realtime_modality_test.go: new Ginkgo spec, 6 cases covering default-to-audio, session-level text-only, response-level override, and modalitiesContainAudio truth table.
lint fix: pre-existing defer os.Remove(audioFilePath) rewritten as defer func() { _ = os.Remove(audioFilePath) }() to satisfy errcheck (the block's now inside the gated branch).

Test plan

go test ./core/http/endpoints/openai/ — all 90 specs pass.
go vet clean.
Manual: connect a WS client with session.update {output_modalities: ["text"]}, send audio, confirm only response.output_text.* events arrive (no response.output_audio.*).
Manual: same with default ["audio"] — confirm existing audio-mode behavior is unchanged.

Notes

Audio-mode behavior is preserved byte-for-byte (the gated block contents are unmodified).
Only the emulated pipeline is affected. Native any-to-any audio models (FLAG_REALTIME_AUDIO) use a different code path.
WebRTC and WebSocket transports both honor the gate.

The emulated realtime pipeline previously ignored the OpenAI Realtime spec field output_modalities and always synthesized TTS. Add resolveOutputModalities + modalitiesContainAudio helpers and gate the TTS / ResponseOutputAudio* emission so a client requesting ["text"] gets only ResponseOutputText* events. This lets thin clients (e.g. thing5-poc) cache TTS on the client side while still using the realtime WS for VAD + STT + LLM + tool-call parsing. Assisted-by: Claude:claude-opus-4-7

Follow-up to the previous commit: - Resolve response.create's output_modalities at the gate so a per-response override of an audio session is honored (the test asserted this contract but the production call site was passing nil). - Mirror OutputModalities in the RealtimeSession echo so session.update round-trips the client-supplied value, matching MaxOutputTokens's pattern. Assisted-by: Claude:claude-opus-4-7

CI's errcheck flagged the pre-existing `defer os.Remove(audioFilePath)` inside the audio-emission block (now wrapped by the modality gate). Wrap the call in a closure that explicitly discards the error — the canonical Go pattern for "I want to defer a cleanup whose error I genuinely don't care about." Assisted-by: Claude:claude-opus-4-7 golangci-lint

richiejp · 2026-05-15T10:35:00Z

Looks good!

mudler force-pushed the feat/realtime-honor-output-modalities branch from e5dd7b4 to 8db5a54 Compare May 15, 2026 10:24

mudler added the bug Something isn't working label May 15, 2026

mudler added 3 commits May 15, 2026 10:31

mudler force-pushed the feat/realtime-honor-output-modalities branch from f3ef553 to 49027ee Compare May 15, 2026 10:32

mudler merged commit a39591f into master May 15, 2026
57 checks passed

mudler deleted the feat/realtime-honor-output-modalities branch May 15, 2026 10:39

BrewTestBot mentioned this pull request May 16, 2026

localai 4.2.5 Homebrew/homebrew-core#283184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

realtime: honor output_modalities to skip TTS in text-only mode#9838

realtime: honor output_modalities to skip TTS in text-only mode#9838
mudler merged 3 commits into
masterfrom
feat/realtime-honor-output-modalities

localai-bot commented May 15, 2026 •

edited

Loading

Uh oh!

richiejp commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

localai-bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Notes

Uh oh!

richiejp commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

localai-bot commented May 15, 2026 •

edited

Loading