feat(api/transcription): include segments + duration + language on stream done event by localai-bot · Pull Request #9709 · mudler/LocalAI

localai-bot · 2026-05-07T15:08:10Z

Summary

Extends the transcript.text.done SSE payload emitted by streamTranscription to additively carry language, duration, and a segments array so streaming clients can build the same TranscriptionResultSeconds shape they get from the non-streaming JSON path.

Why

Streaming clients (e.g. notetaker / notary) want streaming semantics for one big practical reason: the JSON path's ResponseHeaderTimeout trips when whisper requests queue behind each other on a SingleThread backend. streamTranscription fixes that by flushing 200 + headers immediately, but until now the done event only carried text — clients that needed per-utterance timings or audio duration had to fall back to the JSON path and accept the queue-induced timeouts again.

This change closes the loop: the streaming endpoint becomes a strict superset of the JSON endpoint's body, and clients can stay on SSE end-to-end.

Compatibility

OpenAI streaming spec: only text is mandated on transcript.text.done; spec-compliant clients ignore unknown fields.
Empty values are still omitted (Language == "", Duration == 0, len(Segments) == 0), so a transcription that came from a backend without these signals emits the same shape as before.
Wire format: start/end are float seconds (matching TranscriptionSegmentSeconds.Seconds() from the JSON path) so clients can share decoding logic.

Test plan

Manual: hit POST /v1/audio/transcriptions with stream=true against a model that emits segment timings (whisper-cpp). Verify the final data: {...} line before [DONE] includes segments and duration and matches the JSON-mode response for the same audio.
Manual: same request against a backend that doesn't produce segments. Verify the done event still includes text and the segments field is absent (omitempty preserved).
Existing non-streaming JSON path is untouched — no regression risk for spec-compliant clients.

Companion change

The notetaker repo's notary client switches its Transcribe to streaming and consumes these fields in the same shape as its existing JSON-path parser (internal/shared/localai/client.go::parseTranscriptionStream). Without this server change, the streaming client falls through to text-only results — so this PR completes the round trip but isn't a breaking dependency.

🤖 Generated with Claude Code

…ream done event streamTranscription previously emitted a done event with just `text`, matching the OpenAI streaming spec exactly. Streaming clients that need per-utterance timings or audio duration had to fall back to the non-streaming JSON path — and that path is exactly the one that trips on ResponseHeaderTimeout when whisper requests queue behind each other on a SingleThread backend. Extend the done event to additively carry `language`, `duration`, and a `segments` array (id, start, end, text — start/end as float seconds, matching TranscriptionSegmentSeconds). Empty / zero values are still omitted; spec-compliant clients ignore the new fields. This unblocks notary's streaming Transcribe (companion change in the notary repo) so it produces the same TranscriptionResult shape as the JSON path while sidestepping the queue-induced header timeouts. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]

mudler merged commit 595b6fd into master May 7, 2026
51 checks passed

mudler deleted the feat/stream-transcription-include-segments branch May 7, 2026 15:28

localai-bot added the enhancement New feature or request label May 9, 2026

BrewTestBot mentioned this pull request May 11, 2026

localai 4.2.0 Homebrew/homebrew-core#282016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api/transcription): include segments + duration + language on stream done event#9709

feat(api/transcription): include segments + duration + language on stream done event#9709
mudler merged 1 commit into
masterfrom
feat/stream-transcription-include-segments

localai-bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 7, 2026

Summary

Why

Compatibility

Test plan

Companion change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants