feat(openai): stream input_audio_transcription delta events#5859
Merged
Conversation
Wire conversation.item.input_audio_transcription.delta from the OpenAI Realtime API as InputTranscriptionCompleted(is_final=False) partials. Accumulators are keyed per (item_id, content_index) and cleared on .completed, .deleted, session reconnect, and on .failed (which now emits a closing is_final=True when partials had streamed so consumers don't hang).
theomonnom
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports livekit/agents-js#1581 — wires
conversation.item.input_audio_transcription.deltafrom the OpenAI Realtime API so user transcripts surface word-by-word asInputTranscriptionCompleted(is_final=False)partials, instead of only firing once on.completed.Enables streaming user transcripts with
gpt-realtime-whisper(and any future delta-emitting transcription model). Previously the.deltabranch was apassbecause partials weren't useful from the legacy transcription pipeline; now that OpenAI streams them in realtime, we accumulate and emit.Changes
_input_transcript_accumulators: dict[str, dict[int, str]]keyed by(item_id, content_index)._handle_conversion_item_input_audio_transcription_deltahandler accumulates and emitsis_final=False._handle_..._completedclears the matching accumulator before emitting the final, so a subsequent delta on the sameitem_idstarts fresh._handle_..._failedemits a closingis_final=Truewith the last accumulated partial so consumers waiting on a final don't hang. No-op when no partials had streamed.conversation.item.deletedand session reconnect.