-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Bug Description
When using AssemblyAISTT, transcription text on the client occasionally "jumps", the last speech bubble's text shrinks to just a few words, then reappears in full on the next update. STT and voice interaction work correctly; the issue is purely visual.
Example sequence on a single segment ID (all final: false):
- "I need you to look beyond the eyes and be honest with me" — correct, growing
- "And be honest with me." — text regresses to just the tail end
- "I need you to look beyond the eyes and be honest with me, always" — full text restored
Expected Behavior
Transcription text for a segment should only grow or be replaced by the final transcript. never shrink mid-utterance.
Reproduction Steps
- Set up a voice agent with AssemblyAISTT (streaming mode)
- Connect a frontend client that renders TranscriptionReceived segments by ID
- Speak continuously for 10+ seconds
- Observe the last speech bubble's text occasionally shrinking then restoring
Operating System
Linux (also observed on iOS/Safari client)
Models Used
AssemblyAI universal-streaming-multilingual
Package Versions
- livekit-agents[assemblyai] == 1.4.1
- livekit-client (JS) latest
Session/Room/Call IDs
No response
Proposed Solution
AssemblyAI Turn messages contain both a words array (cumulative) and an utterance field (chunk-based). It appears the plugin emits INTERIM_TRANSCRIPT from cumulative words and PREFLIGHT_TRANSCRIPT from the chunk-based utterance. AudioRecognition routes both through on_interim_transcript, and since the transcription uses replacement mode (is_delta_stream=False), the PREFLIGHT's chunk text overwrites the INTERIM's cumulative text on the same segment ID.
A length guard in the framework (skip updates where text shrinks for the same segment) would fix this generically. We're currently applying this workaround on the client side.
Additional Context
This is independent of format_turns, the utterance field is always chunk-based regardless of that setting.
Screenshots and Recordings
No response