Bug Description
Text transcripts are getting cut off mid-sentence. The LLM generates complete responses but users only see partial text on the frontend.
For example, text ending with "...what do you consider to be the distinctive characteristics?" gets truncated to "...what do you consider to be the distinctive"
Production logs show the LLM generated 475 tokens but the frontend didn't receive all of them.
Expected Behavior
All generated text should be delivered to the frontend without truncation. If the LLM generates 475 tokens, the frontend should receive all 475 tokens worth of text, including the complete final sentence.
Reproduction Steps
# 1. Start an agent session with text transcription enabled
# 2. Have the agent generate a response (e.g., initial greeting)
# 3. Check the agent logs - you'll see the full token count
# 4. Check what the user sees - text is truncated
# Example from our logs:
# Realtime ttft=1.21s duration=6.18s tokens=0in/475out tokens/s=76.9
# ^ Agent generated 475 tokens but user didn't get the last few words
Important: This issue is more evident with increased network latency. If your LiveKit server or deployment has higher latency (e.g., cross-region, Azure deployments), the race window widens and truncation happens more consistently.
Operating System
Linux (Docker)
Models Used
- LLM: Azure OpenAI (via Azure AI Foundry)
- TTS: Azure TTS
- Using both Voice Live and traditional pipeline
Package Versions
livekit-agents==1.3.12
livekit==0.17.4
Python==3.13
Proposed Solution
The issue is in _ParticipantStreamTranscriptionOutput.flush() in _output.py. Currently it's not async but creates a background task:
def flush(self) -> None:
# ...
self._flush_atask = asyncio.create_task(self._flush_task(curr_writer))
# Returns immediately without waiting!
Fix: Make flush() async and await the task:
async def flush(self) -> None:
# ...
await self._flush_task(curr_writer) # Wait for it
This would require updating all callers to await flush().
Additional Context
Why This Happens
The _flush_task() performs await writer.aclose() which is network I/O to the LiveKit server. Our testing shows this takes about 50-60ms in low-latency environments. But with higher latency (e.g., cross-region deployments, cloud-to-cloud communication), this can take 100ms or more.
The race window = time between flush() returning and _flush_atask completing
Higher latency → Longer race window → More likely to hit the race condition
This explains why some deployments see consistent truncation while others might not notice it.
Workaround
For now we're waiting for _flush_atask manually after wait_for_playout():
await speech_handle.wait_for_playout()
if hasattr(session.output.transcription, '_flush_atask'):
if session.output.transcription._flush_atask:
await session.output.transcription._flush_atask
This fixes the issue but requires accessing private SDK attributes.
Related Issues
This was reported before in #4817 but was closed with the comment "flush is called after all text captured so the race condition shouldn't happen." While flush IS called after text capture, the problem is that flush() doesn't wait for the actual flush work (writer.aclose()) to complete before returning.
Bug Description
Text transcripts are getting cut off mid-sentence. The LLM generates complete responses but users only see partial text on the frontend.
For example, text ending with "...what do you consider to be the distinctive characteristics?" gets truncated to "...what do you consider to be the distinctive"
Production logs show the LLM generated 475 tokens but the frontend didn't receive all of them.
Expected Behavior
All generated text should be delivered to the frontend without truncation. If the LLM generates 475 tokens, the frontend should receive all 475 tokens worth of text, including the complete final sentence.
Reproduction Steps
Important: This issue is more evident with increased network latency. If your LiveKit server or deployment has higher latency (e.g., cross-region, Azure deployments), the race window widens and truncation happens more consistently.
Operating System
Linux (Docker)
Models Used
Package Versions
Proposed Solution
The issue is in
_ParticipantStreamTranscriptionOutput.flush()in_output.py. Currently it's not async but creates a background task:Fix: Make
flush()async and await the task:This would require updating all callers to
await flush().Additional Context
Why This Happens
The
_flush_task()performsawait writer.aclose()which is network I/O to the LiveKit server. Our testing shows this takes about 50-60ms in low-latency environments. But with higher latency (e.g., cross-region deployments, cloud-to-cloud communication), this can take 100ms or more.The race window = time between flush() returning and _flush_atask completing
Higher latency → Longer race window → More likely to hit the race condition
This explains why some deployments see consistent truncation while others might not notice it.
Workaround
For now we're waiting for
_flush_ataskmanually afterwait_for_playout():This fixes the issue but requires accessing private SDK attributes.
Related Issues
This was reported before in #4817 but was closed with the comment "flush is called after all text captured so the race condition shouldn't happen." While flush IS called after text capture, the problem is that flush() doesn't wait for the actual flush work (writer.aclose()) to complete before returning.