Skip to content

Text transcripts truncated mid-sentence - flush() race condition #4854

@codeweft

Description

@codeweft

Bug Description

Text transcripts are getting cut off mid-sentence. The LLM generates complete responses but users only see partial text on the frontend.

For example, text ending with "...what do you consider to be the distinctive characteristics?" gets truncated to "...what do you consider to be the distinctive"

Production logs show the LLM generated 475 tokens but the frontend didn't receive all of them.

Expected Behavior

All generated text should be delivered to the frontend without truncation. If the LLM generates 475 tokens, the frontend should receive all 475 tokens worth of text, including the complete final sentence.

Reproduction Steps

# 1. Start an agent session with text transcription enabled
# 2. Have the agent generate a response (e.g., initial greeting)
# 3. Check the agent logs - you'll see the full token count
# 4. Check what the user sees - text is truncated

# Example from our logs:
# Realtime ttft=1.21s duration=6.18s tokens=0in/475out tokens/s=76.9
# ^ Agent generated 475 tokens but user didn't get the last few words

Important: This issue is more evident with increased network latency. If your LiveKit server or deployment has higher latency (e.g., cross-region, Azure deployments), the race window widens and truncation happens more consistently.

Operating System

Linux (Docker)

Models Used

  • LLM: Azure OpenAI (via Azure AI Foundry)
  • TTS: Azure TTS
  • Using both Voice Live and traditional pipeline

Package Versions

livekit-agents==1.3.12
livekit==0.17.4
Python==3.13

Proposed Solution

The issue is in _ParticipantStreamTranscriptionOutput.flush() in _output.py. Currently it's not async but creates a background task:

def flush(self) -> None:
    # ...
    self._flush_atask = asyncio.create_task(self._flush_task(curr_writer))
    # Returns immediately without waiting!

Fix: Make flush() async and await the task:

async def flush(self) -> None:
    # ...
    await self._flush_task(curr_writer)  # Wait for it

This would require updating all callers to await flush().

Additional Context

Why This Happens

The _flush_task() performs await writer.aclose() which is network I/O to the LiveKit server. Our testing shows this takes about 50-60ms in low-latency environments. But with higher latency (e.g., cross-region deployments, cloud-to-cloud communication), this can take 100ms or more.

The race window = time between flush() returning and _flush_atask completing

Higher latency → Longer race window → More likely to hit the race condition

This explains why some deployments see consistent truncation while others might not notice it.

Workaround

For now we're waiting for _flush_atask manually after wait_for_playout():

await speech_handle.wait_for_playout()

if hasattr(session.output.transcription, '_flush_atask'):
    if session.output.transcription._flush_atask:
        await session.output.transcription._flush_atask

This fixes the issue but requires accessing private SDK attributes.

Related Issues

This was reported before in #4817 but was closed with the comment "flush is called after all text captured so the race condition shouldn't happen." While flush IS called after text capture, the problem is that flush() doesn't wait for the actual flush work (writer.aclose()) to complete before returning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions