Skip to content

Anthropic plugin: LLMStream._run reuses awaited coroutine on retry → all subsequent attempts fail with RuntimeError: cannot reuse already awaited coroutine #5805

@myonlinematters

Description

@myonlinematters

Bug Description

TL;DR
livekit/plugins/anthropic/llm.py stores the awaitable that yields the Anthropic stream on the LLMStream instance. If the first call to _run() raises before that awaitable can be fully consumed and assigned to self._anthropic_stream, every subsequent retry awaits the same coroutine object — which Python forbids. The session ends up doomed: the framework retries 4 times, but every retry hits RuntimeError: cannot reuse already awaited coroutine, the original cause is lost, and the user-visible result is APIConnectionError: failed to generate LLM completion after 4 attempts.

One transient network/TLS blip during the very first LLM call of a session permanently breaks that session.

Environment
livekit-agents: 1.5.2
livekit-plugins-anthropic: 1.4.3
Python: 3.12.3
Platform: Ubuntu 24.04 (WSL2)
LLM: claude-sonnet-4-6 via Anthropic API

Actual behavior
First await of self._awaitable_anthropic_stream raises any exception → self._anthropic_stream stays None, the coroutine is exhausted.
Retry handler calls _run() again on the same instance.
if not self._anthropic_stream: is True → await self._awaitable_anthropic_stream on the already-awaited coroutine → Python raises RuntimeError: cannot reuse already awaited coroutine.
Retries 2, 3, 4 all hit the same RuntimeError.
Final visible error is APIConnectionError: failed to generate LLM completion after 4 attempts with RuntimeError: cannot reuse already awaited coroutine as the immediate cause; the original transient failure that triggered the whole cascade is permanently discarded from the log.
Session is unrecoverable until the user reconnects and a fresh LLMStream instance is constructed.

The buggy code
livekit/plugins/anthropic/llm.py:280-320 (paths approximate; pinned at v1.4.3):

async def _run(self) -> None:
retryable = True
try:
if not self._anthropic_stream:
self._anthropic_stream = await self._awaitable_anthropic_stream # ← bug

    async with self._anthropic_stream as stream:
        ...
except anthropic.APITimeoutError as e:
    raise APITimeoutError(retryable=retryable) from e
except anthropic.APIStatusError as e:
    raise APIStatusError(...) from e
except Exception as e:
    raise APIConnectionError(retryable=retryable) from e

self._awaitable_anthropic_stream is the result of calling the Anthropic SDK's client.messages.stream(...) factory — it's a coroutine object, not a factory. The LLMStream.init stashes it (line 263):

def init(self, ...) -> None:
super().init(...)
self._awaitable_anthropic_stream = anthropic_stream # ← stored once
self._anthropic_stream: ... | None = None
A coroutine in Python can only be awaited once. After the first await either completes or raises, the coroutine is exhausted.

Failure mode
When the first await self._awaitable_anthropic_stream raises (e.g., httpx ConnectError, ReadTimeout, SSLError, anything from the network layer):

self._anthropic_stream stays None (assignment never happened).
self._awaitable_anthropic_stream is now awaited-and-exhausted.
Exception bubbles up; except Exception as e: raise APIConnectionError(retryable=True) from e re-raises as retryable.
Framework retry handler calls _run() again on the same LLMStream instance.
if not self._anthropic_stream: is True (still None).
await self._awaitable_anthropic_stream → Python raises RuntimeError: cannot reuse already awaited coroutine.
except Exception as e: raise APIConnectionError(retryable=True) from e — but the chain is now "RuntimeError → APIConnectionError", which masks the real original cause.
Retries 3 and 4: same RuntimeError, same wrapping.
After 4 attempts, framework gives up and surfaces only the final APIConnectionError: Connection error. with RuntimeError: cannot reuse already awaited coroutine as the direct cause — the original network failure is gone.
Observed log (real failure, 2026-05-21)
18:56:11.720 INFO mp_main AGENT_STATE_CHANGED old=listening new=thinking
... 39 seconds elapse ...
18:56:50.836 ERROR livekit.agents Error in _llm_inference_task
Traceback (most recent call last):
File ".../livekit/plugins/anthropic/llm.py", line 284, in _run
self._anthropic_stream = await self._awaitable_anthropic_stream
RuntimeError: cannot reuse already awaited coroutine

The above exception was the direct cause of the following exception:
...
livekit.agents._exceptions.APIConnectionError: Connection error.

The above exception was the direct cause of the following exception:
...
livekit.agents._exceptions.APIConnectionError: failed to generate LLM completion after 4 attempts
18:56:50.909 INFO mp_main AGENT_STATE_CHANGED old=thinking new=listening
The agent never spoke. The user, seeing 39 seconds of silence on session start, hung up and reconnected — the second session (fresh LLMStream instance) worked fine. The bug is invisible unless the agent operator instruments around it; we only diagnosed it by reading the SDK source after the fact.

Suggested fix
Treat the awaitable as a factory, not a stored coroutine, so each retry gets a fresh awaitable. One workable shape:

def init(self, ..., anthropic_stream_factory: Callable[[], Awaitable[...]], ...) -> None:
super().init(...)
self._anthropic_stream_factory = anthropic_stream_factory
self._anthropic_stream: ... | None = None

async def _run(self) -> None:
retryable = True
try:
if not self._anthropic_stream:
self._anthropic_stream = await self._anthropic_stream_factory() # fresh per call
async with self._anthropic_stream as stream:
...
Plumbed back through LLM.chat() so the caller passes a thunk that re-invokes client.messages.stream(...) each time. Same behavior on the happy path; retry safety on failure.

Alternative (smaller diff, but more fragile): catch the RuntimeError in _run, log a diagnostic, and let the framework propagate it as a non-retryable error rather than wrapping it as retryable — at least the session would fail fast instead of burning 39 seconds on doomed retries.

Workaround we're using locally
A monkey-patch around LLMStream._run that walks the cause chain and logs the deepest root exception of every failed attempt before the SDK swallows it under APIConnectionError. This gives us the original cause so we can tell whether the trigger is local (firewall, AV, WSL networking) or remote (Anthropic edge transient). It does not fix the bug — the session is still unrecoverable in place. We restart the session manually.

Impact
Any LiveKit Agents deployment using the Anthropic plugin on a network with any non-trivial rate of transient TLS / connection failures (corporate VPN, AV with HTTPS inspection, mobile users, satellite, etc.) will see roughly one unrecoverable session per (P(first-attempt-fails) * sessions/day). For our use case this is a high-visibility user-facing failure mode (silence on session greeting) that we can't engineer around without patching the SDK.

Happy to PR the factory-based fix if the maintainers prefer that shape.

Expected Behavior

When the first await self._awaitable_anthropic_stream raises a transient error (network blip, TLS handshake hiccup, Anthropic 5xx, etc.):

The framework's retry handler calls _run() again.
Each retry obtains a fresh awaitable by re-invoking the Anthropic stream factory and awaiting that — not by re-awaiting the same exhausted coroutine.
If the transient condition has cleared, the retry succeeds and the session greeting plays normally.
If the underlying condition persists, the final APIConnectionError carries the original cause (the SSLError, ConnectError, ReadTimeout, etc.) in its cause chain so operators can diagnose the root issue.
In short: one transient failure on attempt 1 should not poison every subsequent retry for the lifetime of the LLMStream instance.

Reproduction Steps

1. 
2.
3.
...
- Sample code snippet, or a GitHub Gist link -

Operating System

Windows 11 Enterprise on MSFT Surface 7 ARM-64 with 64GB RAM

Models Used

Deepgram/Sonnet-4.6/ElevenLabs

Package Versions

livekit-agents: 1.5.2
livekit-plugins-anthropic: 1.4.3
Python: 3.12.3
Platform: Ubuntu 24.04 (WSL2)
LLM: claude-sonnet-4-6 via Anthropic API

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions