Anthropic plugin: LLMStream._run reuses awaited coroutine on retry → all subsequent attempts fail with RuntimeError: cannot reuse already awaited coroutine

### Bug Description

**TL;DR**
livekit/plugins/anthropic/llm.py stores the awaitable that yields the Anthropic stream on the LLMStream instance. If the first call to _run() raises before that awaitable can be fully consumed and assigned to self._anthropic_stream, every subsequent retry awaits the same coroutine object — which Python forbids. The session ends up doomed: the framework retries 4 times, but every retry hits RuntimeError: cannot reuse already awaited coroutine, the original cause is lost, and the user-visible result is APIConnectionError: failed to generate LLM completion after 4 attempts.

One transient network/TLS blip during the very first LLM call of a session permanently breaks that session.

**Environment**
livekit-agents: 1.5.2
livekit-plugins-anthropic: 1.4.3
Python: 3.12.3
Platform: Ubuntu 24.04 (WSL2)
LLM: claude-sonnet-4-6 via Anthropic API

**Actual behavior**
First await of self._awaitable_anthropic_stream raises any exception → self._anthropic_stream stays None, the coroutine is exhausted.
Retry handler calls _run() again on the same instance.
if not self._anthropic_stream: is True → await self._awaitable_anthropic_stream on the already-awaited coroutine → Python raises RuntimeError: cannot reuse already awaited coroutine.
Retries 2, 3, 4 all hit the same RuntimeError.
Final visible error is APIConnectionError: failed to generate LLM completion after 4 attempts with RuntimeError: cannot reuse already awaited coroutine as the immediate cause; the original transient failure that triggered the whole cascade is permanently discarded from the log.
Session is unrecoverable until the user reconnects and a fresh LLMStream instance is constructed.

**The buggy code**
[livekit/plugins/anthropic/llm.py:280-320](https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py) (paths approximate; pinned at v1.4.3):

async def _run(self) -> None:
    retryable = True
    try:
        if not self._anthropic_stream:
            self._anthropic_stream = await self._awaitable_anthropic_stream  # ← bug

        async with self._anthropic_stream as stream:
            ...
    except anthropic.APITimeoutError as e:
        raise APITimeoutError(retryable=retryable) from e
    except anthropic.APIStatusError as e:
        raise APIStatusError(...) from e
    except Exception as e:
        raise APIConnectionError(retryable=retryable) from e
self._awaitable_anthropic_stream is the result of calling the Anthropic SDK's client.messages.stream(...) factory — it's a coroutine object, not a factory. The LLMStream.__init__ stashes it ([line 263](https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py)):

def __init__(self, ...) -> None:
    super().__init__(...)
    self._awaitable_anthropic_stream = anthropic_stream  # ← stored once
    self._anthropic_stream: ... | None = None
A coroutine in Python can only be awaited once. After the first await either completes or raises, the coroutine is exhausted.

**Failure mode**
When the first await self._awaitable_anthropic_stream raises (e.g., httpx ConnectError, ReadTimeout, SSLError, anything from the network layer):

self._anthropic_stream stays None (assignment never happened).
self._awaitable_anthropic_stream is now awaited-and-exhausted.
Exception bubbles up; except Exception as e: raise APIConnectionError(retryable=True) from e re-raises as retryable.
Framework retry handler calls _run() again on the same LLMStream instance.
if not self._anthropic_stream: is True (still None).
await self._awaitable_anthropic_stream → Python raises RuntimeError: cannot reuse already awaited coroutine.
except Exception as e: raise APIConnectionError(retryable=True) from e — but the chain is now "RuntimeError → APIConnectionError", which masks the real original cause.
Retries 3 and 4: same RuntimeError, same wrapping.
After 4 attempts, framework gives up and surfaces only the final APIConnectionError: Connection error. with RuntimeError: cannot reuse already awaited coroutine as the direct cause — the original network failure is gone.
Observed log (real failure, 2026-05-21)
18:56:11.720 INFO  __mp_main__  AGENT_STATE_CHANGED old=listening new=thinking
... 39 seconds elapse ...
18:56:50.836 ERROR livekit.agents  Error in _llm_inference_task
  Traceback (most recent call last):
    File ".../livekit/plugins/anthropic/llm.py", line 284, in _run
      self._anthropic_stream = await self._awaitable_anthropic_stream
  RuntimeError: cannot reuse already awaited coroutine

  The above exception was the direct cause of the following exception:
  ...
  livekit.agents._exceptions.APIConnectionError: Connection error.

  The above exception was the direct cause of the following exception:
  ...
  livekit.agents._exceptions.APIConnectionError: failed to generate LLM completion after 4 attempts
18:56:50.909 INFO  __mp_main__  AGENT_STATE_CHANGED old=thinking new=listening
The agent never spoke. The user, seeing 39 seconds of silence on session start, hung up and reconnected — the second session (fresh LLMStream instance) worked fine. The bug is invisible unless the agent operator instruments around it; we only diagnosed it by reading the SDK source after the fact.

**Suggested fix**
Treat the awaitable as a factory, not a stored coroutine, so each retry gets a fresh awaitable. One workable shape:

def __init__(self, ..., anthropic_stream_factory: Callable[[], Awaitable[...]], ...) -> None:
    super().__init__(...)
    self._anthropic_stream_factory = anthropic_stream_factory
    self._anthropic_stream: ... | None = None

async def _run(self) -> None:
    retryable = True
    try:
        if not self._anthropic_stream:
            self._anthropic_stream = await self._anthropic_stream_factory()  # fresh per call
        async with self._anthropic_stream as stream:
            ...
Plumbed back through LLM.chat() so the caller passes a thunk that re-invokes client.messages.stream(...) each time. Same behavior on the happy path; retry safety on failure.

Alternative (smaller diff, but more fragile): catch the RuntimeError in _run, log a diagnostic, and let the framework propagate it as a non-retryable error rather than wrapping it as retryable — at least the session would fail fast instead of burning 39 seconds on doomed retries.

**Workaround we're using locally**
A monkey-patch around LLMStream._run that walks the __cause__ chain and logs the deepest root exception of every failed attempt before the SDK swallows it under APIConnectionError. This gives us the original cause so we can tell whether the trigger is local (firewall, AV, WSL networking) or remote (Anthropic edge transient). It does not fix the bug — the session is still unrecoverable in place. We restart the session manually.

**Impact**
Any LiveKit Agents deployment using the Anthropic plugin on a network with any non-trivial rate of transient TLS / connection failures (corporate VPN, AV with HTTPS inspection, mobile users, satellite, etc.) will see roughly one unrecoverable session per (P(first-attempt-fails) * sessions/day). For our use case this is a high-visibility user-facing failure mode (silence on session greeting) that we can't engineer around without patching the SDK.

Happy to PR the factory-based fix if the maintainers prefer that shape.

### Expected Behavior

When the first await self._awaitable_anthropic_stream raises a transient error (network blip, TLS handshake hiccup, Anthropic 5xx, etc.):

The framework's retry handler calls _run() again.
Each retry obtains a fresh awaitable by re-invoking the Anthropic stream factory and awaiting that — not by re-awaiting the same exhausted coroutine.
If the transient condition has cleared, the retry succeeds and the session greeting plays normally.
If the underlying condition persists, the final APIConnectionError carries the original cause (the SSLError, ConnectError, ReadTimeout, etc.) in its __cause__ chain so operators can diagnose the root issue.
In short: one transient failure on attempt 1 should not poison every subsequent retry for the lifetime of the LLMStream instance.

### Reproduction Steps

```bash
1. 
2.
3.
...
- Sample code snippet, or a GitHub Gist link -
```

### Operating System

Windows 11 Enterprise on MSFT Surface 7 ARM-64 with 64GB RAM

### Models Used

Deepgram/Sonnet-4.6/ElevenLabs

### Package Versions

```bash
livekit-agents: 1.5.2
livekit-plugins-anthropic: 1.4.3
Python: 3.12.3
Platform: Ubuntu 24.04 (WSL2)
LLM: claude-sonnet-4-6 via Anthropic API
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

```python

```

### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic plugin: LLMStream._run reuses awaited coroutine on retry → all subsequent attempts fail with RuntimeError: cannot reuse already awaited coroutine #5805

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Anthropic plugin: LLMStream._run reuses awaited coroutine on retry → all subsequent attempts fail with RuntimeError: cannot reuse already awaited coroutine #5805

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions