Skip to content

feat(sessions): two-phase streaming timeout for LLM calls#460

Merged
Aaronontheweb merged 3 commits into
devfrom
feature/two-phase-streaming-timeout
Mar 27, 2026
Merged

feat(sessions): two-phase streaming timeout for LLM calls#460
Aaronontheweb merged 3 commits into
devfrom
feature/two-phase-streaming-timeout

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Closes #447.

Replaces the single hard wall-clock TurnLlmTimeout (3 min default) with a
two-phase timeout that distinguishes "waiting for prefill" from "detecting a
dead stream":

  • FirstTokenTimeout (default 10 min): generous ceiling for the prefill
    phase where the model processes large input contexts before generating output
  • StreamIdleTimeout (default 2 min): tight timeout that resets on every
    streaming delta, catching dead connections quickly once tokens start flowing

Also reverts the retry-with-backoff mechanism (TimeoutRetryHandler) which was
solving the wrong problem — retrying against a slow-but-alive provider just
wastes inference time and API cost.

Industry precedent: Anthropic SDK
#867,
Claude Code #6781,
#18028.

Changes

  • Watchdog starts with FirstTokenTimeout, switches to StreamIdleTimeout
    on first LlmResponseDeltaReceived
  • _firstDeltaReceived flag tracks which phase we're in
  • Error messages distinguish "provider did not respond" from "stream stopped
    unexpectedly"
  • ProcessingWatchdogExpired handler uses ExtractLlmErrorMessage for
    contextual error messages instead of hardcoded string
  • Backward compat: existing TurnLlmTimeoutSeconds config respected as
    fallback when new properties not set
  • Removed: TimeoutRetryHandler, RetryLlmCallAfterBackoff,
    HandleTimeoutRetryOrFail, retry config properties

Test plan

  • dotnet build — clean (0 warnings, 0 errors)
  • dotnet test — 1289 tests pass (0 failures)
  • dotnet slopwatch analyze — no new violations
  • LlmSessionTwoPhaseTimeoutTests — 3 new integration tests:
    • First_token_timeout_fires_when_no_deltas_arrive
    • Stream_idle_timeout_fires_when_stream_stalls_after_deltas
    • Successful_stream_completes_without_timeout
  • Existing watchdog and error correlation tests updated and passing
  • Self-review caught and fixed inverted backward-compat fallback logic

…eout (#447)

The single TurnLlmTimeout (3 min default) conflated two concerns: waiting
for prefill on large contexts and detecting dead streams. This caused
legitimate slow requests to be killed prematurely, and the retry mechanism
just resubmitted the same expensive call.

Replace with two-phase timeout:
- FirstTokenTimeout (default 10 min): generous ceiling for prefill phase
- StreamIdleTimeout (default 2 min): tight timeout that resets on every
  streaming delta, catching dead connections quickly

The watchdog starts with FirstTokenTimeout and switches to StreamIdleTimeout
on the first LlmResponseDeltaReceived. Error messages now distinguish
"provider did not respond" from "stream stopped unexpectedly".

Also reverts the retry-with-backoff mechanism (TimeoutRetryHandler) which
was solving the wrong problem — retrying against a slow-but-alive provider
just wastes inference time and API cost.

Backward compat: existing TurnLlmTimeoutSeconds config is respected as
fallback when new properties are not set.
The fallback logic was inverted — when an operator had TurnLlmTimeoutSeconds
set to a value higher than the new defaults (e.g., 900s for slow providers),
FirstTokenTimeout would silently fall back to 600s instead of respecting the
operator's configured value.

Now: if TurnLlmTimeoutSeconds is customized (non-default 180), use it for
both phases. Otherwise use the new generous defaults (600s/120s).
@Aaronontheweb Aaronontheweb merged commit 7fd7593 into dev Mar 27, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the feature/two-phase-streaming-timeout branch March 27, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(sessions): retry LLM calls on provider timeout instead of failing the turn

1 participant