feat(sessions): two-phase streaming timeout for LLM calls#460
Merged
Conversation
…eout (#447) The single TurnLlmTimeout (3 min default) conflated two concerns: waiting for prefill on large contexts and detecting dead streams. This caused legitimate slow requests to be killed prematurely, and the retry mechanism just resubmitted the same expensive call. Replace with two-phase timeout: - FirstTokenTimeout (default 10 min): generous ceiling for prefill phase - StreamIdleTimeout (default 2 min): tight timeout that resets on every streaming delta, catching dead connections quickly The watchdog starts with FirstTokenTimeout and switches to StreamIdleTimeout on the first LlmResponseDeltaReceived. Error messages now distinguish "provider did not respond" from "stream stopped unexpectedly". Also reverts the retry-with-backoff mechanism (TimeoutRetryHandler) which was solving the wrong problem — retrying against a slow-but-alive provider just wastes inference time and API cost. Backward compat: existing TurnLlmTimeoutSeconds config is respected as fallback when new properties are not set.
The fallback logic was inverted — when an operator had TurnLlmTimeoutSeconds set to a value higher than the new defaults (e.g., 900s for slow providers), FirstTokenTimeout would silently fall back to 600s instead of respecting the operator's configured value. Now: if TurnLlmTimeoutSeconds is customized (non-default 180), use it for both phases. Otherwise use the new generous defaults (600s/120s).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #447.
Replaces the single hard wall-clock
TurnLlmTimeout(3 min default) with atwo-phase timeout that distinguishes "waiting for prefill" from "detecting a
dead stream":
phase where the model processes large input contexts before generating output
streaming delta, catching dead connections quickly once tokens start flowing
Also reverts the retry-with-backoff mechanism (
TimeoutRetryHandler) which wassolving the wrong problem — retrying against a slow-but-alive provider just
wastes inference time and API cost.
Industry precedent: Anthropic SDK
#867,
Claude Code #6781,
#18028.
Changes
FirstTokenTimeout, switches toStreamIdleTimeouton first
LlmResponseDeltaReceived_firstDeltaReceivedflag tracks which phase we're inunexpectedly"
ProcessingWatchdogExpiredhandler usesExtractLlmErrorMessageforcontextual error messages instead of hardcoded string
TurnLlmTimeoutSecondsconfig respected asfallback when new properties not set
TimeoutRetryHandler,RetryLlmCallAfterBackoff,HandleTimeoutRetryOrFail, retry config propertiesTest plan
dotnet build— clean (0 warnings, 0 errors)dotnet test— 1289 tests pass (0 failures)dotnet slopwatch analyze— no new violationsLlmSessionTwoPhaseTimeoutTests— 3 new integration tests:First_token_timeout_fires_when_no_deltas_arriveStream_idle_timeout_fires_when_stream_stalls_after_deltasSuccessful_stream_completes_without_timeout