feat(sessions): two-phase streaming timeout for LLM calls by Aaronontheweb · Pull Request #460 · netclaw-dev/netclaw

Aaronontheweb · 2026-03-27T03:14:13Z

Summary

Closes #447.

Replaces the single hard wall-clock TurnLlmTimeout (3 min default) with a
two-phase timeout that distinguishes "waiting for prefill" from "detecting a
dead stream":

FirstTokenTimeout (default 10 min): generous ceiling for the prefill
phase where the model processes large input contexts before generating output
StreamIdleTimeout (default 2 min): tight timeout that resets on every
streaming delta, catching dead connections quickly once tokens start flowing

Also reverts the retry-with-backoff mechanism (TimeoutRetryHandler) which was
solving the wrong problem — retrying against a slow-but-alive provider just
wastes inference time and API cost.

Industry precedent: Anthropic SDK
#867,
Claude Code #6781,
#18028.

Changes

Watchdog starts with FirstTokenTimeout, switches to StreamIdleTimeout
on first LlmResponseDeltaReceived
_firstDeltaReceived flag tracks which phase we're in
Error messages distinguish "provider did not respond" from "stream stopped
unexpectedly"
ProcessingWatchdogExpired handler uses ExtractLlmErrorMessage for
contextual error messages instead of hardcoded string
Backward compat: existing TurnLlmTimeoutSeconds config respected as
fallback when new properties not set
Removed: TimeoutRetryHandler, RetryLlmCallAfterBackoff,
HandleTimeoutRetryOrFail, retry config properties

Test plan

dotnet build — clean (0 warnings, 0 errors)
dotnet test — 1289 tests pass (0 failures)
dotnet slopwatch analyze — no new violations
LlmSessionTwoPhaseTimeoutTests — 3 new integration tests:
- First_token_timeout_fires_when_no_deltas_arrive
- Stream_idle_timeout_fires_when_stream_stalls_after_deltas
- Successful_stream_completes_without_timeout
Existing watchdog and error correlation tests updated and passing
Self-review caught and fixed inverted backward-compat fallback logic

…eout (#447) The single TurnLlmTimeout (3 min default) conflated two concerns: waiting for prefill on large contexts and detecting dead streams. This caused legitimate slow requests to be killed prematurely, and the retry mechanism just resubmitted the same expensive call. Replace with two-phase timeout: - FirstTokenTimeout (default 10 min): generous ceiling for prefill phase - StreamIdleTimeout (default 2 min): tight timeout that resets on every streaming delta, catching dead connections quickly The watchdog starts with FirstTokenTimeout and switches to StreamIdleTimeout on the first LlmResponseDeltaReceived. Error messages now distinguish "provider did not respond" from "stream stopped unexpectedly". Also reverts the retry-with-backoff mechanism (TimeoutRetryHandler) which was solving the wrong problem — retrying against a slow-but-alive provider just wastes inference time and API cost. Backward compat: existing TurnLlmTimeoutSeconds config is respected as fallback when new properties are not set.

The fallback logic was inverted — when an operator had TurnLlmTimeoutSeconds set to a value higher than the new defaults (e.g., 900s for slow providers), FirstTokenTimeout would silently fall back to 600s instead of respecting the operator's configured value. Now: if TurnLlmTimeoutSeconds is customized (non-default 180), use it for both phases. Otherwise use the new generous defaults (600s/120s).

Aaronontheweb added 3 commits March 27, 2026 02:57

Merge branch 'dev' into feature/two-phase-streaming-timeout

9c01646

Aaronontheweb merged commit 7fd7593 into dev Mar 27, 2026
3 checks passed

Aaronontheweb deleted the feature/two-phase-streaming-timeout branch March 27, 2026 03:21

Aaronontheweb mentioned this pull request Mar 27, 2026

fix(sessions): passivation grace period (5s) too short for memory distillation on slow providers #459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sessions): two-phase streaming timeout for LLM calls#460

feat(sessions): two-phase streaming timeout for LLM calls#460
Aaronontheweb merged 3 commits into
devfrom
feature/two-phase-streaming-timeout

Aaronontheweb commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Aaronontheweb commented Mar 27, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant