Skip to content

cllama: harden upstream-failure handling — full-timeout (600s) hangs, no auto-failover to fallback MODEL on 5xx, and tail latency drifting toward the stream-stall cancel cliff #319

Description

@mostlydev

Summary

cllama's upstream-failure handling is the single biggest reliability risk for a live multi-agent pod. Three related gaps, observed downstream:

  1. Full-timeout hangs. Upstream providers (seen with both xAI/grok and an OpenRouter-hosted model) accept a request and then never complete the stream; cllama waits the entire upstream timeout (~600s) before giving up with a 502. Every such failure is latency_ms ≈ 600000.

  2. No auto-failover to the declared fallback MODEL slot on 5xx. Live behavior is retry-same-model-3×-then-die (API call failed after 3 retries: HTTP 503). When a provider circuit-breaks (e.g. "all declared providers cooling down"), every agent on that model blanks simultaneously with no fallback. (Downstream-tracked separately as tiverton-house#62.)

  3. Tail latency drifts toward the runtime's stream-stall cancel cliff. The consuming runner (Hermes) cancels a turn at ~660s. A live per-model sample:

    model n hangs ≥120s p50 p90 max
    claude-haiku-4-5 51 0 7.7s 47s 109s
    grok-4.3 147 1 3.1s 7s 168s
    minimax-m3 257 20 16s 89s 491s (8.2 min)

    minimax-m3's max of 491s is within ~170s of the 660s cliff; a further degradation cancels turns mid-stream during market hours. All of these return 200 right up until they cross the cliff, so latency, not error rate, is the early-warning signal.

Suggested directions

  • Shorten the cllama→upstream timeout (e.g. 600s → ~120s) so a hung upstream fails fast instead of consuming a full turn.
  • Implement auto-failover to the declared fallback MODEL slot on 5xx / repeated stream errors, rather than retry-same-model-then-die.
  • Treat missing choices / mid-stream INTERNAL_ERROR as retryable.
  • Optionally expose per-model latency percentiles in claw audit so operators can see the cliff approaching.

Downstream references

mostlydev/tiverton-house#59 (full incident history across both desk models) and mostlydev/tiverton-house#62 (the fallback-slot gap specifically).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions