You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
name: split-continuation-failure-backoff
description: Continuation retries use a 1s fixed delay; failure retries use exponential backoff capped at max_retry_backoff_ms.
status: backlog
estimated_complexity: low
blast_radius: contained
PRD: split-continuation-failure-backoff
Executive Summary
Adopt Symphony's two-tier retry policy. Clean exits that re-check whether the issue is still active use a short fixed 1s delay. Failure-driven retries (timeout, crash, agent error) use exponential backoff (10s × 2^(attempt-1)) capped at `max_retry_backoff_ms` (default 5min). Aligns retry pacing with the actual reason for the retry.
Problem Statement
Today's retry pacing is uniform regardless of why the previous attempt ended. Continuation retries (issue still active after a clean attempt) wait the same as failure retries, slowing the autonomous loop unnecessarily. Failure retries without exponential backoff can hammer a struggling API.
Goals
Retry scheduler accepts a reason: `continuation` or `failure`.
Continuation: fixed 1000ms.
Failure: `min(10000 × 2^(attempt-1), max_retry_backoff_ms)` with default cap 300000ms.
Existing retry timer for the same thread is cancelled when scheduling a new one.
Non-Goals
Reason-specific reasons beyond continuation vs failure (e.g. distinguishing API error vs stall — could come later if useful).
Per-phase retry budgets.
Jittered backoff (deterministic for now; can add later if thundering-herd appears).
User Stories
As a user running shipcode autonomously, I want continuation re-checks to be near-instant and failure retries to back off, so my queue is responsive but does not hammer external APIs on outage. Acceptance:
Clean exit with issue still active → next attempt fires within ~1s.
Three consecutive failures → 10s, 20s, 40s delays observed.
Cap honored: at high attempt count the delay never exceeds `max_retry_backoff_ms`.
tests: `packages/pipeline/src/issue-group-scheduler.test.ts` — new cases for continuation delay, failure exponential, cap, cancel.
manual: with daemon (Add daemon mode for label dispatch #70) running, fail an attempt and observe the backoff in logs; confirm successful re-attempt afterward.
Risks & Open Questions
Attempt count semantics: does attempt 1 mean first retry or first attempt? Match Symphony spec (attempt 1 = first retry, exponent 0 → 10s).
name: split-continuation-failure-backoff
description: Continuation retries use a 1s fixed delay; failure retries use exponential backoff capped at max_retry_backoff_ms.
status: backlog
estimated_complexity: low
blast_radius: contained
PRD: split-continuation-failure-backoff
Executive Summary
Adopt Symphony's two-tier retry policy. Clean exits that re-check whether the issue is still active use a short fixed 1s delay. Failure-driven retries (timeout, crash, agent error) use exponential backoff (10s × 2^(attempt-1)) capped at `max_retry_backoff_ms` (default 5min). Aligns retry pacing with the actual reason for the retry.
Problem Statement
Today's retry pacing is uniform regardless of why the previous attempt ended. Continuation retries (issue still active after a clean attempt) wait the same as failure retries, slowing the autonomous loop unnecessarily. Failure retries without exponential backoff can hammer a struggling API.
Goals
Non-Goals
User Stories
Acceptance:
Functional Requirements
Non-Functional Requirements
Success Criteria
Out of Scope
Dependencies
Verification Plan
Risks & Open Questions