Context
Launch feedback asked how AgentRouter handles models that get stuck in long-running edit loops. This is a real production concern for agent workflows that edit repos, run commands, and retry tasks.
Problem
Agents can spend time repeating the same failed command, editing the same file without a meaningful diff, producing lots of output without state changes, or cycling through the same intermediate failure. Product teams need the runtime to surface these conditions clearly.
Proposed direction
Add no-progress detection signals to the runtime. The runtime does not need to fully solve every stuck loop automatically, but it should detect and expose suspicious patterns so product callers can cancel, retry, ask for approval, or resume from current sandbox state.
Acceptance criteria
- Track repeated command execution with similar output/failure.
- Track repeated file edits with no meaningful patch or equivalent patch churn.
- Track long output periods without run-record state transitions.
- Emit no-progress or suspected-loop events when thresholds are crossed.
- Surface loop/no-progress signals in the run record and SDK event stream.
- Add tests for repeated command and repeated edit detection.
Notes
The first version can be heuristic. The value is observability and control, not perfect automatic recovery.
Context
Launch feedback asked how AgentRouter handles models that get stuck in long-running edit loops. This is a real production concern for agent workflows that edit repos, run commands, and retry tasks.
Problem
Agents can spend time repeating the same failed command, editing the same file without a meaningful diff, producing lots of output without state changes, or cycling through the same intermediate failure. Product teams need the runtime to surface these conditions clearly.
Proposed direction
Add no-progress detection signals to the runtime. The runtime does not need to fully solve every stuck loop automatically, but it should detect and expose suspicious patterns so product callers can cancel, retry, ask for approval, or resume from current sandbox state.
Acceptance criteria
Notes
The first version can be heuristic. The value is observability and control, not perfect automatic recovery.