Goal
Enable the system to autonomously detect when a subagent LLM connection has stalled (no tokens for N minutes) and give parent agents real-time visibility into what a running subagent is doing. This eliminates the need for human monitoring of subagent LLM interactions and turns invisible indefinite hangs into visible, bounded failures the system can act on.
Background
When an LLM provider stalls mid-stream, the subagent appears "running" but is actually frozen. The for-await loop in processor.ts hangs indefinitely waiting for the next SSE chunk. Abort signals dont interrupt hung network reads. The only safety net is a 30-minute timeout — meaning zombie agents can waste half an hour before anyone notices. check_task returns only "running" with no detail about whether the agent is making progress or stuck.
Scope
- LLM stream stall detector in processor.ts (detect "no tokens for N minutes")
- Enhanced check_task response with last tool calls, stall status, and last activity timestamp
- Configuration for stall timeout (default 3 minutes)
Out of Scope
- Graceful stream resumption after stall (HTTP/SSE limitation)
- Parent-child abort signal chaining (separate concern)
- Structured progress reporting API (future enhancement)
Child Issues
Acceptance Criteria
Epic is done when both child issues are closed and the system can autonomously detect and surface stalled subagents.
Fork Manifest Requirement
This EPIC modifies the subagent monitoring system introduced by the async-tasks fork feature. Upon completion, the .fork-features/manifest.json entry for async-tasks MUST be updated:
modifiedFiles: Add packages/opencode/src/session/processor.ts (stall detector lives here)
criticalCode: Add the following markers so sync-time agents understand what this code does and can verify it survives upstream merges:
lastTokenTime — per-session timestamp tracking in processor stream loop
OPENCODE_STALL_TIMEOUT_MS — env var for configurable stall timeout
LLM stream stalled — error message thrown on stall detection
stallDetected — field on TaskResult indicating stall was detected
lastToolCalls — field on TaskResult showing recent tool call activity
lastActivity — field on TaskResult showing timestamp of last stream event
absorptionSignals: Add stall.*detector, stream.*stall, lastTokenTime so upstream adoption is detected
Rationale: Even though the modified files are already partially covered in the manifest, the stall detection logic represents a distinct semantic divergence from upstream. Without explicit criticalCode markers and absorption signals, sync-time agents will not understand what this code does or why it exists, and may silently drop it during a merge conflict resolution.
Goal
Enable the system to autonomously detect when a subagent LLM connection has stalled (no tokens for N minutes) and give parent agents real-time visibility into what a running subagent is doing. This eliminates the need for human monitoring of subagent LLM interactions and turns invisible indefinite hangs into visible, bounded failures the system can act on.
Background
When an LLM provider stalls mid-stream, the subagent appears "running" but is actually frozen. The for-await loop in processor.ts hangs indefinitely waiting for the next SSE chunk. Abort signals dont interrupt hung network reads. The only safety net is a 30-minute timeout — meaning zombie agents can waste half an hour before anyone notices. check_task returns only "running" with no detail about whether the agent is making progress or stuck.
Scope
Out of Scope
Child Issues
Acceptance Criteria
Epic is done when both child issues are closed and the system can autonomously detect and surface stalled subagents.
Fork Manifest Requirement
This EPIC modifies the subagent monitoring system introduced by the
async-tasksfork feature. Upon completion, the.fork-features/manifest.jsonentry forasync-tasksMUST be updated:modifiedFiles: Addpackages/opencode/src/session/processor.ts(stall detector lives here)criticalCode: Add the following markers so sync-time agents understand what this code does and can verify it survives upstream merges:lastTokenTime— per-session timestamp tracking in processor stream loopOPENCODE_STALL_TIMEOUT_MS— env var for configurable stall timeoutLLM stream stalled— error message thrown on stall detectionstallDetected— field on TaskResult indicating stall was detectedlastToolCalls— field on TaskResult showing recent tool call activitylastActivity— field on TaskResult showing timestamp of last stream eventabsorptionSignals: Addstall.*detector,stream.*stall,lastTokenTimeso upstream adoption is detectedRationale: Even though the modified files are already partially covered in the manifest, the stall detection logic represents a distinct semantic divergence from upstream. Without explicit criticalCode markers and absorption signals, sync-time agents will not understand what this code does or why it exists, and may silently drop it during a merge conflict resolution.