-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Sessions hang indefinitely when Task tool spawns subagents via REST API (opencode serve) #6573
Description
Description
I'm using OpenCode as a backend service via opencode serve to power a Telegram bot. The setup is:
- OpenCode runs as a systemd service using opencode serve --port 4096 --hostname 0.0.0.0
- A separate Node.js service (running on AWS ECS/EC2) uses the @opencode-ai/sdk to interact with OpenCode via REST API
- The service polls session status and streams responses back to users
This works great for simple queries, but sessions hang indefinitely when the LLM uses the Task tool to spawn subagents (e.g., the explore agent).
Expected Behavior
When a prompt triggers the Task tool (e.g., "explore the codebase to find X"), the subagent should complete and return results back to the parent session, which then completes normally.
Actual Behavior
- Parent session spawns a subagent via Task tool
- Subagent starts processing (visible in logs: step=0, step=1, step=2...)
- Multiple NotFoundError rejections occur in acp-command service
- Both parent and subagent sessions get stuck in "busy" state forever
- No response is ever returned via the SDK
Important: This ONLY happens via REST API
The exact same prompt works perfectly when using the TUI (opencode CLI). The subagent completes normally and the parent session receives the result.
Reproduction Steps
-
Start OpenCode in serve mode:
opencode serve --port 4096 --hostname 0.0.0.0 -
Use the SDK to send a prompt that triggers a subagent:
import { createOpencodeClient } from "@opencode-ai/sdk/v2";
const client = createOpencodeClient({
baseUrl: "http://localhost:4096",
});
// Create or load a session
const session = await client.session.create({ directory: "/path/to/project" });
// Send a prompt that will trigger the explore subagent
await client.session.prompt({
sessionID: session.data.id,
directory: "/path/to/project",
parts: [{ type: "text", text: "How is authentication implemented in this codebase?" }],
agent: "build",
model: { providerID: "anthropic", modelID: "claude-sonnet-4-20250514" },
});
// Poll for completion - this will hang forever
while (true) {
const status = await client.session.status({});
console.log(status.data.sessions[session.data.id]); // Always shows { type: "busy" }
await new Promise(r => setTimeout(r, 1000));
}
- Observe the session is stuck in busy state and never completes.
Log Analysis
Serve instance logs (via /proc//fd/14):
Session processing starts normally:
INFO 2026-01-01T11:19:31 service=session.prompt step=0 sessionID=ses_parent123 loop
INFO 2026-01-01T11:19:31 service=llm providerID=anthropic modelID=claude-opus-4-5 sessionID=ses_parent123 stream
Subagent is spawned via Task tool:
INFO 2026-01-01T11:19:37 service=session.prompt step=0 sessionID=ses_subagent456 loop
INFO 2026-01-01T11:19:37 service=llm providerID=anthropic modelID=claude-opus-4-5 sessionID=ses_subagent456 agent=explore stream
INFO 2026-01-01T11:19:42 service=session.prompt step=1 sessionID=ses_subagent456 loop
INFO 2026-01-01T11:19:46 service=session.prompt step=2 sessionID=ses_subagent456 loop
Then errors start appearing - one per second:
ERROR 2026-01-01T11:19:46 service=acp-command promise={} reason=NotFoundError Unhandled rejection
ERROR 2026-01-01T11:19:46 service=default e=NotFoundError rejection
ERROR 2026-01-01T11:19:47 service=acp-command promise={} reason=NotFoundError Unhandled rejection
ERROR 2026-01-01T11:19:47 service=default e=NotFoundError rejection
... (continues indefinitely)
After the errors, no more session.prompt activity - session is stuck:
INFO 2026-01-01T11:30:54 service=server method=GET path=/global/health request
INFO 2026-01-01T11:30:58 service=server method=GET path=/global/health request
... (only health checks, no session processing)
Session status shows multiple stuck sessions:
{
sessions: {
ses_parent123: { type: busy },
ses_subagent456: { type: busy },
// ... 14 more stuck sessions from previous attempts
}
}
Root Cause Analysis
After tracing through the code, I believe the issue is in packages/opencode/src/acp/agent.ts in the setupEventSubscriptions function.
The Problem Flow:
- When a session is created via ACP, setupEventSubscriptions() is called (line 62)
- This subscribes to events using the SDK: this.config.sdk.event.subscribe({ directory }) (line 71)
- When a message.part.updated event comes in, it tries to fetch the message (line 132-145):
const message = await this.config.sdk.session
.message(
{
sessionID: part.sessionID, // ← This could be a SUBAGENT session ID
messageID: part.messageID,
directory, // ← But this is the PARENT session's directory
},
{ throwOnError: true },
)
.then((x) => x.data)
.catch((err) => {
log.error("unexpected error when fetching message", { error: err })
return undefined
})
- When the Task tool spawns a subagent, events for the subagent session also come through the same event subscription
- The message fetch fails with NotFoundError (possibly timing issue - message not yet persisted, or directory context mismatch)
- The .catch() block returns undefined, but this doesn't properly abort/cleanup the session
- The session remains in "busy" state forever
Why TUI Works:
In the TUI, everything runs in the same process:
- No network calls needed to fetch messages
- State is directly accessible via Instance.state()
- Subagent sessions are handled in the same async context
- The SessionPrompt.loop() function can properly track and complete subagent sessions
Environment
- OpenCode version: 1.0.220
- OS: Ubuntu 22.04 (AWS EC2)
- Model: anthropic/claude-opus-4-5 (but likely affects all models)
OpenCode version
No response
Steps to reproduce
No response
Screenshot and/or share link
No response
Operating System
No response
Terminal
No response