Skip to content

Bug: Concurrent sub-agent events corrupt session state — permanent "tool_use ids were found without tool_result blocks" error #2543

@KeithIsSleeping

Description

@KeithIsSleeping

Error

CAPIError: 400 messages.N: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_XXXX.
Each `tool_use` block must have a corresponding `tool_result` block in the next message.

This error occurs on every subsequent message after corruption -- the session is permanently bricked with no built-in recovery mechanism.

Environment

  • Copilot CLI version: 1.0.19
  • OS: Windows 11
  • Model: Claude Opus 4.6 (1M context)

Root Cause

When the main conversation launches a sub-agent (via the task tool -- e.g., explore, rubber-duck, or general-purpose agents) and continues making tool calls while the sub-agent runs, both conversations write events to the same events.jsonl file. The runtime's message reconstruction does not properly separate the two conversations' tool calls when building the API request. The Claude API then rejects the request because tool_use blocks appear without matching tool_result blocks in the next message.

Steps to Reproduce

  1. Use a long-lived session (one that has undergone at least one compaction)
  2. Trigger a task tool call to launch a background sub-agent (explore, rubber-duck, etc.)
  3. While the sub-agent runs, continue interacting with the main conversation -- for example, read_agent with wait: true returns a timeout, and then the assistant makes additional tool calls
  4. Sub-agent events (assistant.message, tool.execution_start, tool.execution_complete) interleave with main conversation events in events.jsonl
  5. Every subsequent message permanently fails with the tool_use/tool_result mismatch error

Key observation

In the corrupted session we analyzed:

  • A sub-agent that ran with no main conversation events during its execution -- no corruption
  • A sub-agent that ran with 110+ main conversation events interleaved -- 19 orphaned tool calls, session permanently broken

The difference is whether the main conversation was actively producing events while the sub-agent was running.

Workaround / Repair Steps

There is no built-in recovery. We manually repaired the session by editing events.jsonl:

Diagnosis

  1. Opened ~/.copilot/session-state/<session-id>/events.jsonl
  2. Found the last session.compaction_complete event (this is where the runtime starts reconstructing the conversation)
  3. Indexed all assistant.message events with toolRequests after the compaction, along with their tool.execution_start and tool.execution_complete events
  4. For each tool call, checked if the gap between execution_start and execution_complete contained assistant.message or assistant.turn_start events from a different interactionId -- this indicates the sub-agent's events were interleaved

Repair

  1. Backed up events.jsonl to events.jsonl.bak
  2. Removed all events (any type) that referenced any of the problematic tool call IDs -- this covers the main conversation's tool_use, the sub-agent's internal events, hooks, error events, etc.
  3. Cascaded removal via parentId chains -- removed orphaned hook.start/hook.end and tool.execution_start/tool.execution_complete events whose parent was removed
  4. Swept for collateral orphans -- after removal, some tool calls lost their tool.execution_complete as collateral damage, leaving a new tool_use without a tool_result. Iteratively detected and removed these until no orphans remained
  5. Fixed parent chain links -- updated parentId references on surviving assistant.turn_end events to point to the nearest surviving ancestor
  6. Removed all session.error events (just repeated failure logs from retries)

A reusable PowerShell repair script is available in this repair script.

Suggested Fix

The runtime's message reconstruction should filter events by conversation/interaction context before building the Claude API messages array. Sub-agent internal events (assistant.message, tool.execution_start/complete, assistant.turn_start/end) should not be included in the main conversation's messages -- they belong to a separate conversation context.

Alternatively, sub-agent events could be written to a separate events file (e.g., events.<agent-id>.jsonl) to prevent interleaving entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions