Bug: Concurrent sub-agent events corrupt session state — permanent "tool_use ids were found without tool_result blocks" error

## Error

```
CAPIError: 400 messages.N: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_XXXX.
Each `tool_use` block must have a corresponding `tool_result` block in the next message.
```

This error occurs on **every subsequent message** after corruption -- the session is permanently bricked with no built-in recovery mechanism.

## Environment

- **Copilot CLI version:** 1.0.19
- **OS:** Windows 11
- **Model:** Claude Opus 4.6 (1M context)

## Root Cause

When the main conversation launches a sub-agent (via the `task` tool -- e.g., explore, rubber-duck, or general-purpose agents) and continues making tool calls while the sub-agent runs, both conversations write events to the same `events.jsonl` file. The runtime's message reconstruction does not properly separate the two conversations' tool calls when building the API request. The Claude API then rejects the request because `tool_use` blocks appear without matching `tool_result` blocks in the next message.

## Steps to Reproduce

1. Use a **long-lived session** (one that has undergone at least one compaction)
2. Trigger a `task` tool call to launch a **background sub-agent** (explore, rubber-duck, etc.)
3. While the sub-agent runs, **continue interacting** with the main conversation -- for example, `read_agent` with `wait: true` returns a timeout, and then the assistant makes additional tool calls
4. Sub-agent events (`assistant.message`, `tool.execution_start`, `tool.execution_complete`) interleave with main conversation events in `events.jsonl`
5. **Every subsequent message permanently fails** with the `tool_use`/`tool_result` mismatch error

### Key observation

In the corrupted session we analyzed:
- A sub-agent that ran with **no** main conversation events during its execution -- **no corruption**
- A sub-agent that ran with **110+ main conversation events interleaved** -- **19 orphaned tool calls, session permanently broken**

The difference is whether the main conversation was actively producing events while the sub-agent was running.

## Workaround / Repair Steps

There is no built-in recovery. We manually repaired the session by editing `events.jsonl`:

### Diagnosis

1. Opened `~/.copilot/session-state/<session-id>/events.jsonl`
2. Found the last `session.compaction_complete` event (this is where the runtime starts reconstructing the conversation)
3. Indexed all `assistant.message` events with `toolRequests` after the compaction, along with their `tool.execution_start` and `tool.execution_complete` events
4. For each tool call, checked if the gap between `execution_start` and `execution_complete` contained `assistant.message` or `assistant.turn_start` events from a **different `interactionId`** -- this indicates the sub-agent's events were interleaved

### Repair

1. **Backed up** `events.jsonl` to `events.jsonl.bak`
2. **Removed all events** (any type) that referenced any of the problematic tool call IDs -- this covers the main conversation's tool_use, the sub-agent's internal events, hooks, error events, etc.
3. **Cascaded removal** via `parentId` chains -- removed orphaned `hook.start`/`hook.end` and `tool.execution_start`/`tool.execution_complete` events whose parent was removed
4. **Swept for collateral orphans** -- after removal, some tool calls lost their `tool.execution_complete` as collateral damage, leaving a new `tool_use` without a `tool_result`. Iteratively detected and removed these until no orphans remained
5. **Fixed parent chain links** -- updated `parentId` references on surviving `assistant.turn_end` events to point to the nearest surviving ancestor
6. **Removed all `session.error` events** (just repeated failure logs from retries)

A reusable PowerShell repair script is available in [this repair script](https://gist.github.com/KeithIsSleeping/1a919a69a5707b764d880cf9e38e5e70).

## Suggested Fix

The runtime's message reconstruction should filter events by conversation/interaction context before building the Claude API messages array. Sub-agent internal events (`assistant.message`, `tool.execution_start/complete`, `assistant.turn_start/end`) should not be included in the main conversation's messages -- they belong to a separate conversation context.

Alternatively, sub-agent events could be written to a separate events file (e.g., `events.<agent-id>.jsonl`) to prevent interleaving entirely.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Concurrent sub-agent events corrupt session state — permanent "tool_use ids were found without tool_result blocks" error #2543

Error

Environment

Root Cause

Steps to Reproduce

Key observation

Workaround / Repair Steps

Diagnosis

Repair

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Concurrent sub-agent events corrupt session state — permanent "tool_use ids were found without tool_result blocks" error #2543

Description

Error

Environment

Root Cause

Steps to Reproduce

Key observation

Workaround / Repair Steps

Diagnosis

Repair

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions