[Bug] "missing finish_reason for choice 0" when using DelegatingAIAgent with multi-turn sessions

## Description

When wrapping a Copilot `AIAgent` in a `DelegatingAIAgent` that **buffers all streaming updates** (e.g., to capture structured tool output), sessions with heavy built-in tool usage (file reads, shell commands, git operations) intermittently fail with:

```
Session error: Execution failed: Error: missing finish_reason for choice 0
```

The same sessions succeed when using a **plain agent** (no `DelegatingAIAgent` wrapper) — even with identical prompts, models, tools, and session configuration. The issue appears to be caused by the streaming buffering pattern breaking the SDK's internal message flow during long multi-turn sessions.

## Potential Root Cause

The `DelegatingAIAgent.RunCoreStreamingAsync` override buffers all `AgentResponseUpdate` items before yielding them:

```csharp
// This pattern causes the bug:
protected override async IAsyncEnumerable<AgentResponseUpdate> RunCoreStreamingAsync(...)
{
    List<AgentResponseUpdate> updates = [];
    await foreach (var update in base.RunCoreStreamingAsync(...))
    {
        updates.Add(update);  // Buffer ALL updates
    }
    // ... yield updates after buffering
}
```

During long sessions (50+ built-in tool calls), this buffering appears to cause the Copilot CLI to mishandle the streaming response, resulting in a missing `finish_reason` on the final chat completion choice.

## Evidence

### Controlled comparison (same prompt, same model, same repo)

| Agent Type | DelegatingAIAgent? | Built-in tool calls | Result |
|-----------|-------------------|---------------------|--------|
| Worker (plain agent) | ❌ No | **24** permission requests | ✅ **Success** |
| Planner (DelegatingAIAgent) | ✅ Yes | **0** permission requests | ✅ **Success** |
| Reviewer (DelegatingAIAgent) | ✅ Yes | **15-183** permission requests | ❌ **Fails ~60-100%** |
| Reviewer (plain agent, no wrapper) | ❌ No | **183** permission requests | ✅ **Success** |

The pattern is clear: **DelegatingAIAgent + heavy built-in tool usage = failure**. Either factor alone works fine.

### Production sandbox validation

After removing the `DelegatingAIAgent` wrapper from the reviewer and switching to text-based structured output:
- **Before (with wrapper):** 3/3 failures in production, 3/5 failures locally
- **After (plain agent):** Success with 183 permission requests, 1,476 session events, 605 lifecycle events — the heaviest session we've tested

### Local reproduction (5 runs each)

```
Reviewer with DelegatingAIAgent + file reading: 2/5 PASS (40%)
Reviewer as plain agent + file reading:          5/5 PASS (100%)
```

## Steps to Reproduce

```csharp
// 1. Create a DelegatingAIAgent that buffers streaming (mimics ToolCaptureAgent)
class BufferingAgent(AIAgent inner) : DelegatingAIAgent(inner)
{
    protected override async IAsyncEnumerable<AgentResponseUpdate> RunCoreStreamingAsync(
        IEnumerable<ChatMessage> messages, AgentSession? session = null,
        AgentRunOptions? options = null, CancellationToken ct = default)
    {
        List<AgentResponseUpdate> updates = [];
        await foreach (var update in base.RunCoreStreamingAsync(messages, session, options, ct))
            updates.Add(update);
        foreach (var update in updates)
            yield return update;
    }
}

// 2. Create session with any model
var client = new CopilotClient(new() { GithubToken = token });
var config = new SessionConfig { WorkingDirectory = "/path/to/repo", Model = "claude-opus-4.6" };
var inner = client.AsAIAgent(config, ownsClient: false, name: "test");
var agent = new BufferingAgent(inner);  // ← Wrapping causes the bug

// 3. Send prompt that triggers heavy built-in tool usage
var session = await agent.CreateSessionAsync();
var response = await agent.RunAsync(
    "Read all .cs files in src/ and summarize them.", session);
// ❌ Intermittently throws: Session error: Execution failed: Error: missing finish_reason for choice 0
```

**Without the wrapper** (using `inner` directly), the same prompt succeeds consistently.

## Expected Behavior

`DelegatingAIAgent` subclasses that buffer streaming updates should work reliably regardless of session length or built-in tool usage count.

## Actual Behavior

Sessions fail intermittently with `missing finish_reason for choice 0` when a `DelegatingAIAgent` buffers streaming updates during long multi-turn sessions with heavy built-in tool usage. Failure rate increases with session length.

## Environment

- **SDK**: `GitHub.Copilot.SDK` v0.1.23 (NuGet, .NET)
- **Also uses**: `Microsoft.Agents.AI.GitHub.Copilot` v1.0.0-preview.260225.1
- **Runtime**: .NET 10
- **OS**: Reproduced on both Windows (local) and Linux (ADC sandbox/Azure Linux 3.0)
- **Models tested**: `claude-opus-4.6`, `gpt-5.1-codex` — both exhibit the same behavior

## Workaround Used

Avoid `DelegatingAIAgent` / streaming buffering for agents that perform heavy built-in tool usage. Use text-based structured output (prompt the model to include a parseable JSON line in its response) instead of intercepting tool calls via a wrapper agent.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] "missing finish_reason for choice 0" when using DelegatingAIAgent with multi-turn sessions #890

Description

Potential Root Cause

Evidence

Controlled comparison (same prompt, same model, same repo)

Production sandbox validation

Local reproduction (5 runs each)

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Workaround Used

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent Type	DelegatingAIAgent?	Built-in tool calls	Result
Worker (plain agent)	❌ No	24 permission requests	✅ Success
Planner (DelegatingAIAgent)	✅ Yes	0 permission requests	✅ Success
Reviewer (DelegatingAIAgent)	✅ Yes	15-183 permission requests	❌ Fails ~60-100%
Reviewer (plain agent, no wrapper)	❌ No	183 permission requests	✅ Success

[Bug] "missing finish_reason for choice 0" when using DelegatingAIAgent with multi-turn sessions #890

Description

Description

Potential Root Cause

Evidence

Controlled comparison (same prompt, same model, same repo)

Production sandbox validation

Local reproduction (5 runs each)

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Workaround Used

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions