Skip to content

Fix .NET client startup cleanup race#1206

Merged
stephentoub merged 2 commits into
mainfrom
stephentoub/debug-csharp-flake
May 5, 2026
Merged

Fix .NET client startup cleanup race#1206
stephentoub merged 2 commits into
mainfrom
stephentoub/debug-csharp-flake

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

A four-way parallel run of the .NET SDK E2E suite exposed a load-sensitive cleanup race: if the TCP CLI connection drops after a connection is created but before startup finishes protocol negotiation, later force-stop/dispose cleanup can rethrow the failed startup task instead of just releasing resources.

This change makes startup cleanup tear down partially established connections before rethrowing the startup failure, and makes force-stop cleanup treat an already-failed startup task as cleanup-complete. It also adds a fake TCP CLI regression test that announces a port, accepts the initial JSON-RPC connection, and drops during startup verification to ensure ForceStopAsync remains safe afterward.

Validation:

  • dotnet test dotnet\test\GitHub.Copilot.SDK.Test.csproj --filter FullyQualifiedName~ForceStop_Does_Not_Rethrow_When_Tcp_Cli_Drops_During_Startup
  • 4 full C# SDK test suites run in parallel: 4/4 passed

Clean up partially established CLI connections if startup fails during protocol negotiation, and avoid rethrowing failed startup tasks during force-stop cleanup.

Add a TCP fake-CLI regression test that drops during startup verification and verifies ForceStopAsync remains cleanup-safe.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub requested a review from a team as a code owner May 5, 2026 18:36
Copilot AI review requested due to automatic review settings May 5, 2026 18:36
Comment thread dotnet/src/Client.cs Fixed
Comment thread dotnet/src/Client.cs Fixed
Comment thread dotnet/src/Client.cs Fixed
Comment thread dotnet/src/Client.cs Fixed
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a cleanup race in the .NET SDK’s client startup path where a dropped TCP CLI connection during protocol negotiation could cause subsequent cleanup (ForceStopAsync / dispose) to rethrow the original startup failure instead of just releasing resources.

Changes:

  • Ensures partially-established connections are torn down if startup verification/configuration fails (protocol negotiation + session FS config).
  • Adjusts force-stop/stop cleanup to treat an already-failed startup task as “cleanup complete” (avoids rethrowing startup failures during teardown).
  • Adds an E2E regression test using a fake TCP CLI that drops the connection during startup verification to validate ForceStopAsync safety afterward.
Show a summary per file
File Description
dotnet/src/Client.cs Adds startup failure cleanup for partially established connections; makes teardown ignore faulted startup tasks and improves cleanup error reporting/logging.
dotnet/test/E2E/ClientOptionsE2ETests.cs Adds an E2E regression test with a fake TCP CLI that drops during startup to ensure ForceStopAsync does not rethrow.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment thread dotnet/src/Client.cs
Ensure child CLI processes are terminated when startup fails before a Connection is produced, including TCP connect failures after the CLI announces a port.

Add a regression test that verifies a fake TCP CLI process is killed when the announced port cannot be connected to.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread dotnet/src/Client.cs
Comment thread dotnet/src/Client.cs
Comment thread dotnet/src/Client.cs
Comment thread dotnet/src/Client.cs
Comment thread dotnet/src/Client.cs
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Cross-SDK Consistency Review ✅

This PR is a .NET-only internal bug fix and maintains full cross-SDK consistency. No changes are needed in Node.js, Python, or Go.

What this PR fixes: A race in the .NET client's startup cleanup — if a TCP CLI connection drops after being established but before protocol verification completes, the stored _connectionTask (a Task in a faulted state) would be re-awaited during CleanupConnectionAsync, causing ForceStopAsync to throw instead of silently releasing resources.

Why no other SDK is affected:

SDK Startup pattern Cleanup pattern
.NET Stores _connectionTask (a Task<Connection>) that can be re-awaited by concurrent callers or cleanup Awaits _connectionTask in cleanup — bug: faulted task re-throws
Node.js Linear async start() — sets this.cliProcess/this.connection directly stop()/forceStop() read fields directly, no stored promise to re-await
Python Linear async start() — sets self._process/self._client directly stop()/force_stop() read fields directly, same safe pattern
Go Synchronous Start() protected by a mutex — no async task caching Stop()/ForceStop() read c.process/c.conn directly under the mutex

The _connectionTask caching pattern is unique to .NET's async/await model and is not replicated in the other SDKs, so the equivalent bug cannot occur there. The new regression tests (ForceStop_Does_Not_Rethrow_When_Tcp_Cli_Drops_During_Startup and StartAsync_Cleans_Up_Tcp_Cli_Process_When_Connect_Fails) appropriately cover the .NET-specific scenario.

Generated by SDK Consistency Review Agent for issue #1206 · ● 320.3K ·

@stephentoub stephentoub merged commit 3c622b9 into main May 5, 2026
26 checks passed
@stephentoub stephentoub deleted the stephentoub/debug-csharp-flake branch May 5, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants