Fix Continue-as-new Race Condition at InProcessTestHost by nytian · Pull Request #703 · microsoft/durabletask-dotnet

nytian · 2026-04-15T01:30:26Z

As titled. Fix issue #689

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds coverage and runtime fixes to address a ContinueAsNew race condition in the in-process test host (durabletask-dotnet #689), focusing on proper async continuations and improved robustness around worker streaming/disconnect behavior.

Changes:

Add a new stress-oriented test suite validating ContinueAsNew behavior under concurrency and scheduling patterns.
Run TaskCompletionSource continuations asynchronously to reduce inline-continuation races/deadlocks.
Adjust gRPC sidecar behavior around history streaming threshold and worker stream disconnect handling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
test/InProcessTestHost.Tests/ContinueAsNewTests.cs	New tests exercising ContinueAsNew across single, parallel, rapid-fire, and staggered scheduling scenarios.
src/InProcessTestHost/Sidecar/InMemoryOrchestrationService.cs	Makes waiter continuations run asynchronously to avoid continuation-related races.
src/InProcessTestHost/Sidecar/Grpc/TaskHubGrpcServer.cs	Introduces a streaming threshold constant, adjusts “no client” exception type, handles a specific stream-closed scenario, and uses async continuations for several TCS instances.

Copilot · 2026-04-15T04:26:39Z

+        catch (InvalidOperationException ex) when (ex.Message.Contains("request is complete", StringComparison.OrdinalIgnoreCase))
+        {
+            // The client disconnected or canceled the GetWorkItems stream.
+            // Reset the connection state so the dispatcher pauses naturally
+            // (via the traffic signal) until a new client connects.
+            lock (this.isConnectedSignal)
+            {
+                this.workerToClientStream = null;
+                this.isConnectedSignal.Reset();
+            }
+
+            this.log.LogWarning(ex, "Work-item stream closed by client. Dispatcher will pause until reconnection.");
+        }


Catching InvalidOperationException based on a substring of the exception message is brittle (message text can change across runtime/framework versions and may vary). Prefer matching on exception types that represent stream cancellation/disconnect (commonly RpcException with Cancelled/Unavailable, or IOException/ObjectDisposedException depending on the gRPC stack), or use a more reliable signal from the server call context if available. This will make the disconnect handling more stable long-term.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-15T20:25:07Z

+        catch (InvalidOperationException ex) when (ex.Message.Contains("request is complete", StringComparison.OrdinalIgnoreCase))
+        {
+            // The client disconnected or canceled the GetWorkItems stream.
+            // Reset the connection state so the dispatcher pauses naturally
+            // (via the traffic signal) until a new client connects.
+            lock (this.isConnectedSignal)
+            {
+                this.workerToClientStream = null;
+                this.isConnectedSignal.Reset();
+            }
+
+            // Must throw so callers (ExecuteOrchestrator/ExecuteActivity) can clean up
+            // their pending TCS. The dispatcher catches this, abandons the work item,
+            // and releases it back to the queue for retry.
+            throw new OperationCanceledException("Work-item stream closed by client.", ex);
+        }


Catching InvalidOperationException based on a substring match of ex.Message is brittle (message text can change across framework versions/locales and is not a stable contract). Prefer handling stream-closure in a way that doesn’t depend on message text (e.g., catch InvalidOperationException without a message filter and/or handle additional common stream write failures like ObjectDisposedException/IOException, then treat them uniformly as a disconnected client).

I agree with this comment, but I will not block on this. If the message text changes, the new tests should catch this. We should look for other ways to detect this condition: perhaps the exception object has more clues in other properties, or we can infer this from other context. However, this can be addressed later -- I recommend filing a separate follow-up issue now, but no need to block releasing the fix.

Copilot · 2026-04-15T20:25:08Z

+            // Must throw so callers (ExecuteOrchestrator/ExecuteActivity) can clean up
+            // their pending TCS. The dispatcher catches this, abandons the work item,
+            // and releases it back to the queue for retry.
+            throw new OperationCanceledException("Work-item stream closed by client.", ex);


Throwing OperationCanceledException from gRPC server code can surface to callers as an unexpected/opaque gRPC status (often Unknown) unless consistently translated upstream. Since this is explicitly a client-disconnect/stream-cancel scenario, consider throwing an RpcException with an appropriate status code (e.g., Cancelled or Unavailable) so the failure mode is explicit and consistent with the earlier No client is connected path.

Suggested change

throw new OperationCanceledException("Work-item stream closed by client.", ex);

throw new RpcException(new Status(StatusCode.Cancelled, "Work-item stream closed by client."));

AnatoliB

LGTM, just one non-blocking comment.

initial commit

bf3be9a

github-code-quality bot found potential problems Apr 15, 2026

View reviewed changes

Comment thread test/InProcessTestHost.Tests/ContinueAsNewTests.cs Dismissed

nytian added 2 commits April 14, 2026 19:52

update

fe4c7e5

update

016be9d

nytian marked this pull request as ready for review April 15, 2026 04:20

Copilot AI review requested due to automatic review settings April 15, 2026 04:20

Copilot AI reviewed Apr 15, 2026

View reviewed changes

add test timeout

42c86b4

Copilot started reviewing on behalf of nytian April 15, 2026 05:28 View session

update version

449bd24

Copilot AI review requested due to automatic review settings April 15, 2026 20:18

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Copilot started reviewing on behalf of bachuv April 15, 2026 20:43 View session

AnatoliB approved these changes Apr 15, 2026

View reviewed changes

bachuv merged commit 533707d into main Apr 15, 2026
12 checks passed

bachuv deleted the nytian/continue-race branch April 15, 2026 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Continue-as-new Race Condition at InProcessTestHost#703

Fix Continue-as-new Race Condition at InProcessTestHost#703
bachuv merged 5 commits intomainfrom
nytian/continue-race

nytian commented Apr 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

AnatoliB Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

AnatoliB left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	throw new OperationCanceledException("Work-item stream closed by client.", ex);
	throw new RpcException(new Status(StatusCode.Cancelled, "Work-item stream closed by client."));

Conversation

nytian commented Apr 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

AnatoliB Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

AnatoliB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants