Harden gRPC worker and client against silent disconnects by berndverst · Pull Request #708 · microsoft/durabletask-dotnet

berndverst · 2026-04-21T19:52:35Z

Summary

Hardens the gRPC worker and client against silent disconnects and stale transport state when the Durable Task Scheduler (DTS) backend is replaced, drained, or otherwise leaves callers latched onto a bad HTTP/2 path.

Before this change, a worker or long-lived client could stay pinned to a half-open connection, stale sub-channel state, stale DNS result, or stale routing affinity for the lifetime of the process. In practice that meant a backend replacement or graceful drain could leave callers idle until restart even though the service had recovered.

Motivation / root causes

The main failure modes were:

HelloAsync on the worker connect path had no deadline, so a half-open handshake could hang forever and prevent the reconnect loop from making progress.
The gRPC channel was effectively process-lifetime state. Once the transport was wedged on stale DNS, SocketsHttpHandler state, or backend routing affinity, nothing forced a fresh channel.
Worker reconnect used a fixed delay, so a fleet could reconnect in lockstep against a backend that was still warming up.
Some transport-adjacent failures were either logged generically (Unauthenticated) or not surfaced clearly at all (graceful server stream end).
The silent-disconnect timeout and health-ping cadence were misaligned, which made healthy-but-idle streams harder to reason about and increased the chance of pointless reconnects.
Client-side recreate cooldown math depended on a raw double -> long conversion of TimeSpan, which was not well-defined for extreme values.

How this change addresses them

Worker (`src/Worker/Grpc`)

Add a 30 second HelloAsync deadline (configurable via GrpcDurableTaskWorkerOptions.SetHelloDeadline()) so a half-open handshake cannot stall reconnect forever.
Treat Unavailable, hello DeadlineExceeded, silent stream disconnects, and empty graceful drains as transport failures that increment a recreate counter.
Use capped exponential backoff with full jitter instead of a fixed reconnect delay.
Log Unauthenticated distinctly.
Raise the silent-disconnect timeout to 120 seconds and log HealthPing at Trace.
Handle graceful server stream closes explicitly via Logs.StreamEndedByPeer.
Keep channel recreation on the worker side, but simplify the Azure Managed worker cache path now that the cache key already includes WorkerId: replace that worker's cached channel, guard against stale callbacks, and defer disposal of the previous channel. This keeps the worker logic focused on the realistic case instead of modeling peer-worker contention as a default scenario.

Client (`src/Client/Grpc`)

The client does not have a central reconnect loop, so recovery is implemented around the call path instead:

ChannelRecreatingCallInvoker observes unary RPC outcomes and recreates the channel after repeated transport failures.
Long-poll DeadlineExceeded calls are excluded from failure counting.
Recreate stays off the caller-visible failure path via a single-flight background recreate plus cooldown.
MinRecreateInterval conversion to stopwatch ticks now saturates safely for extreme values.
The Azure Managed client-side shared-cache recreate flow is preserved, including peer-client winner reuse and deferred disposal, because multiple clients in one process realistically can share the same cached channel.

Azure Managed cache wiring (`src/{Worker,Client}/AzureManaged`)

Worker side: keep per-worker cache replacement simple and scoped by WorkerId.
Client side: keep the richer shared-cache coordination logic because peer-client contention is realistic there.
Both paths continue to defer disposal of the old channel so in-flight RPCs can drain.

Tests and validation

Add behavior-level gRPC worker recreation tests.
Add behavior-level gRPC client recreation tests.
Add worker Azure Managed coverage for cache replacement and stale recreate callbacks.
Add a worker test for the UTC DateTime.MaxValue hello-deadline clamp.
Stabilize the flaky WorkItemStreamConsumer heartbeat-reset timing test that was failing CI.

What this prevents

Workers hanging indefinitely on a half-open handshake.
Long-lived workers or clients staying pinned to dead routing targets after backend replacement.
Slow, opaque recovery from graceful server drains.
Reconnect storms caused by fixed-delay retry loops.
Generic/noisy diagnostics for auth and stream-liveness issues.
Undefined cooldown behavior for extreme client recreate intervals.

PR comment follow-ups

Simplified the worker-side Azure Managed recreate logic after confirming worker cache keys include WorkerId, so peer-worker coordination is not a realistic default scenario.
Kept the richer peer-aware recreate logic on the client side, where a shared cached channel is realistic.
Addressed the current review follow-ups on worker-test cleanup and client cooldown overflow handling.

Breaking change

No breaking change.

Adds Hello deadline, channel recreation after N consecutive failures, jittered exponential backoff, distinct Unauthenticated logging, and DTS health-ping observability to the worker. Mirrors the channel-recreate pattern in the client via a CallInvoker wrapper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR hardens the Durable Task gRPC worker and client against silent disconnects and wedged HTTP/2 connections by adding bounded handshakes, jittered reconnect backoff, and channel recreation paths (including AzureManaged channel-cache CAS swaps with deferred disposal).

Changes:

Worker: add Hello deadline, silent-disconnect window changes + HealthPing trace logging, jittered exponential reconnect backoff, and channel recreation signaling.
Client: introduce a CallInvoker wrapper that triggers channel recreation after consecutive transport failures, with cooldown/single-flight safeguards and long-poll exclusions.
AzureManaged extensions: wire recreator callbacks through the shared ConcurrentDictionary<string, Lazy<GrpcChannel>> cache with CAS swap + deferred disposal.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/Worker/Grpc.Tests/ReconnectBackoffTests.cs	Adds unit tests for full-jitter exponential reconnect backoff behavior.
test/Worker/Grpc.Tests/GrpcDurableTaskWorkerOptionsInternalTests.cs	Adds tests for internal worker option defaults and SetChannelRecreator validation.
src/Worker/Grpc/ReconnectBackoff.cs	New helper for capped exponential backoff with jitter.
src/Worker/Grpc/ProcessorExitReason.cs	New enum to signal processor exit reason to the outer worker loop.
src/Worker/Grpc/Logs.cs	Adds new structured log events for timeout/auth/channel recreation/backoff/HealthPing observability.
src/Worker/Grpc/Internal/InternalOptionsExtensions.cs	Adds internal extension method to set a channel recreator callback.
src/Worker/Grpc/GrpcDurableTaskWorkerOptions.cs	Adds internal options for deadlines, silent-disconnect timeout, backoff, and recreation thresholds.
src/Worker/Grpc/GrpcDurableTaskWorker.cs	Wraps processor execution in a loop and attempts channel recreation when requested.
src/Worker/Grpc/GrpcDurableTaskWorker.Processor.cs	Implements Hello deadline, failure tracking, backoff, silent disconnect signaling, and HealthPing trace logging.
src/Worker/AzureManaged/DurableTaskSchedulerWorkerExtensions.cs	Adds cached-channel recreation with CAS swap and deferred channel disposal.
src/Client/Grpc/Logs.cs	Adds client-side channel recreation logs.
src/Client/Grpc/Internal/InternalOptionsExtensions.cs	Adds internal extension method to set a client channel recreator callback.
src/Client/Grpc/GrpcDurableTaskClientOptions.cs	Adds internal client options for recreation threshold and cooldown interval.
src/Client/Grpc/GrpcDurableTaskClient.cs	Wraps the call invoker with a recreating invoker when enabled; owns lifecycle for Address-created channels.
src/Client/Grpc/ChannelRecreatingCallInvoker.cs	New `CallInvoker` wrapper that recreates channels after consecutive transport failures.
src/Client/AzureManaged/DurableTaskSchedulerClientExtensions.cs	Adds cached-channel recreation with CAS swap and deferred channel disposal for AzureManaged clients.

Comments suppressed due to low confidence (1)

src/Worker/Grpc/GrpcDurableTaskWorker.Processor.cs:427

ProcessWorkItemsAsync can enter a tight loop if the server completes the work-item stream normally (ReadAllAsync finishes without throwing and without any cancellation). In that case, tokenSource is not cancelled, the cancellation check is false, and the outer while-loop restarts another ReadAllAsync on the same completed stream. Consider treating normal stream completion as a disconnect (e.g., return after the await-foreach completes, or break/throw so the caller reconnects) to avoid spinning and to re-establish the stream.

            while (!cancellation.IsCancellationRequested)
            {
                await foreach (P.WorkItem workItem in stream.ResponseStream.ReadAllAsync(tokenSource.Token))
                {
                    if (silentDisconnectEnabled)
                    {
                        timeoutSource.CancelAfter(silentDisconnectTimeout);
                    }

                    if (!firstMessageObserved)
                    {
                        firstMessageObserved = true;
                        onFirstMessage?.Invoke();
                    }

                    if (workItem.RequestCase == P.WorkItem.RequestOneofCase.OrchestratorRequest)
                    {
                        this.RunBackgroundTask(
                            workItem,
                            () => this.OnRunOrchestratorAsync(
                                workItem.OrchestratorRequest,
                                workItem.CompletionToken,
                                cancellation),
                            cancellation);
                    }
                    else if (workItem.RequestCase == P.WorkItem.RequestOneofCase.ActivityRequest)
                    {
                        this.RunBackgroundTask(
                            workItem,
                            () => this.OnRunActivityAsync(
                                workItem.ActivityRequest,
                                workItem.CompletionToken,
                                cancellation),
                            cancellation);
                    }
                    else if (workItem.RequestCase == P.WorkItem.RequestOneofCase.EntityRequest)
                    {
                        this.RunBackgroundTask(
                            workItem,
                            () => this.OnRunEntityBatchAsync(workItem.EntityRequest.ToEntityBatchRequest(), cancellation),
                            cancellation);
                    }
                    else if (workItem.RequestCase == P.WorkItem.RequestOneofCase.EntityRequestV2)
                    {
                        workItem.EntityRequestV2.ToEntityBatchRequest(
                            out EntityBatchRequest batchRequest,
                            out List<P.OperationInfo> operationInfos);

                        this.RunBackgroundTask(
                             workItem,
                             () => this.OnRunEntityBatchAsync(
                                batchRequest,
                                cancellation,
                                workItem.CompletionToken,
                                operationInfos),
                             cancellation);
                    }
                    else if (workItem.RequestCase == P.WorkItem.RequestOneofCase.HealthPing)
                    {
                        // Health pings are heartbeat-only signals from the backend; the silent-disconnect
                        // timer reset above is the actionable behavior. Logging at Trace allows operators
                        // to confirm liveness without flooding info-level telemetry.
                        this.Logger.ReceivedHealthPing();
                    }
                    else
                    {
                        this.Logger.UnexpectedWorkItemType(workItem.RequestCase.ToString());
                    }
                }

                if (tokenSource.IsCancellationRequested || tokenSource.Token.IsCancellationRequested)
                {
                    // The token has cancelled, this means either:
                    // 1. The broader 'cancellation' was triggered, return here to start a graceful shutdown.
                    // 2. The timeoutSource was triggered, return here to trigger a reconnect to the backend.
                    if (!cancellation.IsCancellationRequested)
                    {
                        // Since the cancellation came from the timeout, log a warning.
                        this.Logger.ConnectionTimeout();
                        onSilentDisconnect?.Invoke();
                    }

                    return;
                }
            }

…ic catches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…out default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Worker: ProcessWorkItemsAsync no longer wraps �wait foreach in an outer loop. IAsyncStreamReader is single-use, so on a graceful server close (HTTP/2 GOAWAY + OK trailers during a DTS rolling upgrade) the foreach exited cleanly and the outer while re-entered an already-exhausted reader, tight-spinning until the 120s silent-disconnect timer eventually fired and the resulting OCE surfaced as a generic UnexpectedError. New flow: after the foreach exits, explicitly distinguish shutdown, timeout, and peer-initiated close. Peer-initiated close is logged as StreamEndedByPeer (EventId 76, Information) and drives the channel-poisoned counter so repeated drains escalate to recreate. Client: ChannelRecreatingCallInvoker now reads/writes the state field with Volatile.Read/Volatile.Write so call-site threads observe the post-recreate (channel, invoker) pair without torn state. When the invoker owns the channel (self-Address path), RecreateAsync now schedules deferred disposal of the old channel so repeated recreates do not leak HTTP handlers / sockets on NET6+. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

C1: Wrap the worker stream-reader await foreach in a try/catch so that grpc-dotnet's RpcException(Cancelled) raised when the silent-disconnect timer fires is routed to the silent-disconnect handler. Without this catch the exception unwound to the outer Cancelled handler that explicitly does not poison the channel, leaving the silent-disconnect -> recreate path effectively dead. C2: Switch all Lazy<GrpcChannel> instances in the AzureManaged worker and client extensions to LazyThreadSafetyMode.PublicationOnly so a transient CreateChannel failure does not permanently poison the cache slot for the lifetime of the process. I1: Track the most recently observed channel inside the worker's ExecuteAsync and pass it to TryRecreateChannelAsync instead of the never-updated this.grpcOptions.Channel field. The stale field caused the recreator's 'peer already swapped' branch to be skipped on every cycle. I3+I4: Add a class-level disposalCts to ChannelRecreatingCallInvoker that is cancelled in DisposeAsync and linked into the recreator's CancellationToken. The recreate task now re-checks the disposed flag before publishing a freshly created channel, and shuts the new channel down if it loses the race; this prevents leaking a freshly created channel after disposal and lets DisposeAsync abort an in-flight recreate. I5: Only treat a graceful stream-end as a poison signal when no work-item messages were observed on the stream. A drained stream that successfully delivered work is healthy backend rolling-upgrade behavior; counting it would let long-lived processes accumulate spurious poison credits. I6: Reset the consecutive-failure and reconnect-attempt counters on Unauthenticated. A status reply is proof the underlying transport is healthy, so prior transport failures should not combine with a later transient blip to trip the channel recreate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

- M2: Add explanatory comment in ChannelRecreatingCallInvoker.RecordFailure about why only Unavailable + non-long-poll DeadlineExceeded count toward the recreate threshold. - M3: Drop the unnecessary 'await Task.Yield()' at the end of both AzureManaged RecreateChannelAsync paths. Convert the methods to non-async returning Task.FromResult to avoid CS1998. - I2: Expose 'SetHelloDeadline' and 'SetSilentDisconnectTimeout' in 'Microsoft.DurableTask.Worker.Grpc.Internal.InternalOptionsExtensions' so the PR description's configurability claims are actually reachable from outside the assembly. Pattern matches existing 'SetChannelRecreator'. - M1 (deferred): A regression test for the silent-disconnect path requires faking 'AsyncServerStreamingCall<T>' and 'IAsyncStreamReader<T>'. Skipped to keep the change surgical; the fix is small and verified by inspection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extracts the work-item stream-consume + termination-classification logic from GrpcDurableTaskWorker.Processor.ProcessWorkItemsAsync into a new internal helper WorkItemStreamConsumer.ConsumeAsync. The helper owns the linked-token wiring, the await foreach, and the three-arm catch chain that distinguishes: - outer cancellation (Shutdown) - silent-disconnect timeout surfaced as OCE (SilentDisconnect) - silent-disconnect timeout surfaced as RpcException(Cancelled) (SilentDisconnect) from a graceful drain. ProcessWorkItemsAsync becomes a thin coordinator that dispatches per-item work via a new DispatchWorkItem method and switches on the helper's outcome to decide post-loop logging and channel-poisoned signaling. Adds 9 unit tests including the C1 regression test HangingStream_SurfacingRpcCancelled_ReturnsSilentDisconnect that fails on the pre-fix code: prior to the fix the helper would have propagated the RpcException(Cancelled) past the silent-disconnect classification, leaving the caller without the channelLikelyPoisoned signal and the channel-recreate counter would not advance. Behavior is preserved exactly. Outer-cancellation surfaced as RpcException(Cancelled) still propagates to ExecuteAsync's outer catch chain (documented in OuterCancellation_WithRpcCancelledFromStream_PropagatesException). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Previously the AzureManaged worker and client recreators wrote the new Lazy<GrpcChannel> into the cache via TryUpdate, scheduled deferred disposal of the old channel, and THEN materialized newLazy.Value. If CreateChannel throws inside .Value, the cache is left pointing at a permanently-failing Lazy and the still-healthy old channel has already been queued for shutdown — an unrecoverable state for that cache key. Now we call source.CreateChannel() first and only TryUpdate after the new channel is proven created. If the swap loses the race we dispose the freshly-created channel so it does not leak. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Client/Grpc/GrpcDurableTaskClient.GetCallInvoker: when an externally-supplied Channel is used with channel recreation enabled, return an AsyncDisposable that disposes the ChannelRecreatingCallInvoker wrapper. Without this, the wrapper's CancellationTokenSource and any in-flight recreate task outlive the client. The wrapper's DisposeAsync is a no-op for the channel itself when ownsChannel is false, so the externally-owned channel is not affected. - Client/Grpc/ChannelRecreatingCallInvoker.RecordFailure: a non-counted RpcException status (NotFound, InvalidArgument, FailedPrecondition, etc.) now resets consecutiveFailures. Any gRPC status reply is proof the transport is healthy enough to deliver round-trips, so an app-level error sequence should not allow a prior transport-failure count to accumulate across it and combine with a later blip to trip a false-positive recreate. - Worker/Grpc/ReconnectBackoff: doc-comment now mentions that a non-positive cap also returns TimeSpan.Zero (matches the Math.Max clamp behavior). - Worker/Grpc/WorkItemStreamConsumer: removed unused using System.Runtime.CompilerServices;. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

- AzureManaged worker/client: don't return a freshly-created channel that has been scheduled for deferred disposal when the cache slot disappears (concurrent DisposeAsync); throw ObjectDisposedException instead. - WorkItemStreamConsumer: clamp silentDisconnectTimeout to int.MaxValue ms before passing to CancellationTokenSource.CancelAfter (avoids ArgumentOutOfRangeException for >24 day values). - Processor.ConnectAsync: clamp DateTime.UtcNow.Add(helloDeadline) at DateTime.MaxValue so a misconfigured HelloDeadline cannot crash the connect loop. - Processor: clamp delay.TotalMilliseconds to int.MaxValue before logging ReconnectBackoff to avoid integer overflow. - Processor: rename 'onSilentDisconnect' callback to 'onChannelLikelyPoisoned' to reflect that it also fires on graceful empty drains, not only on silent-disconnect timeouts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Resetting my vote

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings April 21, 2026 19:52

Copilot started reviewing on behalf of berndverst April 21, 2026 19:53 View session

github-code-quality Bot found potential problems Apr 21, 2026

View reviewed changes

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Fixed

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Fixed

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Fixed

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread test/Worker/Grpc.Tests/GrpcDurableTaskWorkerOptionsInternalTests.cs

Comment thread src/Worker/Grpc/ReconnectBackoff.cs Outdated

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Outdated

Copilot AI added 2 commits April 21, 2026 13:00

Address CodeQL feedback: drop ReferenceEquals on struct, narrow gener…

71bca4c

…ic catches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Sort new log event additions by EventId in Worker and Client Logs.cs

6533658

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 21, 2026 20:05

Copilot started reviewing on behalf of berndverst April 21, 2026 20:05 View session

Address Copilot review: clamp backoff cap, cover SilentDisconnectTime…

665ba0e

…out default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/Worker/Grpc/ReconnectBackoff.cs Outdated

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Outdated

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs Outdated

Copilot AI and others added 2 commits April 21, 2026 13:32

Copilot AI review requested due to automatic review settings April 21, 2026 22:10

github-code-quality Bot found potential problems Apr 21, 2026

View reviewed changes

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs Dismissed

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs

Comment thread src/Worker/AzureManaged/DurableTaskSchedulerWorkerExtensions.cs Outdated

Comment thread src/Client/AzureManaged/DurableTaskSchedulerClientExtensions.cs Outdated

Copilot started reviewing on behalf of berndverst April 21, 2026 22:22 View session

Copilot AI review requested due to automatic review settings April 21, 2026 23:05

Copilot started reviewing on behalf of berndverst April 21, 2026 23:06 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs

Comment thread src/Client/Grpc/GrpcDurableTaskClient.cs Outdated

Comment thread src/Worker/Grpc/WorkItemStreamConsumer.cs Outdated

Comment thread src/Worker/Grpc/ReconnectBackoff.cs Outdated

berndverst force-pushed the fix/grpc-resilience-channel-recreate branch from c0fe328 to cec5f4a Compare April 21, 2026 23:13

Copilot AI review requested due to automatic review settings April 21, 2026 23:29

Copilot started reviewing on behalf of berndverst April 21, 2026 23:29 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

sophiatev previously requested changes Apr 22, 2026

View reviewed changes

Fix worker recreate disposal timing

fd9381d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-code-quality Bot found potential problems Apr 23, 2026

View reviewed changes

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.cs Fixed

Fix continue-as-new event carryover

dc243e1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 23, 2026 19:55

Copilot started reviewing on behalf of berndverst April 23, 2026 19:55 View session

github-code-quality Bot found potential problems Apr 23, 2026

View reviewed changes

Comment thread src/InProcessTestHost/Sidecar/Dispatcher/TaskOrchestrationDispatcher.cs

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/Worker/AzureManaged/DurableTaskSchedulerWorkerExtensions.cs

Comment thread src/Client/AzureManaged/DurableTaskSchedulerClientExtensions.cs

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs

Bernd Verst and others added 2 commits April 23, 2026 13:23

Add worker disconnect coverage tests

18e23a3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix fatal deferred-dispose filters

dd9ad77

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 23, 2026 20:31

Copilot started reviewing on behalf of berndverst April 23, 2026 20:36 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs Outdated

Bernd Verst and others added 3 commits April 23, 2026 13:58

Fix client recreate dispose race

4d30b9b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ignore local git worktrees

caf307b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move wrapper changes to separate PR

c85234d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 23, 2026 21:14

Copilot started reviewing on behalf of berndverst April 23, 2026 21:15 View session

berndverst mentioned this pull request Apr 23, 2026

Preserve late events after continue-as-new #711

Merged

Keep worktree ignore local

f246e90

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/Worker/Grpc/GrpcDurableTaskWorker.Processor.cs

Comment thread src/Client/Grpc/ChannelRecreatingCallInvoker.cs

Bernd Verst and others added 2 commits April 23, 2026 14:50

Address latest PR feedback on reconnect cleanup

d291efd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into fix/grpc-resilience-channel-recreate

729654a

Copilot AI review requested due to automatic review settings April 23, 2026 21:56

Copilot started reviewing on behalf of berndverst April 23, 2026 21:57 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread test/Client/Grpc.Tests/GrpcDurableTaskClientChannelRecreationTests.cs

sophiatev approved these changes Apr 23, 2026

View reviewed changes

torosent approved these changes Apr 23, 2026

View reviewed changes

berndverst merged commit 9842ef5 into main Apr 23, 2026
12 checks passed

berndverst deleted the fix/grpc-resilience-channel-recreate branch April 23, 2026 23:25

Conversation

berndverst commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / root causes

How this change addresses them

Worker (src/Worker/Grpc)

Client (src/Client/Grpc)

Azure Managed cache wiring (src/{Worker,Client}/AzureManaged)

Tests and validation

What this prevents

PR comment follow-ups

Breaking change

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

berndverst commented Apr 21, 2026 •

edited

Loading

Worker (`src/Worker/Grpc`)

Client (`src/Client/Grpc`)

Azure Managed cache wiring (`src/{Worker,Client}/AzureManaged`)