Skip to content

Conversation

@ejona86
Copy link
Member

@ejona86 ejona86 commented Nov 25, 2025

This fixes a race where RPCs could fail with "UNAVAILABLE: Channel shutdown invoked" even though they were created before channel.shutdown().

This basically adopts the internalStart() logic from DelayedStream, although the stream is a bit different because it has APIs that can be called before start() and doesn't need to handle cancel() without start().

The ManagedChannelImpltest had the number of due tasks increase because start() running earlier creates a DelayedStream. Previously the stream wasn't created until runDueTasks() so the mockPicker had already been installed and it could use a real stream from the beginning. But that's specific to the test; in practice it'd be a delayed stream before and after this change.

See #12536

This fixes a race where RPCs could fail with "UNAVAILABLE: Channel
shutdown invoked" even though they were created before
channel.shutdown().

This basically adopts the internalStart() logic from DelayedStream,
although the stream is a bit different because it has APIs that can be
called before start() and doesn't need to handle cancel() without
start().

The ManagedChannelImpltest had the number of due tasks increase because
start() running earlier creates a DelayedStream. Previously the stream
wasn't created until runDueTasks() so the mockPicker had already been
installed and it could use a real stream from the beginning. But that's
specific to the test; in practice it'd be a delayed stream before and
after this change.

See grpc#12536
Copy link
Member

@shivaspeaks shivaspeaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is nice and great. But..

@shivaspeaks
Copy link
Member

Read your comment:

(But this PR is still appropriate, for the other reasons I mentioned earlier.)

If this PR solves the root cause for race, then why are we not reverting prev PR #12536, which was just fixing the test to make it green?
Waiting for the response to come using respFuture.get(5, SECONDS); is not ideal right?

@ejona86
Copy link
Member Author

ejona86 commented Nov 26, 2025

If this PR solves the root cause for race, then why are we not reverting prev PR #12536, which was just fixing the test to make it green?

Because the test was broken in other ways. I had written earlier in that PR:

but shutting down the executor isn't safe until the RPC completes (so if awaitTermination() returns false, then things are probably busted)

The flake that I had reproduced is solved by this PR. But the test still had issues that are fixed by the other PR.

@ejona86 ejona86 merged commit 02e98a8 into grpc:master Nov 26, 2025
17 checks passed
@ejona86 ejona86 deleted the core-delayedcall-shutdown-race branch November 26, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants