Skip to content

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Apr 13, 2023

The problem was one hedge was committed before another had drained start(). This was not testable because HedgingRunnable checks whether scheduledHedgingRef is cancelled, which is racy, but there's no way to deterministically trigger either race.

The same problem couldn't be triggered with retries because only one attempt will be draining at a time. Retries with cancellation also couldn't trigger it, for the surprising reason that the noop stream used in cancel() wasn't considered drained.

This commit marks the noop stream as drained with cancel(), which allows memory to be garbage collected sooner and exposes the race for tests. That then showed the stream as hanging, because inFlightSubStreams wasn't being decremented.

Fixes #9185

Backport of #10007

The problem was one hedge was committed before another had drained
start(). This was not testable because HedgingRunnable checks whether
scheduledHedgingRef is cancelled, which is racy, but there's no way to
deterministically trigger either race.

The same problem couldn't be triggered with retries because only one
attempt will be draining at a time. Retries with cancellation also
couldn't trigger it, for the surprising reason that the noop stream used
in cancel() wasn't considered drained.

This commit marks the noop stream as drained with cancel(), which allows
memory to be garbage collected sooner and exposes the race for tests.
That then showed the stream as hanging, because inFlightSubStreams
wasn't being decremented.

Fixes grpc#9185
@ejona86 ejona86 requested a review from YifeiZhuang April 13, 2023 16:20
@ejona86 ejona86 merged commit eaa4ea7 into grpc:v1.53.x Apr 13, 2023
@ejona86 ejona86 deleted the backport-hedge-npe-1.53 branch April 13, 2023 17:11
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants