core: fix retriablestream deadlock #10386
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix #10314
The deadlock can be reproduced with the UT in this change, and diff:
The issue was that two streams from two transports held their transport lock while waiting for the other, thus deadlock. In this particular case, what happens is that in one subListener.close() it creates a new substream on other transports that requires that transport thread. Meanwhile another subListener.close() it receives
headersRead()
and tries to cancel all other streams that requires the corresponding transport lock. It is believe that it won't happen in netty, only in okhttp.Solutions
We should break the deadlock in both places.
This fix is in headersRead() to have the cancel other stream from the call executor thread.
The other part of the fix is to have createSubstream to run from the call executor. This is be fixed in a follow up PR.
Tests
global TAP running results currently does not show bad signals. (failing due to already failing issues)