[Object manager] don't abort entire pull request on race condition from concurrent chunk receive - #2 #19216
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
This PR re-applies d12e35c, and fixes the issue discovered after the original reverted commit.
#18955 contains the background information of the original commit.
The origin commit can cause threads stuck under the following condition:
Eventually an object transfer would not complete, likely related to more threads stuck in limbo state like request 3. Hence the test stalled.
The original change and its fix in this PR passed 3 consecutive
dask_on_ray_large_scale_test_no_spillingruns. For now we will rely on this nightly test to catch similar issues in future. If we can inject failures to create buffer, this issue might be reproducible in unit tests too.Related issue number
#18062
Checks
scripts/format.shto lint the changes in this PR.