Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cuda event deadlocking issues in device mr tests #1097

Merged

Conversation

robertmaynard
Copy link
Contributor

We fixed both deadlocking issues due to a assumption that std::mutex would have fair scheduling, and work around deadlocks found in cuda event created in very short lived threads ( < 10ms ).

We fixed both deadlocking issues due to a assumption that
std::mutex would have fair scheduling, and work around deadlocks
found in cuda event created in very short lived threads ( < 10ms ).
@robertmaynard robertmaynard added bug Something isn't working 3 - Ready for review Ready for review by team non-breaking Non-breaking change labels Sep 22, 2022
@robertmaynard robertmaynard requested a review from a team as a code owner September 22, 2022 21:13
@robertmaynard robertmaynard added this to PR-WIP in v22.10 Release via automation Sep 22, 2022
@github-actions github-actions bot added the cpp Pertains to C++ code label Sep 22, 2022
@robertmaynard robertmaynard changed the title Workaround for cuda event deadlocking issues in device mr tests Remove cuda event deadlocking issues in device mr tests Sep 22, 2022
v22.10 Release automation moved this from PR-WIP to PR-Reviewer approved Sep 22, 2022
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Just need more descriptive name for the condition .

tests/mr/device/mr_multithreaded_tests.cpp Outdated Show resolved Hide resolved
tests/mr/device/mr_multithreaded_tests.cpp Outdated Show resolved Hide resolved
tests/mr/device/mr_multithreaded_tests.cpp Outdated Show resolved Hide resolved
@harrism
Copy link
Member

harrism commented Sep 23, 2022

@ajschmidt8 please test on ARM before we merge.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

tests/mr/device/mr_multithreaded_tests.cpp Outdated Show resolved Hide resolved
tests/mr/device/mr_multithreaded_tests.cpp Show resolved Hide resolved
@ajschmidt8
Copy link
Member

I never tested the problematic code outside of CI, so I have no way of verifying whether this fix works as intended. I'll defer to the devs for the approvals here. If this fix looks good to everyone else, let's get it merged and Ops will add these changes to our GitHub Actions POC PR to see if we still experience any issues.

@ajschmidt8
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e0fd8eb into rapidsai:branch-22.10 Sep 27, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Sep 27, 2022
@robertmaynard robertmaynard deleted the correct_DEVICE_MR_TEST_deadlock branch December 27, 2022 15:41
harrism added a commit to miscco/rmm that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team bug Something isn't working cpp Pertains to C++ code non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants