-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't rely on synchronization behavior of default stream in CUDA and HIP #5391
Don't rely on synchronization behavior of default stream in CUDA and HIP #5391
Conversation
It would be helpful to document or reference the particular behavior in the PR description. I believe this relates to kernel launches on CUDA/HIP's default stream implicitly synchronizing more broadly than launches on any other stream that gets created explicitly? |
Also, this will pose a merge conflict with #5390 if/when that gets cherry-picked to |
This was still just a draft. 🙂 I updated the description now, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record at the moment none of the CUDA nor the HIP builds passes
Retest this please. |
1 similar comment
Retest this please. |
test_cuda_spaces_int_value<<<1, 1>>>(uvm_ptr); | ||
Kokkos::Cuda().fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok this makes sense.
|
Yeah, I couldn't reproduce on |
34c3b57
to
123e4db
Compare
core/unit_test/TestAbort.hpp
Outdated
ExecutionSpace exec; | ||
Kokkos::parallel_for(Kokkos::RangePolicy<ExecutionSpace>(exec, 0, 1), | ||
*this); | ||
exec.fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not quite sure why the HIP
Ci was timing out without changing this and I'm not quite sure how much we care.
I couldn't reproduce on crusher
or spock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @JBludau can only reproduce with ROCm < 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying this is not hanging with 5.2?
core/unit_test/TestAbort.hpp
Outdated
ExecutionSpace exec; | ||
Kokkos::parallel_for(Kokkos::RangePolicy<ExecutionSpace>(exec, 0, 1), | ||
*this); | ||
exec.fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying this is not hanging with 5.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok lets revert that abort test change again, and move testing to 5.2
We cannot require 5.2 yet unless your machine has been updated. |
123e4db
to
63c964a
Compare
As expected this works after updating CI to |
Waiting for #5416. |
63c964a
to
5671245
Compare
Rebased after #5416 has been merged. |
5671245
to
397a6bc
Compare
Rebased after #5410 has been merged. |
Only the
Since only CUDA and HIP were changed, this is clearly unrelated. |
https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html:
and HIP behaves similarly, see ROCm/HIP#129 (comment).
As discussed on Slack, the default execution space instance shouldn't have any special synchronization behavior with respect to other execution space instances. Hence, this pull request also creates a stream for the singleton that is used for the default execution space instance.