-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in HIP backend #3467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't call __syncthreads inside of vector level operations. Maybe a threadfence will work instead? Or we could try a local __shared variable reduction instead of shfl?
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -297,6 +300,9 @@ class HIPTeamMember { | |||
for (int i = blockDim.x; (i >>= 1);) { | |||
::Kokkos::Experimental::Impl::in_place_shfl_down(tmp2, tmp, i, | |||
blockDim.x); | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't do this. Not in a vector_reduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. this will not be called by all threads in the team .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try using a memory fence instead?
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -308,6 +314,9 @@ class HIPTeamMember { | |||
// and thus different threads could have different results. | |||
|
|||
::Kokkos::Experimental::Impl::in_place_shfl(tmp2, tmp, 0, blockDim.x); | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do this in vector_reduce.
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -969,6 +981,9 @@ KOKKOS_INLINE_FUNCTION void parallel_scan( | |||
for (int j = 1; j < static_cast<int>(blockDim.x); j <<= 1) { | |||
value_type tmp = 0; | |||
::Kokkos::Experimental::Impl::in_place_shfl_up(tmp, sval, j, blockDim.x); | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do this in ThreadVectorRange
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -983,6 +998,9 @@ KOKKOS_INLINE_FUNCTION void parallel_scan( | |||
// Accumulate the last value in the inclusive scan: | |||
::Kokkos::Experimental::Impl::in_place_shfl(sval, sval, blockDim.x - 1, | |||
blockDim.x); | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do this in ThreadVectorRange
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -1025,6 +1043,9 @@ KOKKOS_INLINE_FUNCTION void single( | |||
#ifdef __HIP_DEVICE_COMPILE__ | |||
if (threadIdx.x == 0) lambda(val); | |||
::Kokkos::Experimental::Impl::in_place_shfl(val, val, 0, blockDim.x); | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do this in PerThread single struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
Experimental::Impl::in_place_shfl(result, value, 0, width); |
core/src/HIP/Kokkos_HIP_Team.hpp
Outdated
@@ -170,6 +170,9 @@ class HIPTeamMember { | |||
::Kokkos::Experimental::Impl::in_place_shfl( | |||
val, tmp, blockDim.x * thread_id, blockDim.x * blockDim.y); | |||
} | |||
// FIXME_HIP the synchronization point is necessary because of a race | |||
// condition in shuffle | |||
__syncthreads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you not just put the synchronization point in the shuffle function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again this doesn't all doesn't work since these functions are not block synchronous called ...
I have reworked the PR. I have added |
This PR add new
__syncthreads()
after shuffle and move another synchronization just after a shuffle. This allows us to enable one more tests. This should not be necessary but there is a race condition in shuffle with HIP. This bug has already been reported to AMD.