Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for shuffle reduction for the HIP backend #3154

Merged
merged 1 commit into from
Jul 11, 2020

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Jul 6, 2020

I haven't done any performance comparison yet but I have added a FIXME_HIP_PERFORMANCE comment so we don't forget to tune the algorithm.

@masterleinad masterleinad self-requested a review July 6, 2020 14:57
core/src/HIP/Kokkos_HIP_Parallel_Range.hpp Outdated Show resolved Hide resolved
core/src/HIP/Kokkos_HIP_Parallel_Range.hpp Outdated Show resolved Hide resolved
core/src/HIP/Kokkos_HIP_Parallel_Range.hpp Outdated Show resolved Hide resolved
core/src/HIP/Kokkos_HIP_ReduceScan.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor

This looks OK to me comparing with the corresponding CUDA implementation and the other HIP implementation but I am not quite clear what the motivation for the pull request is.
AFAICT, it doesn't add any new functionality but should give better performance in the end?

@Rombur
Copy link
Member Author

Rombur commented Jul 6, 2020

AFAICT, it doesn't add any new functionality but should give better performance in the end?

Correct

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK.

@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jul 8, 2020
if ((blockDim.x * blockDim.y) > i) {
value_type tmp = Kokkos::Experimental::shfl_down(value, i, warp_size);
if (id + i < gridDim.x) join(value, tmp);
}
active += __ballot(1);
__syncthreads();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait why do you do the syncthreads here? THis is within a warp?

@@ -231,6 +232,7 @@ __device__ inline void hip_intra_warp_reduction(
// blockDim.y)
if (threadIdx.y + shift < max_active_thread) reducer.join(result, tmp);
shift *= 2;
__syncthreads();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why syncthreads?

if ((blockDim.x * blockDim.y) > i) {
value_type tmp = Kokkos::Experimental::shfl_down(value, i, warp_size);
if (id + i < gridDim.x) reducer.join(value, tmp);
}
active += __ballot(1);
__syncthreads();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why syncthreads?

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are syncthreads in the warp level reductions. Why?

@Rombur
Copy link
Member Author

Rombur commented Jul 9, 2020

There are syncthreads in the warp level reductions. Why?

Because I have a race condition at the warp level :( I plan to bring the problem with AMD at our next meeting with them. I want to point out that there is a similar problem with CUDA Clang and there you fixed it by calling dummy __ballot(1) but that doesn't work for me. See #941

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is fine, also checked that ThreadVector reduce doesn't hit the syncthreads which would be a deadlock.

@crtrott crtrott merged commit e4a2778 into kokkos:develop Jul 11, 2020
@Rombur Rombur deleted the hip_reduce_shfl branch July 15, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants