Add support for shuffle reduction for the HIP backend #3154

Rombur · 2020-07-06T14:39:53Z

I haven't done any performance comparison yet but I have added a FIXME_HIP_PERFORMANCE comment so we don't forget to tune the algorithm.

core/src/HIP/Kokkos_HIP_Parallel_Range.hpp

core/src/HIP/Kokkos_HIP_ReduceScan.hpp

masterleinad · 2020-07-06T18:42:42Z

This looks OK to me comparing with the corresponding CUDA implementation and the other HIP implementation but I am not quite clear what the motivation for the pull request is.
AFAICT, it doesn't add any new functionality but should give better performance in the end?

Rombur · 2020-07-06T18:48:17Z

AFAICT, it doesn't add any new functionality but should give better performance in the end?

Correct

masterleinad

Looks OK.

crtrott · 2020-07-09T15:39:38Z

core/src/HIP/Kokkos_HIP_ReduceScan.hpp

        if ((blockDim.x * blockDim.y) > i) {
          value_type tmp = Kokkos::Experimental::shfl_down(value, i, warp_size);
          if (id + i < gridDim.x) join(value, tmp);
        }
-        active += __ballot(1);
+        __syncthreads();


wait why do you do the syncthreads here? THis is within a warp?

crtrott · 2020-07-09T15:40:04Z

core/src/HIP/Kokkos_HIP_ReduceScan.hpp

@@ -231,6 +232,7 @@ __device__ inline void hip_intra_warp_reduction(
    // blockDim.y)
    if (threadIdx.y + shift < max_active_thread) reducer.join(result, tmp);
    shift *= 2;
+    __syncthreads();


again why syncthreads?

crtrott · 2020-07-09T15:40:19Z

core/src/HIP/Kokkos_HIP_ReduceScan.hpp

        if ((blockDim.x * blockDim.y) > i) {
          value_type tmp = Kokkos::Experimental::shfl_down(value, i, warp_size);
          if (id + i < gridDim.x) reducer.join(value, tmp);
        }
-        active += __ballot(1);
+        __syncthreads();


why syncthreads?

crtrott

There are syncthreads in the warp level reductions. Why?

Rombur · 2020-07-09T15:45:06Z

There are syncthreads in the warp level reductions. Why?

Because I have a race condition at the warp level :( I plan to bring the problem with AMD at our next meeting with them. I want to point out that there is a similar problem with CUDA Clang and there you fixed it by calling dummy __ballot(1) but that doesn't work for me. See #941

crtrott

Ok this is fine, also checked that ThreadVector reduce doesn't hit the syncthreads which would be a deadlock.

masterleinad self-requested a review July 6, 2020 14:57

dalg24 reviewed Jul 6, 2020

View reviewed changes

Add support for shuffle reduction for the HIP backend

e63ba6b

Rombur force-pushed the hip_reduce_shfl branch from 7a4ea4c to e63ba6b Compare July 6, 2020 19:34

masterleinad approved these changes Jul 7, 2020

View reviewed changes

crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jul 8, 2020

crtrott reviewed Jul 9, 2020

View reviewed changes

crtrott requested changes Jul 9, 2020

View reviewed changes

crtrott reviewed Jul 11, 2020

View reviewed changes

crtrott approved these changes Jul 11, 2020

View reviewed changes

crtrott merged commit e4a2778 into kokkos:develop Jul 11, 2020

Rombur deleted the hip_reduce_shfl branch July 15, 2020 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for shuffle reduction for the HIP backend #3154

Add support for shuffle reduction for the HIP backend #3154

Rombur commented Jul 6, 2020

masterleinad commented Jul 6, 2020

Rombur commented Jul 6, 2020

masterleinad left a comment

crtrott Jul 9, 2020

crtrott Jul 9, 2020

crtrott Jul 9, 2020

crtrott left a comment

Rombur commented Jul 9, 2020

crtrott left a comment

Add support for shuffle reduction for the HIP backend #3154

Add support for shuffle reduction for the HIP backend #3154

Conversation

Rombur commented Jul 6, 2020

masterleinad commented Jul 6, 2020

Rombur commented Jul 6, 2020

masterleinad left a comment

Choose a reason for hiding this comment

crtrott Jul 9, 2020

Choose a reason for hiding this comment

crtrott Jul 9, 2020

Choose a reason for hiding this comment

crtrott Jul 9, 2020

Choose a reason for hiding this comment

crtrott left a comment

Choose a reason for hiding this comment

Rombur commented Jul 9, 2020

crtrott left a comment

Choose a reason for hiding this comment