[SYCL][AMDGCN] Fix up and down shuffles and reductions #5359

npmiller · 2022-01-21T13:39:32Z

This patch fixes the group collective implementation for AMDGCN, which
had two main issues, in one place it was calling a regular shuffle
instead of a shuffleUp which ended up breaking the reduction
algorithm. In addition it was also not using the correct interface for
the SPIR-V shuffleUp function.

Which leads to the second part of this patch which fixes the shuffleUp
and shuffleDown functions, mostly for the AMDGCN built-ins but also in
the SYCL header, as the SYCL built-ins were not implemented properly on
top of the SPIR-V built-ins.

At the SYCL level, the shuffleUp and shuffleDown built-ins take a
value to participate in the shuffle and a delta. The delta is used to
compute which thread to take the value from during the shuffle
operation. For shuffleUp it will be substracted from the thread id,
and for shuffleDown it will be added. And so in SYCL this delta must
be defined such as subgroup_local_id - delta falls within [0, subgroup_local_size[ for shuffleUp, and subgroup_local_id + delta
falls within [0, subgroup_local_size[ for shuffleDown.

However in SPIR-V, these built-ins are a bit more complicated and take
two values to participate in the shuffle and support twice the delta
range as the SYCL built-ins. For example for shuffleUp the valid range
for subgroup_local_id - delta is [-subgroup_local_size, subgroup_local_size[ and in this instance if it falls within
[-subgroup_local_size, 0[ the first value will be used to participate
in the shuffle, and if it falls within [0, subgroup_local_size[ the
second value will be used to participate in the shuffle. And it works in
a similar way for shuffleDown.

And so when implementing the SYCL built-ins using the SPIR-V built-ins,
only half of the range can be used in a properly defined way, which
means only one of the value parameters of the SPIR-V built-ins actually
matters. Therefore the SYCL built-ins are implemented passing in the
same value to both value parameters of the SPIR-V built-ins.

The complete definition of the SPIR-V built-ins can be found here:

https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_subgroups.asciidoc#instructions

This patch fixes the group collective implementation for AMDGCN, which had two main issues, in one place it was calling a regular `shuffle` instead of a `shuffleUp` which ended up breaking the reduction algorithm. In addition it was also not using the correct interface for the SPIR-V `shuffleUp` function. Which leads to the second part of this patch which fixes the `shuffleUp` and `shuffleDown` functions, mostly for the AMDGCN built-ins but also in the SYCL header, as the SYCL built-ins were not implemented properly on top of the SPIR-V built-ins. At the SYCL level, the `shuffleUp` and `shuffleDown` built-ins take a value to participate in the shuffle and a delta. The delta is used to compute which thread to take the value from during the shuffle operation. For `shuffleUp` it will be substracted from the thread id, and for `shuffleDown` it will be added. And so in SYCL this delta must be defined such as `subgroup_local_id - delta` falls within `[0, subgroup_local_size[` for `shuffleUp`, and `subgroup_local_id + delta` falls within `[0, subgroup_local_size[` for `shuffleDown`. However in SPIR-V, these built-ins are a bit more complicated and take two values to participate in the shuffle and support twice the delta range as the SYCL built-ins. For example for `shuffleUp` the valid range for `subgroup_local_id - delta` is `[-subgroup_local_size, subgroup_local_size[` and in this instance if it falls within `[-subgroup_local_size, 0[` the first value will be used to participate in the shuffle, and if it falls within `[0, subgroup_local_size[` the second value will be used to participate in the shuffle. And it works in a similar way for `shuffleDown`. And so when implementing the SYCL built-ins using the SPIR-V built-ins, only half of the range can be used in a properly defined way, which means only one of the value parameters of the SPIR-V built-ins actually matters. Therefore the SYCL built-ins are implemented passing in the same value to both value parameters of the SPIR-V built-ins. The complete definition of the SPIR-V built-ins can be found here: * https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_subgroups.asciidoc#instructions

Pennycook

Good catch. I have some thoughts about long-term direction (see below) but don't want to block this.

Pennycook · 2022-01-21T15:53:04Z

libclc/amdgcn-amdhsa/libspirv/misc/sub_group_shuffle.cl

+  int val;
+  if (index >= 0 && index < size) {
+    val = current;
+  } else if (index < 0 && index > -size) {
+    val = previous;
+    index = index + size;
+  } else {
+    // index out of bounds so return arbitrary data
+    val = current;
+    index = self;
+  }


This looks like a correct implementation of ShuffleUpINTEL to me, but I'm concerned about the extra complexity it introduces. We know that previous and current are always the same variable (if this intrinsic is called from the SYCL headers.

The SPIR-V non-uniform shuffle instructions (e.g. OpGroupNonUniformShuffleUp) are much closer semantically to the functionality required by SYCL, and this PR has me wondering whether it would be a good idea to switch over to those. It might be worth opening an issue or something to track this -- somebody would have to run some experiments to see which backends actually support these instructions, etc.

Agreed, we should definitely investigate the performance of this at some point.

Also for what it's worth the CUDA backend implements this directly in the header using NVPTX built-ins rather than going through the SPIR-V interface. So that mostly would affect the HIP backend and maybe others as well.

I've filed a ticket to keep track of this:

Investigate performance when using OpSubgroupShuffleUpINTEL for SYCL shuffles #5364

npmiller · 2022-01-24T10:46:39Z

The failing CI is because of an "Unexpectedly Passed" test which makes sense because this if fixing an issue, I've updated the XFAIL here:

[SYCL][HIP] Fix shuffles and reductions llvm-test-suite#759

However I was expecting a lot more new passes as locally this fixed a lot more than one test. So there may be another issue that doesn't show up on gfx908, I'm not too sure what the CI is using currently.

Using defines to figure out the wavefront size there is incorrect because libclc is not built for a specific amdgcn version, so it will always default to `64`. Instead use the `__oclc_wavefront64` global variable provided by ROCm, which will be set to a different value depending on the architecture.

Co-authored-by: Alexey Bader <alexey.bader@intel.com>

npmiller · 2022-01-25T14:14:09Z

Added the commits from #5386 to see how testing on the CI goes with both of the PRs.

npmiller · 2022-01-25T15:20:45Z

Okay, so with the changes from #5386 this PR isn't getting any failures and exactly unexpected passes I was expecting from this patch.

Plus it also fixes the broadcast tests disabled in intel/llvm-test-suite#740

So I'm, going to combine the two PRs into this one as we need both for the testing to work and the other patch is fairly small, closing #5386 and adding enabling the broadcast tests to intel/llvm-test-suite#759

This was fixed in the CI by intel/llvm#5359

bader · 2022-01-25T16:49:27Z

Thanks!

steffenlarsen

Good stuff! And an additional 👍 for the thorough description.

These tests are fixed by intel/llvm#5359

These tests are fixed by #5359

These tests are fixed by intel#5359

npmiller requested review from bader and a team as code owners January 21, 2022 13:39

npmiller requested a review from steffenlarsen January 21, 2022 13:39

npmiller mentioned this pull request Jan 21, 2022

[SYCL][HIP] Fix shuffles and reductions intel/llvm-test-suite#759

Merged

bader requested a review from Pennycook January 21, 2022 13:52

Pennycook previously approved these changes Jan 21, 2022

View reviewed changes

npmiller mentioned this pull request Jan 21, 2022

Investigate performance when using OpSubgroupShuffleUpINTEL for SYCL shuffles #5364

Open

npmiller mentioned this pull request Jan 25, 2022

[LIBCLC][AMDGCN] Fix get_max_sub_group_size #5386

Closed

npmiller and others added 2 commits January 25, 2022 14:08

Update libclc/amdgcn-amdhsa/libspirv/workitem/get_max_sub_group_size.cl

ffc0867

Co-authored-by: Alexey Bader <alexey.bader@intel.com>

npmiller dismissed Pennycook’s stale review via ffc0867 January 25, 2022 14:12

npmiller added a commit to npmiller/llvm-test-suite that referenced this pull request Jan 25, 2022

[SYCL][HIP] Re-enable broadcast tests

ba423d2

This was fixed in the CI by intel/llvm#5359

bader requested a review from Pennycook January 25, 2022 16:48

bader approved these changes Jan 25, 2022

View reviewed changes

Pennycook approved these changes Jan 25, 2022

View reviewed changes

steffenlarsen approved these changes Jan 25, 2022

View reviewed changes

bader merged commit d99e957 into intel:sycl Jan 26, 2022

bader pushed a commit to intel/llvm-test-suite that referenced this pull request Jan 26, 2022

[SYCL][HIP] Fix shuffles and reductions (#759)

12712bc

These tests are fixed by intel/llvm#5359

aelovikov-intel pushed a commit that referenced this pull request Feb 17, 2023

[SYCL][HIP] Fix shuffles and reductions (#759)

b238ad2

These tests are fixed by #5359

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023

[SYCL][HIP] Fix shuffles and reductions (intel/llvm-test-suite#759)

32a6ba4

These tests are fixed by intel#5359

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][AMDGCN] Fix up and down shuffles and reductions #5359

[SYCL][AMDGCN] Fix up and down shuffles and reductions #5359

npmiller commented Jan 21, 2022

Pennycook left a comment

Pennycook Jan 21, 2022

npmiller Jan 21, 2022

npmiller commented Jan 24, 2022

npmiller commented Jan 25, 2022

npmiller commented Jan 25, 2022

bader commented Jan 25, 2022

steffenlarsen left a comment

[SYCL][AMDGCN] Fix up and down shuffles and reductions #5359

[SYCL][AMDGCN] Fix up and down shuffles and reductions #5359

Conversation

npmiller commented Jan 21, 2022

Pennycook left a comment

Choose a reason for hiding this comment

Pennycook Jan 21, 2022

Choose a reason for hiding this comment

npmiller Jan 21, 2022

Choose a reason for hiding this comment

npmiller commented Jan 24, 2022

npmiller commented Jan 25, 2022

npmiller commented Jan 25, 2022

bader commented Jan 25, 2022

steffenlarsen left a comment

Choose a reason for hiding this comment