[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

jchlanda · 2023-10-12T10:36:56Z

SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.

jchlanda · 2023-10-12T10:37:08Z

cc: @steffenlarsen

llvmbot · 2023-10-12T10:38:00Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jakub Chlanda (jchlanda)

Changes

SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.

Full diff: https://github.com/llvm/llvm-project/pull/68872.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp (+28)
(added) llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir (+78)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
index 26074cf06071478..9d914a63fca2411 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   return MadeChange;
 }
 
+// SYCL allows required work-group size attribute to be partially specified
+// (not all three dimensions), provide a default value (1) for the missing
+// dimensions.
+static void updateSYCLreqdWorkGroupMD(Function &F) {
+  auto *Node = F.getMetadata("reqd_work_group_size");
+  if (!Node || Node->getNumOperands() == 3)
+    return;
+
+  auto &Context = F.getContext();
+  SmallVector<uint64_t, 3> RWGS;
+  for (auto &Op : Node->operands())
+    RWGS.push_back(mdconst::extract<ConstantInt>(Op)->getZExtValue());
+  while (RWGS.size() != 3)
+    RWGS.push_back(1);
+
+  llvm::Metadata *RWGSArgs[] = {
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[0]))),
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[1]))),
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[2])))};
+  F.setMetadata("reqd_work_group_size", llvm::MDNode::get(Context, RWGSArgs));
+}
 
 // TODO: Move makeLIDRangeMetadata usage into here. Seem to not get
 // TargetPassConfig for subtarget.
 bool AMDGPULowerKernelAttributes::runOnModule(Module &M) {
+  for (auto &F : M)
+    if (F.hasFnAttribute("sycl-module-id"))
+      updateSYCLreqdWorkGroupMD(F);
+
   bool MadeChange = false;
   bool IsV5OrAbove = AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5;
   Function *BasePtr = getBasePtrIntrinsic(M, IsV5OrAbove);
diff --git a/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
new file mode 100644
index 000000000000000..15f3e6f8c17ca27
--- /dev/null
+++ b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
@@ -0,0 +1,78 @@
+# RUN: llc -march=amdgcn -mcpu=gfx90a -run-pass amdgpu-lower-kernel-attributes -verify-machineinstrs -o - %s | FileCheck %s
+
+# As SYCL allows for the required work group to be specified partially, we need
+# to patch it up to 3 dimensions. Make sure that it only happens when dealing
+# with SYCL kernels.
+
+--- |
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
+
+  ; CHECK-LABEL: sycl_kernel_3dim
+  ; CHECK: !reqd_work_group_size [[SYCL_3DIM:![0-9]+]]
+  define amdgpu_kernel void @sycl_kernel_3dim() #0 !reqd_work_group_size !0 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: sycl_kernel_2dim
+  ; CHECK: !reqd_work_group_size [[SYCL_2DIM:![0-9]+]]
+  define amdgpu_kernel void @sycl_kernel_2dim() #0 !reqd_work_group_size !1 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: non_sycl_kernel_3dim
+  ; CHECK: !reqd_work_group_size [[NON_SYCL_3DIM:![0-9]+]]
+  define amdgpu_kernel void @non_sycl_kernel_3dim() #1 !reqd_work_group_size !2 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: non_sycl_kernel_2dim
+  ; CHECK: !reqd_work_group_size [[NON_SYCL_2DIM:![0-9]+]]
+  define amdgpu_kernel void @non_sycl_kernel_2dim() #1 !reqd_work_group_size !3 {
+  entry:
+    ret void
+  }
+
+  attributes #0 = { "sycl-module-id"="sycl-reqd-work-group-size.cpp" "target-cpu"="gfx90a" }
+  attributes #1 = { "target-cpu"="gfx90a" }
+
+  ; CHECK: [[SYCL_3DIM]] = !{i32 8, i32 16, i32 2}
+  !0 = !{i32 8, i32 16, i32 2}
+  ; CHECK: [[SYCL_2DIM]] = !{i32 8, i32 16, i32 1}
+  !1 = !{i32 8, i32 16}
+  ; CHECK: [[NON_SYCL_3DIM]] = !{i32 4, i32 8, i32 4}
+  !2 = !{i32 4, i32 8, i32 4}
+  ; CHECK: [[NON_SYCL_2DIM]] = !{i32 4, i32 8}
+  !3 = !{i32 4, i32 8}
+
+...
+---
+name:            sycl_kernel_3dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            sycl_kernel_2dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            non_sycl_kernel_3dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            non_sycl_kernel_2dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...

jchlanda · 2023-10-18T11:18:02Z

Friendly ping: @arsenm, @changpeng

yxsamliu

LGTM

SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.

arsenm · 2023-10-24T06:08:29Z

llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp

@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
  return MadeChange;
 }

+// SYCL allows required work-group size attribute to be partially specified


First of all, why would this happen? I would expect all producers of this to agree on one consistent format (which we should also add to the LangRef).

Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.

First of all, why would this happen? I would expect all producers of this to agree on one consistent format

The reason why this happens is because a discrepancy in how reqd_work_group_size is handled in OpenCL and SYCL. OpenCL mandates that all 3 dimensions are specified, padded by 1 if applicable (6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 spec):

The optional attribute((reqd_work_group_size(X, Y, Z))) is the work-
group size that must be used as the local_work_size argument to clEnqueueNDRangeKernel.
This allows the compiler to optimize the generated code appropriately for this kernel. The
optional attribute((reqd_work_group_size(X, Y, Z))), if specified, must
be (1, 1, 1) if the kernel is executed via clEnqueueTask

However, SYCL allows for any of the dimensions to be specified (Table 180 of SYCL 2020 spec):

reqd_work_group_size(dim0)
reqd_work_group_size(dim0, dim1)
reqd_work_group_size(dim0, dim1, dim2)

Furthermore, the SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the required work group size dimensionality, which makes padding the dimensions up to 3 very awkward.

Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.

I think this is a bit too harsh. It is a lower kernel attributes pass and it handles just that, an attribute. While I agree, that this is not "a fault" of AMD backend, I do think that sanitising the values here is correct, there are many places in the codebase where backends make special provisions for languages that are not in tree (rust for instance). And it solves the problem (admittedly, SYCL's problem) of loosing the dimensionality at the point where it is save to do so. Alternatively, maybe the verifier could be used to canonicalize SYCL required work group size?

That sounds like a source syntax difference, clang could still emit consistent metadata with 1 padding. The IR doesn't need to 1:1 correspond to what the source looks like

If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it

I guess in principle you are right, but, we do not want to alter the metadata in clang, as it would lose the dimensionality information that is important.

If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it

Since this feature is borrowed from OpenCL, and OpenCL assumptions are held in the ROCm toolchain (see examples in the comment below) I feel it has to be rewritten to converge to OpenCL version; and this spot seems very convenient for it.

So just to clarify based on the examples below, the main reason we need to do this is because the ROCm toolchain expects 3 entries in the final metadata?

Would it make more sense/be more acceptable to update this pass to handle any number of dimensions, and to instead add the padding when generating the AMD specific binary metadata for reqd_work_group_size, rather than doing it at IR level?

Agreed, we could use MetadataStreamerMsgPackV3::getWorkGroupDimensions to pad the missing dimensions and keep the metadata intact.

arsenm · 2023-10-24T06:08:41Z

llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp


 // TODO: Move makeLIDRangeMetadata usage into here. Seem to not get
 // TargetPassConfig for subtarget.
 bool AMDGPULowerKernelAttributes::runOnModule(Module &M) {
+  for (auto &F : M)
+    if (F.hasFnAttribute("sycl-module-id"))


This is unnecessary

This is an error condition in OpenCL, which is check by the verifier:

llvm-project/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp

Line 232 in 856bd99

if (!verifyEntry(KernelMap, ".reqd_workgroup_size", false,

However, it is acceptable in SYCL.

What's an error condition? That verifier is just checking the final output parses? The backend shouldn't need language specific checks

Is it just tripping over the assert 3 element condition? But you fixed that by adjusting the metadata to match?

Sorry, I should have been more clear.

What's an error condition? That verifier is just checking the final output parses?

No, OpenCL mandates all 3 dimensions are set, the verifier, rightly so, expects 3 elements and errors out on sub 3 elements. It's just that the same metadata now can be created through either SYCL, or OpenCL, and SYCL happens to relax the restriction.

But you fixed that by adjusting the metadata to match?

Yes, with this patch the condition is also true for SYCL's required work group size (padded with 1 for sub 3 elements).

A quick grep through ROCm-Developer-Tools, shows that this assumption (all 3 elements specified) is relied upon in a bunch of places:
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L216
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L250

@arsenm did I make a convincing argument for this patch, could we please procede?

Wouldn't updateSYCLreqdWorkGroupMD just early exit for OpenCL anyway since it always has 3 elements:

auto *Node = F.getMetadata("reqd_work_group_size"); if (!Node || Node->getNumOperands() == 3) return;

So we should be able to skip this sycl-module-id check and just always call the attribute updating function

Yeap, it would, but I wanted to be explicit about the fact that this is a sycl specific update. Happy to get rid of it though.

llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir

jchlanda · 2023-11-13T09:01:30Z

@arsenm friendly ping.

Is the change to verifier (to allow either 1 or 3 elements) and then padding the value in the getWorkGroupDimensions something that we could work with?

jchlanda · 2023-11-21T13:18:16Z

@arsenm we're quite keen on this being resolved, I've had another go at it only modifying the internal handling of the verifier here: #72652 (following you comment re changing metadata). #72652 achieves the same thing and if accepted could supersed this PR.

jchlanda · 2024-05-21T07:27:35Z

Closing this PR, we've decide to handle it internally in SYCL (intel/llvm#13600). It aligns with OpenCL's and always pads missing dimensions with 1.

jchlanda requested review from arsenm and changpeng October 12, 2023 10:36

llvmbot added the backend:AMDGPU label Oct 12, 2023

jchlanda requested a review from rampitec October 19, 2023 17:55

rampitec requested a review from yxsamliu October 19, 2023 18:17

yxsamliu approved these changes Oct 19, 2023

View reviewed changes

jchlanda force-pushed the jakub/amd_rqwgs_sycl branch from cb5fc49 to f85201d Compare October 19, 2023 18:59

jchlanda force-pushed the jakub/amd_rqwgs_sycl branch from f85201d to 856bd99 Compare October 24, 2023 05:40

arsenm requested changes Oct 24, 2023

View reviewed changes

Remove verify MI from a MIR test

b8bd043

This was referenced Nov 15, 2023

[HIP] Handle required wg size attribute in HIP oneapi-src/unified-runtime#954

Merged

[SYCL][HIP] Emit program metadata for HIP intel/llvm#11891

Draft

jchlanda mentioned this pull request May 6, 2024

[SYCL] Add work_group_num_dim metadata intel/llvm#13600

Merged

jchlanda closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

jchlanda commented Oct 12, 2023

jchlanda commented Oct 12, 2023

llvmbot commented Oct 12, 2023

jchlanda commented Oct 18, 2023

yxsamliu left a comment

arsenm Oct 24, 2023

jchlanda Oct 24, 2023

arsenm Oct 26, 2023

jchlanda Oct 26, 2023 •

edited

npmiller Nov 8, 2023

jchlanda Nov 8, 2023

arsenm Oct 24, 2023

jchlanda Oct 24, 2023

arsenm Oct 26, 2023

arsenm Oct 26, 2023

jchlanda Oct 26, 2023

jchlanda Nov 3, 2023

npmiller Nov 8, 2023

jchlanda Nov 8, 2023

jchlanda commented Nov 13, 2023

jchlanda commented Nov 21, 2023

jchlanda commented May 21, 2024

[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

Conversation

jchlanda commented Oct 12, 2023

jchlanda commented Oct 12, 2023

llvmbot commented Oct 12, 2023

jchlanda commented Oct 18, 2023

yxsamliu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchlanda Oct 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchlanda commented Nov 13, 2023

jchlanda commented Nov 21, 2023

jchlanda commented May 21, 2024

jchlanda Oct 26, 2023 •

edited