Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872

Closed
wants to merge 2 commits into from

Conversation

jchlanda
Copy link
Contributor

SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.

@jchlanda
Copy link
Contributor Author

cc: @steffenlarsen

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 12, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Jakub Chlanda (jchlanda)

Changes

SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.


Full diff: https://github.com/llvm/llvm-project/pull/68872.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp (+28)
  • (added) llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir (+78)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
index 26074cf06071478..9d914a63fca2411 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   return MadeChange;
 }
 
+// SYCL allows required work-group size attribute to be partially specified
+// (not all three dimensions), provide a default value (1) for the missing
+// dimensions.
+static void updateSYCLreqdWorkGroupMD(Function &F) {
+  auto *Node = F.getMetadata("reqd_work_group_size");
+  if (!Node || Node->getNumOperands() == 3)
+    return;
+
+  auto &Context = F.getContext();
+  SmallVector<uint64_t, 3> RWGS;
+  for (auto &Op : Node->operands())
+    RWGS.push_back(mdconst::extract<ConstantInt>(Op)->getZExtValue());
+  while (RWGS.size() != 3)
+    RWGS.push_back(1);
+
+  llvm::Metadata *RWGSArgs[] = {
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[0]))),
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[1]))),
+      llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+          llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[2])))};
+  F.setMetadata("reqd_work_group_size", llvm::MDNode::get(Context, RWGSArgs));
+}
 
 // TODO: Move makeLIDRangeMetadata usage into here. Seem to not get
 // TargetPassConfig for subtarget.
 bool AMDGPULowerKernelAttributes::runOnModule(Module &M) {
+  for (auto &F : M)
+    if (F.hasFnAttribute("sycl-module-id"))
+      updateSYCLreqdWorkGroupMD(F);
+
   bool MadeChange = false;
   bool IsV5OrAbove = AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5;
   Function *BasePtr = getBasePtrIntrinsic(M, IsV5OrAbove);
diff --git a/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
new file mode 100644
index 000000000000000..15f3e6f8c17ca27
--- /dev/null
+++ b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
@@ -0,0 +1,78 @@
+# RUN: llc -march=amdgcn -mcpu=gfx90a -run-pass amdgpu-lower-kernel-attributes -verify-machineinstrs -o - %s | FileCheck %s
+
+# As SYCL allows for the required work group to be specified partially, we need
+# to patch it up to 3 dimensions. Make sure that it only happens when dealing
+# with SYCL kernels.
+
+--- |
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
+
+  ; CHECK-LABEL: sycl_kernel_3dim
+  ; CHECK: !reqd_work_group_size [[SYCL_3DIM:![0-9]+]]
+  define amdgpu_kernel void @sycl_kernel_3dim() #0 !reqd_work_group_size !0 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: sycl_kernel_2dim
+  ; CHECK: !reqd_work_group_size [[SYCL_2DIM:![0-9]+]]
+  define amdgpu_kernel void @sycl_kernel_2dim() #0 !reqd_work_group_size !1 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: non_sycl_kernel_3dim
+  ; CHECK: !reqd_work_group_size [[NON_SYCL_3DIM:![0-9]+]]
+  define amdgpu_kernel void @non_sycl_kernel_3dim() #1 !reqd_work_group_size !2 {
+  entry:
+    ret void
+  }
+
+  ; CHECK-LABEL: non_sycl_kernel_2dim
+  ; CHECK: !reqd_work_group_size [[NON_SYCL_2DIM:![0-9]+]]
+  define amdgpu_kernel void @non_sycl_kernel_2dim() #1 !reqd_work_group_size !3 {
+  entry:
+    ret void
+  }
+
+  attributes #0 = { "sycl-module-id"="sycl-reqd-work-group-size.cpp" "target-cpu"="gfx90a" }
+  attributes #1 = { "target-cpu"="gfx90a" }
+
+  ; CHECK: [[SYCL_3DIM]] = !{i32 8, i32 16, i32 2}
+  !0 = !{i32 8, i32 16, i32 2}
+  ; CHECK: [[SYCL_2DIM]] = !{i32 8, i32 16, i32 1}
+  !1 = !{i32 8, i32 16}
+  ; CHECK: [[NON_SYCL_3DIM]] = !{i32 4, i32 8, i32 4}
+  !2 = !{i32 4, i32 8, i32 4}
+  ; CHECK: [[NON_SYCL_2DIM]] = !{i32 4, i32 8}
+  !3 = !{i32 4, i32 8}
+
+...
+---
+name:            sycl_kernel_3dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            sycl_kernel_2dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            non_sycl_kernel_3dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...
+---
+name:            non_sycl_kernel_2dim
+body:             |
+  bb.0.entry:
+    S_ENDPGM 0
+
+...

@jchlanda
Copy link
Contributor Author

Friendly ping: @arsenm, @changpeng

Copy link
Collaborator

@yxsamliu yxsamliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

SYCL allows for required work group to be partially specified (i.e. not
all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes
This fails AMDGPU's attribute verification. The patch aims to provide
the default values for missing dimensions when dealing with SYCL
kernels.
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
return MadeChange;
}

// SYCL allows required work-group size attribute to be partially specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, why would this happen? I would expect all producers of this to agree on one consistent format (which we should also add to the LangRef).

Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, why would this happen? I would expect all producers of this to agree on one consistent format

The reason why this happens is because a discrepancy in how reqd_work_group_size is handled in OpenCL and SYCL. OpenCL mandates that all 3 dimensions are specified, padded by 1 if applicable (6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 spec):

The optional attribute((reqd_work_group_size(X, Y, Z))) is the work-
group size that must be used as the local_work_size argument to clEnqueueNDRangeKernel.
This allows the compiler to optimize the generated code appropriately for this kernel. The
optional attribute((reqd_work_group_size(X, Y, Z))), if specified, must
be (1, 1, 1) if the kernel is executed via clEnqueueTask

However, SYCL allows for any of the dimensions to be specified (Table 180 of SYCL 2020 spec):

reqd_work_group_size(dim0)
reqd_work_group_size(dim0, dim1)
reqd_work_group_size(dim0, dim1, dim2)

Furthermore, the SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the required work group size dimensionality, which makes padding the dimensions up to 3 very awkward.

Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.

I think this is a bit too harsh. It is a lower kernel attributes pass and it handles just that, an attribute. While I agree, that this is not "a fault" of AMD backend, I do think that sanitising the values here is correct, there are many places in the codebase where backends make special provisions for languages that are not in tree (rust for instance). And it solves the problem (admittedly, SYCL's problem) of loosing the dimensionality at the point where it is save to do so. Alternatively, maybe the verifier could be used to canonicalize SYCL required work group size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a source syntax difference, clang could still emit consistent metadata with 1 padding. The IR doesn't need to 1:1 correspond to what the source looks like

If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it

Copy link
Contributor Author

@jchlanda jchlanda Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in principle you are right, but, we do not want to alter the metadata in clang, as it would lose the dimensionality information that is important.

If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it

Since this feature is borrowed from OpenCL, and OpenCL assumptions are held in the ROCm toolchain (see examples in the comment below) I feel it has to be rewritten to converge to OpenCL version; and this spot seems very convenient for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify based on the examples below, the main reason we need to do this is because the ROCm toolchain expects 3 entries in the final metadata?

Would it make more sense/be more acceptable to update this pass to handle any number of dimensions, and to instead add the padding when generating the AMD specific binary metadata for reqd_work_group_size, rather than doing it at IR level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we could use MetadataStreamerMsgPackV3::getWorkGroupDimensions to pad the missing dimensions and keep the metadata intact.


// TODO: Move makeLIDRangeMetadata usage into here. Seem to not get
// TargetPassConfig for subtarget.
bool AMDGPULowerKernelAttributes::runOnModule(Module &M) {
for (auto &F : M)
if (F.hasFnAttribute("sycl-module-id"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error condition in OpenCL, which is check by the verifier:

if (!verifyEntry(KernelMap, ".reqd_workgroup_size", false,

However, it is acceptable in SYCL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an error condition? That verifier is just checking the final output parses? The backend shouldn't need language specific checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just tripping over the assert 3 element condition? But you fixed that by adjusting the metadata to match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more clear.

What's an error condition? That verifier is just checking the final output parses?

No, OpenCL mandates all 3 dimensions are set, the verifier, rightly so, expects 3 elements and errors out on sub 3 elements. It's just that the same metadata now can be created through either SYCL, or OpenCL, and SYCL happens to relax the restriction.

But you fixed that by adjusting the metadata to match?

Yes, with this patch the condition is also true for SYCL's required work group size (padded with 1 for sub 3 elements).

A quick grep through ROCm-Developer-Tools, shows that this assumption (all 3 elements specified) is relied upon in a bunch of places:
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L216
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L250

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm did I make a convincing argument for this patch, could we please procede?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't updateSYCLreqdWorkGroupMD just early exit for OpenCL anyway since it always has 3 elements:

  auto *Node = F.getMetadata("reqd_work_group_size");
  if (!Node || Node->getNumOperands() == 3)
    return;

So we should be able to skip this sycl-module-id check and just always call the attribute updating function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, it would, but I wanted to be explicit about the fact that this is a sycl specific update. Happy to get rid of it though.

llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir Outdated Show resolved Hide resolved
@jchlanda
Copy link
Contributor Author

@arsenm friendly ping.

Is the change to verifier (to allow either 1 or 3 elements) and then padding the value in the getWorkGroupDimensions something that we could work with?

@jchlanda
Copy link
Contributor Author

@arsenm we're quite keen on this being resolved, I've had another go at it only modifying the internal handling of the verifier here: #72652 (following you comment re changing metadata). #72652 achieves the same thing and if accepted could supersed this PR.

@jchlanda
Copy link
Contributor Author

Closing this PR, we've decide to handle it internally in SYCL (intel/llvm#13600). It aligns with OpenCL's and always pads missing dimensions with 1.

@jchlanda jchlanda closed this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants