-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Defaults for missing dimensions in SYCL required wg size #68872
Conversation
cc: @steffenlarsen |
@llvm/pr-subscribers-backend-amdgpu Author: Jakub Chlanda (jchlanda) ChangesSYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels. Full diff: https://github.com/llvm/llvm-project/pull/68872.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
index 26074cf06071478..9d914a63fca2411 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
return MadeChange;
}
+// SYCL allows required work-group size attribute to be partially specified
+// (not all three dimensions), provide a default value (1) for the missing
+// dimensions.
+static void updateSYCLreqdWorkGroupMD(Function &F) {
+ auto *Node = F.getMetadata("reqd_work_group_size");
+ if (!Node || Node->getNumOperands() == 3)
+ return;
+
+ auto &Context = F.getContext();
+ SmallVector<uint64_t, 3> RWGS;
+ for (auto &Op : Node->operands())
+ RWGS.push_back(mdconst::extract<ConstantInt>(Op)->getZExtValue());
+ while (RWGS.size() != 3)
+ RWGS.push_back(1);
+
+ llvm::Metadata *RWGSArgs[] = {
+ llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+ llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[0]))),
+ llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+ llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[1]))),
+ llvm::ConstantAsMetadata::get(llvm::ConstantInt::get(
+ llvm::IntegerType::get(Context, 32), llvm::APInt(32, RWGS[2])))};
+ F.setMetadata("reqd_work_group_size", llvm::MDNode::get(Context, RWGSArgs));
+}
// TODO: Move makeLIDRangeMetadata usage into here. Seem to not get
// TargetPassConfig for subtarget.
bool AMDGPULowerKernelAttributes::runOnModule(Module &M) {
+ for (auto &F : M)
+ if (F.hasFnAttribute("sycl-module-id"))
+ updateSYCLreqdWorkGroupMD(F);
+
bool MadeChange = false;
bool IsV5OrAbove = AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5;
Function *BasePtr = getBasePtrIntrinsic(M, IsV5OrAbove);
diff --git a/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
new file mode 100644
index 000000000000000..15f3e6f8c17ca27
--- /dev/null
+++ b/llvm/test/CodeGen/MIR/AMDGPU/sycl-reqd-work-group-size.mir
@@ -0,0 +1,78 @@
+# RUN: llc -march=amdgcn -mcpu=gfx90a -run-pass amdgpu-lower-kernel-attributes -verify-machineinstrs -o - %s | FileCheck %s
+
+# As SYCL allows for the required work group to be specified partially, we need
+# to patch it up to 3 dimensions. Make sure that it only happens when dealing
+# with SYCL kernels.
+
+--- |
+ target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
+
+ ; CHECK-LABEL: sycl_kernel_3dim
+ ; CHECK: !reqd_work_group_size [[SYCL_3DIM:![0-9]+]]
+ define amdgpu_kernel void @sycl_kernel_3dim() #0 !reqd_work_group_size !0 {
+ entry:
+ ret void
+ }
+
+ ; CHECK-LABEL: sycl_kernel_2dim
+ ; CHECK: !reqd_work_group_size [[SYCL_2DIM:![0-9]+]]
+ define amdgpu_kernel void @sycl_kernel_2dim() #0 !reqd_work_group_size !1 {
+ entry:
+ ret void
+ }
+
+ ; CHECK-LABEL: non_sycl_kernel_3dim
+ ; CHECK: !reqd_work_group_size [[NON_SYCL_3DIM:![0-9]+]]
+ define amdgpu_kernel void @non_sycl_kernel_3dim() #1 !reqd_work_group_size !2 {
+ entry:
+ ret void
+ }
+
+ ; CHECK-LABEL: non_sycl_kernel_2dim
+ ; CHECK: !reqd_work_group_size [[NON_SYCL_2DIM:![0-9]+]]
+ define amdgpu_kernel void @non_sycl_kernel_2dim() #1 !reqd_work_group_size !3 {
+ entry:
+ ret void
+ }
+
+ attributes #0 = { "sycl-module-id"="sycl-reqd-work-group-size.cpp" "target-cpu"="gfx90a" }
+ attributes #1 = { "target-cpu"="gfx90a" }
+
+ ; CHECK: [[SYCL_3DIM]] = !{i32 8, i32 16, i32 2}
+ !0 = !{i32 8, i32 16, i32 2}
+ ; CHECK: [[SYCL_2DIM]] = !{i32 8, i32 16, i32 1}
+ !1 = !{i32 8, i32 16}
+ ; CHECK: [[NON_SYCL_3DIM]] = !{i32 4, i32 8, i32 4}
+ !2 = !{i32 4, i32 8, i32 4}
+ ; CHECK: [[NON_SYCL_2DIM]] = !{i32 4, i32 8}
+ !3 = !{i32 4, i32 8}
+
+...
+---
+name: sycl_kernel_3dim
+body: |
+ bb.0.entry:
+ S_ENDPGM 0
+
+...
+---
+name: sycl_kernel_2dim
+body: |
+ bb.0.entry:
+ S_ENDPGM 0
+
+...
+---
+name: non_sycl_kernel_3dim
+body: |
+ bb.0.entry:
+ S_ENDPGM 0
+
+...
+---
+name: non_sycl_kernel_2dim
+body: |
+ bb.0.entry:
+ S_ENDPGM 0
+
+...
|
Friendly ping: @arsenm, @changpeng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cb5fc49
to
f85201d
Compare
SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.
f85201d
to
856bd99
Compare
@@ -317,10 +317,38 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) { | |||
return MadeChange; | |||
} | |||
|
|||
// SYCL allows required work-group size attribute to be partially specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, why would this happen? I would expect all producers of this to agree on one consistent format (which we should also add to the LangRef).
Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, why would this happen? I would expect all producers of this to agree on one consistent format
The reason why this happens is because a discrepancy in how reqd_work_group_size
is handled in OpenCL and SYCL. OpenCL mandates that all 3 dimensions are specified, padded by 1
if applicable (6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 spec):
The optional attribute((reqd_work_group_size(X, Y, Z))) is the work-
group size that must be used as the local_work_size argument to clEnqueueNDRangeKernel.
This allows the compiler to optimize the generated code appropriately for this kernel. The
optional attribute((reqd_work_group_size(X, Y, Z))), if specified, must
be (1, 1, 1) if the kernel is executed via clEnqueueTask
However, SYCL allows for any of the dimensions to be specified (Table 180 of SYCL 2020 spec):
reqd_work_group_size(dim0)
reqd_work_group_size(dim0, dim1)
reqd_work_group_size(dim0, dim1, dim2)
Furthermore, the SYCL runtime is supposed to throw an exception when a kernel is launched with a number of dimensions that does not match the required work group size dimensionality, which makes padding the dimensions up to 3 very awkward.
Second, a random backend pass shouldn't be going out of its way to canonicalize metadata.
I think this is a bit too harsh. It is a lower kernel attributes pass and it handles just that, an attribute. While I agree, that this is not "a fault" of AMD backend, I do think that sanitising the values here is correct, there are many places in the codebase where backends make special provisions for languages that are not in tree (rust for instance). And it solves the problem (admittedly, SYCL's problem) of loosing the dimensionality at the point where it is save to do so. Alternatively, maybe the verifier could be used to canonicalize SYCL required work group size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a source syntax difference, clang could still emit consistent metadata with 1 padding. The IR doesn't need to 1:1 correspond to what the source looks like
If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in principle you are right, but, we do not want to alter the metadata in clang, as it would lose the dimensionality information that is important.
If we really needed to support 2 forms of the metadata, I would expect to just have a utility function to read the effective value. You don't need to rewrite it
Since this feature is borrowed from OpenCL, and OpenCL assumptions are held in the ROCm toolchain (see examples in the comment below) I feel it has to be rewritten to converge to OpenCL version; and this spot seems very convenient for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to clarify based on the examples below, the main reason we need to do this is because the ROCm toolchain expects 3 entries in the final metadata?
Would it make more sense/be more acceptable to update this pass to handle any number of dimensions, and to instead add the padding when generating the AMD specific binary metadata for reqd_work_group_size
, rather than doing it at IR level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we could use MetadataStreamerMsgPackV3::getWorkGroupDimensions
to pad the missing dimensions and keep the metadata intact.
|
||
// TODO: Move makeLIDRangeMetadata usage into here. Seem to not get | ||
// TargetPassConfig for subtarget. | ||
bool AMDGPULowerKernelAttributes::runOnModule(Module &M) { | ||
for (auto &F : M) | ||
if (F.hasFnAttribute("sycl-module-id")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error condition in OpenCL, which is check by the verifier:
if (!verifyEntry(KernelMap, ".reqd_workgroup_size", false, |
However, it is acceptable in SYCL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's an error condition? That verifier is just checking the final output parses? The backend shouldn't need language specific checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just tripping over the assert 3 element condition? But you fixed that by adjusting the metadata to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been more clear.
What's an error condition? That verifier is just checking the final output parses?
No, OpenCL mandates all 3 dimensions are set, the verifier, rightly so, expects 3 elements and errors out on sub 3 elements. It's just that the same metadata now can be created through either SYCL, or OpenCL, and SYCL happens to relax the restriction.
But you fixed that by adjusting the metadata to match?
Yes, with this patch the condition is also true for SYCL's required work group size (padded with 1 for sub 3 elements).
A quick grep through ROCm-Developer-Tools, shows that this assumption (all 3 elements specified) is relied upon in a bunch of places:
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L216
https://github.com/ROCm-Developer-Tools/clr/blob/38d2c56784fe2a2b9aff35822d3c9f4616189ead/rocclr/device/devkernel.cpp#L250
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arsenm did I make a convincing argument for this patch, could we please procede?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't updateSYCLreqdWorkGroupMD
just early exit for OpenCL anyway since it always has 3 elements:
auto *Node = F.getMetadata("reqd_work_group_size");
if (!Node || Node->getNumOperands() == 3)
return;
So we should be able to skip this sycl-module-id
check and just always call the attribute updating function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, it would, but I wanted to be explicit about the fact that this is a sycl specific update. Happy to get rid of it though.
@arsenm friendly ping. Is the change to verifier (to allow either 1 or 3 elements) and then padding the value in the |
Closing this PR, we've decide to handle it internally in SYCL (intel/llvm#13600). It aligns with OpenCL's and always pads missing dimensions with |
SYCL allows for required work group to be partially specified (i.e. not all 3 dimensions): https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.attributes This fails AMDGPU's attribute verification. The patch aims to provide the default values for missing dimensions when dealing with SYCL kernels.