Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size #3755

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions clang/lib/CodeGen/TargetInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7132,6 +7132,25 @@ void NVPTXTargetCodeGenInfo::setTargetAttributes(
// And kernel functions are not subject to inlining
F->addFnAttr(llvm::Attribute::NoInline);
}

// Set reqntid if reqd_work_group_size attribute is set
if (const ReqdWorkGroupSizeAttr *A = FD->getAttr<ReqdWorkGroupSizeAttr>()) {
ASTContext &ClangCtx = FD->getASTContext();
Optional<llvm::APSInt> XDimVal = A->getXDimVal(ClangCtx);
Optional<llvm::APSInt> YDimVal = A->getYDimVal(ClangCtx);
Optional<llvm::APSInt> ZDimVal = A->getZDimVal(ClangCtx);

// For a SYCLDevice ReqdWorkGroupSizeAttr arguments are reversed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding this correctly, in user code the order of arguments will be reversed for SYCL vs OpenCL but in final IR generated, the order will be openCL convention for all? @AaronBallman @smanna12 can you confirm this is correct? Does this match what we do for other targets?

I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632. I believe the tests represent this as well.

I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.

Good point. I think you are right that the swapping becomes a problem if there are any defaults in there, i.e. the first two concrete IR tests should be

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidy", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidz", i32 1}

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidy", i32 64}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidz", i32 1}

Since this code is not generating the defaults, it will be hard to distinguish default-generated 1's from user-specified 1 requirements. Maybe this will have to wait until there is a resolution to the defaults. Do you know if there is an issue open somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is an issue open somewhere?

yes, there is one: #3743

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match what we do for other targets?

This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632.

Thanks for confirmation.

if (M.getLangOpts().SYCLIsDevice)
std::swap(XDimVal, ZDimVal);

// Create !{<func-ref>, metadata !"reqntidx", i32 <ReqdWorkGroupSize x>}
addNVVMMetadata(F, "reqntidx", XDimVal->getZExtValue());
// Create !{<func-ref>, metadata !"reqntidy", i32 <ReqdWorkGroupSize y>}
addNVVMMetadata(F, "reqntidy", YDimVal->getZExtValue());
// Create !{<func-ref>, metadata !"reqntidz", i32 <ReqdWorkGroupSize z>}
addNVVMMetadata(F, "reqntidz", ZDimVal->getZExtValue());
}
}

// Perform special handling in CUDA mode.
Expand Down
44 changes: 44 additions & 0 deletions clang/test/CodeGenSYCL/ptx-reqd-work-group-size.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// RUN: %clang_cc1 -fsycl-is-device %s -emit-llvm -triple nvptx64-nvidia-cuda-sycldevice -o - | FileCheck %s

template <typename name, typename Func>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please #include "sycl.hpp". You can use the headers included in test folder by adding -internal-isystem %S/Inputs to RUN line.

Also, please add a comment describing the test

__attribute__((sycl_kernel)) void kernel(const Func &kernelFunc) {
kernelFunc();
}

int main() {
kernel<class kernel_no_reqd_work_size>([]() {});
// CHECK: define dso_local void @{{.*}}kernel_no_reqd_work_size()
// CHECK-NOT: define dso_local void @{{.*}}kernel_no_reqd_work_size() {{.*}} !reqd_work_group_size ![[WGSIZE1D:[0-9]+]]

kernel<class kernel_reqd_work_size_1d>(
[]() [[intel::reqd_work_group_size(32)]]{});
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_1d() {{.*}} !reqd_work_group_size ![[WGSIZE1D:[0-9]+]]

kernel<class kernel_reqd_work_size_2d>(
[]() [[intel::reqd_work_group_size(64, 32)]]{});
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_2d() {{.*}} !reqd_work_group_size ![[WGSIZE2D:[0-9]+]]

kernel<class kernel_reqd_work_size_3d>(
[]() [[intel::reqd_work_group_size(128, 64, 32)]]{});
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_3d() {{.*}} !reqd_work_group_size ![[WGSIZE3D:[0-9]+]]
}

// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidx", i32 !{{[0-9]+}}}
// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidy", i32 !{{[0-9]+}}}
// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidz", i32 !{{[0-9]+}}}

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidx", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidy", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidz", i32 32}

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidx", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidy", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidz", i32 64}

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidy", i32 64}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidz", i32 128}

// CHECK: ![[WGSIZE1D]] = !{i32 1, i32 1, i32 32}
// CHECK: ![[WGSIZE2D]] = !{i32 1, i32 32, i32 64}
// CHECK: ![[WGSIZE3D]] = !{i32 32, i32 64, i32 128}