Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size #3755

Conversation

steffenlarsen
Copy link
Contributor

@steffenlarsen steffenlarsen commented May 13, 2021

These changes makes clang generate NVVM annotations for reqntidx, reqntidy, and reqntidz from the reqd_work_group_size attribute on kernels in a SYCL program when targeting. These are in turn converted to the reqntid PTX directive.

This is related to, but not dependent on, #3735.

These changes makes clang generate NVPTX annotations for reqntidx,
reqntidy, and reqntidz from the reqd_work_group_size attribute on
kernels in a SYCL program when targetting. These are in turn converted
to the reqntid PTX directive.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delay in review. I missed the email notification.

@@ -0,0 +1,44 @@
// RUN: %clang_cc1 -fsycl-is-device %s -emit-llvm -triple nvptx64-nvidia-cuda-sycldevice -o - | FileCheck %s

template <typename name, typename Func>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please #include "sycl.hpp". You can use the headers included in test folder by adding -internal-isystem %S/Inputs to RUN line.

Also, please add a comment describing the test

Optional<llvm::APSInt> YDimVal = A->getYDimVal(ClangCtx);
Optional<llvm::APSInt> ZDimVal = A->getZDimVal(ClangCtx);

// For a SYCLDevice ReqdWorkGroupSizeAttr arguments are reversed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding this correctly, in user code the order of arguments will be reversed for SYCL vs OpenCL but in final IR generated, the order will be openCL convention for all? @AaronBallman @smanna12 can you confirm this is correct? Does this match what we do for other targets?

I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632. I believe the tests represent this as well.

I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.

Good point. I think you are right that the swapping becomes a problem if there are any defaults in there, i.e. the first two concrete IR tests should be

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidy", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidz", i32 1}

// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidy", i32 64}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidz", i32 1}

Since this code is not generating the defaults, it will be hard to distinguish default-generated 1's from user-specified 1 requirements. Maybe this will have to wait until there is a resolution to the defaults. Do you know if there is an issue open somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is an issue open somewhere?

yes, there is one: #3743

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match what we do for other targets?

This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632.

Thanks for confirmation.

@steffenlarsen steffenlarsen changed the title [SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size [WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size Jun 21, 2021
@steffenlarsen
Copy link
Contributor Author

I have marked this WIP until #3755 (comment) can be resolved.

@AerialMantis AerialMantis added this to Needs triage in oneAPI DPC++ Aug 17, 2021
@github-actions github-actions bot added the Stale label Feb 14, 2022
@github-actions github-actions bot closed this Mar 17, 2022
oneAPI DPC++ automation moved this from Needs triage to Closed Mar 17, 2022
@steffenlarsen steffenlarsen deleted the steffen/cuda_reqntid_from_reqd_work_group_size branch December 6, 2023 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end Stale
Projects
No open projects
SYCL on CUDA
  
To do
oneAPI DPC++
  
Closed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants