-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[NFC][HIP] Disable device-side kernel launches for HIP #171043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Alex Voicu (AlexVlx) Changes#165519 added support for launching kernels from the device side. This is only available in CUDA at the moment. We have to explicitly check whether we are compiling for HIP to guard against this path being exercised, since the CUDA and HIP languages rely on the same Full diff: https://github.com/llvm/llvm-project/pull/171043.diff 1 Files Affected:
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index ce2ed9026fa1f..3f4f61db8d3a4 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -504,7 +504,8 @@ RValue CodeGenFunction::EmitCUDAKernelCallExpr(const CUDAKernelCallExpr *E,
ReturnValueSlot ReturnValue,
llvm::CallBase **CallOrInvoke) {
// Emit as a device kernel call if CUDA device code is to be generated.
- if (getLangOpts().CUDAIsDevice)
+ // TODO: implement for HIP
+ if (!getLangOpts().HIP && getLangOpts().CUDAIsDevice)
return CGM.getCUDARuntime().EmitCUDADeviceKernelCallExpr(
*this, E, ReturnValue, CallOrInvoke);
return CGM.getCUDARuntime().EmitCUDAKernelCallExpr(*this, E, ReturnValue,
|
…fix_hip_disable_self_enqueue
…fix_hip_disable_self_enqueue
| // Emit as a device kernel call if CUDA device code is to be generated. | ||
| if (getLangOpts().CUDAIsDevice) | ||
| // TODO: implement for HIP | ||
| if (!getLangOpts().HIP && getLangOpts().CUDAIsDevice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added sema check @
llvm-project/clang/lib/Sema/SemaCUDA.cpp
Line 83 in 8378a6f
| if (IsDeviceKernelCall && getLangOpts().HIP) |
CUDAKernelCallExpr in the device compilation. Could you elaborate in details?
AlexVlx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added sema check @
to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could havellvm-project/clang/lib/Sema/SemaCUDA.cpp
Line 83 in 8378a6f
if (IsDeviceKernelCall && getLangOpts().HIP) CUDAKernelCallExprin the device compilation. Could you elaborate in details?
The sema check doesn't work as is for hipstdpar, because it's gated on the current target being either a __global__ function or a __device__ function. What happens is that we do the parsing on a normal function, the <<<>>> expression is semantically valid, and then we try to EmitCUDAKernelCallExpr, because at CodeGen that is gated on whether the entire compilation is host or device, not on whether or not the caller is __global__ or __device__. So either the latter check should actually establish the caller's context, or we should bypass this altogether when compiling for hipstdpar. This is the simplest NFC workaround to unbreak things.
Why not add |
AlexVlx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added sema check @
llvm-project/clang/lib/Sema/SemaCUDA.cpp
Line 83 in 8378a6f
if (IsDeviceKernelCall && getLangOpts().HIP)
to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could haveCUDAKernelCallExprin the device compilation. Could you elaborate in details?The sema check doesn't work as is for
hipstdpar, because it's gated on the current target being either a__global__function or a__device__function. What happens is that we do the parsing on a normal function, the <<<>>> expression is semantically valid, and then we try toEmitCUDAKernelCallExpr, because at CodeGen that is gated on whether the entire compilation is host or device, not on whether or not the caller is__global__or__device__. So either the latter check should actually establish the caller's context, or we should bypass this altogether when compiling for hipstdpar. This is the simplest NFC workaround to unbreak things.Why not add
getLangOpts().HIPStdParcheck in sema to skip generating device-side kernel call? So that we have a central place to make that decision?
Because, as far as I can ascertain, the Sema check is insufficient / the separate assert in EmitCUDAKernelCallExpr is disjoint. Here's what would happen:
- In Sema what we see is that
IsDeviceKernelCallis false - this is fine, but we still would emit aCudaKernelCallExprfor the<<<>>>callsite, which was the case anyways before this change; - Later on, when we get to
CodeGen, we see theCudaKernelCallExpr, and try to handle it, except now the assumption is that if we're compiling for device and we see that, it must be a device side launch, and go look up a non-existent symbol, and run into the bug.
#165519 added support for launching kernels from the device side. This is only available in CUDA at the moment. We have to explicitly check whether we are compiling for HIP to guard against this path being exercised, since the CUDA and HIP languages rely on the same
CUDAIsDevicebit to check for device side compilation, and it is not possible to disambiguate otherwise.