Skip to content

Conversation

@AlexVlx
Copy link
Contributor

@AlexVlx AlexVlx commented Dec 7, 2025

#165519 added support for launching kernels from the device side. This is only available in CUDA at the moment. We have to explicitly check whether we are compiling for HIP to guard against this path being exercised, since the CUDA and HIP languages rely on the same CUDAIsDevice bit to check for device side compilation, and it is not possible to disambiguate otherwise.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Dec 7, 2025
@AlexVlx AlexVlx requested review from darkbuck and yxsamliu December 7, 2025 15:05
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Alex Voicu (AlexVlx)

Changes

#165519 added support for launching kernels from the device side. This is only available in CUDA at the moment. We have to explicitly check whether we are compiling for HIP to guard against this path being exercised, since the CUDA and HIP languages rely on the same CUDAIsDevice bit to check for device side compilation, and it is not possible to disambiguate otherwise.


Full diff: https://github.com/llvm/llvm-project/pull/171043.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprCXX.cpp (+2-1)
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index ce2ed9026fa1f..3f4f61db8d3a4 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -504,7 +504,8 @@ RValue CodeGenFunction::EmitCUDAKernelCallExpr(const CUDAKernelCallExpr *E,
                                                ReturnValueSlot ReturnValue,
                                                llvm::CallBase **CallOrInvoke) {
   // Emit as a device kernel call if CUDA device code is to be generated.
-  if (getLangOpts().CUDAIsDevice)
+  // TODO: implement for HIP
+  if (!getLangOpts().HIP && getLangOpts().CUDAIsDevice)
     return CGM.getCUDARuntime().EmitCUDADeviceKernelCallExpr(
         *this, E, ReturnValue, CallOrInvoke);
   return CGM.getCUDARuntime().EmitCUDAKernelCallExpr(*this, E, ReturnValue,

// Emit as a device kernel call if CUDA device code is to be generated.
if (getLangOpts().CUDAIsDevice)
// TODO: implement for HIP
if (!getLangOpts().HIP && getLangOpts().CUDAIsDevice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added sema check @

if (IsDeviceKernelCall && getLangOpts().HIP)
to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could have CUDAKernelCallExpr in the device compilation. Could you elaborate in details?

Copy link
Contributor Author

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added sema check @

if (IsDeviceKernelCall && getLangOpts().HIP)
to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could have CUDAKernelCallExpr in the device compilation. Could you elaborate in details?

The sema check doesn't work as is for hipstdpar, because it's gated on the current target being either a __global__ function or a __device__ function. What happens is that we do the parsing on a normal function, the <<<>>> expression is semantically valid, and then we try to EmitCUDAKernelCallExpr, because at CodeGen that is gated on whether the entire compilation is host or device, not on whether or not the caller is __global__ or __device__. So either the latter check should actually establish the caller's context, or we should bypass this altogether when compiling for hipstdpar. This is the simplest NFC workaround to unbreak things.

@darkbuck
Copy link
Contributor

darkbuck commented Dec 9, 2025

We added sema check @

if (IsDeviceKernelCall && getLangOpts().HIP)

to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could have CUDAKernelCallExpr in the device compilation. Could you elaborate in details?

The sema check doesn't work as is for hipstdpar, because it's gated on the current target being either a __global__ function or a __device__ function. What happens is that we do the parsing on a normal function, the <<<>>> expression is semantically valid, and then we try to EmitCUDAKernelCallExpr, because at CodeGen that is gated on whether the entire compilation is host or device, not on whether or not the caller is __global__ or __device__. So either the latter check should actually establish the caller's context, or we should bypass this altogether when compiling for hipstdpar. This is the simplest NFC workaround to unbreak things.

Why not add getLangOpts().HIPStdPar check in sema to skip generating device-side kernel call? So that we have a central place to make that decision?

Copy link
Contributor Author

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added sema check @

if (IsDeviceKernelCall && getLangOpts().HIP)

to generate error message on HIP based on Sam's request as HIP currently doesnt' support device-side kernel calls. I don't follow how we could have CUDAKernelCallExpr in the device compilation. Could you elaborate in details?

The sema check doesn't work as is for hipstdpar, because it's gated on the current target being either a __global__ function or a __device__ function. What happens is that we do the parsing on a normal function, the <<<>>> expression is semantically valid, and then we try to EmitCUDAKernelCallExpr, because at CodeGen that is gated on whether the entire compilation is host or device, not on whether or not the caller is __global__ or __device__. So either the latter check should actually establish the caller's context, or we should bypass this altogether when compiling for hipstdpar. This is the simplest NFC workaround to unbreak things.

Why not add getLangOpts().HIPStdPar check in sema to skip generating device-side kernel call? So that we have a central place to make that decision?

Because, as far as I can ascertain, the Sema check is insufficient / the separate assert in EmitCUDAKernelCallExpr is disjoint. Here's what would happen:

  1. In Sema what we see is that IsDeviceKernelCall is false - this is fine, but we still would emit a CudaKernelCallExpr for the <<<>>> callsite, which was the case anyways before this change;
  2. Later on, when we get to CodeGen, we see the CudaKernelCallExpr, and try to handle it, except now the assumption is that if we're compiling for device and we see that, it must be a device side launch, and go look up a non-existent symbol, and run into the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants