Skip to content

Conversation

@XChy
Copy link
Member

@XChy XChy commented Dec 1, 2025

Fixes #154772
We previously set ptx_kernel for all kernels. But it's incorrect to add ptx_kernel to the stub version of kernel introduced in #115821. This patch copies the workaround of AMDGPU.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Dec 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Hongyu Chen (XChy)

Changes

Fixes #154772
We previously set ptx_kernel for all kernels. But it's incorrect to add ptx_kernel to the stub version of kernel introduced in #115821. This patch copies the workaround of AMDGPU.


Full diff: https://github.com/llvm/llvm-project/pull/170170.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/NVPTX.cpp (+3-1)
  • (modified) clang/test/CodeGenOpenCL/ptx-calls.cl (+15-4)
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp
index f6715861d91bc..5afef658c840b 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -277,7 +277,9 @@ void NVPTXTargetCodeGenInfo::setTargetAttributes(
     }
   }
   // Attach kernel metadata directly if compiling for NVPTX.
-  if (FD->hasAttr<DeviceKernelAttr>())
+  // NOTE: Don't set kernel calling convention for handled OpenCL kernel,
+  // otherwise the stub version of kernel would be incorrect.
+  if (FD->hasAttr<DeviceKernelAttr>() && !M.getLangOpts().OpenCL)
     F->setCallingConv(getDeviceKernelCallingConv());
 }
 
diff --git a/clang/test/CodeGenOpenCL/ptx-calls.cl b/clang/test/CodeGenOpenCL/ptx-calls.cl
index ae187173b1730..d5e27fce426a7 100644
--- a/clang/test/CodeGenOpenCL/ptx-calls.cl
+++ b/clang/test/CodeGenOpenCL/ptx-calls.cl
@@ -1,11 +1,22 @@
-// RUN: %clang_cc1 %s -triple nvptx-unknown-unknown -emit-llvm -O0 -o - | FileCheck %s
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
+// RUN: %clang_cc1 %s -triple nvptx-unknown-unknown -emit-llvm -O1 -o - | FileCheck %s
 
+// CHECK-LABEL: define dso_local void @device_function(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    ret void
+//
 void device_function() {
 }
-// CHECK-LABEL: define{{.*}} void @device_function()
 
+// CHECK-LABEL: define dso_local ptx_kernel void @kernel_function(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR1:[0-9]+]] !kernel_arg_addr_space [[META7:![0-9]+]] !kernel_arg_access_qual [[META7]] !kernel_arg_type [[META7]] !kernel_arg_base_type [[META7]] !kernel_arg_type_qual [[META7]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    ret void
+//
 __kernel void kernel_function() {
   device_function();
 }
-// CHECK-LABEL: define{{.*}} ptx_kernel void @kernel_function()
-// CHECK: call void @device_function()
+//.
+// CHECK: [[META7]] = !{}
+//.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

🐧 Linux x64 Test Results

  • 111718 tests passed
  • 4465 tests skipped

✅ The build succeeded and all tests passed.

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a right fix. To me it seems like getDeviceKernelCallingConv should not return the wrong CC in the first place.

@XChy
Copy link
Member Author

XChy commented Dec 1, 2025

This doesn't seem like a right fix. To me it seems like getDeviceKernelCallingConv should not return the wrong CC in the first place.

I am unable to determine whether a function is a stub in the context of getDeviceKernelCallingConv or setTargetAttributes, unless we check the name prefix __clang_ocl_kern_imp. It looks like setOCLKernelStubCallingConvention should be the one that sets the correct CC of the OpenCL kernel.

if (FD->hasAttr<DeviceKernelAttr>())
// NOTE: Don't set kernel calling convention for handled OpenCL kernel,
// otherwise the stub version of kernel would be incorrect.
if (FD->hasAttr<DeviceKernelAttr>() && !M.getLangOpts().OpenCL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language check is certainly the wrong thing to be looking for. Isn't this code invoked twice for each of them? This should only apply to one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context here, I didn't find a way to check whether it's a stub unless by the mangled name. So I apply this code to the ones without __clang_ocl_kern_imp_ prefix in the mangled name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clang 21: unreachable produced when compiling OpenCL C for NVPTX target

4 participants