[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation #164326

Jason-VanBeusekom · 2025-10-20T21:55:55Z

This PR adds checks for when emitting weak aliases in: void CodeGenModule::EmitGlobal(GlobalDecl GD), before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation.

For OpenMP the following case now works:

Failed before when compiling with device, ie: clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa
int __Two(void) { return 2; }
int Two(void) __attribute__ ((weak, alias("__Two")));

For HIP / Cuda:

int __HostFunc(void) { return 42; }
int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));

For HIP:

Failed before on HIP, Cuda fails due to: NVPTX aliasee must not be '.weak' error
__device__ int __One(void) { return 2; }
__device__ int One(void) __attribute__ ((weak, alias("__One")));

Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang.

Fixes #117369

…when aliasee is no declared on device Add checks to skip the emitting of an alias on the device when the aliasee is not declared on the device. This change effects OpenMP, Hip and Cuda.

github-actions · 2025-10-20T21:56:15Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-10-20T21:56:58Z

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (Jason-VanBeusekom)

Changes

This PR adds checks for when emitting weak aliases in: void CodeGenModule::EmitGlobal(GlobalDecl GD), before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation.

For OpenMP the following case now works:

> Failed before when compiling with device, ie: clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa
> > int __Two(void) { return 2; } > int Two(void) __attribute__ ((weak, alias("__Two"))); >

For HIP / Cuda:

>
> > int __HostFunc(void) { return 42; } > int HostFunc(void) __attribute__ ((weak, alias("__HostFunc"))); >

For HIP:

>Failed before on HIP, Cuda fails due to: NVPTX aliasee must not be '.weak' error
> > __device__ int __One(void) { return 2; } > __device__ int One(void) __attribute__ ((weak, alias("__One"))); >

Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang.

Fixes #117369

Full diff: https://github.com/llvm/llvm-project/pull/164326.diff

5 Files Affected:

(modified) clang/lib/CodeGen/CodeGenModule.cpp (+33-1)
(added) clang/test/CodeGenCUDA/cuda_weak_alias.cu (+36)
(added) clang/test/CodeGenHIP/hip_weak_alias.cpp (+63)
(added) clang/test/OpenMP/amdgcn_weak_alias.c (+90)
(added) clang/test/OpenMP/nvptx_weak_alias.c (+34)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c5eb14e329315..ac0de5a221ec7 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4065,8 +4065,40 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
 
   // If this is an alias definition (which otherwise looks like a declaration)
   // emit it now.
-  if (Global->hasAttr<AliasAttr>())
+  if (Global->hasAttr<AliasAttr>()) {
+    if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
+      const auto *AA = Global->getAttr<AliasAttr>();
+      assert(AA && "Not an alias?");
+      GlobalDecl AliaseeGD;
+      if (!lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
+        if (LangOpts.CUDA)
+          // Failed to find aliasee on device side, skip emitting
+          return;
+      } else {
+        const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
+        if (LangOpts.OpenMPIsTargetDevice) {
+          if (!AliaseeDecl ||
+              !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(
+                  AliaseeDecl))
+            // Not a target declaration, skip emitting
+            return;
+        } else {
+          // HIP/CUDA
+          const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
+          const bool AliaseeHasDeviceAttr =
+              AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();
+          if (LangOpts.CUDAIsDevice) {
+            if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
+              return;
+          } else if (HasDeviceAttr && AliaseeHasDeviceAttr) {
+            // Alias is only on device side, skip emitting on host side
+            return;
+          }
+        }
+      }
+    }
     return EmitAliasDefinition(GD);
+  }
 
   // IFunc like an alias whose value is resolved at runtime by calling resolver.
   if (Global->hasAttr<IFuncAttr>())
diff --git a/clang/test/CodeGenCUDA/cuda_weak_alias.cu b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
new file mode 100644
index 0000000000000..fda0ed7e5d74b
--- /dev/null
+++ b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
@@ -0,0 +1,36 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+// RUN: %clang_cc1 -x cuda -triple x86_64-unknown-linux-gnu -aux-triple nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+
+extern "C" {
+
+//.
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main() {
+    return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenHIP/hip_weak_alias.cpp b/clang/test/CodeGenHIP/hip_weak_alias.cpp
new file mode 100644
index 0000000000000..6a57ce1ab74c7
--- /dev/null
+++ b/clang/test/CodeGenHIP/hip_weak_alias.cpp
@@ -0,0 +1,63 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -emit-llvm %s -fcuda-is-device -o - | FileCheck %s --check-prefix=DEVICE
+
+#define __device__ __attribute__((device))
+
+extern "C" {
+
+//.
+// HOST: @__hip_cuid_ = global i8 0
+// HOST: @llvm.compiler.used = appending global [1 x ptr] [ptr @__hip_cuid_], section "llvm.metadata"
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// DEVICE: @__hip_cuid_ = addrspace(1) global i8 0
+// DEVICE: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata"
+// DEVICE: @One = weak alias i32 (), ptr @__One
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+// DEVICE-LABEL: define dso_local i32 @__One(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 2
+//
+__device__ int __One(void) { return 2; }
+__device__ int One(void) __attribute__ ((weak, alias("__One")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main() {
+    return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
+// DEVICE: [[META2:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META3:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/amdgcn_weak_alias.c b/clang/test/OpenMP/amdgcn_weak_alias.c
new file mode 100644
index 0000000000000..bf8645bef6d78
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_weak_alias.c
@@ -0,0 +1,90 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -fopenmp -x c -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=DEVICE
+
+//.
+// HOST: @One = weak alias i32 (), ptr @__One
+// HOST: @Two = weak alias i32 (), ptr @__Two
+// HOST: @Three = weak alias i32 (), ptr @__Three
+//.
+// DEVICE: @__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @Two = weak hidden alias i32 (), ptr @__Two
+// DEVICE: @Three = weak hidden alias i32 (), ptr @__Three
+// DEVICE: @Three.1 = weak hidden alias i32 (), ptr @__Three
+//.
+// HOST-LABEL: define dso_local i32 @__One(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Two(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 2
+//
+// DEVICE-LABEL: define hidden i32 @__Two(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 2
+//
+int __Two(void) { return 2; }
+int Two(void) __attribute__ ((weak, alias("__Two")));
+#pragma omp end declare target
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Three(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 3
+//
+// DEVICE-LABEL: define hidden i32 @__Three(
+// DEVICE-SAME: ) #[[ATTR0]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 3
+//
+int __Three(void) { return 3; }
+#pragma omp end declare target
+int Three(void) __attribute__ ((weak, alias("__Three")));
+
+
+// HOST-LABEL: define dso_local i32 @main(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main(){
+    return 0;
+}
+
+//.
+// HOST: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// HOST: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META2:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// DEVICE: [[META3:![0-9]+]] = !{i32 7, !"openmp-device", i32 51}
+// DEVICE: [[META4:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/nvptx_weak_alias.c b/clang/test/OpenMP/nvptx_weak_alias.c
new file mode 100644
index 0000000000000..695bd7d0b8af9
--- /dev/null
+++ b/clang/test/OpenMP/nvptx_weak_alias.c
@@ -0,0 +1,34 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s
+
+//.
+// CHECK: @One = weak alias i32 (), ptr @__One
+//.
+// CHECK-LABEL: define dso_local i32 @__One(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+
+// CHECK-LABEL: define dso_local i32 @main(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret i32 0
+//
+int main(){
+    return 0;
+}
+//.
+// CHECK: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.

Jason-VanBeusekom · 2025-10-20T22:00:45Z

@jhuber6 @alexey-bataev, anyone else I should ping?

jhuber6 · 2025-10-20T22:29:09Z

clang/lib/CodeGen/CodeGenModule.cpp

+          if (LangOpts.CUDAIsDevice) {
+            if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
+              return;
+          } else if (HasDeviceAttr && AliaseeHasDeviceAttr) {
+            // Alias is only on device side, skip emitting on host side
+            return;


What is up with the logic nesting here, we have an if, nesed if, and an else if, that all do the same thing?

It is needed as LangOpts.CUDA is true for both the host and device compilation, for the host compilation case (the else if) we need to skip if the alias / Aliasee is only on the device, and for the device side we need to make sure both the alias and aliasee are both on the device.

in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.

jhuber6 · 2025-10-20T22:29:26Z

clang/lib/CodeGen/CodeGenModule.cpp

+        if (LangOpts.CUDA)
+          // Failed to find aliasee on device side, skip emitting


This is a hard error on CUDA we should honestly reject it at Sema.

LangOpts.CUDA is true for both HIP and CUDA cases for both host and device compilation, this is not a hard error as this is triggered during host compilation for a device only alias and during device compilation for a host only alais

in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.

jhuber6 · 2025-10-20T22:30:08Z

clang/lib/CodeGen/CodeGenModule.cpp

+  if (Global->hasAttr<AliasAttr>()) {
+    if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
+      const auto *AA = Global->getAttr<AliasAttr>();
+      assert(AA && "Not an alias?");


Why are we asserting something we know for a fact is true?

Removed in:16c1a68

I copied this from EmitAliasDefinition which does not check with Global->hasAttr<AliasAttr>()

jhuber6 · 2025-10-20T22:32:22Z

clang/lib/CodeGen/CodeGenModule.cpp

  // emit it now.
-  if (Global->hasAttr<AliasAttr>())
+  if (Global->hasAttr<AliasAttr>()) {
+    if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {


All of this should be a helper function that returns a bool most likely.

Refactor done in: 16c1a68

jhuber6 · 2025-10-20T22:32:41Z

clang/test/CodeGenCUDA/cuda_weak_alias.cu

+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main() {
+    return 0;
+}


removed in: 16c1a68

I'm still new to LIT tests, did not know main() was not needed.

yxsamliu · 2025-10-21T14:08:57Z

clang/test/CodeGenHIP/hip_weak_alias.cpp

+//
+__device__ int __One(void) { return 2; }
+__device__ int One(void) __attribute__ ((weak, alias("__One")));
+


can we add lit tests for:

aliasee being a host device function

aliasee being a constexpr function

aliasee being one of an overloaded device functions aliased by mangled name

I added the cases for HIP and amdgcn for OpenMP in 16c1a68,

The cases work without modification of the patch.

yxsamliu · 2025-10-21T14:11:41Z

can we add a section "Alias Attribute Support" to https://clang.llvm.org/docs/HIPSupport.html, briefly describe the usecase of alias attribute in HIP with some examples?

…e li tests

Jason-VanBeusekom · 2025-10-24T20:43:18Z

16c1a68 Addresses the above feedback (minus documentation).

I refactored the code to be more readable about the states, while, also, reducing the nested conditionals and added comments to describe the states.

Below I wrote a modified version of shouldSkipAliasEmission with debug prints better helps show the logic and show the results for various cases.

Of note:
-For some of the incorrect hip cases the code does compiler under certain cases
-The code / case printf("OpenMP aliasee not found error\n"); is never executed as lookupRepresentativeDecl does not return false for The OpenMP case, I still put an error there, but I am not sure how it would be executed.

static bool shouldSkipAliasEmission(const CodeGenModule &CGM,
                                    const ValueDecl *Global) {
  const LangOptions &LangOpts = CGM.getLangOpts();
  if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA)) {
    printf("Not cuda / openmp target returning false\n");
    return false;
  }
  const auto *AA = Global->getAttr<AliasAttr>();
  GlobalDecl AliaseeGD;

  // Check if the aliasee exists.
  if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
    if (LangOpts.CUDA) {
      printf("HIP / CUDA aliasse not found returning true\n");
      // In CUDA/HIP, if the aliasee is not found, skip the alias emission.
      // This is not a hard error as this branch is executed for both the host
      // and device, with no respect to where the aliasee is defined.
      return true;
    }
    printf("OpenMP aliasee not found error\n");
    // For OpenMP, lookupRepresentativeDecl should always find the aliasee, this
    // is an error
    CGM.getDiags().Report(AA->getLocation(), diag::err_alias_to_undefined)
        << false << true;
    return false;
  }

  const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
  if (LangOpts.OpenMPIsTargetDevice) {
    if (!AliaseeDecl ||
        !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl)) {
      printf("OpenMP aliasse not target decl returning true\n");
      return true;
    }
    printf("OpenMP aliasse is target decl returning false\n");
    return false;
  }

  // CUDA / HIP
  const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
  const bool AliaseeHasDeviceAttr =
      AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();

  if (LangOpts.CUDAIsDevice) {
    if (!HasDeviceAttr || !AliaseeHasDeviceAttr) {
      // On device, skip alias emission if either the alias or the aliasee
      // is not marked with __device__.
      printf(
          "Skipping HIP / CUDA alias emission on device for host only alias\n");
      return true;
    }
    printf("Emitting HIP / CUDA alias on device\n");
    return false;
  }

  printf("Emitting HIP / CUDA alias on host\n");
  // CUDA / HIP Host
  // we know that the aliasee exists from above, so we know to emit
  return false;
}

And prints out the following for each case:

OpenMP:

#pragma omp declare target
int __One(void) { return 1;  }
#pragma omp end declare target
int One(void) __attribute__ ((weak, alias("__One")));

Outputs:

    Not cuda / openmp target returning false
    OpenMP aliasse is target decl returning false
    OpenMP aliasse is target decl returning false

#pragma omp declare target
int __One(void) { return 1;  }
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare target

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse is target decl returning false

int __One(void) { return 1;  }
int One(void) __attribute__ ((weak, alias("__One")));

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse not target decl returning true

int __One(void) { return 1;  }
#pragma omp declare target
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare target

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse not target decl returning true

Results in runtime error (expected):

omptarget error: Failed to load image "a provided binary image is malformed" failed to load binary 0x55b530291d60
omptarget fatal error 0: "the plugin backend is in an invalid or unsupported state" failed to load images on device '0'

HIP:

__device__ int __DevTwo(void) { return 2; }
__device__ int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

Emitting HIP / CUDA alias on device
HIP / CUDA aliasse not found returning true

int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

HIP / CUDA aliasse not found returning true
    Emitting HIP / CUDA alias on host

int __DevTwo(void) { return 2; }
__device__ int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

HIP / CUDA aliasse not found returning true
Emitting HIP / CUDA alias on host

(compiles with no error, calling aliasee on host works, calling alias on device results in compiler error calling alaisee on host results in compiler error, calling on device results in runtime error [memory access fault])

we could add check to have error here, but not sure if we want to

__device__ int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

Skipping HIP / CUDA alias emission on device for host only alias
HIP / CUDA aliasse not found returning true

-get compiler error when trying to call alias on device
-able to call aliasee on device
-call alias on host results in segfault at runtime
-call aliasee on host results in compiler error

could add check to have error here, but not sure if we want to

jhuber6

Style nits

@Artem-B CUDA does not support weak on anything but non-kernel functions. Should we handle that in Sema or just let the backend die later.

jhuber6 · 2025-11-05T15:42:27Z

clang/lib/CodeGen/CodeGenModule.cpp

+  // Check if the aliasee exists.
+  if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
+    // If the aliasee is not found, skip the alias emission.
+    // This is not a hard error as this branch is executed for both the host
+    // and device, with no respect to where the aliasee is defined.
+    // For some OpenMP cases (functions) this will return true even if the
+    // aliasee is not on the device, which is handled by the case below
+    return true;
+  }


No braces around single block, make the comment less verbose and put it above the condition.

Done in 2949485

jhuber6 · 2025-11-05T15:43:15Z

clang/lib/CodeGen/CodeGenModule.cpp

+  if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA))
+    return false;


Suggested change

if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA))

return false;

if (!LangOpts.OpenMPIsTargetDevice && !LangOpts.CUDA))

return false;

Clearer.

Done in 2949485

jhuber6 · 2025-11-05T15:43:39Z

clang/lib/CodeGen/CodeGenModule.cpp

+    if (!AliaseeDecl ||
+        !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl))
+      // On OpenMP device, skip alias emission if the aliasee is not marked
+      // with declare target.
+      return true;
+    return false;


Suggested change

if (!AliaseeDecl ||

!OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl))

// On OpenMP device, skip alias emission if the aliasee is not marked

// with declare target.

return true;

return false;

return !AliaseeDecl || !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl);

Done in 2949485

jhuber6 · 2025-11-05T15:44:09Z

clang/lib/CodeGen/CodeGenModule.cpp

+  if (LangOpts.CUDAIsDevice) {
+    if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
+      // On device, skip alias emission if either the alias or the aliasee
+      // is not marked with __device__.
+      return true;
+    return false;
+  }


Suggested change

if (LangOpts.CUDAIsDevice) {

if (!HasDeviceAttr || !AliaseeHasDeviceAttr)

// On device, skip alias emission if either the alias or the aliasee

// is not marked with __device__.

return true;

return false;

}

if (LangOpts.CUDAIsDevice)

return !HasDeviceAttr || !AliaseeHasDeviceAttr;

Done in 2949485

jhuber6 · 2025-11-05T15:46:30Z

clang/test/CodeGenCUDA/cuda_weak_alias.cu

@@ -0,0 +1,24 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target


Suggested change

// REQUIRES: nvptx-registered-target

This shouldn't be necessary since we're not hitting PTX codegen, here and elsewhere. Also I know these tests are autogenerated, but we don't need to check the attributes. Totally find to trim things from the autogenerated output.

Trimmed up the checks, in 2949485, I kept the function definitions as they technically relate to the issue, but I also see an argument to also removing those.

[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation …

26b90e0

…when aliasee is no declared on device Add checks to skip the emitting of an alias on the device when the aliasee is not declared on the device. This change effects OpenMP, Hip and Cuda.

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Oct 20, 2025

jhuber6 reviewed Oct 20, 2025

View reviewed changes

jhuber6 requested review from Artem-B and yxsamliu October 20, 2025 22:33

yxsamliu reviewed Oct 21, 2025

View reviewed changes

Move weak alias code to seperate function, refactor weak alias, updat…

16c1a68

…e li tests

Jason-VanBeusekom requested a review from jhuber6 October 24, 2025 21:47

Jason-VanBeusekom added 2 commits October 24, 2025 19:18

Update lit tests for variable and non weak aliasing

73f219f

update Hip documentation

7fd51cc

Jason-VanBeusekom requested a review from yxsamliu October 30, 2025 18:41

jhuber6 reviewed Nov 5, 2025

View reviewed changes

Trim lit tests and format fixes

2949485

Jason-VanBeusekom requested a review from jhuber6 November 7, 2025 20:06

		if (LangOpts.CUDA)
		// Failed to find aliasee on device side, skip emitting

		if (!(LangOpts.OpenMPIsTargetDevice \|\| LangOpts.CUDA))
		return false;

		@@ -0,0 +1,24 @@
		// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
		// REQUIRES: nvptx-registered-target

[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation #164326

Are you sure you want to change the base?

[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation #164326

Uh oh!

Conversation

Jason-VanBeusekom commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

llvmbot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jason-VanBeusekom commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yxsamliu commented Oct 21, 2025

Uh oh!

Jason-VanBeusekom commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenMP:

Outputs:

Outputs:

Outputs:

Outputs:

Results in runtime error (expected):

HIP:

Outputs:

Outputs:

Outputs:

we could add check to have error here, but not sure if we want to

Outputs:

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

llvmbot commented Oct 20, 2025 •

edited

Loading

Jason-VanBeusekom commented Oct 24, 2025 •

edited

Loading