Skip to content

Conversation

@Jason-VanBeusekom
Copy link

This PR adds checks for when emitting weak aliases in: void CodeGenModule::EmitGlobal(GlobalDecl GD), before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation.

For OpenMP the following case now works:

Failed before when compiling with device, ie: clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa

int __Two(void) { return 2; }
int Two(void) __attribute__ ((weak, alias("__Two")));

For HIP / Cuda:

int __HostFunc(void) { return 42; }
int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));

For HIP:

Failed before on HIP, Cuda fails due to: NVPTX aliasee must not be '.weak' error

__device__ int __One(void) { return 2; }
__device__ int One(void) __attribute__ ((weak, alias("__One")));

Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang.

Fixes #117369

…when aliasee is no declared on device

Add checks to skip the emitting of an alias on the device
when the aliasee is not declared on the device. This change effects
OpenMP, Hip and Cuda.
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Oct 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (Jason-VanBeusekom)

Changes

This PR adds checks for when emitting weak aliases in: void CodeGenModule::EmitGlobal(GlobalDecl GD), before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation.

For OpenMP the following case now works:

> Failed before when compiling with device, ie: clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa
> > int __Two(void) { return 2; } > int Two(void) __attribute__ ((weak, alias("__Two"))); >

For HIP / Cuda:

>
> > int __HostFunc(void) { return 42; } > int HostFunc(void) __attribute__ ((weak, alias("__HostFunc"))); >

For HIP:

>Failed before on HIP, Cuda fails due to: NVPTX aliasee must not be '.weak' error
> > __device__ int __One(void) { return 2; } > __device__ int One(void) __attribute__ ((weak, alias("__One"))); >

Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang.

Fixes #117369


Full diff: https://github.com/llvm/llvm-project/pull/164326.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+33-1)
  • (added) clang/test/CodeGenCUDA/cuda_weak_alias.cu (+36)
  • (added) clang/test/CodeGenHIP/hip_weak_alias.cpp (+63)
  • (added) clang/test/OpenMP/amdgcn_weak_alias.c (+90)
  • (added) clang/test/OpenMP/nvptx_weak_alias.c (+34)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c5eb14e329315..ac0de5a221ec7 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4065,8 +4065,40 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
 
   // If this is an alias definition (which otherwise looks like a declaration)
   // emit it now.
-  if (Global->hasAttr<AliasAttr>())
+  if (Global->hasAttr<AliasAttr>()) {
+    if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
+      const auto *AA = Global->getAttr<AliasAttr>();
+      assert(AA && "Not an alias?");
+      GlobalDecl AliaseeGD;
+      if (!lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
+        if (LangOpts.CUDA)
+          // Failed to find aliasee on device side, skip emitting
+          return;
+      } else {
+        const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
+        if (LangOpts.OpenMPIsTargetDevice) {
+          if (!AliaseeDecl ||
+              !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(
+                  AliaseeDecl))
+            // Not a target declaration, skip emitting
+            return;
+        } else {
+          // HIP/CUDA
+          const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
+          const bool AliaseeHasDeviceAttr =
+              AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();
+          if (LangOpts.CUDAIsDevice) {
+            if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
+              return;
+          } else if (HasDeviceAttr && AliaseeHasDeviceAttr) {
+            // Alias is only on device side, skip emitting on host side
+            return;
+          }
+        }
+      }
+    }
     return EmitAliasDefinition(GD);
+  }
 
   // IFunc like an alias whose value is resolved at runtime by calling resolver.
   if (Global->hasAttr<IFuncAttr>())
diff --git a/clang/test/CodeGenCUDA/cuda_weak_alias.cu b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
new file mode 100644
index 0000000000000..fda0ed7e5d74b
--- /dev/null
+++ b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
@@ -0,0 +1,36 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+// RUN: %clang_cc1 -x cuda -triple x86_64-unknown-linux-gnu -aux-triple nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+
+extern "C" {
+
+//.
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main() {
+    return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenHIP/hip_weak_alias.cpp b/clang/test/CodeGenHIP/hip_weak_alias.cpp
new file mode 100644
index 0000000000000..6a57ce1ab74c7
--- /dev/null
+++ b/clang/test/CodeGenHIP/hip_weak_alias.cpp
@@ -0,0 +1,63 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -emit-llvm %s -fcuda-is-device -o - | FileCheck %s --check-prefix=DEVICE
+
+#define __device__ __attribute__((device))
+
+extern "C" {
+
+//.
+// HOST: @__hip_cuid_ = global i8 0
+// HOST: @llvm.compiler.used = appending global [1 x ptr] [ptr @__hip_cuid_], section "llvm.metadata"
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// DEVICE: @__hip_cuid_ = addrspace(1) global i8 0
+// DEVICE: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata"
+// DEVICE: @One = weak alias i32 (), ptr @__One
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+// DEVICE-LABEL: define dso_local i32 @__One(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 2
+//
+__device__ int __One(void) { return 2; }
+__device__ int One(void) __attribute__ ((weak, alias("__One")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main() {
+    return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
+// DEVICE: [[META2:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META3:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/amdgcn_weak_alias.c b/clang/test/OpenMP/amdgcn_weak_alias.c
new file mode 100644
index 0000000000000..bf8645bef6d78
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_weak_alias.c
@@ -0,0 +1,90 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -fopenmp -x c -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=DEVICE
+
+//.
+// HOST: @One = weak alias i32 (), ptr @__One
+// HOST: @Two = weak alias i32 (), ptr @__Two
+// HOST: @Three = weak alias i32 (), ptr @__Three
+//.
+// DEVICE: @__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @Two = weak hidden alias i32 (), ptr @__Two
+// DEVICE: @Three = weak hidden alias i32 (), ptr @__Three
+// DEVICE: @Three.1 = weak hidden alias i32 (), ptr @__Three
+//.
+// HOST-LABEL: define dso_local i32 @__One(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Two(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 2
+//
+// DEVICE-LABEL: define hidden i32 @__Two(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 2
+//
+int __Two(void) { return 2; }
+int Two(void) __attribute__ ((weak, alias("__Two")));
+#pragma omp end declare target
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Three(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    ret i32 3
+//
+// DEVICE-LABEL: define hidden i32 @__Three(
+// DEVICE-SAME: ) #[[ATTR0]] {
+// DEVICE-NEXT:  [[ENTRY:.*:]]
+// DEVICE-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT:    ret i32 3
+//
+int __Three(void) { return 3; }
+#pragma omp end declare target
+int Three(void) __attribute__ ((weak, alias("__Three")));
+
+
+// HOST-LABEL: define dso_local i32 @main(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT:  [[ENTRY:.*:]]
+// HOST-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT:    ret i32 0
+//
+int main(){
+    return 0;
+}
+
+//.
+// HOST: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// HOST: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META2:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// DEVICE: [[META3:![0-9]+]] = !{i32 7, !"openmp-device", i32 51}
+// DEVICE: [[META4:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/nvptx_weak_alias.c b/clang/test/OpenMP/nvptx_weak_alias.c
new file mode 100644
index 0000000000000..695bd7d0b8af9
--- /dev/null
+++ b/clang/test/OpenMP/nvptx_weak_alias.c
@@ -0,0 +1,34 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s
+
+//.
+// CHECK: @One = weak alias i32 (), ptr @__One
+//.
+// CHECK-LABEL: define dso_local i32 @__One(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+
+// CHECK-LABEL: define dso_local i32 @main(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// CHECK-NEXT:    ret i32 0
+//
+int main(){
+    return 0;
+}
+//.
+// CHECK: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.

@Jason-VanBeusekom
Copy link
Author

@jhuber6 @alexey-bataev, anyone else I should ping?

Comment on lines 4090 to 4095
if (LangOpts.CUDAIsDevice) {
if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
return;
} else if (HasDeviceAttr && AliaseeHasDeviceAttr) {
// Alias is only on device side, skip emitting on host side
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is up with the logic nesting here, we have an if, nesed if, and an else if, that all do the same thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed as LangOpts.CUDA is true for both the host and device compilation, for the host compilation case (the else if) we need to skip if the alias / Aliasee is only on the device, and for the device side we need to make sure both the alias and aliasee are both on the device.

in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.

Comment on lines 4074 to 4075
if (LangOpts.CUDA)
// Failed to find aliasee on device side, skip emitting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard error on CUDA we should honestly reject it at Sema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangOpts.CUDA is true for both HIP and CUDA cases for both host and device compilation, this is not a hard error as this is triggered during host compilation for a device only alias and during device compilation for a host only alais

in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.

if (Global->hasAttr<AliasAttr>()) {
if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
const auto *AA = Global->getAttr<AliasAttr>();
assert(AA && "Not an alias?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we asserting something we know for a fact is true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in:16c1a68

I copied this from EmitAliasDefinition which does not check with Global->hasAttr<AliasAttr>()

// emit it now.
if (Global->hasAttr<AliasAttr>())
if (Global->hasAttr<AliasAttr>()) {
if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be a helper function that returns a bool most likely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor done in: 16c1a68

Comment on lines 20 to 29
// HOST-LABEL: define dso_local noundef i32 @main(
// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
// HOST-NEXT: [[ENTRY:.*:]]
// HOST-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
// HOST-NEXT: store i32 0, ptr [[RETVAL]], align 4
// HOST-NEXT: ret i32 0
//
int main() {
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why main?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in: 16c1a68

I'm still new to LIT tests, did not know main() was not needed.

@jhuber6 jhuber6 requested review from Artem-B and yxsamliu October 20, 2025 22:33
//
__device__ int __One(void) { return 2; }
__device__ int One(void) __attribute__ ((weak, alias("__One")));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add lit tests for:

  1. aliasee being a host device function
  2. aliasee being a constexpr function
  3. aliasee being one of an overloaded device functions aliased by mangled name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the cases for HIP and amdgcn for OpenMP in 16c1a68,

The cases work without modification of the patch.

@yxsamliu
Copy link
Collaborator

can we add a section "Alias Attribute Support" to https://clang.llvm.org/docs/HIPSupport.html, briefly describe the usecase of alias attribute in HIP with some examples?

@Jason-VanBeusekom
Copy link
Author

Jason-VanBeusekom commented Oct 24, 2025

16c1a68 Addresses the above feedback (minus documentation).

I refactored the code to be more readable about the states, while, also, reducing the nested conditionals and added comments to describe the states.

Below I wrote a modified version of shouldSkipAliasEmission with debug prints better helps show the logic and show the results for various cases.

Of note:
-For some of the incorrect hip cases the code does compiler under certain cases
-The code / case printf("OpenMP aliasee not found error\n"); is never executed as lookupRepresentativeDecl does not return false for The OpenMP case, I still put an error there, but I am not sure how it would be executed.

static bool shouldSkipAliasEmission(const CodeGenModule &CGM,
                                    const ValueDecl *Global) {
  const LangOptions &LangOpts = CGM.getLangOpts();
  if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA)) {
    printf("Not cuda / openmp target returning false\n");
    return false;
  }
  const auto *AA = Global->getAttr<AliasAttr>();
  GlobalDecl AliaseeGD;

  // Check if the aliasee exists.
  if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
    if (LangOpts.CUDA) {
      printf("HIP / CUDA aliasse not found returning true\n");
      // In CUDA/HIP, if the aliasee is not found, skip the alias emission.
      // This is not a hard error as this branch is executed for both the host
      // and device, with no respect to where the aliasee is defined.
      return true;
    }
    printf("OpenMP aliasee not found error\n");
    // For OpenMP, lookupRepresentativeDecl should always find the aliasee, this
    // is an error
    CGM.getDiags().Report(AA->getLocation(), diag::err_alias_to_undefined)
        << false << true;
    return false;
  }

  const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
  if (LangOpts.OpenMPIsTargetDevice) {
    if (!AliaseeDecl ||
        !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl)) {
      printf("OpenMP aliasse not target decl returning true\n");
      return true;
    }
    printf("OpenMP aliasse is target decl returning false\n");
    return false;
  }

  // CUDA / HIP
  const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
  const bool AliaseeHasDeviceAttr =
      AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();

  if (LangOpts.CUDAIsDevice) {
    if (!HasDeviceAttr || !AliaseeHasDeviceAttr) {
      // On device, skip alias emission if either the alias or the aliasee
      // is not marked with __device__.
      printf(
          "Skipping HIP / CUDA alias emission on device for host only alias\n");
      return true;
    }
    printf("Emitting HIP / CUDA alias on device\n");
    return false;
  }

  printf("Emitting HIP / CUDA alias on host\n");
  // CUDA / HIP Host
  // we know that the aliasee exists from above, so we know to emit
  return false;
}

And prints out the following for each case:

OpenMP:

#pragma omp declare target
int __One(void) { return 1;  }
#pragma omp end declare target
int One(void) __attribute__ ((weak, alias("__One")));

Outputs:

    Not cuda / openmp target returning false
    OpenMP aliasse is target decl returning false
    OpenMP aliasse is target decl returning false

#pragma omp declare target
int __One(void) { return 1;  }
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare target

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse is target decl returning false

int __One(void) { return 1;  }
int One(void) __attribute__ ((weak, alias("__One")));

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse not target decl returning true

int __One(void) { return 1;  }
#pragma omp declare target
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare target

Outputs:

Not cuda / openmp target returning false
OpenMP aliasse not target decl returning true
Results in runtime error (expected):

omptarget error: Failed to load image "a provided binary image is malformed" failed to load binary 0x55b530291d60
omptarget fatal error 0: "the plugin backend is in an invalid or unsupported state" failed to load images on device '0'


HIP:

__device__ int __DevTwo(void) { return 2; }
__device__ int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

Emitting HIP / CUDA alias on device
HIP / CUDA aliasse not found returning true

int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

HIP / CUDA aliasse not found returning true
    Emitting HIP / CUDA alias on host

int __DevTwo(void) { return 2; }
__device__ int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

HIP / CUDA aliasse not found returning true
Emitting HIP / CUDA alias on host

(compiles with no error, calling aliasee on host works, calling alias on device results in compiler error calling alaisee on host results in compiler error, calling on device results in runtime error [memory access fault])

we could add check to have error here, but not sure if we want to

__device__ int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));

Outputs:

Skipping HIP / CUDA alias emission on device for host only alias
HIP / CUDA aliasse not found returning true
-get compiler error when trying to call alias on device
-able to call aliasee on device
-call alias on host results in segfault at runtime
-call aliasee on host results in compiler error

could add check to have error here, but not sure if we want to

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nits

@Artem-B CUDA does not support weak on anything but non-kernel functions. Should we handle that in Sema or just let the backend die later.

Comment on lines 4055 to 4063
// Check if the aliasee exists.
if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
// If the aliasee is not found, skip the alias emission.
// This is not a hard error as this branch is executed for both the host
// and device, with no respect to where the aliasee is defined.
// For some OpenMP cases (functions) this will return true even if the
// aliasee is not on the device, which is handled by the case below
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No braces around single block, make the comment less verbose and put it above the condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2949485

Comment on lines 4049 to 4050
if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA))
return false;
if (!LangOpts.OpenMPIsTargetDevice && !LangOpts.CUDA))
return false;

Clearer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2949485

Comment on lines 4067 to 4072
if (!AliaseeDecl ||
!OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl))
// On OpenMP device, skip alias emission if the aliasee is not marked
// with declare target.
return true;
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!AliaseeDecl ||
!OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl))
// On OpenMP device, skip alias emission if the aliasee is not marked
// with declare target.
return true;
return false;
return !AliaseeDecl || !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2949485

Comment on lines 4080 to 4086
if (LangOpts.CUDAIsDevice) {
if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
// On device, skip alias emission if either the alias or the aliasee
// is not marked with __device__.
return true;
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (LangOpts.CUDAIsDevice) {
if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
// On device, skip alias emission if either the alias or the aliasee
// is not marked with __device__.
return true;
return false;
}
if (LangOpts.CUDAIsDevice)
return !HasDeviceAttr || !AliaseeHasDeviceAttr;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2949485

@@ -0,0 +1,24 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
// REQUIRES: nvptx-registered-target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// REQUIRES: nvptx-registered-target

This shouldn't be necessary since we're not hitting PTX codegen, here and elsewhere. Also I know these tests are autogenerated, but we don't need to check the attributes. Totally find to trim things from the autogenerated output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed up the checks, in 2949485, I kept the function definitions as they technically relate to the issue, but I also see an argument to also removing those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Clang][OpenMP] Weak alias fails on OpenMP targets

4 participants