Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -9178,7 +9178,13 @@ class Sema final : public SemaBase {
};

/// Compute the mangling number context for a lambda expression or
/// block literal. Also return the extra mangling decl if any.
/// block literal that appears in the specified declaration context in
/// consideration of the current expression evaluation and template
/// instantiation contexts. If the mangling context requires external linkage,
/// then a mangling number context is returned in the first tuple
/// element. If the mangling context is non-normal (specialized for
/// lambda and block types relative to other entities), the overriding
/// declaration is returned in the second tuple element.
///
/// \param DC - The DeclContext containing the lambda expression or
/// block literal.
Expand Down
1 change: 1 addition & 0 deletions clang/lib/AST/ASTContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13395,6 +13395,7 @@ MangleNumberingContext &
ASTContext::getManglingNumberContext(const DeclContext *DC) {
assert(LangOpts.CPlusPlus); // We don't need mangling numbers for plain C.
std::unique_ptr<MangleNumberingContext> &MCtx = MangleNumberingContexts[DC];
DC = DC->getPrimaryContext();
if (!MCtx)
MCtx = createMangleNumberingContext();
return *MCtx;
Expand Down
9 changes: 8 additions & 1 deletion clang/lib/Sema/SemaLambda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,14 @@ Sema::getCurrentMangleNumberContext(const DeclContext *DC) {
return std::make_tuple(&Context.getManglingNumberContext(DC), nullptr);
}

return std::make_tuple(nullptr, nullptr);
if (ManglingContextDecl) {
// Lambdas defined in the initializer of a local variable are mangled
// in the enclosing function context.
if (isa<VarDecl>(ManglingContextDecl) &&
!cast<VarDecl>(ManglingContextDecl)->hasGlobalStorage())
ManglingContextDecl = nullptr;
}
return std::make_tuple(nullptr, ManglingContextDecl);
}

case NonInlineInModulePurview:
Expand Down
61 changes: 61 additions & 0 deletions clang/test/CodeGen/unnamed-types.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
// RUN: %clang_cc1 -O0 -triple x86_64-unknown-unknown \
// RUN: -emit-llvm %s -o - | FileCheck %s

// RUN: %clang_cc1 -O0 -triple x86_64-pc-windows-msvc \
// RUN: -emit-llvm %s -o - | FileCheck %s --check-prefix=MSVC

namespace QL {
auto dg1 = [] { return 1; };
inline auto dg_inline1 = [] { return 1; };
}

namespace QL {
auto dg2 = [] { return 2; };
template<int N>
auto dg_template = [] { return N; };
}

using namespace QL;
template<typename T>
void f(T t) {
t();
}

void g() {
f(dg1);
f(dg2);
f(dg_inline1);
f(dg_template<3>);
}

// CHECK: @_ZN2QL3dg1E = internal global %class.anon undef, align 1
// CHECK: @_ZN2QL3dg2E = internal global %class.anon.0 undef, align 1
// CHECK: @_ZN2QL10dg_inline1E = linkonce_odr global %class.anon.2 undef, comdat, align 1
// CHECK: @_ZN2QL11dg_templateILi3EEE = linkonce_odr global %class.anon.4 undef, comdat, align 1

// MSVC: @"?dg1@QL@@3V<lambda_0>@1@A" = internal global %class.anon undef, align 1
// MSVC: @"?dg2@QL@@3V<lambda_1>@1@A" = internal global %class.anon.0 undef, align 1
// MSVC: @"?dg_inline1@QL@@3V<lambda_1>@01@A" = linkonce_odr dso_local global %class.anon.2 undef, comdat, align 1
// MSVC: @"??$dg_template@$02@QL@@3V<lambda_1>@01@A" = linkonce_odr dso_local global %class.anon.4 undef, comdat, align 1


// CHECK: define internal void @"_Z1fIN2QL3$_0EEvT_"
// CHECK: call noundef i32 @"_ZNK2QL3$_0clEv"
// CHECK: define internal void @"_Z1fIN2QL3$_1EEvT_"
// CHECK: define linkonce_odr void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
// CHECK: call noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// CHECK: define linkonce_odr void @_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_
// CHECK: call noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
// CHECK: define internal noundef i32 @"_ZNK2QL3$_0clEv"
// CHECK: define internal noundef i32 @"_ZNK2QL3$_1clEv"
// CHECK: define linkonce_odr noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// CHECK: define linkonce_odr noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv

// MSVC: define linkonce_odr dso_local void @"??$f@V<lambda_1>@dg_inline1@QL@@@@YAXV<lambda_1>@dg_inline1@QL@@@Z"
// MSVC: call noundef i32 @"??R<lambda_1>@dg_inline1@QL@@QEBA?A?<auto>@@XZ"
// MSVC: define linkonce_odr dso_local void @"??$f@V<lambda_1>@?$dg_template@$02@QL@@@@YAXV<lambda_1>@?$dg_template@$02@QL@@@Z"
// MSVC: call noundef i32 @"??R<lambda_1>@?$dg_template@$02@QL@@QEBA?A?<auto>@@XZ"
// MSVC: define internal noundef i32 @"??R<lambda_0>@QL@@QEBA?A?<auto>@@XZ"
// MSVC: define internal noundef i32 @"??R<lambda_1>@QL@@QEBA?A?<auto>@@XZ"
// MSVC: define linkonce_odr dso_local noundef i32 @"??R<lambda_1>@dg_inline1@QL@@QEBA?A?<auto>@@XZ"
// MSVC: define linkonce_odr dso_local noundef i32 @"??R<lambda_1>@?$dg_template@$02@QL@@QEBA?A?<auto>@@XZ"
4 changes: 2 additions & 2 deletions clang/test/CodeGenCUDA/anon-ns.cu
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@

// HIP-DAG: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv\.intern\.b04fd23c98500190]](
// HIP-DAG: define weak_odr {{.*}}void @[[KTX:_Z2ktIN12_GLOBAL__N_11XEEvT_\.intern\.b04fd23c98500190]](
// HIP-DAG: define weak_odr {{.*}}void @[[KTL:_Z2ktIN12_GLOBAL__N_1UlvE_EEvT_\.intern\.b04fd23c98500190]](
// HIP-DAG: define weak_odr {{.*}}void @[[KTL:_Z2ktIN12_GLOBAL__N_16lambdaMUlvE_EEvT_.intern.b04fd23c98500190]](
// HIP-DAG: @[[VM:_ZN12_GLOBAL__N_12vmE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global
// HIP-DAG: @[[VC:_ZN12_GLOBAL__N_12vcE\.static\.b04fd23c98500190]] = addrspace(4) externally_initialized constant
// HIP-DAG: @[[VT:_Z2vtIN12_GLOBAL__N_11XEE\.static\.b04fd23c98500190]] = addrspace(1) externally_initialized global

// CUDA-DAG: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv__intern__b04fd23c98500190]](
// CUDA-DAG: define weak_odr {{.*}}void @[[KTX:_Z2ktIN12_GLOBAL__N_11XEEvT___intern__b04fd23c98500190]](
// CUDA-DAG: define weak_odr {{.*}}void @[[KTL:_Z2ktIN12_GLOBAL__N_1UlvE_EEvT___intern__b04fd23c98500190]](
// CUDA-DAG: define weak_odr {{.*}}void @[[KTL:_Z2ktIN12_GLOBAL__N_16lambdaMUlvE_EEvT___intern__b04fd23c98500190]](
// CUDA-DAG: @[[VC:_ZN12_GLOBAL__N_12vcE__static__b04fd23c98500190]] = addrspace(4) externally_initialized constant
// CUDA-DAG: @[[VT:_Z2vtIN12_GLOBAL__N_11XEE__static__b04fd23c98500190]] = addrspace(1) externally_initialized global

Expand Down
70 changes: 64 additions & 6 deletions clang/test/CodeGenCUDA/unnamed-types.cu
Original file line number Diff line number Diff line change
@@ -1,14 +1,22 @@
// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-pc-windows-msvc -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=MSVC
// RUN: %clang_cc1 -std=c++11 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm %s -o - | FileCheck %s --check-prefix=DEVICE
// RUN: %clang_cc1 -std=c++17 -x hip -triple x86_64-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
// RUN: %clang_cc1 -std=c++17 -x hip -triple x86_64-pc-windows-msvc -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=MSVC
// RUN: %clang_cc1 -std=c++17 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm %s -o - | FileCheck %s --check-prefix=DEVICE

#include "Inputs/cuda.h"

// HOST: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1
// HOST: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1
// HOST: @2 = private unnamed_addr constant [25 x i8] c"_Z1fIN2QL3dg1MUlvE_EEvT_\00", align 1
// HOST: @3 = private unnamed_addr constant [25 x i8] c"_Z1fIN2QL3dg2MUlvE_EEvT_\00", align 1
// HOST: @4 = private unnamed_addr constant [33 x i8] c"_Z1fIN2QL10dg_inline1MUlvE_EEvT_\00", align 1
// HOST: @5 = private unnamed_addr constant [40 x i8] c"_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_\00", align 1
// Check that, on MSVC, the same device kernel mangling name is generated.
// MSVC: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1
// MSVC: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1
// MSVC: @2 = private unnamed_addr constant [25 x i8] c"_Z1fIN2QL3dg1MUlvE_EEvT_\00", align 1
// MSVC: @3 = private unnamed_addr constant [25 x i8] c"_Z1fIN2QL3dg2MUlvE_EEvT_\00", align 1
// MSVC: @4 = private unnamed_addr constant [33 x i8] c"_Z1fIN2QL10dg_inline1MUlvE_EEvT_\00", align 1
// MSVC: @5 = private unnamed_addr constant [40 x i8] c"_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_\00", align 1

__device__ float d0(float x) {
return [](float x) { return x + 1.f; }(x);
Expand All @@ -20,13 +28,13 @@ __device__ float d1(float x) {

// DEVICE: amdgpu_kernel void @_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_(
// DEVICE: define internal noundef float @_ZZZ2f1PfENKUlS_E_clES_ENKUlfE_clEf(
// DEVICE: define amdgpu_kernel void @_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_(
// DEVICE: define internal noundef float @_ZZ2f1PfENKUlfE_clEf(
template <typename F>
__global__ void k0(float *p, F f) {
p[0] = f(p[0]) + d0(p[1]) + d1(p[2]);
}

// DEVICE: amdgpu_kernel void @_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_(
// DEVICE: define internal noundef float @_ZZ2f1PfENKUlfE_clEf(
// DEVICE: define internal noundef float @_ZZ2f1PfENKUlffE_clEff(
// DEVICE: define internal noundef float @_ZZ2f1PfENKUlfE0_clEf(
template <typename F0, typename F1, typename F2>
Expand All @@ -53,8 +61,58 @@ void f1(float *p) {
[] __device__ (float x, float y) { return x * y; },
[] __device__ (float x) { return x + 5.f; });
}
// HOST: define internal void @_Z16__device_stub__fIN2QL3dg1MUlvE_EEvT_(
// HOST: define internal void @_Z16__device_stub__fIN2QL3dg2MUlvE_EEvT_(
// HOST: define linkonce_odr void @_Z16__device_stub__fIN2QL10dg_inline1MUlvE_EEvT_
// HOST: define internal void @_Z17__device_stub__k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_
// HOST: define linkonce_odr void @_ZN4dim3C2Ejjj

// HOST: @__hip_register_globals
// HOST: __hipRegisterFunction{{.*}}@_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_{{.*}}@0
// HOST: __hipRegisterFunction{{.*}}@_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_{{.*}}@1
// MSVC: __hipRegisterFunction{{.*}}@"??$k0@V<lambda_1>@?0???R1?0??f1@@YAXPEAM@Z@QEBA@0@Z@@@YAXPEAMV<lambda_1>@?0???R0?0??f1@@YAX0@Z@QEBA@0@Z@@Z{{.*}}@0
// HOST: __hipRegisterFunction{{.*}}@_Z1fIN2QL3dg1MUlvE_EEvT_{{.*}}@2
// HOST: __hipRegisterFunction{{.*}}_Z1fIN2QL3dg2MUlvE_EEvT_{{.*}}@3
// HOST: __hipRegisterFunction{{.*}}_Z1fIN2QL10dg_inline1MUlvE_EEvT_{{.*}}@4
// HOST: __hipRegisterFunction{{.*}}_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_{{.*}}@5

// MSVC: __hipRegisterFunction{{.*}}@"??$k0@V<lambda_1>@?0???R1?0??f1@@YAXPEAM@Z@QEBA?A?<auto>@@0@Z@@@YAXPEAMV<lambda_1>@?0???R0?0??f1@@YAX0@Z@QEBA?A?<auto>@@0@Z@@Z{{.*}}@0
// MSVC: __hipRegisterFunction{{.*}}@"??$k1@V<lambda_2>@?0??f1@@YAXPEAM@Z@V<lambda_3>@?0??2@YAX0@Z@V<lambda_4>@?0??2@YAX0@Z@@@YAXPEAMV<lambda_2>@?0??f1@@YAX0@Z@V<lambda_3>@?0??1@YAX0@Z@V<lambda_4>@?0??1@YAX0@Z@@Z{{.*}}@1
// MSVC: __hipRegisterFunction{{.*}}@"??$f@V<lambda_1>@dg1@QL@@@@YAXV<lambda_1>@dg1@QL@@@Z"{{.*}}@2
// MSVC: __hipRegisterFunction{{.*}}@"??$f@V<lambda_1>@dg2@QL@@@@YAXV<lambda_1>@dg2@QL@@@Z"{{.*}}@3
// MSVC: __hipRegisterFunction{{.*}}@"??$f@V<lambda_1>@dg_inline1@QL@@@@YAXV<lambda_1>@dg_inline1@QL@@@Z"{{.*}}@4
// MSVC: __hipRegisterFunction{{.*}}@"??$f@V<lambda_1>@?$dg_template@$02@QL@@@@YAXV<lambda_1>@?$dg_template@$02@QL@@@Z"{{.*}}@5


// DEVICE: define amdgpu_kernel void @_Z1fIN2QL3dg1MUlvE_EEvT_(
// DEVICE: call noundef i32 @_ZNK2QL3dg1MUlvE_clEv(
// DEVICE: define internal noundef i32 @_ZNK2QL3dg1MUlvE_clEv(
// DEVICE define amdgpu_kernel void @_Z1fIN2QL3dg2MUlvE_EEvT_(
// DEVICE: call noundef i32 @_ZNK2QL3dg2MUlvE_clEv(
// DEVICE: define internal noundef i32 @_ZNK2QL3dg2MUlvE_clEv
// DEVICE: define amdgpu_kernel void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
// DEVICE: call noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// DEVICE: define linkonce_odr noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// DEVICE: define amdgpu_kernel void @_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_
// DEVICE: call noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
// DEVICE: define linkonce_odr noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv

namespace QL {
auto dg1 = [] { return 1; };
inline auto dg_inline1 = [] { return 1; };
}
namespace QL {
auto dg2 = [] { return 2; };
template<int N>
auto dg_template = [] { return N; };
}
using namespace QL;
template<typename T>
__global__ void f(T t) {
t();
}
void g() {
f<<<1,1>>>(dg1);
f<<<1,1>>>(dg2);
f<<<1,1>>>(dg_inline1);
f<<<1,1>>>(dg_template<3>);
}
27 changes: 22 additions & 5 deletions clang/test/CodeGenSYCL/unique_stable_name_windows_diff.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: %clang_cc1 -triple spir64-unknown-unknown -aux-triple x86_64-pc-windows-msvc -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s '-D$ADDRSPACE=addrspace(1) '
// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fsycl-is-host -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s '-D$ADDRSPACE='
// RUN: %clang_cc1 -triple spir64-unknown-unknown -aux-triple x86_64-pc-windows-msvc -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=DEVICE
// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fsycl-is-host -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST


template<typename KN, typename Func>
Expand Down Expand Up @@ -47,7 +47,24 @@ int main() {
// Make sure the following 3 are the same between the host and device compile.
// Note that these are NOT the same value as each other, they differ by the
// signature.
// CHECK: private unnamed_addr [[$ADDRSPACE]]constant [17 x i8] c"_ZTSZ4mainEUlvE_\00"
// CHECK: private unnamed_addr [[$ADDRSPACE]]constant [17 x i8] c"_ZTSZ4mainEUliE_\00"
// CHECK: private unnamed_addr [[$ADDRSPACE]]constant [17 x i8] c"_ZTSZ4mainEUldE_\00"
// HOST: private unnamed_addr constant [17 x i8] c"_ZTSZ4mainEUlvE_\00"
// HOST: private unnamed_addr constant [17 x i8] c"_ZTSZ4mainEUliE_\00"
// HOST: private unnamed_addr constant [17 x i8] c"_ZTSZ4mainEUldE_\00"

// DEVICE: private unnamed_addr addrspace(1) constant [17 x i8] c"_ZTSZ4mainEUlvE_\00"
// DEVICE: private unnamed_addr addrspace(1) constant [17 x i8] c"_ZTSZ4mainEUliE_\00"
// DEVICE: private unnamed_addr addrspace(1) constant [17 x i8] c"_ZTSZ4mainEUldE_\00"

// DEVICE: define dso_local spir_kernel void @_ZTSZ4mainE2K1
// DEVICE: call spir_func void @_ZZ4mainENKUlvE_clEv
// DEVICE: define internal spir_func void @_ZZ4mainENKUlvE_clEv
// DEVICE: define dso_local spir_kernel void @_ZTSZ4mainE2K2
// DEVICE: call spir_func void @_ZZ4mainENKUliE_clEi
// DEVICE: define internal spir_func void @_ZZ4mainENKUliE_clEi
// DEVICE: define dso_local spir_kernel void @_ZTSZ4mainE2K3
// DEVICE: call spir_func void @_ZZ4mainENKUldE_clEd
// DEVICE: define internal spir_func void @_ZZ4mainENKUldE_clEd
// DEVICE: define dso_local spir_kernel void @_ZTSZ4mainE2K4
// DEVICE: call spir_func void @_ZZ4mainENKUlvE0_clEv
// DEVICE: define internal spir_func void @_ZZ4mainENKUlvE0_clEv
}
74 changes: 74 additions & 0 deletions clang/test/CodeGenSYCL/unnamed-types.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// RUN: %clang_cc1 -fsycl-is-device -O0 -triple spirv64-unknown-unknown \
// RUN: -emit-llvm %s -o - | FileCheck %s --check-prefix=DEVICE

// RUN: %clang_cc1 -fsycl-is-host -O0 -triple spirv64-unknown-unknown \
// RUN: -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST

// RUN: %clang_cc1 -fsycl-is-device -emit-llvm \
// RUN: -aux-triple x86_64-pc-windows-msvc -triple spir-unknown--unknown \
// RUN: %s -o - | FileCheck %s --check-prefix=MSVC

namespace QL {
auto dg1 = [] { return 1; };
inline auto dg_inline1 = [] { return 1; };
}

namespace QL {
auto dg2 = [] { return 2; };
template<int N>
auto dg_template = [] { return N; };
}

using namespace QL;
template<typename T>
[[clang::sycl_kernel_entry_point(T)]] void f(T t) {
t();
}

void g() {
f(dg1);
f(dg2);
f(dg_inline1);
f(dg_template<3>);
}

// HOST: @_ZN2QL3dg1E = internal global %class.anon undef, align 1
// HOST: @_ZN2QL3dg2E = internal global %class.anon.0 undef, align 1
// HOST: @_ZN2QL10dg_inline1E = linkonce_odr global %class.anon.2 undef, comdat, align 1
// HOST: @_ZN2QL11dg_templateILi3EEE = linkonce_odr global %class.anon.4 undef, comdat, align 1

// DEVICE: define spir_kernel void @_ZTSN2QL3dg1MUlvE_E
// DEVICE: call spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
// DEVICE: define internal spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
// DEVICE: define spir_kernel void @_ZTSN2QL3dg2MUlvE_E
// DEVICE: call spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
// DEVICE: define internal spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
// DEVICE: define spir_kernel void @_ZTSN2QL10dg_inline1MUlvE_E
// DEVICE: call spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// DEVICE: define linkonce_odr spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// DEVICE: define spir_kernel void @_ZTSN2QL11dg_templateILi3EEMUlvE_E
// DEVICE: call spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
// DEVICE: define linkonce_odr spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv

// HOST: define spir_func void @_Z1gv
// HOST: call spir_func void @_Z1fIN2QL3dg1MUlvE_EEvT_
// HOST: call spir_func void @_Z1fIN2QL3dg2MUlvE_EEvT_
// HOST: call spir_func void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
// HOST: call spir_func void @_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_
// HOST: define internal spir_func void @_Z1fIN2QL3dg1MUlvE_EEvT
// HOST: define internal spir_func void @_Z1fIN2QL3dg2MUlvE_EEvT_
// HOST: define linkonce_odr spir_func void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
// HOST: define linkonce_odr spir_func void @_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_

// MSVC: define dso_local spir_kernel void @_ZTSN2QL3dg1MUlvE_E
// MSVC: call spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
// MSVC: define internal spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
// MSVC: define dso_local spir_kernel void @_ZTSN2QL3dg2MUlvE_E
// MSVC: call spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
// MSVC: define internal spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
// MSVC: define dso_local spir_kernel void @_ZTSN2QL10dg_inline1MUlvE_E
// MSVC: call spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// MSVC: define linkonce_odr spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
// MSVC: define dso_local spir_kernel void @_ZTSN2QL11dg_templateILi3EEMUlvE_E
// MSVC: call spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
// MSVC: define linkonce_odr spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at mangled names generated by GCC and Clang in order to convince myself that these changes won't cause other problems. The good news is that, I think, these changes are a strict improvement over the status quo. The bad news is, there appear to be a lot of problems with mangling of lambdas in Clang (and a few in GCC). I don't have a comprehensive set of cases to review, but I think the following case provides a good starting point for discussion.

Tagging @zygoloid and @rjmccall for any thoughts they have.

In the following, the "SYCL" annotation means Clang with CUDA, HIP, or SYCL enabled, it doesn't matter which. I chose SYCL because that is what I'm using in the examples. The annotated mangled names correspond to the generated call operator for the lambda expression that appears in each example.

https://godbolt.org/z/j7roYMsEv

extern int x;

template<int (*FP)()>
auto vt = FP;

namespace ns {

// GCC:   _ZNK2nsUlvE0_clEv
// Clang: _ZNK2nsUlvE_clEv
// SYCL:  _ZNK2nsUlvE0_clEv
template<int (*FP)() = [] { return x+1; }>
inline auto v1 = FP;
int usev1() { return v1<>(); }

// GCC:   _ZNK2ns2v2MUlvE_clEv
// Clang: _ZNK2ns3$_0clEv
// SYCL:  _ZNK2nsUlvE1_clEv
inline auto& v2 = vt<[] { return x+2; }>;
int usev2() { return v2(); }

// GCC:   _ZNK2nsUlvE2_clEv
// Clang: _ZNK2nsUlvE0_clEv
// SYCL:  _ZNK2nsUlvE3_clEv
template<int (*FP)() = [] { return x+3; }>
inline auto v3 = FP;
int usev3() { return v3<>(); }

} // ns namespace

A few observations:

  • GCC appears to allocate a discriminator for lambda expressions in dependent contexts (even when they are not dependent). Clang presumably does as well, but only when an external mangling is being used. This explains the mangling differences for v1 above.
  • Neither GCC nor Clang incorporate the variable declaration context (see <closure-prefix>) in the mangled names for lambdas that appear in default arguments of template parameters. I think they should. The Itanium C++ ABI doesn't appear to address this possibility, at least not in 5.1.8, "Closure Types (Lambdas)".
  • v2 is the case that is particularly interesting for this PR. GCC includes a <closure-prefix>, Clang generates a private name, and Clang+SYCL generates an external name without a <closure-prefix>. The latter consumes a discriminator from the enclosing namespace and therefore affects the names produced for v3.
  • GCC and Clang+SYCL agree on the mangling for the v1 case. Observable effects from the v2 case lead them to disagree for the v3 case (if the v2 case is removed, they do agree on v3).

The current changes in this PR have no effect on this example. Ideally, additional changes would be made to include a <closure-prefix> in the mangling for the v2 case.

The question for this PR is, how far should we go in addressing this and other cases? I have, or will have, additional test cases, but it might be best to file new issues for those and follow up on them separately. I suspect some of the other examples I have might warrant new issues for the Itanium C++ ABI to address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • GCC appears to allocate a discriminator for lambda expressions in dependent contexts (even when they are not dependent). Clang presumably does as well, but only when an external mangling is being used. This explains the mangling differences for v1 above.

That lambda should have internal linkage (the relevant part of the ODR does not apply to lambdas in default template arguments), so that seems fine. Except that Clang fails to actually give the lambda internal linkage, which looks like a bug to me.

  • Neither GCC nor Clang incorporate the variable declaration context (see <closure-prefix>) in the mangled names for lambdas that appear in default arguments of template parameters.

That's correct. Lambdas in default template arguments are not "owned" by the template and so shouldn't allocate mangling numbers within it. Note that their existence can differ across redeclarations in different TUs, so numbering them would mean the same entity could get different manglings in different TUs.

  • v2 is the case that is particularly interesting for this PR. GCC includes a <closure-prefix>, Clang generates a private name, and Clang+SYCL generates an external name without a <closure-prefix>. The latter consumes a discriminator from the enclosing namespace and therefore affects the names produced for v3.

We'd previously agreed (on a different github issue, I could probably dig it up if needed) that we want to stop using the $n mangling for lambdas in Clang entirely, and always produce manglings that actually match the ABI, can be demangled, etc. We should do that -- which hopefully would mean that there's no divergence between the SYCL and non-SYCL manglings after this PR, at least for ABI-mandated manglings.

But separately from that, in the v2 case, the lambda should be mangled with a <closure-prefix> naming the inline variable v2. Looks like GCC gets this right, and Clang gets it wrong in both modes.

The question for this PR is, how far should we go in addressing this and other cases? I have, or will have, additional test cases, but it might be best to file new issues for those and follow up on them separately. I suspect some of the other examples I have might warrant new issues for the Itanium C++ ABI to address.

What new requirements does SYCL have compared to C++ here? It looks like at a minimum we effectively want to treat all entities as visible-across-TUs for lambda mangling purposes, but as I noted above, we want to do that anyway. And those manglings should presumably exactly match the Itanium ones.

But it sounds like there's more going on here: for lambdas that don't have a defined mangling at all under the ABI (where there's no context declaration, such as lambdas appearing in default arguments of functions or templates, or other exotic places that otherwise aren't within the scope of some lambda mangling context), SYCL needs a mangling scheme. How stable does that need to be? Would it be OK to number them within the translation unit, or within the namespace, as Clang happens to do at the moment? Or do you need a scheme that attempts to associate the lambdas with the declaration they end up being default arguments for, in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zygoloid!

That lambda should have internal linkage (the relevant part of the ODR does not apply to lambdas in default template arguments), so that seems fine. Except that Clang fails to actually give the lambda internal linkage, which looks like a bug to me.

I see, ok, we can follow up on that.

That's correct. Lambdas in default template arguments are not "owned" by the template and so shouldn't allocate mangling numbers within it. Note that their existence can differ across redeclarations in different TUs, so numbering them would mean the same entity could get different manglings in different TUs.

Thank you! This is a very helpful insight that I was missing and failed to pick up from the standard wording.

Assuming I'm reading things correctly, [basic.def.odr]p(16.11) also makes it clear that uses of such default template arguments result in distinct closure types in the contexts where those uses occur.

The example in [basic.def.odr]p19 makes it clear that correspondence across TUs is required for default arguments of function parameters in member functions. The preceding text appears to treat default template arguments and function default arguments the same, so I presume the example is applicable to default template arguments as well. I think the example is more subtle than the description suggests though. Where it states, "The definition of X can appear in multiple translation units of a valid program; the lambda-expressions defined within the default argument of X​::​h within the definition of X denote the same closure type in each translation unit.", I think this is referring only to the use of that lambda within the body of X::h as opposed to other hypothetical uses elsewhere outside the class definition. Am I reading that correctly?

Back to the ABI, my take away from this is that discriminators allocated from namespace and class contexts need not correspond across TUs; when correspondence is required, discriminators are allocated from more local declaration contexts (except in cases like those reported in itanium-cxx-abi/cxx-abi#165). If so, then it seems I need not worry (too much) about discriminators being allocated and unused or about the order in which they are allocated in those contexts.

Another example to check what I'm learning; https://godbolt.org/z/M4fY4e3Px:

extern int x;

struct S {
  template<auto F = [] { return x; }>
  static inline auto vt = F;

  // GCC:   _ZNK1SUlvE0_clEv
  // Clang: _ZNK1SUlvE_clEv
  // SYCL:  _ZNK1SUlvE0_clEv
  static inline auto sdm = vt<>;
};

// GCC:   _ZZ6usevt0vENKUlvE1_clEv
// Clang: _ZNK1SUlvE0_clEv
// SYCL:  _ZNK1SUlvE1_clEv
inline int usevt0() { return S::vt<>(); }

// GCC:   _ZZ6usevt1vENKUlvE2_clEv
// Clang: _ZNK1SUlvE1_clEv
// SYCL:  _ZNK1SUlvE2_clEv
inline int usevt1() { return S::vt<>(); }

int f() {
  return usevt0() + usevt1() + S::sdm();
}

Per your reference to [basic.def.odr]p18, if usevt0() and/or usevt1() are defined in multiple TUs, the program is IFNDR because the uses of the lambda in the default template argument do not correspond. However, the same use in the initializer of S::sdm does correspond because S::sdm is defined within the definition of S. From an ABI perspective, the sdm case is the only one that matters, but a stable name isn't produced by Clang or GCC.

GCC includes a <closure-prefix> corresponding to the point of use of the lambda for usevt0() and usevt1(), which is good. However, I would have expected _ZZ6usevt0vENKUlvE_clEv and _ZZ6usevt1vENKUlvE_clEv for those cases; it looks like discriminators might be being allocated from the class context, but the function context is used for naming. I think the use in the initializer of S::sdm should result in a name with a <closure-prefix> that incorporates sdm too.

For Clang, the situation looks much like the last case. Hmm, I think I'm rediscovering #143218.

We'd previously agreed (on a different github issue, I could probably dig it up if needed) that we want to stop using the $n mangling for lambdas in Clang entirely, and always produce manglings that actually match the ABI, can be demangled, etc. We should do that -- which hopefully would mean that there's no divergence between the SYCL and non-SYCL manglings after this PR, at least for ABI-mandated manglings.

That would be great.

But separately from that, in the v2 case, the lambda should be mangled with a <closure-prefix> naming the inline variable v2. Looks like GCC gets this right, and Clang gets it wrong in both modes.

Ok, we can follow up on that.

What new requirements does SYCL have compared to C++ here? It looks like at a minimum we effectively want to treat all entities as visible-across-TUs for lambda mangling purposes, but as I noted above, we want to do that anyway. And those manglings should presumably exactly match the Itanium ones.

That is correct. The extra requirements for CUDA/HIP/SYCL stem from the need for types used in kernel invocations to correspond across the host target TU and each of the device target TUs, even for types that don't have external linkage. SYCL issue 454 seeks to clarify what kinds of heroics are required; at least in the face of preprocessor shenanigans. For SYCL, the correspondence concerns are limited to the type used to name the kernel, the lambda/class type used to define the kernel, and the types of its captures/data members. For CUDA/HIP, I think the correspondence concerns are limited to the types used as kernel arguments. I can provide examples if it would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Clang, the situation looks much like the last case. Hmm, I think I'm rediscovering #143218.

Oops, I wrote that while playing with a different test case and then switched to a different test case for the comment.

I think the desired behavior is to use the declaration context for the point of use in each of these cases with a <closure-prefix>. This should yield the names _ZNK1S3sdmMUlvE_clEv, _ZZ6usevt0vENKUlvE_clEv, and _ZZ6usevt1vENKUlvE_clEv respectively.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I'm reading things correctly, [basic.def.odr]p(16.11) also makes it clear that uses of such default template arguments result in distinct closure types in the contexts where those uses occur.

Yes. I suspect we probably don't get that right just yet, and there's some ABI subtleties here that I think the Itanium ABI doesn't precisely describe yet (but fortunately there's not a lot of choice unless an implementation gets "clever"). For example:

template<auto a = [] { static int n; return &n; }, auto b = [] { static int n; return &n; }>
int *f(bool x) { return x ? a() : b(); }
inline int *g(bool x) { return f(x); }

Here, the two lambdas should be mangled within the context of function g, and the order in which they're instantiated (and when that happens relative to other lambdas that get numbered in g) is ABI-relevant. I think the obvious thing here is that they get numbered immediately after the function call is processed (at the point where overload resolution is performed) -- so we'd number lambdas within the argument to x first for example -- and that the default template arguments are instantiated left-to-right (which is mostly forced by the possibility of a later default template argument using an earlier template parameter). It's also ABI-relevant whether we instantiate default template arguments before or after default function arguments, but the language rules strongly imply we have to do default template arguments first, so there's not much of a choice there.

The example in [basic.def.odr]p19 makes it clear that correspondence across TUs is required for default arguments of function parameters in member functions. The preceding text appears to treat default template arguments and function default arguments the same, so I presume the example is applicable to default template arguments as well. I think the example is more subtle than the description suggests though. Where it states, "The definition of X can appear in multiple translation units of a valid program; the lambda-expressions defined within the default argument of X​::​h within the definition of X denote the same closure type in each translation unit.", I think this is referring only to the use of that lambda within the body of X::h as opposed to other hypothetical uses elsewhere outside the class definition. Am I reading that correctly?

Broadly, the rule is that the whole program must behave as if the class definition is only defined in one place, and somehow the textual repeats of it in other places act as if they were importing / referencing that one definition rather than adding another definition. So the behavior must be as if there is only one lambda-expression in the whole program -- regardless of how you reach the closure type, it's the same type. In practice this means we need to give it external (linkonce_odr) linkage and a mangling :)

So, for example, if we had:

struct X {
  static auto f(int* (*p)() = [] { static int n; return &n; }) { return p; }
};

... then a call to X::f()() in different translation units must return a pointer to the same int. (A call to X::f() in different translation units doesn't actually have to return the same pointer, not because it's allowed to be a different lambda, but because the conversion from lambda to function pointer isn't required to return the same pointer each time it's called, just like a string literal isn't required to always evaluate to the same pointer.)

Back to the ABI, my take away from this is that discriminators allocated from namespace and class contexts need not correspond across TUs; when correspondence is required, discriminators are allocated from more local declaration contexts (except in cases like those reported in itanium-cxx-abi/cxx-abi#165).

For namespace contexts: yes. The Itanium ABI never allocates discriminators for namespace context, and there's no way they could align across translation units anyway. Lambdas at namespace context should never be emitted with external linkage -- except that maybe SYCL wants some special rule here for things that are notionally internal but still need to link between host and device.

For class contexts, we need cross-TU correspondence. (@rjmccall and I had a discussion a while back about trying to use a context more specific than the class where possible, to avoid the ABI relying on declaration order within a class definition, but as I recall we didn't come up with anything that we liked substantially more than just numbering lexically within the class for the long tail of cases where the lambda isn't within, say, a function declaration.)

Another example to check what I'm learning; https://godbolt.org/z/M4fY4e3Px:

extern int x;

struct S {
  template<auto F = [] { return x; }>
  static inline auto vt = F;

  // GCC:   _ZNK1SUlvE0_clEv
  // Clang: _ZNK1SUlvE_clEv
  // SYCL:  _ZNK1SUlvE0_clEv
  static inline auto sdm = vt<>;
};

Looks like everyone gets this wrong.

Each template-id is supposed to perform a fresh substitution into the default template arguments, meaning that sdm has its own lambda distinct from any other use of vt. The lambda instantiated for the default argument should have sdm as its context declaration, to satisfy [basic.def.odr]/16.11 and the ABI rule: "If the context of a closure type is an initializer for a class member (static or nonstatic), inline variable, or variable template, it is encoded in a qualified name"

// GCC:   _ZZ6usevt0vENKUlvE1_clEv
// Clang: _ZNK1SUlvE0_clEv
// SYCL:  _ZNK1SUlvE1_clEv
inline int usevt0() { return S::vt<>(); }

I think GCC is getting this one approximately right. The context declaration here should be usevt0, because the instantiated lambda is distinct from the one produced for any other use of S::vt<>.

// GCC:   _ZZ6usevt1vENKUlvE2_clEv
// Clang: _ZNK1SUlvE1_clEv
// SYCL:  _ZNK1SUlvE2_clEv
inline int usevt1() { return S::vt<>(); }

The discriminator numbering used by GCC is obviously wrong, though. (The ...E1_... vs ...E2_....) Reordering the definitions of usevt1 and usevt2 causes them to swap which gets the number 1 and which gets 2! I share your suspicion that GCC might be numbering within the class still, but using the function when mangling. The lambdas here should have no discriminator because they're each the first lambda within their context declaration with that lambda-sig.

Per your reference to [basic.def.odr]p18, if usevt0() and/or usevt1() are defined in multiple TUs, the program is IFNDR because the uses of the lambda in the default template argument do not correspond.

I don't think that's right, but I'm not entirely certain, and I think there might be a wording issue here. The ODR says the behavior is as if the token sequences of the lambdas appeared within the inline function, and the lambdas would be equivalent under those terms.

However, it also says in /18 that we should be "excluding entities defined within default arguments or default template arguments of either D or an entity not defined within D". But what does that mean for entities formed by instantiating a default template argument, especially one where the templated lambda appears within a class (or similar entity where we'd "normally" deduplicate the lambda across translation units? When the lambda is formed by instantiating a default [template] argument, we in general need for it to have the place where instantiation was triggered as its context declaration, and it seems fine for lambdas to be given an "identity" on that basis. I think the most reasonable and certainly the most useful interpretation is that the ODR does apply in those cases still, and therefore we need to give these instantiated lambdas a mangling in the context in which the instantiation happens. That'd mean:

// ODR violation if this appears across TUs, each g would have a different lambda
void f(void (*arg)() = []{});
inline void g() { f(); }

// OK! Lambda is instantiated in the call to `f()`, so `g()` has its own lambda, and
// it's the same one each time, so no ODR violation.
template<typename = void>
void f(void (*arg)() = []{});
inline void g() { f(); }

We probably need some CWG feedback on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zygoloid! It is likely going to be at least a few weeks before I'm able to get back to this. When I do, I'll add more tests so that we have explicit examples in source files to document and iterate on.