-
Notifications
You must be signed in to change notification settings - Fork 15k
[Clang] Use TargetInfo when deciding if an address space is compatible #115777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) ChangesSummary: The C/C++ standard says nothing about address spaces, so we cannot make This works in most cases, however it does cause some problems for cases template<typename T> void foo(T *, T); Full diff: https://github.com/llvm/llvm-project/pull/115777.diff 7 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 1304ef3c5a228b..0d6f2c3410e9a0 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3710,6 +3710,9 @@ def fopenmp_assume_no_nested_parallelism : Flag<["-"], "fopenmp-assume-no-nested
} // let Visibility = [ClangOption, CC1Option, FC1Option]
} // let Flags = [NoArgumentUnused, HelpHidden]
+def fdefault_generic_addrspace : Flag<["-"], "fdefault-generic-addrspace">, Group<f_Group>,
+ Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+ HelpText<"Allow pointers to be implicitly casted to other address spaces.">;
def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group<f_Group>,
Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
HelpText<"Do not create a host fallback if offloading to the device fails.">,
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0952262c360185..d997a1d232e83d 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7067,6 +7067,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (Args.hasArg(options::OPT_nogpulib))
CmdArgs.push_back("-nogpulib");
+ if (Args.hasArg(options::OPT_fdefault_generic_addrspace))
+ CmdArgs.push_back("-fdefault-generic-addrspace");
+
if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
CmdArgs.push_back(
Args.MakeArgString(Twine("-fcf-protection=") + A->getValue()));
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index b5fd35aaa1e841..b44f04d0f275e0 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -3662,6 +3662,9 @@ void CompilerInvocationBase::GenerateLangArgs(const LangOptions &Opts,
if (Opts.Blocks && !(Opts.OpenCL && Opts.OpenCLVersion == 200))
GenerateArg(Consumer, OPT_fblocks);
+ if (Opts.OpenCLGenericAddressSpace)
+ GenerateArg(Consumer, OPT_fdefault_generic_addrspace);
+
if (Opts.ConvergentFunctions)
GenerateArg(Consumer, OPT_fconvergent_functions);
else
@@ -3939,6 +3942,7 @@ bool CompilerInvocation::ParseLangArgs(LangOptions &Opts, ArgList &Args,
// These need to be parsed now. They are used to set OpenCL defaults.
Opts.IncludeDefaultHeader = Args.hasArg(OPT_finclude_default_header);
Opts.DeclareOpenCLBuiltins = Args.hasArg(OPT_fdeclare_opencl_builtins);
+ Opts.OpenCLGenericAddressSpace = Args.hasArg(OPT_fdefault_generic_addrspace);
LangOptions::setLangDefaults(Opts, IK.getLanguage(), T, Includes, LangStd);
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 2b51765e80864a..2920220948d145 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -1579,7 +1579,7 @@ NamedDecl *Sema::getCurFunctionOrMethodDecl() const {
}
LangAS Sema::getDefaultCXXMethodAddrSpace() const {
- if (getLangOpts().OpenCL)
+ if (getLangOpts().OpenCL || getLangOpts().OpenCLGenericAddressSpace)
return getASTContext().getDefaultOpenCLPointeeAddrSpace();
return LangAS::Default;
}
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 8d76a35b2d2557..d855d9e3f49bc9 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -16178,7 +16178,8 @@ CheckOperatorNewDeleteTypes(Sema &SemaRef, const FunctionDecl *FnDecl,
<< FnDecl->getDeclName();
QualType FirstParamType = FnDecl->getParamDecl(0)->getType();
- if (SemaRef.getLangOpts().OpenCLCPlusPlus) {
+ if (SemaRef.getLangOpts().OpenCLCPlusPlus ||
+ SemaRef.getLangOpts().OpenCLGenericAddressSpace) {
// The operator is valid on any address space for OpenCL.
// Drop address space from actual and expected first parameter types.
if (const auto *PtrTy =
diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp
index eb7516b3ef1ece..7ea663eacd0452 100644
--- a/clang/lib/Sema/SemaType.cpp
+++ b/clang/lib/Sema/SemaType.cpp
@@ -1836,7 +1836,7 @@ QualType Sema::BuildPointerType(QualType T,
if (getLangOpts().ObjCAutoRefCount)
T = inferARCLifetimeForPointee(*this, T, Loc, /*reference*/ false);
- if (getLangOpts().OpenCL)
+ if (getLangOpts().OpenCL || getLangOpts().OpenCLGenericAddressSpace)
T = deduceOpenCLPointeeAddrSpace(*this, T);
// In WebAssembly, pointers to reference types and pointers to tables are
@@ -1913,7 +1913,7 @@ QualType Sema::BuildReferenceType(QualType T, bool SpelledAsLValue,
if (getLangOpts().ObjCAutoRefCount)
T = inferARCLifetimeForPointee(*this, T, Loc, /*reference*/ true);
- if (getLangOpts().OpenCL)
+ if (getLangOpts().OpenCL || getLangOpts().OpenCLGenericAddressSpace)
T = deduceOpenCLPointeeAddrSpace(*this, T);
// In WebAssembly, references to reference types and tables are illegal.
@@ -2741,7 +2741,7 @@ QualType Sema::BuildBlockPointerType(QualType T,
if (checkQualifiedFunction(*this, T, Loc, QFK_BlockPointer))
return QualType();
- if (getLangOpts().OpenCL)
+ if (getLangOpts().OpenCL || getLangOpts().OpenCLGenericAddressSpace)
T = deduceOpenCLPointeeAddrSpace(*this, T);
return Context.getBlockPointerType(T);
@@ -5289,7 +5289,10 @@ static TypeSourceInfo *GetFullTypeForDeclarator(TypeProcessingState &state,
DeclaratorContext::LambdaExpr;
};
- if (state.getSema().getLangOpts().OpenCLCPlusPlus && IsClassMember()) {
+ if ((state.getSema().getLangOpts().OpenCLCPlusPlus ||
+ (!state.getSema().getLangOpts().OpenCL &&
+ state.getSema().getLangOpts().OpenCLGenericAddressSpace)) &&
+ IsClassMember()) {
LangAS ASIdx = LangAS::Default;
// Take address space attr if any and mark as invalid to avoid adding
// them later while creating QualType.
diff --git a/clang/test/CodeGen/generic-addrspace.cpp b/clang/test/CodeGen/generic-addrspace.cpp
new file mode 100644
index 00000000000000..e400172e1c6bd1
--- /dev/null
+++ b/clang/test/CodeGen/generic-addrspace.cpp
@@ -0,0 +1,219 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fdefault-generic-addrspace -emit-llvm -o - %s \
+// RUN: | FileCheck %s --check-prefix=NVPTX
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fdefault-generic-addrspace -emit-llvm -o - %s \
+// RUN: | FileCheck %s --check-prefix=AMDGPU
+
+// NVPTX-LABEL: define dso_local void @_Z1fPv(
+// NVPTX-SAME: ptr noundef [[P:%.*]]) #[[ATTR0:[0-9]+]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// NVPTX-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z1fPv(
+// AMDGPU-SAME: ptr noundef [[P:%.*]]) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr [[P]], ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: ret void
+//
+void f(void *p) {}
+
+// NVPTX-LABEL: define dso_local void @_Z2p1Pv(
+// NVPTX-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// NVPTX-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: call void @_Z1fPv(ptr noundef [[TMP0]]) #[[ATTR1:[0-9]+]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z2p1Pv(
+// AMDGPU-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr [[P]], ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: call void @_Z1fPv(ptr noundef [[TMP0]]) #[[ATTR1:[0-9]+]]
+// AMDGPU-NEXT: ret void
+//
+void p1(void [[clang::opencl_generic]] * p) { f(p); }
+// NVPTX-LABEL: define dso_local noundef ptr @_Z2p2PU3AS3v(
+// NVPTX-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 8
+// NVPTX-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(3) [[TMP0]] to ptr
+// NVPTX-NEXT: ret ptr [[TMP1]]
+//
+// AMDGPU-LABEL: define dso_local noundef ptr @_Z2p2PU3AS3v(
+// AMDGPU-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[RETVAL:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 4, addrspace(5)
+// AMDGPU-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(3) [[TMP0]] to ptr
+// AMDGPU-NEXT: ret ptr [[TMP1]]
+//
+void *p2(void [[clang::opencl_local]] * p) { return p; }
+// NVPTX-LABEL: define dso_local noundef ptr @_Z2p3PU3AS3v(
+// NVPTX-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 8
+// NVPTX-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(3) [[TMP0]] to ptr
+// NVPTX-NEXT: ret ptr [[TMP1]]
+//
+// AMDGPU-LABEL: define dso_local noundef ptr @_Z2p3PU3AS3v(
+// AMDGPU-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[RETVAL:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 4, addrspace(5)
+// AMDGPU-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(3) [[TMP0]] to ptr
+// AMDGPU-NEXT: ret ptr [[TMP1]]
+//
+void *p3(void [[clang::address_space(3)]] * p) { return p; }
+
+struct S {
+ S() = default;
+ ~S() = default;
+// NVPTX-LABEL: define linkonce_odr void @_ZN1S3fooEv(
+// NVPTX-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]]) #[[ATTR0]] comdat align 2 {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// NVPTX-NEXT: store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// NVPTX-NEXT: [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define linkonce_odr void @_ZN1S3fooEv(
+// AMDGPU-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]]) #[[ATTR0]] comdat align 2 {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[THIS_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[THIS_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[THIS_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr [[THIS]], ptr [[THIS_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: ret void
+//
+ void foo() {}
+};
+
+S s1;
+S [[clang::opencl_global]] s2;
+S [[clang::opencl_local]] s3;
+
+// NVPTX-LABEL: define dso_local void @_Z1gv(
+// NVPTX-SAME: ) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) @s1) #[[ATTR1]]
+// NVPTX-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) addrspacecast (ptr addrspace(1) @s2 to ptr)) #[[ATTR1]]
+// NVPTX-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) addrspacecast (ptr addrspace(3) @s3 to ptr)) #[[ATTR1]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z1gv(
+// AMDGPU-SAME: ) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) addrspacecast (ptr addrspace(1) @s1 to ptr)) #[[ATTR1]]
+// AMDGPU-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) addrspacecast (ptr addrspace(1) @s2 to ptr)) #[[ATTR1]]
+// AMDGPU-NEXT: call void @_ZN1S3fooEv(ptr noundef nonnull align 1 dereferenceable(1) addrspacecast (ptr addrspace(3) @s3 to ptr)) #[[ATTR1]]
+// AMDGPU-NEXT: ret void
+//
+void g() {
+ s1.foo();
+ s2.foo();
+ s3.foo();
+}
+
+template <typename Ty> void foo(Ty *) {}
+
+// NVPTX-LABEL: define dso_local void @_Z2t1Pv(
+// NVPTX-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// NVPTX-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: call void @_Z3fooIvEvPT_(ptr noundef [[TMP0]]) #[[ATTR1]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z2t1Pv(
+// AMDGPU-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr [[P]], ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: call void @_Z3fooIvEvPT_(ptr noundef [[TMP0]]) #[[ATTR1]]
+// AMDGPU-NEXT: ret void
+//
+void t1(void *p) { foo(p); }
+// NVPTX-LABEL: define dso_local void @_Z2t2Pv(
+// NVPTX-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// NVPTX-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: call void @_Z3fooIvEvPT_(ptr noundef [[TMP0]]) #[[ATTR1]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z2t2Pv(
+// AMDGPU-SAME: ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr [[P]], ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: call void @_Z3fooIvEvPT_(ptr noundef [[TMP0]]) #[[ATTR1]]
+// AMDGPU-NEXT: ret void
+//
+void t2(void [[clang::opencl_generic]] *p) { foo(p); }
+// NVPTX-LABEL: define dso_local void @_Z2t3PU3AS3v(
+// NVPTX-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 8
+// NVPTX-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: call void @_Z3fooIU3AS3vEvPT_(ptr addrspace(3) noundef [[TMP0]]) #[[ATTR1]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z2t3PU3AS3v(
+// AMDGPU-SAME: ptr addrspace(3) noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(3), align 4, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr addrspace(3) [[P]], ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr addrspace(3), ptr [[P_ADDR_ASCAST]], align 4
+// AMDGPU-NEXT: call void @_Z3fooIU3AS3vEvPT_(ptr addrspace(3) noundef [[TMP0]]) #[[ATTR1]]
+// AMDGPU-NEXT: ret void
+//
+void t3(void [[clang::opencl_local]] *p) { foo(p); }
+// NVPTX-LABEL: define dso_local void @_Z2t4PU5AS999v(
+// NVPTX-SAME: ptr addrspace(999) noundef [[P:%.*]]) #[[ATTR0]] {
+// NVPTX-NEXT: [[ENTRY:.*:]]
+// NVPTX-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(999), align 8
+// NVPTX-NEXT: store ptr addrspace(999) [[P]], ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: [[TMP0:%.*]] = load ptr addrspace(999), ptr [[P_ADDR]], align 8
+// NVPTX-NEXT: call void @_Z3fooIU5AS999vEvPT_(ptr addrspace(999) noundef [[TMP0]]) #[[ATTR1]]
+// NVPTX-NEXT: ret void
+//
+// AMDGPU-LABEL: define dso_local void @_Z2t4PU5AS999v(
+// AMDGPU-SAME: ptr addrspace(999) noundef [[P:%.*]]) #[[ATTR0]] {
+// AMDGPU-NEXT: [[ENTRY:.*:]]
+// AMDGPU-NEXT: [[P_ADDR:%.*]] = alloca ptr addrspace(999), align 8, addrspace(5)
+// AMDGPU-NEXT: [[P_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P_ADDR]] to ptr
+// AMDGPU-NEXT: store ptr addrspace(999) [[P]], ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: [[TMP0:%.*]] = load ptr addrspace(999), ptr [[P_ADDR_ASCAST]], align 8
+// AMDGPU-NEXT: call void @_Z3fooIU5AS999vEvPT_(ptr addrspace(999) noundef [[TMP0]]) #[[ATTR1]]
+// AMDGPU-NEXT: ret void
+//
+void t4(void [[clang::address_space(999)]] *p) { foo(p); }
|
edae36c
to
34eb41f
Compare
clang/lib/Sema/Sema.cpp
Outdated
@@ -1579,7 +1579,7 @@ NamedDecl *Sema::getCurFunctionOrMethodDecl() const { | |||
} | |||
|
|||
LangAS Sema::getDefaultCXXMethodAddrSpace() const { | |||
if (getLangOpts().OpenCL) | |||
if (getLangOpts().OpenCL || getLangOpts().OpenCLGenericAddressSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole thing is just working around a defect in how OpenCL was implemented. There should be no difference between generic and default address space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because the DefaultAS applies to every single language and it wasn't deemed proper to modify C/C++ behavior? I'm not an expert, maybe if @AnastasiaStulova chimes in. Regardless, I'm mostly just working with the smallest number of changes with what we've got that gives me the desired result.
https://godbolt.org/z/qWGaejTx9 for this case, I'm wondering if there's a way to resolve this by considering the AS as part of the pointer, since I don't know if AS means anything on a non-pointer type. |
Can you give some examples? It sounds that what you really want is for address space qualifiers to not be part of a type signature. OpenCL sort of happens to avoid that by sticking Clang and C++ indeed still have issues with AS-qualified pointers. Yet C++ does not consider the types equivalent, so you can't pass plain pointer as AS(0)-qualified argument and vice versa: https://godbolt.org/z/K4PEv9e7j It's a bug that needs fixing, IMO. We either treat unqualified ans AS(0) pointer types as different, and give them different mangling. Or we should treat AS(0) and plain pointer types the same, diagnose AS0/plain overloads as redefinitions (no need to wait til we run into mangling conflict) and allow using them interchangeably. Would straightening this out help with the issue you're trying to solve? |
Stuff like this is mostly what I'm talking about https://godbolt.org/z/1K8KGdqe9. I previously wanted to relax the handling of the default address space for all targets, but that made @AlexVlx unhappy in #112248. It's not it not being the type signature, it's more just having default conversion rules. The case I provided with the template is just an edge case that this changes away from C++, which is a little weird. |
Honestly I would prefer to just have an option that lets |
I think I generally agree with @AlexVlx argument. While the patch may solve you immediate issue, I think it's not going to give you a usable compilation model for AS-qualified pointers. If you are defining your own C++ extension along the lines of CUDA/HIP/OpenCL, you would have some wiggle room on how you want things to behave, but if you want it to be a regular C++ compilation, you've got to play by C++ rules. The diagnostics in your example looks quite sensible to me -- you're asking compiler to do with things of different types, but you do not provide implementation for it. and because C++ has no idea about what can and can't be done with non-default AS, it's all on you to do it. I don't think you can have this cake (explicit AS-qualified pointers) and eat it (make them all usable out of the box in a consistent manner, magically compatible with everything C++). Sticking For example, it looks like the pointer promotion will be ABI-breaking. How will you mangle function names with pointer arguments? Normally AS is encoded in the mangled name. And both the caller and callee must be in sync in order to call the right function -> therefore everything must be compiled with the same setting of this flag, including libc, etc. |
I agree in general that C/C++ has no semantic meaning ascribed to address spaces, so we cannot rely on intelligent semantic checking since C++ is strongly typed. Realizing that this literally changes all the address spaces makes it a bit difficult to argue that it's just C++, however it's pretty much impossible to use classes as-is since we cannot convert I think this approach is fine as an opt-in to have really lenient rules, since that's what AMDGPU and CUDA support already... But doing it with This was the 'easy' solution, alternatives would be to allow this for target specific address spaces (i.e. AMDGPU supports any addrspace to default). Another solution would be to do the same treatment here but with |
Err, C does. Please see TR 18037 (https://standards.iso.org/ittf/PubliclyAvailableStandards/c051126_ISO_IEC_TR_18037_2008.zip) AIUI, we've extended the C extension into C++, so we should be following the C semantics while mapping them on to C++'s stronger type system as closely as we can. |
Interesting, though reading through that I didn't see any mentions of implicit casts. It's simply stating that they can do casting if the target lists them as a subset. However, that would mean that the program passing sema would be dependent on the target machine. I'm actually not opposed to that, we could tear out the Also it's worth noting that in the documentation it has stuff like Maybe someone who reads better standardese than me can help with this. My ultimate goal is to make this legal when targeting NVPTX or AMDGPU, which is easily done by stating that void *foo(void [[clang::addrspace(3) *p) { return p; } |
-fdefault-generic-addrspace
flag for targeting GPUs
I updated the patch to use target info instead, this doesn't affect OpenCL because they don't use the |
Which is the gist of what I am saying: linguistic constructs carry semantics (meaning), we should not work back from their disembodied lowering into a target specific quantity to generalise the latter into a linguistic construct beyond the initial one. CUDA made language design choices, it is an AS agnostic language, cuda_shared doesn't work like an address space in the OCL / Clang sense, even though it ends up lowered through the same mechanism / leads to a numbered AS being CodeGen-ed. TL;DR: this doesn't work by design, not by accident = those ARE distinct from a language perspective. |
12b7653
to
24d6d01
Compare
I've updated this to make the language AS's and target AS's distinct so now this won't conflict at all. This patch just uses target information to ascribe some semantic meaning to the target AS's, which I believe is legitimate. The text that @AaronBallman provided says the following.
My interpretation of this is that it's fine to have target specific behavior for address spaces, and that it seems okay with |
Address spaces from language dialects generally have required relationships and behaviors in the language, and that really shouldn't be overridden by targets. However, targets do need to be able to decide how target-specific address spaces work, including how they interact with language address spaces. We can square this circle in two ways that I can see. The first is that we could only defer to the It seems to me that the AST-level comparison routines ought to take an |
This is pretty much what I've got now, we only check the target specific address spaces in the target info, so the languages should be unaffected.
|
Okay the problem with using |
Can you just outline the slow path where you have to actually call the target info hook? |
It would require pushing some unrelated functions into |
24d6d01
to
1e400ac
Compare
I'm not sure what unrelated code you're saying would need to pulled into a .cpp file. It looks like there's only one actual call to |
The checks for |
Oh, you can just forward-declare |
f114193
to
bc76c32
Compare
bc76c32
to
23a8d5a
Compare
Done, and a sema test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly LGTM. A few structural requests.
Typo? "Use TargetInfo when deciding if an address space is compatible" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. I would still prefer this PR to just contain the NFC refactor, with a follow up that adds the semantic changes to the AMDGPU and NVPTX targets.
This was introduced in #115777, but isn't actually used.
@@ -31,6 +31,7 @@ | |||
#include "clang/Basic/PointerAuthOptions.h" | |||
#include "clang/Basic/SourceLocation.h" | |||
#include "clang/Basic/Specifiers.h" | |||
#include "clang/Basic/TargetInfo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this include again in bc3b0fa because it was causing a significant build time regression. I assume this was a leftover from some earlier iteration of the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be necessary, I can make a PR to remove it again. Ah you meant you did it already, nevermind. Thanks for cleaning that up. Doing the change that was requested was really annoying so I did it off of the diff and forgot to delete that.
@@ -111,6 +111,18 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { | |||
return getPointerWidthV(AddrSpace); | |||
} | |||
|
|||
virtual bool isAddressSpaceSupersetOf(LangAS A, LangAS B) const override { | |||
// The flat address space AS(0) is a superset of all the other address | |||
// spaces used by the backend target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should exclude several address spaces. Should exclude region. Arguably also the fat pointers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic should only permit 0-5, I was torn on rejecting AS(2), can edit that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should reject it, it's not flat addressable. We don't have a magic aperture constant for it and the addrspacecast codegen will fail
Summary:
Address spaces are used in several embedded and GPU targets to describe
accesses to different types of memory. Currently we use the address
space enumerations to control which address spaces are considered
supersets of eachother, however this is also a target level property as
described by the C standard's passing mentions. This patch allows the
address space checks to use the target information to decide if a
pointer conversion is legal. For AMDGPU and NVPTX, all supported address
spaces can be converted to the default address space.
More semantic checks can be added on top of this, for now I'm mainly
looking to get more standard semantics working for C/C++. Right now the
address space conversions must all be done explicitly in C/C++ unlike
the offloading languages which define their own custom address spaces
that just map to the same target specific ones anyway. The main question
is if this behavior is a function of the target or the language.