-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -170,20 +173,46 @@ static Value *appendString(IRBuilder<> &Builder, Value *Desc, Value *Arg, | |||
return callAppendStringN(Builder, Desc, Arg, Length, IsLast); | |||
} | |||
|
|||
static Value *appendVectorArg(IRBuilder<> &Builder, Value *Desc, Value *Arg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the codegen changes here should be a separate commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all still in this PR?
auto VecArg = dyn_cast<FixedVectorType>(Args[i]->getType()); | ||
assert(VecArg && "invalid vector specifier"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast<> instead of dyn_cast + assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AllocSize = VecArg->getNumElements() * 8; | ||
} else | ||
AllocSize = M->getDataLayout().getTypeAllocSize(Args[i]->getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand this split vector handling. Just always use getTypeAllocSize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extract individual vector elements, expand them to 8 bytes and store them onto the buffer. The "getTypeAllocSize" would not give me the actual occupied size in the buffer in this case.
// have printed a warning. We just rely on undefined behaviour and send the | ||
// argument anyway. | ||
return appendArg(Builder, Desc, Arg, IsLast); | ||
} else if (IsVector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No else after return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Zero, Zero, Zero, Zero, Zero, false); | ||
} | ||
|
||
auto Val = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f932959
to
b443e11
Compare
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") | |||
|
|||
// OpenCL | |||
LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is necessary for the current task. What does it fix in the parsing OpenCL builtins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong here. This has nothing to do with AMDGPU
if (Ty->getTypeID() == Type::DoubleTyID) { | ||
if (Ty->isFloatingPointTy()) { | ||
if (DL.getTypeAllocSize(Ty) < 8) | ||
Arg = Builder.CreateFPExt(Arg, Builder.getDoubleTy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typecast should not be necessary. Default argument promotions in C++ for variadic functions ensure that all floating point arguments are promoted to double. If that is not happening, can you demonstrate with a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type cast is necessary for types such as _Float16, which is not handled at argument promotion. I have added a test case to show the same
return Builder.CreateBitCast(Arg, Int64Ty); | ||
} | ||
|
||
if (isa<PointerType>(Ty)) { | ||
if (!IsBuffered && isa<PointerType>(Ty)) { | ||
return Builder.CreatePtrToInt(Arg, Int64Ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are pointers handled in the buffered case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer is just pushed onto the buffer. The cast is necessary for the hostcall case to be compatible with device lib functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Information like this should be written in comments in the source code itself.
assert(Arg->getType()->isVectorTy() && "incorrent append* function"); | ||
auto VectorTy = dyn_cast<FixedVectorType>(Arg->getType()); | ||
auto Zero = Builder.getInt64(0); | ||
if (VectorTy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the argument is ignored if it is not a FixedVectorType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this code a little now so that only FixedVectorTypes are handled. This should be okay since the OCL specs specifically say only vectors of length 2,3,4,8 and 16 are supported for printf.
// If the format specifies a string but the argument is not, the frontend | ||
// will have printed a warning. We just rely on undefined behaviour and send | ||
// the argument anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pure whitespace change. Keeping the original formatting of the comment helps simplify the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump. Please restore the original comment if there is no change in the actual words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump!
@@ -194,6 +226,8 @@ static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) { | |||
SpecPos += 2; | |||
continue; | |||
} | |||
if (Str.find_first_of("v", SpecPos) != StringRef::npos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work as expected. It can clearly match a "v" that occurs after the data type. For example, it is supposed to match "%v2d", but it will also match "%d v". The match should be performed inside the "Spec" substring created below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
"-mprintf-kind=" + | ||
Args.getLastArgValue(options::OPT_mprintf_kind_EQ))); | ||
// Force compiler error on invalid conversion specifiers | ||
CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump. I am not convinced that we should force errors on invalid specifiers. What is the rationale for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just so that we could error out instead of undefined behaviors due to wrong specifiers. I have no preference here and would be okay to change it if you feel so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is important. Converting a warning into an error can cause surprises when users build existing applications. It is upto the user to decide if they want this to be an error, which is why it is a command-line option. Even if the spec says undefined behaviour, we should just do something reasonable and accept whatever the user wrote without forcing an error. There is a lot of freedom in deciding what is "something reasonable" ... we could choose to print nothing, or a default value, or a placeholder instead of the actual printf format string, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump! Remove this command-line flag.
} | ||
|
||
// Scan the format string to locate all specifiers, and mark the ones that | ||
// specify a string, i.e, the "%s" specifier with optional '*' characters. | ||
static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) { | ||
static void locateCStringsAndVectors(SparseBitVector<8> &BV, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comment above the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1,12 +1,68 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s | |||
|
|||
int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be tests that use the "v" modifier inside a format specifier such as "%v2d" and also tests that use outside it, such as "%dv".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see tests with the "v" correctly used as a vector specifier. What about tests like "%dv", and other cases where either the "v" is wrong, or it's just part of the text being printed? Given that the first attempt at detecting "v" had errors in it, I think it will be good to cover all corner cases where a "v" is actually a vector specifier and and where it is not.
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Vikram Hegde (vikramRH) ChangesKindly review top commit here, The builtin specific changes are up for review in a seperate patch (#72554) Few implementation details,
Patch is 89.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72556.diff 9 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index a19c8bd5f219ec6..1799c72806bfdd4 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -21,6 +21,10 @@
#if defined(BUILTIN) && !defined(TARGET_BUILTIN)
# define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) BUILTIN(ID, TYPE, ATTRS)
#endif
+
+#if defined(BUILTIN) && !defined(LANGBUILTIN)
+#define LANGBUILTIN(ID, TYPE, ATTRS, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS)
+#endif
//===----------------------------------------------------------------------===//
// SI+ only builtins.
//===----------------------------------------------------------------------===//
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts")
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts")
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts")
+// OpenCL
+LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)
+
#undef BUILTIN
#undef TARGET_BUILTIN
+#undef LANGBUILTIN
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index c5c2edf1bfe3aba..2597422bdd521a0 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -49,6 +49,7 @@
#include "clang/Basic/SourceLocation.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Basic/Specifiers.h"
+#include "clang/Basic/TargetBuiltins.h"
#include "clang/Basic/TargetCXXABI.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/Visibility.h"
@@ -3598,6 +3599,12 @@ unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const {
if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
return 0;
+ // AMDGCN implementation supports printf as a builtin
+ // for OpenCL
+ if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+ Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+ return BuiltinID;
+
// OpenCL v1.2 s6.9.f - The library functions defined in
// the C99 standard headers are not available.
if (Context.getLangOpts().OpenCL &&
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 409ae32ab424215..307cfa49f54e926 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -91,6 +91,8 @@ static constexpr Builtin::Info BuiltinInfo[] = {
{#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
#define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) \
{#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define LANGBUILTIN(ID, TYPE, ATTRS, LANG) \
+ {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANG},
#include "clang/Basic/BuiltinsAMDGPU.def"
};
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 09309a3937fb613..8d51df24c7872b7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
&getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad())
BuiltinID = mutateLongDoubleBuiltin(BuiltinID);
+ // Mutate the printf builtin ID so that we use the same CodeGen path for
+ // HIP and OpenCL with AMDGPU targets.
+ if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+ BuiltinID = Builtin::BIprintf;
+
// If the builtin has been declared explicitly with an assembler label,
// disable the specialized emitting below. Ideally we should communicate the
// rename in IR, or at least avoid generating the intrinsic calls that are
@@ -5617,7 +5622,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return EmitOpenMPDevicePrintfCallExpr(E);
if (getTarget().getTriple().isNVPTX())
return EmitNVPTXDevicePrintfCallExpr(E);
- if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
+ if (getTarget().getTriple().isAMDGCN() &&
+ (getLangOpts().HIP || getLangOpts().OpenCL))
return EmitAMDGPUDevicePrintfCallExpr(E);
}
diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index de4ee68c0da1e79..4eb2cf826e700fb 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -13,7 +13,10 @@
#include "CodeGenFunction.h"
#include "clang/Basic/Builtins.h"
+#include "clang/Basic/TargetBuiltins.h"
+#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DataLayout.h"
+#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Instruction.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
@@ -177,10 +180,20 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) {
E, this, GetVprintfDeclaration(CGM.getModule()), false);
}
+// Deterimines if an argument is a string
+static bool isString(const clang::Type *argXTy) {
+ if ((argXTy->isPointerType() || argXTy->isConstantArrayType()) &&
+ argXTy->getPointeeOrArrayElementType()->isCharType())
+ return true;
+ else
+ return false;
+}
+
RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
assert(getTarget().getTriple().getArch() == llvm::Triple::amdgcn);
assert(E->getBuiltinCallee() == Builtin::BIprintf ||
- E->getBuiltinCallee() == Builtin::BI__builtin_printf);
+ E->getBuiltinCallee() == Builtin::BI__builtin_printf ||
+ E->getBuiltinCallee() == AMDGPU::BIprintf);
assert(E->getNumArgs() >= 1); // printf always has at least one arg.
CallArgList CallArgs;
@@ -188,6 +201,8 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
E->getDirectCallee()->getType()->getAs<FunctionProtoType>(),
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
+ llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
+ IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
SmallVector<llvm::Value *, 8> Args;
for (const auto &A : CallArgs) {
@@ -198,15 +213,31 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
}
llvm::Value *Arg = A.getRValue(*this).getScalarVal();
+ if (isString(A.getType().getTypePtr()) && CGM.getLangOpts().OpenCL)
+ Arg = Builder.CreateAddrSpaceCast(Arg, CGM.Int8PtrTy);
Args.push_back(Arg);
}
- llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
- IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
+ auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal;
+ bool isBuffered =
+ (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered);
+
+ StringRef FmtStr;
+ if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
+ if (FmtStr.empty())
+ FmtStr = StringRef("", 1);
+ } else {
+ if (CGM.getLangOpts().OpenCL) {
+ llvm::DiagnosticInfoUnsupported UnsupportedFormatStr(
+ *IRB.GetInsertBlock()->getParent(),
+ "printf format string must be a trivially resolved constant string "
+ "global variable",
+ IRB.getCurrentDebugLocation());
+ IRB.getContext().diagnose(UnsupportedFormatStr);
+ }
+ }
- bool isBuffered = (CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal ==
- clang::TargetOptions::AMDGPUPrintfKind::Buffered);
- auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, isBuffered);
+ auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, FmtStr, isBuffered);
Builder.SetInsertPoint(IRB.GetInsertBlock(), IRB.GetInsertPoint());
return RValue::get(Printf);
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index b462f5a44057d94..b63c777fd1f158c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4742,6 +4742,16 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.ClaimAllArgs(options::OPT_gen_cdb_fragment_path);
}
+ if (TC.getTriple().isAMDGPU() && types::isOpenCL(Input.getType())) {
+ if (Args.getLastArg(options::OPT_mprintf_kind_EQ)) {
+ CmdArgs.push_back(Args.MakeArgString(
+ "-mprintf-kind=" +
+ Args.getLastArgValue(options::OPT_mprintf_kind_EQ)));
+ // Force compiler error on invalid conversion specifiers
+ CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier"));
+ }
+ }
+
if (IsCuda || IsHIP) {
// We have to pass the triple of the host if compiling for a CUDA/HIP device
// and vice-versa.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-printf.cl b/clang/test/CodeGenOpenCL/amdgpu-printf.cl
index edf6dbf8657cbe5..9b411f23ceba56a 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-printf.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-printf.cl
@@ -1,5 +1,8 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)));
@@ -7,6 +10,42 @@ int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)))
// CHECK-NEXT: entry:
// CHECK-NEXT: [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str) #[[ATTR4:[0-9]+]]
// CHECK-NEXT: ret void
+// CHECK_BUFFERED-LABEL: @test_printf_noargs(
+// CHECK_BUFFERED-NEXT: entry:
+// CHECK_BUFFERED-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
+// CHECK_BUFFERED-NEXT: [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT: br i1 [[TMP0]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED: end.block:
+// CHECK_BUFFERED-NEXT: [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK_BUFFERED-NEXT: [[PRINTF_RESULT:%.*]] = sext i1 [[TMP1]] to i32
+// CHECK_BUFFERED-NEXT: ret void
+// CHECK_BUFFERED: argpush.block:
+// CHECK_BUFFERED-NEXT: store i32 50, ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
+// CHECK_BUFFERED-NEXT: store i64 -8529802306755643245, ptr addrspace(1) [[TMP2]], align 8
+// CHECK_BUFFERED-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP2]], i32 8
+// CHECK_BUFFERED-NEXT: br label [[END_BLOCK]]
+//
+// CHECK_HOSTCALL-LABEL: @test_printf_noargs(
+// CHECK_HOSTCALL-NEXT: entry:
+// CHECK_HOSTCALL-NEXT: [[TMP0:%.*]] = call i64 @__ockl_printf_begin(i64 0)
+// CHECK_HOSTCALL-NEXT: br i1 icmp eq (ptr addrspacecast (ptr addrspace(4) @.str to ptr), ptr null), label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_HOSTCALL: strlen.while:
+// CHECK_HOSTCALL-NEXT: [[TMP1:%.*]] = phi ptr [ addrspacecast (ptr addrspace(4) @.str to ptr), [[ENTRY:%.*]] ], [ [[TMP2:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_HOSTCALL-NEXT: [[TMP2]] = getelementptr i8, ptr [[TMP1]], i64 1
+// CHECK_HOSTCALL-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 1
+// CHECK_HOSTCALL-NEXT: [[TMP4:%.*]] = icmp eq i8 [[TMP3]], 0
+// CHECK_HOSTCALL-NEXT: br i1 [[TMP4]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_HOSTCALL: strlen.while.done:
+// CHECK_HOSTCALL-NEXT: [[TMP5:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK_HOSTCALL-NEXT: [[TMP6:%.*]] = sub i64 [[TMP5]], ptrtoint (ptr addrspacecast (ptr addrspace(4) @.str to ptr) to i64)
+// CHECK_HOSTCALL-NEXT: [[TMP7:%.*]] = add i64 [[TMP6]], 1
+// CHECK_HOSTCALL-NEXT: br label [[STRLEN_JOIN]]
+// CHECK_HOSTCALL: strlen.join:
+// CHECK_HOSTCALL-NEXT: [[TMP8:%.*]] = phi i64 [ [[TMP7]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_HOSTCALL-NEXT: [[TMP9:%.*]] = call i64 @__ockl_printf_append_string_n(i64 [[TMP0]], ptr addrspacecast (ptr addrspace(4) @.str to ptr), i64 [[TMP8]], i32 1)
+// CHECK_HOSTCALL-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+// CHECK_HOSTCALL-NEXT: ret void
//
__kernel void test_printf_noargs() {
printf("");
@@ -19,6 +58,53 @@ __kernel void test_printf_noargs() {
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA8]]
// CHECK-NEXT: [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str.1, i32 noundef [[TMP0]]) #[[ATTR4]]
// CHECK-NEXT: ret void
+// CHECK_BUFFERED-LABEL: @test_printf_int(
+// CHECK_BUFFERED-NEXT: entry:
+// CHECK_BUFFERED-NEXT: [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_BUFFERED-NEXT: store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16:![0-9]+]]
+// CHECK_BUFFERED-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 20)
+// CHECK_BUFFERED-NEXT: [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT: br i1 [[TMP1]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED: end.block:
+// CHECK_BUFFERED-NEXT: [[TMP2:%.*]] = xor i1 [[TMP1]], true
+// CHECK_BUFFERED-NEXT: [[PRINTF_RESULT:%.*]] = sext i1 [[TMP2]] to i32
+// CHECK_BUFFERED-NEXT: ret void
+// CHECK_BUFFERED: argpush.block:
+// CHECK_BUFFERED-NEXT: store i32 82, ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
+// CHECK_BUFFERED-NEXT: store i64 -2582314622382785113, ptr addrspace(1) [[TMP3]], align 8
+// CHECK_BUFFERED-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP3]], i32 8
+// CHECK_BUFFERED-NEXT: [[TMP5:%.*]] = zext i32 [[TMP0]] to i64
+// CHECK_BUFFERED-NEXT: store i64 [[TMP5]], ptr addrspace(1) [[TMP4]], align 8
+// CHECK_BUFFERED-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP4]], i32 8
+// CHECK_BUFFERED-NEXT: br label [[END_BLOCK]]
+//
+// CHECK_HOSTCALL-LABEL: @test_printf_int(
+// CHECK_HOSTCALL-NEXT: entry:
+// CHECK_HOSTCALL-NEXT: [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_HOSTCALL-NEXT: store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA9:![0-9]+]]
+// CHECK_HOSTCALL-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA9]]
+// CHECK_HOSTCALL-NEXT: [[TMP1:%.*]] = call i64 @__ockl_printf_begin(i64 0)
+// CHECK_HOSTCALL-NEXT: br i1 icmp eq (ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr), ptr null), label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_HOSTCALL: strlen.while:
+// CHECK_HOSTCALL-NEXT: [[TMP2:%.*]] = phi ptr [ addrspacecast (ptr addrspace(4) @.str.1 to ptr), [[ENTRY:%.*]] ], [ [[TMP3:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_HOSTCALL-NEXT: [[TMP3]] = getelementptr i8, ptr [[TMP2]], i64 1
+// CHECK_HOSTCALL-NEXT: [[TMP4:%.*]] = load i8, ptr [[TMP2]], align 1
+// CHECK_HOSTCALL-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0
+// CHECK_HOSTCALL-NEXT: br i1 [[TMP5]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_HOSTCALL: strlen.while.done:
+// CHECK_HOSTCALL-NEXT: [[TMP6:%.*]] = ptrtoint ptr [[TMP2]] to i64
+// CHECK_HOSTCALL-NEXT: [[TMP7:%.*]] = sub i64 [[TMP6]], ptrtoint (ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr) to i64)
+// CHECK_HOSTCALL-NEXT: [[TMP8:%.*]] = add i64 [[TMP7]], 1
+// CHECK_HOSTCALL-NEXT: br label [[STRLEN_JOIN]]
+// CHECK_HOSTCALL: strlen.join:
+// CHECK_HOSTCALL-NEXT: [[TMP9:%.*]] = phi i64 [ [[TMP8]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_HOSTCALL-NEXT: [[TMP10:%.*]] = call i64 @__ockl_printf_append_string_n(i64 [[TMP1]], ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr), i64 [[TMP9]], i32 0)
+// CHECK_HOSTCALL-NEXT: [[TMP11:%.*]] = zext i32 [[TMP0]] to i64
+// CHECK_HOSTCALL-NEXT: [[TMP12:%.*]] = call i64 @__ockl_printf_append_args(i64 [[TMP10]], i32 1, i64 [[TMP11]], i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i32 1)
+// CHECK_HOSTCALL-NEXT: [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32
+// CHECK_HOSTCALL-NEXT: ret void
//
__kernel void test_printf_int(int i) {
printf("%d", i);
@@ -36,8 +122,677 @@ __kernel void test_printf_int(int i) {
// CHECK-NEXT: [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str.2, ptr addrspace(5) noundef [[ARRAYDECAY]], i32 noundef [[TMP2]]) #[[ATTR4]]
// CHECK-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR5]]
// CHECK-NEXT: ret void
+// CHECK_BUFFERED-LABEL: @test_printf_str_int(
+// CHECK_BUFFERED-NEXT: entry:
+// CHECK_BUFFERED-NEXT: [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_BUFFERED-NEXT: [[S:%.*]] = alloca [4 x i8], align 1, addrspace(5)
+// CHECK_BUFFERED-NEXT: store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT: call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR1:[0-9]+]]
+// CHECK_BUFFERED-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) align 1 [[S]], ptr addrspace(4) align 1 @__const.test_printf_str_int.s, i64 4, i1 false)
+// CHECK_BUFFERED-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [4 x i8], ptr addrspace(5) [[S]], i64 0, i64 0
+// CHECK_BUFFERED-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT: [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[ARRAYDECAY]] to ptr
+// CHECK_BUFFERED-NEXT: [[TMP2:%.*]] = icmp eq ptr [[TMP1]], null
+// CHECK_BUFFERED-NEXT: br i1 [[TMP2]], label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_BUFFERED: strlen.while:
+// CHECK_BUFFERED-NEXT: [[TMP3:%.*]] = phi ptr [ [[TMP1]], [[ENTRY:%.*]] ], [ [[TMP4:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_BUFFERED-NEXT: [[TMP4]] = getelementptr i8, ptr [[TMP3]], i64 1
+// CHECK_BUFFERED-NEXT: [[TMP5:%.*]] = load i8, ptr [[TMP3]], align 1
+// CHECK_BUFFERED-NEXT: [[TMP6:%.*]] = icmp eq i8 [[TMP5]], 0
+// CHECK_BUFFERED-NEXT: br i1 [[TMP6]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_BUFFERED: strlen.while.done:
+// CHECK_BUFFERED-NEXT: [[TMP7:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK_BUFFERED-NEXT: [[TMP8:%.*]] = ptrtoint ptr [[TMP3]] to i64
+// CHECK_BUFFERED-NEXT: [[TMP9:%.*]] = sub i64 [[TMP8]], [[TMP7]]
+// CHECK_BUFFERED-NEXT: [[TMP10:%.*]] = add i64 [[TMP9]], 1
+// CHECK_BUFFERED-NEXT: br label [[STRLEN_JOIN]]
+// CHECK_BUFFERED: strlen.join:
+// CHECK_BUFFERED-NEXT: [[TMP11:%.*]] = phi i64 [ [[TMP10]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_BUFFERED-NEXT: [[TMP12:%.*]] = add i64 [[TMP11]], 7
+// CHECK_BUFFERED-NEXT: [[TMP13:%.*]] = and i64 [[TMP12]], 4294967288
+// CHECK_BUFFERED-NEXT: [[TMP14:%.*]] = add i64 [[TMP13]], 20
+// CHECK_BUFFERED-NEXT: [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
+// CHECK_BUFFERED-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 [[TMP15]])
+// CHECK_BUFFERED-NEXT: [[TMP16:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT: br i1 [[TMP16]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED: end.block:
+// CHECK_BUFFERED-NEXT: [[TMP17:%.*]] = xor i1 [[TMP16]], true
+// CHECK_BUFFERED-NEXT: [[PRINTF_RESULT:%.*]] = sext i1 [[TMP17]] to i32
+// CHECK_BUFFERED-NEXT: call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR1]]
+// CHECK_BUFFERED-NEXT: ret void
+// CHECK_BUFFERED: argpush.block:
+// CHECK_BUFFERED-NEXT: [[TMP18:%.*]] = shl i32 [[TMP15]], 2
+// CHECK_BUFFERED-NEXT: [[TMP19:%.*]] = or i32 [[TMP18]], 2
+// CHECK_BUFFERED-NEXT: store i32 [[TMP19]], ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT: ...
[truncated]
|
ping |
The split up parts are still part of this one PR. Currently you're supposed to create a separate PR for each separate change. The set behavior is to squash all of these together on submit |
@arsenm , apologies for the trouble here. I should have based this out of my earlier commit. currently I do not see a way to base this patch off of my earlier commit and it might get too confusing for other reviewers if I close this and raise another review. would the individual commit details suffice ? |
Some comments still need to be addressed. |
return Builder.CreateBitCast(Arg, Int64Ty); | ||
} | ||
|
||
if (isa<PointerType>(Ty)) { | ||
if (!IsBuffered && isa<PointerType>(Ty)) { | ||
return Builder.CreatePtrToInt(Arg, Int64Ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Information like this should be written in comments in the source code itself.
@@ -1,12 +1,68 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s | |||
// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s | |||
|
|||
int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see tests with the "v" correctly used as a vector specifier. What about tests like "%dv", and other cases where either the "v" is wrong, or it's just part of the text being printed? Given that the first attempt at detecting "v" had errors in it, I think it will be good to cover all corner cases where a "v" is actually a vector specifier and and where it is not.
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") | |||
|
|||
// OpenCL | |||
LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.
@@ -198,6 +229,10 @@ static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) { | |||
if (SpecEnd == StringRef::npos) | |||
return; | |||
auto Spec = Str.slice(SpecPos, SpecEnd + 1); | |||
|
|||
if ((Spec.find_first_of("v")) != StringRef::npos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ".find()" should be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a separate PR open for "Add vector processing support to AMDGPU printf"? I think it's easiest to move this part forward first
cb564c2
to
aaff6dd
Compare
@arsenm , you are right. I just want to make sure we are good on runtime changes too now since there seems to be a blocker. The changes here are not necessary unless we are okay with runtime changes. |
// Scan the format string to locate all specifiers, and mark the ones that | ||
// specify a string, i.e, the "%s" specifier with optional '*' characters. | ||
static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) { | ||
// Scan the format string to locate all specifiers and OCL vectors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"all specifiers" is enough ... there is no need to say "OCL vectors". The rest of the sentence is the one which correctly says "string or vector".
// If the format specifies a string but the argument is not, the frontend | ||
// will have printed a warning. We just rely on undefined behaviour and send | ||
// the argument anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump!
return Builder.CreatePtrToInt(Arg, Int64Ty); | ||
} | ||
|
||
llvm_unreachable("unexpected type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This llvm_unreachable is preferred. It's clear documentation that all supported types have been handled by this point. Each if-block for integer, floating and pointer types should have its own default "return Arg".
clang/lib/CodeGen/CGGPUBuiltin.cpp
Outdated
if (FmtStr.empty()) | ||
FmtStr = StringRef("", 1); | ||
} else { | ||
if (CGM.getLangOpts().OpenCL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the wrong place for a diagnostic. For an OpenCL program, shouldn't Sema have already verified that the arguments match the required types, such as "constant address space" for the format string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagnostic should be replaced with an assert() or an llvm_unreachable(). The OpenCL spec says that the format string should be resolvable at compile time, but this is not the right place to check that. By now, the frontend or sema should have rejected the program as ill-formed.
"-mprintf-kind=" + | ||
Args.getLastArgValue(options::OPT_mprintf_kind_EQ))); | ||
// Force compiler error on invalid conversion specifiers | ||
CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump! Remove this command-line flag.
aaff6dd
to
de80946
Compare
The new set of changes adds following changes,
|
clang/lib/CodeGen/CGGPUBuiltin.cpp
Outdated
@@ -178,17 +181,29 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) { | |||
E, this, GetVprintfDeclaration(CGM.getModule()), false); | |||
} | |||
|
|||
// Deterimines if an argument is a string | |||
static bool isString(const clang::Type *argXTy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be called "MayBeString()" at best. It's not necessary that a char* argument type is a C-style string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this, addrspace cast is done during arg processing.
clang/lib/CodeGen/CGGPUBuiltin.cpp
Outdated
if (FmtStr.empty()) | ||
FmtStr = StringRef("", 1); | ||
} else { | ||
if (CGM.getLangOpts().OpenCL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagnostic should be replaced with an assert() or an llvm_unreachable(). The OpenCL spec says that the format string should be resolvable at compile time, but this is not the right place to check that. By now, the frontend or sema should have rejected the program as ill-formed.
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts") | |||
TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts") | |||
|
|||
// OpenCL | |||
LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong here. This has nothing to do with AMDGPU
// AMDGCN implementation supports printf as a builtin | ||
// for OpenCL | ||
if (Context.getTargetInfo().getTriple().isAMDGCN() && | ||
Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf) | ||
return BuiltinID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong here and has nothing to do with AMDGPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signatures of C-printf and OCL printf differ and I dont think generic builtin handling provides a way to register overloaded builtins with "shared" builtin ID's. This needs a new ID. do you have any alternate suggestions here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this had been clarified earlier too. It's quite imprecise to just say that "signatures differ". Perhaps the following detailed explanation might move the conversatino forward. The problem is that the OpenCL printf expects a format string in the constant address space, which has no representation in Clang builtin. What we do have is the ability to specify an address-space number in the builtin declaration. But this number is target-specific, which makes the whole builtin target-specific. Is there a way around that magic number 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only other alternative I see currently is to modify Sema (probably ActOnFunctionDeclarator) so that we map the ocl printf declaration to C printf builtin ID. This would be a really hacky solution and I would prefer this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @arsenm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builtin specifications are also in terms of the lang address space, not the target address space (this was an ugly compromise to make builtins work at all in OpenCL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arsenm, thanks for the info. CustomTypeChecking is a valid option. I'm not sure why OpenCL community did not consider this change despite OpenCL specs specifying the details. I could create a separate patch for this (probably folks from OCL community would provide further background). Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place ? (The frontend changes here can be reverted with that follow up patch )
PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different. If this understanding is correct, moving ahead with using same builtin ID as std printf is not the right way. (probably this is why such an implementation was never considered)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place
No. That cleanup will never happen.
PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different
That's not what that means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to
llvm-project/clang/lib/AST/Decl.cpp
Line 3633 in 3e6db60
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that check is buggy. The printf declaration doesn't come from a header, but the printf function does exist in 1.2+. It probably needs to special case printf
// Mutate the printf builtin ID so that we use the same CodeGen path for | ||
// HIP and OpenCL with AMDGPU targets. | ||
if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf) | ||
BuiltinID = Builtin::BIprintf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need to remap builtins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed if you feel so, probably we would need a new case in Expr CodeGen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all you need is to add the CustomTypeChecking attribute to the printf definition, and then add language specific type checking on the string argument
StringRef FmtStr; | ||
if (llvm::getConstantStringInfo(Args[0], FmtStr)) { | ||
if (FmtStr.empty()) | ||
FmtStr = StringRef("", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is producing an invalid StringRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. This is just to say the format string is not really empty (i.e size = 0) when the user input is an empty format string (a weird corner case)
Kindly review top commit here, The builtin specific changes are up for review in a seperate patch (#72554)
Few implementation details,