[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556

vikramRH · 2023-11-16T18:55:59Z

Kindly review top commit here, The builtin specific changes are up for review in a seperate patch (#72554)

Few implementation details,

Hostcall printf is now default for both HIP and OpenCL.
The implementation adds vector processing support both for hostcall and buffered cases. The vector elements are extracted and pushed onto the buffer individually (each alingned to 8 byte boundary)
for OpenCL hostcall case, The format string pointer is addrspace casted to generic address space to be compatible with hostcall device lib functions.

github-actions · 2023-11-16T18:58:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2023-11-17T02:29:34Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

@@ -170,20 +173,46 @@ static Value *appendString(IRBuilder<> &Builder, Value *Desc, Value *Arg,
  return callAppendStringN(Builder, Desc, Arg, Length, IsLast);
 }

+static Value *appendVectorArg(IRBuilder<> &Builder, Value *Desc, Value *Arg,


All of the codegen changes here should be a separate commit

These are all still in this PR?

arsenm · 2023-11-17T02:34:52Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+        auto VecArg = dyn_cast<FixedVectorType>(Args[i]->getType());
+        assert(VecArg && "invalid vector specifier");


cast<> instead of dyn_cast + assert

arsenm · 2023-11-17T02:35:33Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+        AllocSize = VecArg->getNumElements() * 8;
+      } else
+        AllocSize = M->getDataLayout().getTypeAllocSize(Args[i]->getType());


Don't understand this split vector handling. Just always use getTypeAllocSize?

I extract individual vector elements, expand them to 8 bytes and store them onto the buffer. The "getTypeAllocSize" would not give me the actual occupied size in the buffer in this case.

arsenm · 2023-11-17T02:35:46Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

-  // have printed a warning. We just rely on undefined behaviour and send the
-  // argument anyway.
-  return appendArg(Builder, Desc, Arg, IsLast);
+  } else if (IsVector) {


No else after return

arsenm · 2023-11-17T02:35:51Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+                            Zero, Zero, Zero, Zero, Zero, false);
+    }
+
+    auto Val =


ssahasra · 2023-11-20T06:27:18Z

clang/include/clang/Basic/BuiltinsAMDGPU.def

@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts")

+// OpenCL
+LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)


I don't understand why this is necessary for the current task. What does it fix in the parsing OpenCL builtins?

Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.

This does not belong here. This has nothing to do with AMDGPU

ssahasra · 2023-11-20T06:30:13Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

-  if (Ty->getTypeID() == Type::DoubleTyID) {
+  if (Ty->isFloatingPointTy()) {
+    if (DL.getTypeAllocSize(Ty) < 8)
+      Arg = Builder.CreateFPExt(Arg, Builder.getDoubleTy());


This typecast should not be necessary. Default argument promotions in C++ for variadic functions ensure that all floating point arguments are promoted to double. If that is not happening, can you demonstrate with a test?

The type cast is necessary for types such as _Float16, which is not handled at argument promotion. I have added a test case to show the same

ssahasra · 2023-11-20T06:32:34Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

    return Builder.CreateBitCast(Arg, Int64Ty);
  }

-  if (isa<PointerType>(Ty)) {
+  if (!IsBuffered && isa<PointerType>(Ty)) {
    return Builder.CreatePtrToInt(Arg, Int64Ty);


How are pointers handled in the buffered case?

The pointer is just pushed onto the buffer. The cast is necessary for the hostcall case to be compatible with device lib functions

Information like this should be written in comments in the source code itself.

ssahasra · 2023-11-20T06:35:21Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+  assert(Arg->getType()->isVectorTy() && "incorrent append* function");
+  auto VectorTy = dyn_cast<FixedVectorType>(Arg->getType());
+  auto Zero = Builder.getInt64(0);
+  if (VectorTy) {


So the argument is ignored if it is not a FixedVectorType?

I have changed this code a little now so that only FixedVectorTypes are handled. This should be okay since the OCL specs specifically say only vectors of length 2,3,4,8 and 16 are supported for printf.

ssahasra · 2023-11-20T06:36:55Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+  // If the format specifies a string but the argument is not, the frontend
+  // will have printed a warning. We just rely on undefined behaviour and send
+  // the argument anyway.


This is a pure whitespace change. Keeping the original formatting of the comment helps simplify the diff.

Bump. Please restore the original comment if there is no change in the actual words.

ssahasra · 2023-11-20T06:43:35Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

@@ -194,6 +226,8 @@ static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
      SpecPos += 2;
      continue;
    }
+    if (Str.find_first_of("v", SpecPos) != StringRef::npos)


I don't think this will work as expected. It can clearly match a "v" that occurs after the data type. For example, it is supposed to match "%v2d", but it will also match "%d v". The match should be performed inside the "Spec" substring created below.

ssahasra · 2023-11-20T06:47:01Z

clang/lib/Driver/ToolChains/Clang.cpp

+          "-mprintf-kind=" +
+          Args.getLastArgValue(options::OPT_mprintf_kind_EQ)));
+      // Force compiler error on invalid conversion specifiers
+      CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier"));


Why is this necessary here?

Bump. I am not convinced that we should force errors on invalid specifiers. What is the rationale for that?

This was just so that we could error out instead of undefined behaviors due to wrong specifiers. I have no preference here and would be okay to change it if you feel so.

Actually this is important. Converting a warning into an error can cause surprises when users build existing applications. It is upto the user to decide if they want this to be an error, which is why it is a command-line option. Even if the spec says undefined behaviour, we should just do something reasonable and accept whatever the user wrote without forcing an error. There is a lot of freedom in deciding what is "something reasonable" ... we could choose to print nothing, or a default value, or a placeholder instead of the actual printf format string, etc.

Bump! Remove this command-line flag.

ssahasra · 2023-11-20T06:48:17Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

 }

 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
+static void locateCStringsAndVectors(SparseBitVector<8> &BV,


update the comment above the function

ssahasra · 2023-11-20T06:49:14Z

clang/test/CodeGenOpenCL/amdgpu-printf.cl

@@ -1,12 +1,68 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s

 int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)));


There should be tests that use the "v" modifier inside a format specifier such as "%v2d" and also tests that use outside it, such as "%dv".

I only see tests with the "v" correctly used as a vector specifier. What about tests like "%dv", and other cases where either the "v" is wrong, or it's just part of the text being printed? Given that the first attempt at detecting "v" had errors in it, I think it will be good to cover all corner cases where a "v" is actually a vector specifier and and where it is not.

llvmbot · 2023-11-28T06:11:54Z

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Vikram Hegde (vikramRH)

Changes

Kindly review top commit here, The builtin specific changes are up for review in a seperate patch (#72554)

Few implementation details,

Hostcall printf is now default for both HIP and OpenCL.
The implementation adds vector processing support both for hostcall and buffered cases. The vector elements are extracted and pushed onto the buffer individually (each alingned to 8 byte boundary)
for OpenCL hostcall case, The format string pointer is addrspace casted to generic address space to be compatible with hostcall device lib functions.

Patch is 89.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72556.diff

9 Files Affected:

(modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+8)
(modified) clang/lib/AST/Decl.cpp (+7)
(modified) clang/lib/Basic/Targets/AMDGPU.cpp (+2)
(modified) clang/lib/CodeGen/CGBuiltin.cpp (+7-1)
(modified) clang/lib/CodeGen/CGGPUBuiltin.cpp (+37-6)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+10)
(modified) clang/test/CodeGenOpenCL/amdgpu-printf.cl (+756-1)
(modified) llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h (+1-1)
(modified) llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp (+90-57)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index a19c8bd5f219ec6..1799c72806bfdd4 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -21,6 +21,10 @@
 #if defined(BUILTIN) && !defined(TARGET_BUILTIN)
 #   define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) BUILTIN(ID, TYPE, ATTRS)
 #endif
+
+#if defined(BUILTIN) && !defined(LANGBUILTIN)
+#define LANGBUILTIN(ID, TYPE, ATTRS, BUILTIN_LANG) BUILTIN(ID, TYPE, ATTRS)
+#endif
 //===----------------------------------------------------------------------===//
 // SI+ only builtins.
 //===----------------------------------------------------------------------===//
@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts")
 
+// OpenCL
+LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)
+
 #undef BUILTIN
 #undef TARGET_BUILTIN
+#undef LANGBUILTIN
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index c5c2edf1bfe3aba..2597422bdd521a0 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -49,6 +49,7 @@
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Basic/SourceManager.h"
 #include "clang/Basic/Specifiers.h"
+#include "clang/Basic/TargetBuiltins.h"
 #include "clang/Basic/TargetCXXABI.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Basic/Visibility.h"
@@ -3598,6 +3599,12 @@ unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const {
   if (!ConsiderWrapperFunctions && getStorageClass() == SC_Static)
     return 0;
 
+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+      Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+    return BuiltinID;
+
   // OpenCL v1.2 s6.9.f - The library functions defined in
   // the C99 standard headers are not available.
   if (Context.getLangOpts().OpenCL &&
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 409ae32ab424215..307cfa49f54e926 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -91,6 +91,8 @@ static constexpr Builtin::Info BuiltinInfo[] = {
   {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
 #define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE)                               \
   {#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define LANGBUILTIN(ID, TYPE, ATTRS, LANG)                                     \
+  {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANG},
 #include "clang/Basic/BuiltinsAMDGPU.def"
 };
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 09309a3937fb613..8d51df24c7872b7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2458,6 +2458,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       &getTarget().getLongDoubleFormat() == &llvm::APFloat::IEEEquad())
     BuiltinID = mutateLongDoubleBuiltin(BuiltinID);
 
+   // Mutate the printf builtin ID so that we use the same CodeGen path for
+   // HIP and OpenCL with AMDGPU targets.
+   if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+     BuiltinID = Builtin::BIprintf;
+
   // If the builtin has been declared explicitly with an assembler label,
   // disable the specialized emitting below. Ideally we should communicate the
   // rename in IR, or at least avoid generating the intrinsic calls that are
@@ -5617,7 +5622,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         return EmitOpenMPDevicePrintfCallExpr(E);
       if (getTarget().getTriple().isNVPTX())
         return EmitNVPTXDevicePrintfCallExpr(E);
-      if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
+      if (getTarget().getTriple().isAMDGCN() &&
+         (getLangOpts().HIP || getLangOpts().OpenCL))
         return EmitAMDGPUDevicePrintfCallExpr(E);
     }
 
diff --git a/clang/lib/CodeGen/CGGPUBuiltin.cpp b/clang/lib/CodeGen/CGGPUBuiltin.cpp
index de4ee68c0da1e79..4eb2cf826e700fb 100644
--- a/clang/lib/CodeGen/CGGPUBuiltin.cpp
+++ b/clang/lib/CodeGen/CGGPUBuiltin.cpp
@@ -13,7 +13,10 @@
 
 #include "CodeGenFunction.h"
 #include "clang/Basic/Builtins.h"
+#include "clang/Basic/TargetBuiltins.h"
+#include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/DataLayout.h"
+#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
@@ -177,10 +180,20 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) {
       E, this, GetVprintfDeclaration(CGM.getModule()), false);
 }
 
+// Deterimines if an argument is a string
+static bool isString(const clang::Type *argXTy) {
+  if ((argXTy->isPointerType() || argXTy->isConstantArrayType()) &&
+      argXTy->getPointeeOrArrayElementType()->isCharType())
+    return true;
+  else
+    return false;
+}
+
 RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
   assert(getTarget().getTriple().getArch() == llvm::Triple::amdgcn);
   assert(E->getBuiltinCallee() == Builtin::BIprintf ||
-         E->getBuiltinCallee() == Builtin::BI__builtin_printf);
+         E->getBuiltinCallee() == Builtin::BI__builtin_printf ||
+         E->getBuiltinCallee() == AMDGPU::BIprintf);
   assert(E->getNumArgs() >= 1); // printf always has at least one arg.
 
   CallArgList CallArgs;
@@ -188,6 +201,8 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
                E->getDirectCallee()->getType()->getAs<FunctionProtoType>(),
                E->arguments(), E->getDirectCallee(),
                /* ParamsToSkip = */ 0);
+  llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
+  IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
 
   SmallVector<llvm::Value *, 8> Args;
   for (const auto &A : CallArgs) {
@@ -198,15 +213,31 @@ RValue CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(const CallExpr *E) {
     }
 
     llvm::Value *Arg = A.getRValue(*this).getScalarVal();
+    if (isString(A.getType().getTypePtr()) && CGM.getLangOpts().OpenCL)
+      Arg = Builder.CreateAddrSpaceCast(Arg, CGM.Int8PtrTy);
     Args.push_back(Arg);
   }
 
-  llvm::IRBuilder<> IRB(Builder.GetInsertBlock(), Builder.GetInsertPoint());
-  IRB.SetCurrentDebugLocation(Builder.getCurrentDebugLocation());
+  auto PFK = CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal;
+  bool isBuffered =
+       (PFK == clang::TargetOptions::AMDGPUPrintfKind::Buffered);
+
+  StringRef FmtStr;
+  if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
+    if (FmtStr.empty())
+      FmtStr = StringRef("", 1);
+  } else {
+    if (CGM.getLangOpts().OpenCL) {
+      llvm::DiagnosticInfoUnsupported UnsupportedFormatStr(
+          *IRB.GetInsertBlock()->getParent(),
+          "printf format string must be a trivially resolved constant string "
+          "global variable",
+          IRB.getCurrentDebugLocation());
+      IRB.getContext().diagnose(UnsupportedFormatStr);
+    }
+  }
 
-  bool isBuffered = (CGM.getTarget().getTargetOpts().AMDGPUPrintfKindVal ==
-                     clang::TargetOptions::AMDGPUPrintfKind::Buffered);
-  auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, isBuffered);
+  auto Printf = llvm::emitAMDGPUPrintfCall(IRB, Args, FmtStr, isBuffered);
   Builder.SetInsertPoint(IRB.GetInsertBlock(), IRB.GetInsertPoint());
   return RValue::get(Printf);
 }
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index b462f5a44057d94..b63c777fd1f158c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4742,6 +4742,16 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
     Args.ClaimAllArgs(options::OPT_gen_cdb_fragment_path);
   }
 
+  if (TC.getTriple().isAMDGPU() && types::isOpenCL(Input.getType())) {
+    if (Args.getLastArg(options::OPT_mprintf_kind_EQ)) {
+      CmdArgs.push_back(Args.MakeArgString(
+          "-mprintf-kind=" +
+          Args.getLastArgValue(options::OPT_mprintf_kind_EQ)));
+      // Force compiler error on invalid conversion specifiers
+      CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier"));
+    }
+  }
+
   if (IsCuda || IsHIP) {
     // We have to pass the triple of the host if compiling for a CUDA/HIP device
     // and vice-versa.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-printf.cl b/clang/test/CodeGenOpenCL/amdgpu-printf.cl
index edf6dbf8657cbe5..9b411f23ceba56a 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-printf.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-printf.cl
@@ -1,5 +1,8 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
 
 int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)));
 
@@ -7,6 +10,42 @@ int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)))
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str) #[[ATTR4:[0-9]+]]
 // CHECK-NEXT:    ret void
+// CHECK_BUFFERED-LABEL: @test_printf_noargs(
+// CHECK_BUFFERED-NEXT:  entry:
+// CHECK_BUFFERED-NEXT:    [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
+// CHECK_BUFFERED-NEXT:    [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT:    br i1 [[TMP0]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED:       end.block:
+// CHECK_BUFFERED-NEXT:    [[TMP1:%.*]] = xor i1 [[TMP0]], true
+// CHECK_BUFFERED-NEXT:    [[PRINTF_RESULT:%.*]] = sext i1 [[TMP1]] to i32
+// CHECK_BUFFERED-NEXT:    ret void
+// CHECK_BUFFERED:       argpush.block:
+// CHECK_BUFFERED-NEXT:    store i32 50, ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
+// CHECK_BUFFERED-NEXT:    store i64 -8529802306755643245, ptr addrspace(1) [[TMP2]], align 8
+// CHECK_BUFFERED-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP2]], i32 8
+// CHECK_BUFFERED-NEXT:    br label [[END_BLOCK]]
+//
+// CHECK_HOSTCALL-LABEL: @test_printf_noargs(
+// CHECK_HOSTCALL-NEXT:  entry:
+// CHECK_HOSTCALL-NEXT:    [[TMP0:%.*]] = call i64 @__ockl_printf_begin(i64 0)
+// CHECK_HOSTCALL-NEXT:    br i1 icmp eq (ptr addrspacecast (ptr addrspace(4) @.str to ptr), ptr null), label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_HOSTCALL:       strlen.while:
+// CHECK_HOSTCALL-NEXT:    [[TMP1:%.*]] = phi ptr [ addrspacecast (ptr addrspace(4) @.str to ptr), [[ENTRY:%.*]] ], [ [[TMP2:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_HOSTCALL-NEXT:    [[TMP2]] = getelementptr i8, ptr [[TMP1]], i64 1
+// CHECK_HOSTCALL-NEXT:    [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 1
+// CHECK_HOSTCALL-NEXT:    [[TMP4:%.*]] = icmp eq i8 [[TMP3]], 0
+// CHECK_HOSTCALL-NEXT:    br i1 [[TMP4]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_HOSTCALL:       strlen.while.done:
+// CHECK_HOSTCALL-NEXT:    [[TMP5:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK_HOSTCALL-NEXT:    [[TMP6:%.*]] = sub i64 [[TMP5]], ptrtoint (ptr addrspacecast (ptr addrspace(4) @.str to ptr) to i64)
+// CHECK_HOSTCALL-NEXT:    [[TMP7:%.*]] = add i64 [[TMP6]], 1
+// CHECK_HOSTCALL-NEXT:    br label [[STRLEN_JOIN]]
+// CHECK_HOSTCALL:       strlen.join:
+// CHECK_HOSTCALL-NEXT:    [[TMP8:%.*]] = phi i64 [ [[TMP7]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_HOSTCALL-NEXT:    [[TMP9:%.*]] = call i64 @__ockl_printf_append_string_n(i64 [[TMP0]], ptr addrspacecast (ptr addrspace(4) @.str to ptr), i64 [[TMP8]], i32 1)
+// CHECK_HOSTCALL-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+// CHECK_HOSTCALL-NEXT:    ret void
 //
 __kernel void test_printf_noargs() {
     printf("");
@@ -19,6 +58,53 @@ __kernel void test_printf_noargs() {
 // CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA8]]
 // CHECK-NEXT:    [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str.1, i32 noundef [[TMP0]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
+// CHECK_BUFFERED-LABEL: @test_printf_int(
+// CHECK_BUFFERED-NEXT:  entry:
+// CHECK_BUFFERED-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_BUFFERED-NEXT:    store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16:![0-9]+]]
+// CHECK_BUFFERED-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT:    [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 20)
+// CHECK_BUFFERED-NEXT:    [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT:    br i1 [[TMP1]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED:       end.block:
+// CHECK_BUFFERED-NEXT:    [[TMP2:%.*]] = xor i1 [[TMP1]], true
+// CHECK_BUFFERED-NEXT:    [[PRINTF_RESULT:%.*]] = sext i1 [[TMP2]] to i32
+// CHECK_BUFFERED-NEXT:    ret void
+// CHECK_BUFFERED:       argpush.block:
+// CHECK_BUFFERED-NEXT:    store i32 82, ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
+// CHECK_BUFFERED-NEXT:    store i64 -2582314622382785113, ptr addrspace(1) [[TMP3]], align 8
+// CHECK_BUFFERED-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP3]], i32 8
+// CHECK_BUFFERED-NEXT:    [[TMP5:%.*]] = zext i32 [[TMP0]] to i64
+// CHECK_BUFFERED-NEXT:    store i64 [[TMP5]], ptr addrspace(1) [[TMP4]], align 8
+// CHECK_BUFFERED-NEXT:    [[PRINTBUFFNEXTPTR:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP4]], i32 8
+// CHECK_BUFFERED-NEXT:    br label [[END_BLOCK]]
+//
+// CHECK_HOSTCALL-LABEL: @test_printf_int(
+// CHECK_HOSTCALL-NEXT:  entry:
+// CHECK_HOSTCALL-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_HOSTCALL-NEXT:    store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA9:![0-9]+]]
+// CHECK_HOSTCALL-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA9]]
+// CHECK_HOSTCALL-NEXT:    [[TMP1:%.*]] = call i64 @__ockl_printf_begin(i64 0)
+// CHECK_HOSTCALL-NEXT:    br i1 icmp eq (ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr), ptr null), label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_HOSTCALL:       strlen.while:
+// CHECK_HOSTCALL-NEXT:    [[TMP2:%.*]] = phi ptr [ addrspacecast (ptr addrspace(4) @.str.1 to ptr), [[ENTRY:%.*]] ], [ [[TMP3:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_HOSTCALL-NEXT:    [[TMP3]] = getelementptr i8, ptr [[TMP2]], i64 1
+// CHECK_HOSTCALL-NEXT:    [[TMP4:%.*]] = load i8, ptr [[TMP2]], align 1
+// CHECK_HOSTCALL-NEXT:    [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0
+// CHECK_HOSTCALL-NEXT:    br i1 [[TMP5]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_HOSTCALL:       strlen.while.done:
+// CHECK_HOSTCALL-NEXT:    [[TMP6:%.*]] = ptrtoint ptr [[TMP2]] to i64
+// CHECK_HOSTCALL-NEXT:    [[TMP7:%.*]] = sub i64 [[TMP6]], ptrtoint (ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr) to i64)
+// CHECK_HOSTCALL-NEXT:    [[TMP8:%.*]] = add i64 [[TMP7]], 1
+// CHECK_HOSTCALL-NEXT:    br label [[STRLEN_JOIN]]
+// CHECK_HOSTCALL:       strlen.join:
+// CHECK_HOSTCALL-NEXT:    [[TMP9:%.*]] = phi i64 [ [[TMP8]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_HOSTCALL-NEXT:    [[TMP10:%.*]] = call i64 @__ockl_printf_append_string_n(i64 [[TMP1]], ptr addrspacecast (ptr addrspace(4) @.str.1 to ptr), i64 [[TMP9]], i32 0)
+// CHECK_HOSTCALL-NEXT:    [[TMP11:%.*]] = zext i32 [[TMP0]] to i64
+// CHECK_HOSTCALL-NEXT:    [[TMP12:%.*]] = call i64 @__ockl_printf_append_args(i64 [[TMP10]], i32 1, i64 [[TMP11]], i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i32 1)
+// CHECK_HOSTCALL-NEXT:    [[TMP13:%.*]] = trunc i64 [[TMP12]] to i32
+// CHECK_HOSTCALL-NEXT:    ret void
 //
 __kernel void test_printf_int(int i) {
     printf("%d", i);
@@ -36,8 +122,677 @@ __kernel void test_printf_int(int i) {
 // CHECK-NEXT:    [[CALL:%.*]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) noundef @.str.2, ptr addrspace(5) noundef [[ARRAYDECAY]], i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR5]]
 // CHECK-NEXT:    ret void
+// CHECK_BUFFERED-LABEL: @test_printf_str_int(
+// CHECK_BUFFERED-NEXT:  entry:
+// CHECK_BUFFERED-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK_BUFFERED-NEXT:    [[S:%.*]] = alloca [4 x i8], align 1, addrspace(5)
+// CHECK_BUFFERED-NEXT:    store i32 [[I:%.*]], ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT:    call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR1:[0-9]+]]
+// CHECK_BUFFERED-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) align 1 [[S]], ptr addrspace(4) align 1 @__const.test_printf_str_int.s, i64 4, i1 false)
+// CHECK_BUFFERED-NEXT:    [[ARRAYDECAY:%.*]] = getelementptr inbounds [4 x i8], ptr addrspace(5) [[S]], i64 0, i64 0
+// CHECK_BUFFERED-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(5) [[I_ADDR]], align 4, !tbaa [[TBAA16]]
+// CHECK_BUFFERED-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[ARRAYDECAY]] to ptr
+// CHECK_BUFFERED-NEXT:    [[TMP2:%.*]] = icmp eq ptr [[TMP1]], null
+// CHECK_BUFFERED-NEXT:    br i1 [[TMP2]], label [[STRLEN_JOIN:%.*]], label [[STRLEN_WHILE:%.*]]
+// CHECK_BUFFERED:       strlen.while:
+// CHECK_BUFFERED-NEXT:    [[TMP3:%.*]] = phi ptr [ [[TMP1]], [[ENTRY:%.*]] ], [ [[TMP4:%.*]], [[STRLEN_WHILE]] ]
+// CHECK_BUFFERED-NEXT:    [[TMP4]] = getelementptr i8, ptr [[TMP3]], i64 1
+// CHECK_BUFFERED-NEXT:    [[TMP5:%.*]] = load i8, ptr [[TMP3]], align 1
+// CHECK_BUFFERED-NEXT:    [[TMP6:%.*]] = icmp eq i8 [[TMP5]], 0
+// CHECK_BUFFERED-NEXT:    br i1 [[TMP6]], label [[STRLEN_WHILE_DONE:%.*]], label [[STRLEN_WHILE]]
+// CHECK_BUFFERED:       strlen.while.done:
+// CHECK_BUFFERED-NEXT:    [[TMP7:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK_BUFFERED-NEXT:    [[TMP8:%.*]] = ptrtoint ptr [[TMP3]] to i64
+// CHECK_BUFFERED-NEXT:    [[TMP9:%.*]] = sub i64 [[TMP8]], [[TMP7]]
+// CHECK_BUFFERED-NEXT:    [[TMP10:%.*]] = add i64 [[TMP9]], 1
+// CHECK_BUFFERED-NEXT:    br label [[STRLEN_JOIN]]
+// CHECK_BUFFERED:       strlen.join:
+// CHECK_BUFFERED-NEXT:    [[TMP11:%.*]] = phi i64 [ [[TMP10]], [[STRLEN_WHILE_DONE]] ], [ 0, [[ENTRY]] ]
+// CHECK_BUFFERED-NEXT:    [[TMP12:%.*]] = add i64 [[TMP11]], 7
+// CHECK_BUFFERED-NEXT:    [[TMP13:%.*]] = and i64 [[TMP12]], 4294967288
+// CHECK_BUFFERED-NEXT:    [[TMP14:%.*]] = add i64 [[TMP13]], 20
+// CHECK_BUFFERED-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
+// CHECK_BUFFERED-NEXT:    [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 [[TMP15]])
+// CHECK_BUFFERED-NEXT:    [[TMP16:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
+// CHECK_BUFFERED-NEXT:    br i1 [[TMP16]], label [[ARGPUSH_BLOCK:%.*]], label [[END_BLOCK:%.*]]
+// CHECK_BUFFERED:       end.block:
+// CHECK_BUFFERED-NEXT:    [[TMP17:%.*]] = xor i1 [[TMP16]], true
+// CHECK_BUFFERED-NEXT:    [[PRINTF_RESULT:%.*]] = sext i1 [[TMP17]] to i32
+// CHECK_BUFFERED-NEXT:    call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) [[S]]) #[[ATTR1]]
+// CHECK_BUFFERED-NEXT:    ret void
+// CHECK_BUFFERED:       argpush.block:
+// CHECK_BUFFERED-NEXT:    [[TMP18:%.*]] = shl i32 [[TMP15]], 2
+// CHECK_BUFFERED-NEXT:    [[TMP19:%.*]] = or i32 [[TMP18]], 2
+// CHECK_BUFFERED-NEXT:    store i32 [[TMP19]], ptr addrspace(1) [[PRINTF_ALLOC_FN]], align 4
+// CHECK_BUFFERED-NEXT:   ...
[truncated]

vikramRH · 2023-11-28T11:05:10Z

ping

arsenm · 2023-11-28T11:17:03Z

ping

The split up parts are still part of this one PR. Currently you're supposed to create a separate PR for each separate change. The set behavior is to squash all of these together on submit

vikramRH · 2023-11-28T15:31:42Z

@arsenm , apologies for the trouble here. I should have based this out of my earlier commit. currently I do not see a way to base this patch off of my earlier commit and it might get too confusing for other reviewers if I close this and raise another review. would the individual commit details suffice ?

ssahasra · 2023-11-29T10:04:35Z

ping

Some comments still need to be addressed.

ssahasra · 2023-11-29T09:40:30Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

    return Builder.CreateBitCast(Arg, Int64Ty);
  }

-  if (isa<PointerType>(Ty)) {
+  if (!IsBuffered && isa<PointerType>(Ty)) {
    return Builder.CreatePtrToInt(Arg, Int64Ty);


Information like this should be written in comments in the source code itself.

ssahasra · 2023-11-29T09:47:31Z

clang/test/CodeGenOpenCL/amdgpu-printf.cl

@@ -1,12 +1,68 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=buffered -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_BUFFERED %s
+// RUN: %clang_cc1 -cl-std=CL1.2 -triple amdgcn-amd-amdhsa -mprintf-kind=hostcall -disable-llvm-passes -emit-llvm -o - %s | FileCheck --check-prefix=CHECK_HOSTCALL %s

 int printf(__constant const char* st, ...) __attribute__((format(printf, 1, 2)));


I only see tests with the "v" correctly used as a vector specifier. What about tests like "%dv", and other cases where either the "v" is wrong, or it's just part of the text being printed? Given that the first attempt at detecting "v" had errors in it, I think it will be good to cover all corner cases where a "v" is actually a vector specifier and and where it is not.

ssahasra · 2023-11-29T09:49:47Z

clang/include/clang/Basic/BuiltinsAMDGPU.def

@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts")

+// OpenCL
+LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)


Although we talked about this offline, the explanation needs to be added here. In fact, the motivation for having this builtin should be added as a comment to the source itself for future reference.

ssahasra · 2023-11-29T10:02:53Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

@@ -198,6 +229,10 @@ static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
    if (SpecEnd == StringRef::npos)
      return;
    auto Spec = Str.slice(SpecPos, SpecEnd + 1);
+
+    if ((Spec.find_first_of("v")) != StringRef::npos)


Just ".find()" should be sufficient?

arsenm

Is there a separate PR open for "Add vector processing support to AMDGPU printf"? I think it's easiest to move this part forward first

vikramRH · 2023-12-04T06:42:32Z

Is there a separate PR open for "Add vector processing support to AMDGPU printf"? I think it's easiest to move this part forward first

@arsenm , you are right. I just want to make sure we are good on runtime changes too now since there seems to be a blocker. The changes here are not necessary unless we are okay with runtime changes.

ssahasra · 2024-01-22T07:45:11Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

-// Scan the format string to locate all specifiers, and mark the ones that
-// specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
+// Scan the format string to locate all specifiers and OCL vectors,


"all specifiers" is enough ... there is no need to say "OCL vectors". The rest of the sentence is the one which correctly says "string or vector".

ssahasra · 2024-01-22T07:50:26Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

+  // If the format specifies a string but the argument is not, the frontend
+  // will have printed a warning. We just rely on undefined behaviour and send
+  // the argument anyway.


ssahasra · 2024-01-22T07:55:27Z

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

    return Builder.CreatePtrToInt(Arg, Int64Ty);
  }

-  llvm_unreachable("unexpected type");


This llvm_unreachable is preferred. It's clear documentation that all supported types have been handled by this point. Each if-block for integer, floating and pointer types should have its own default "return Arg".

ssahasra · 2024-01-22T07:57:12Z

clang/lib/CodeGen/CGGPUBuiltin.cpp

+    if (FmtStr.empty())
+      FmtStr = StringRef("", 1);
+  } else {
+    if (CGM.getLangOpts().OpenCL) {


This looks like the wrong place for a diagnostic. For an OpenCL program, shouldn't Sema have already verified that the arguments match the required types, such as "constant address space" for the format string?

The diagnostic should be replaced with an assert() or an llvm_unreachable(). The OpenCL spec says that the format string should be resolvable at compile time, but this is not the right place to check that. By now, the frontend or sema should have rejected the program as ill-formed.

ssahasra · 2024-01-22T07:58:46Z

clang/lib/Driver/ToolChains/Clang.cpp

+          "-mprintf-kind=" +
+          Args.getLastArgValue(options::OPT_mprintf_kind_EQ)));
+      // Force compiler error on invalid conversion specifiers
+      CmdArgs.push_back(Args.MakeArgString("-Werror=format-invalid-specifier"));


Bump! Remove this command-line flag.

…#77615)" This reverts commit 2759cfa.

vikramRH · 2024-02-21T05:35:45Z

The new set of changes adds following changes,

The iteration over vector elements now happens using vector size from the format specifier as reference, this is inline with runtime implementation and helps handling undefined behavior when we have a mismatch.
The error flag "-Werror=format-invalid-specifier" has been removed.

ssahasra · 2024-02-21T06:14:54Z

clang/lib/CodeGen/CGGPUBuiltin.cpp

@@ -178,17 +181,29 @@ RValue CodeGenFunction::EmitNVPTXDevicePrintfCallExpr(const CallExpr *E) {
      E, this, GetVprintfDeclaration(CGM.getModule()), false);
 }

+// Deterimines if an argument is a string
+static bool isString(const clang::Type *argXTy) {


This could be called "MayBeString()" at best. It's not necessary that a char* argument type is a C-style string.

I have removed this, addrspace cast is done during arg processing.

clang/lib/CodeGen/CGGPUBuiltin.cpp

ssahasra · 2024-02-21T06:17:46Z

clang/lib/CodeGen/CGGPUBuiltin.cpp

+    if (FmtStr.empty())
+      FmtStr = StringRef("", 1);
+  } else {
+    if (CGM.getLangOpts().OpenCL) {


The diagnostic should be replaced with an assert() or an llvm_unreachable(). The OpenCL spec says that the format string should be resolvable at compile time, but this is not the right place to check that. By now, the frontend or sema should have rejected the program as ill-formed.

arsenm · 2024-02-29T12:03:57Z

clang/include/clang/Basic/BuiltinsAMDGPU.def

@@ -406,5 +410,9 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32, "iffiIb", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_bf8_f32, "ifiiIi", "nc", "fp8-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32, "ifiiIi", "nc", "fp8-insts")

+// OpenCL
+LANGBUILTIN(printf, "icC*4.", "fp:0:", ALL_OCL_LANGUAGES)


This does not belong here. This has nothing to do with AMDGPU

arsenm · 2024-02-29T12:05:14Z

clang/lib/AST/Decl.cpp

+  // AMDGCN implementation supports printf as a builtin
+  // for OpenCL
+  if (Context.getTargetInfo().getTriple().isAMDGCN() &&
+      Context.getLangOpts().OpenCL && BuiltinID == AMDGPU::BIprintf)
+    return BuiltinID;


This does not belong here and has nothing to do with AMDGPU

The signatures of C-printf and OCL printf differ and I dont think generic builtin handling provides a way to register overloaded builtins with "shared" builtin ID's. This needs a new ID. do you have any alternate suggestions here ?

I thought this had been clarified earlier too. It's quite imprecise to just say that "signatures differ". Perhaps the following detailed explanation might move the conversatino forward. The problem is that the OpenCL printf expects a format string in the constant address space, which has no representation in Clang builtin. What we do have is the ability to specify an address-space number in the builtin declaration. But this number is target-specific, which makes the whole builtin target-specific. Is there a way around that magic number 4?

Only other alternative I see currently is to modify Sema (probably ActOnFunctionDeclarator) so that we map the ocl printf declaration to C printf builtin ID. This would be a really hacky solution and I would prefer this implementation.

ping @arsenm

The builtin specifications are also in terms of the lang address space, not the target address space (this was an ugly compromise to make builtins work at all in OpenCL)

@arsenm, thanks for the info. CustomTypeChecking is a valid option. I'm not sure why OpenCL community did not consider this change despite OpenCL specs specifying the details. I could create a separate patch for this (probably folks from OCL community would provide further background). Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place ? (The frontend changes here can be reverted with that follow up patch )

PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different. If this understanding is correct, moving ahead with using same builtin ID as std printf is not the right way. (probably this is why such an implementation was never considered)

. Meanwhile, can this go ahead as an AMDGPU specific workaround for now so that we have the intended feature in place

No. That cleanup will never happen.

PS :Ah, I see another issue . OpenCL v1.2 s6.9.f states none of the functions defined in C99 headers are available. This would mean std printf is supposed to be treated differently than OpenCL builtins and consequently the builtin IDs assigned to them "need" to be different

That's not what that means

I was referring to

llvm-project/clang/lib/AST/Decl.cpp

Line 3633 in 3e6db60

return 0;

. This essentially means that even if frontend attaches the printf builtin ID to the decl (even after custom type checks), this would revert.

I think that check is buggy. The printf declaration doesn't come from a header, but the printf function does exist in 1.2+. It probably needs to special case printf

arsenm · 2024-02-29T12:05:24Z

clang/lib/CodeGen/CGBuiltin.cpp

+  // Mutate the printf builtin ID so that we use the same CodeGen path for
+  // HIP and OpenCL with AMDGPU targets.
+  if (getTarget().getTriple().isAMDGCN() && BuiltinID == AMDGPU::BIprintf)
+    BuiltinID = Builtin::BIprintf;


You should not need to remap builtins

This can be removed if you feel so, probably we would need a new case in Expr CodeGen

I think all you need is to add the CustomTypeChecking attribute to the printf definition, and then add language specific type checking on the string argument

arsenm · 2024-02-29T12:06:43Z

clang/lib/CodeGen/CGGPUBuiltin.cpp

+  StringRef FmtStr;
+  if (llvm::getConstantStringInfo(Args[0], FmtStr)) {
+    if (FmtStr.empty())
+      FmtStr = StringRef("", 1);


This is producing an invalid StringRef?

not really. This is just to say the format string is not really empty (i.e size = 0) when the user input is an empty format string (a weird corner case)

clang/lib/CodeGen/CGGPUBuiltin.cpp

vikramRH requested review from arsenm, yxsamliu, b-sumner, jhuber6 and ssahasra November 16, 2023 18:55

vikramRH mentioned this pull request Nov 16, 2023

[WIP][AMDGPU] Enable hostcall printf for OpenCL #70932

Closed

arsenm reviewed Nov 17, 2023

View reviewed changes

vikramRH mentioned this pull request Nov 17, 2023

[AMDGPU] Treat printf as builtin for OpenCL #72554

Closed

vikramRH force-pushed the OpenCL_hostcall_printf branch from f932959 to b443e11 Compare November 20, 2023 05:30

ssahasra reviewed Nov 20, 2023

View reviewed changes

vikramRH marked this pull request as ready for review November 28, 2023 06:11

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen llvm:transforms labels Nov 28, 2023

vikramRH changed the title ~~Enable OpenCL hostcall printf (WIP)~~ [AMDGPU] Enable OpenCL hostcall printf (WIP) Nov 28, 2023

ssahasra reviewed Nov 29, 2023

View reviewed changes

arsenm reviewed Dec 1, 2023

View reviewed changes

vikramRH force-pushed the OpenCL_hostcall_printf branch from cb564c2 to aaff6dd Compare December 4, 2023 06:37

ssahasra requested changes Jan 22, 2024

View reviewed changes

vikramRH added 7 commits February 19, 2024 03:05

Revert "[AMDGPU] Remove unnecessary add instructions in ctlz.i8 (llvm…

9b0b337

…#77615)" This reverts commit 2759cfa.

[AMDGPU] Treat printf as builtin for OpenCL

62b5f5a

[AMDGPU] Enable OpenCL printf expansion at clang CodeGen

1b9e2ba

[AMDGPU] Add vector processing support to AMDGPU printf

f937af4

Review commnents handled, code refactoring

e05ffbd

Handled further review comments

47b927a

Improvements to vector handling and review comments

de80946

vikramRH force-pushed the OpenCL_hostcall_printf branch from aaff6dd to de80946 Compare February 19, 2024 08:08

vikramRH added 2 commits February 19, 2024 03:17

fix a redundant wrong revert

f428579

Fix lit test crash, review comments

c69c80a

ssahasra reviewed Feb 21, 2024

View reviewed changes

vikramRH added 6 commits February 21, 2024 08:47

Run clang-format

0f8f4f9

review comments and clang-format

99b1559

fix windows CI failures

04b29e0

address missed failures

cb9dedf

code clean-up/clang format

cdacec7

fix comment

5b73f02

ssahasra approved these changes Feb 26, 2024

View reviewed changes

arsenm requested changes Feb 29, 2024

View reviewed changes

arsenm reviewed Feb 29, 2024

View reviewed changes

clang/lib/CodeGen/CGGPUBuiltin.cpp Outdated Show resolved Hide resolved

remove non literal format strig assert

37e0fe4

vikramRH mentioned this pull request Mar 27, 2024

[RFC][Clang] Enable custom type checking for printf #86801

Open

		auto VecArg = dyn_cast<FixedVectorType>(Args[i]->getType());
		assert(VecArg && "invalid vector specifier");

[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556

Are you sure you want to change the base?

[AMDGPU] Enable OpenCL hostcall printf (WIP) #72556

Conversation

vikramRH commented Nov 16, 2023 • edited

github-actions bot commented Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Nov 28, 2023 • edited

vikramRH commented Nov 28, 2023

arsenm commented Nov 28, 2023

vikramRH commented Nov 28, 2023

ssahasra commented Nov 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

vikramRH commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramRH commented Feb 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramRH Mar 1, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramRH Mar 18, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramRH commented Nov 16, 2023 •

edited

github-actions bot commented Nov 16, 2023 •

edited

llvmbot commented Nov 28, 2023 •

edited

vikramRH Mar 1, 2024 •

edited

vikramRH Mar 18, 2024 •

edited