Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Clang] Enable custom type checking for printf #86801

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vikramRH
Copy link
Contributor

The motivation for this change comes from an ongoing PR (#72556 ) , which enables hostcall based printf lowering for AMDGPU target and OpenCL inputs. The OpenCL printf has a different signature than the C printf. the difference being the explicit address space specifier for format string arg as follows
int printf(__constant const char* st, ...) attribute((format(printf, 1, 2)));
This is not considered a builtin because of the type mismatch.

The patch #72556 tried to address this scenario by declaring OCL printf essentially as a target specific printf overload. However, the discussions there resulted in the decision that this should not be target-specific.

The idea in this patch is that the changes are NFC for current framework (i.e the semantic checks for printf are preserved) however the printf declarations are considered builtins now regardless of LangOpt. This would allow me to hanlde the printf CodeGen without any target specific hacks.

PS: feel free to add additional reviewers, I'm not aware of others who could comment here.

@vikramRH vikramRH marked this pull request as ready for review March 28, 2024 01:53
@vikramRH vikramRH requested a review from Endilll as a code owner March 28, 2024 01:53
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Mar 28, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 28, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Vikram Hegde (vikramRH)

Changes

The motivation for this change comes from an ongoing PR (#72556 ) , which enables hostcall based printf lowering for AMDGPU target and OpenCL inputs. The OpenCL printf has a different signature than the C printf. the difference being the explicit address space specifier for format string arg as follows
int printf(__constant const char* st, ...) attribute((format(printf, 1, 2)));
This is not considered a builtin because of the type mismatch.

The patch #72556 tried to address this scenario by declaring OCL printf essentially as a target specific printf overload. However, the discussions there resulted in the decision that this should not be target-specific.

The idea in this patch is that the changes are NFC for current framework (i.e the semantic checks for printf are preserved) however the printf declarations are considered builtins now regardless of LangOpt. This would allow me to hanlde the printf CodeGen without any target specific hacks.

PS: feel free to add additional reviewers, I'm not aware of others who could comment here.


Full diff: https://github.com/llvm/llvm-project/pull/86801.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+2-2)
  • (modified) clang/include/clang/Sema/Sema.h (+1)
  • (modified) clang/lib/AST/Decl.cpp (+5-1)
  • (modified) clang/lib/Basic/Builtins.cpp (+2-1)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+51)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 52c0dd52c28b11..f795c7c42c7b25 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -2816,14 +2816,14 @@ def StrLen : LibBuiltin<"string.h"> {
 // FIXME: This list is incomplete.
 def Printf : LibBuiltin<"stdio.h"> {
   let Spellings = ["printf"];
-  let Attributes = [PrintfFormat<0>];
+  let Attributes = [PrintfFormat<0>, CustomTypeChecking];
   let Prototype = "int(char const*, ...)";
 }
 
 // FIXME: The builtin and library function should have the same signature.
 def BuiltinPrintf : Builtin {
   let Spellings = ["__builtin_printf"];
-  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix];
+  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix, CustomTypeChecking];
   let Prototype = "int(char const* restrict, ...)";
 }
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 5ecd2f9eb2881f..b18b208a75bdf4 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -2245,6 +2245,7 @@ class Sema final {
 
   bool SemaBuiltinVAStart(unsigned BuiltinID, CallExpr *TheCall);
   bool SemaBuiltinVAStartARMMicrosoft(CallExpr *Call);
+  bool SemaBuiltinPrintf(FunctionDecl *FDecl, CallExpr *TheCall);
   bool SemaBuiltinUnorderedCompare(CallExpr *TheCall, unsigned BuiltinID);
   bool SemaBuiltinFPClassification(CallExpr *TheCall, unsigned NumArgs,
                                    unsigned BuiltinID);
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 131f82985e903b..298223f874cda3 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -3629,8 +3629,12 @@ unsigned FunctionDecl::getBuiltinID(bool ConsiderWrapperFunctions) const {
   // OpenCL v1.2 s6.9.f - The library functions defined in
   // the C99 standard headers are not available.
   if (Context.getLangOpts().OpenCL &&
-      Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID))
+      Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID)) {
+    if (Context.getLangOpts().getOpenCLCompatibleVersion() >= 120 &&
+        (BuiltinID == Builtin::BIprintf))
+      return BuiltinID;
     return 0;
+  }
 
   // CUDA does not have device-side standard library. printf and malloc are the
   // only special cases that are supported by device-side runtime.
diff --git a/clang/lib/Basic/Builtins.cpp b/clang/lib/Basic/Builtins.cpp
index 3467847ac1672e..25590ed9299e8b 100644
--- a/clang/lib/Basic/Builtins.cpp
+++ b/clang/lib/Basic/Builtins.cpp
@@ -235,7 +235,8 @@ bool Builtin::Context::performsCallback(unsigned ID,
 
 bool Builtin::Context::canBeRedeclared(unsigned ID) const {
   return ID == Builtin::NotBuiltin || ID == Builtin::BI__va_start ||
-         ID == Builtin::BI__builtin_assume_aligned ||
+         ID == Builtin::BI__builtin_assume_aligned || ID == Builtin::BIprintf ||
+         ID == Builtin::BI__builtin_printf ||
          (!hasReferenceArgsOrResult(ID) && !hasCustomTypechecking(ID)) ||
          isInStdNamespace(ID);
 }
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 3cfdb261a0eac0..baed36acb12437 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5825,7 +5825,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
         getTarget().getTriple().isAMDGCN()) {
       if (getLangOpts().OpenMPIsTargetDevice)
         return EmitOpenMPDevicePrintfCallExpr(E);
-      if (getTarget().getTriple().isNVPTX())
+      if (getTarget().getTriple().isNVPTX() && !getLangOpts().OpenCL)
         return EmitNVPTXDevicePrintfCallExpr(E);
       if (getTarget().getTriple().isAMDGCN() && getLangOpts().HIP)
         return EmitAMDGPUDevicePrintfCallExpr(E);
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 08449581330934..bac871a5f9c9be 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -2515,6 +2515,11 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID,
     }
     break;
   }
+  case Builtin::BIprintf:
+  case Builtin::BI__builtin_printf:
+    if (SemaBuiltinPrintf(FDecl, TheCall))
+      return ExprError();
+    break;
 
   // The acquire, release, and no fence variants are ARM and AArch64 only.
   case Builtin::BI_interlockedbittestandset_acq:
@@ -9544,6 +9549,52 @@ bool Sema::SemaBuiltinVAStart(unsigned BuiltinID, CallExpr *TheCall) {
   return false;
 }
 
+bool Sema::SemaBuiltinPrintf(FunctionDecl *FDecl, CallExpr *TheCall) {
+  auto Proto = FDecl->getType()->getAs<FunctionProtoType>();
+  auto Args = llvm::ArrayRef(TheCall->getArgs(), TheCall->getNumArgs());
+
+  unsigned NumParams = Proto->getNumParams();
+  unsigned MinArgs = FDecl->getMinRequiredArguments();
+  if (Args.size() < NumParams) {
+    if (Args.size() < MinArgs) {
+      Diag(TheCall->getEndLoc(), diag::err_typecheck_call_too_few_args_at_least)
+          << /*function*/ 0 << MinArgs << static_cast<unsigned>(Args.size())
+          << /*HasExplicitObjectParameter*/ false << TheCall->getSourceRange();
+
+      // Emit the location of the prototype if valid.
+      if (!FDecl->isImplicit())
+        Diag(FDecl->getLocation(), diag::note_callee_decl)
+            << FDecl << FDecl->getParametersSourceRange();
+
+      return true;
+    }
+    // We reserve space for the default arguments when we create
+    // the call expression.
+    assert((TheCall->getNumArgs() == NumParams) &&
+           "We should have reserved space for the default arguments before!");
+  }
+
+  SmallVector<Expr *, 8> AllArgs;
+  VariadicCallType CallType = VariadicCallType::VariadicFunction;
+
+  // Convert arguments to paramters types of printf decl/builtin.
+  bool Invalid = GatherArgumentsForCall(TheCall->getBeginLoc(), FDecl, Proto, 0,
+                                        Args, AllArgs, CallType);
+
+  if (Invalid)
+    return true;
+  unsigned TotalNumArgs = AllArgs.size();
+  for (unsigned i = 0; i < TotalNumArgs; ++i)
+    TheCall->setArg(i, AllArgs[i]);
+
+  TheCall->setType(Proto->getReturnType());
+
+  if (CheckFunctionCall(FDecl, TheCall, Proto))
+    return true;
+
+  return false;
+}
+
 bool Sema::SemaBuiltinVAStartARMMicrosoft(CallExpr *Call) {
   auto IsSuitablyTypedFormatArgument = [this](const Expr *Arg) -> bool {
     const LangOptions &LO = getLangOpts();

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sema.h changes look good to me.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on reimplementing the checking logic manually; I worry this is going to be a slippery slope where we start with printf today, but then need to do the same thing for the rest of the family at some point in the future.

Would it make sense for us to define a LibBuiltin specific to OpenCL languages to expose a different signature for OpenCL? And change BuiltinPrintf from Builtin to LangBuiltin to do similar for the builtin form?

@vikramRH
Copy link
Contributor Author

vikramRH commented Mar 29, 2024

Thanks for the comments @AaronBallman. The core issue here is that the current builtin handling design does not allow multiple overloads for the same identifier to coexist (ref.

#define BUILTIN(ID, TYPE, ATTRS) BI##ID,
), unless the builtins are defined in target specific namespaces which is what I tried in my original patch . If we want change this approach, I currently think of couple of ways at a top level

  1. As you said, we could have OCL specific LibBuiltin and LangBuiltin TableGen classes (and corresponding macros in Buitlins.inc). To make this work they would need new builtin ID's of different form (say "BI_OCL##identifier"). This is very Language specific.
  2. Probably change the current Builtin Info structure to allow vector of possible signatures for an identifier. The builtin type decoder could choose the appropriate signature based on LangOpt. (This wording is vague and could be a separate discussion in itself )

either way, changes in current design are required. printf is the only current use case I know that can benefit out of this (since OpenCL v1.2 s6.9.f says other library functions defined in C standard header are not available ,so 🤷‍♂️ ). But I guess we could have more use cases in future. can this be a separate discussion ? This patch would unblock my current work for now.

@AaronBallman
Copy link
Collaborator

Thanks for the comments @AaronBallman. The core issue here is that the current builtin handling design does not allow multiple overloads for the same identifier to coexist (ref.

#define BUILTIN(ID, TYPE, ATTRS) BI##ID,

), unless the builtins are defined in target specific namespaces which is what I tried in my original patch . If we want change this approach, I currently think of couple of ways at a top level

1. As you said, we could have OCL specific LibBuiltin and LangBuiltin TableGen classes (and corresponding macros in Buitlins.inc). To make this work they would need new builtin ID's  of different form (say "BI_OCL##identifier"). This is very Language specific.

2. Probably change the current Builtin Info structure to allow vector of possible signatures for an identifier. The builtin type decoder could choose the appropriate signature based on LangOpt. (This wording is vague and could be a separate discussion in itself )

either way, changes in current design are required. printf is the only current use case I know that can benefit out of this (since OpenCL v1.2 s6.9.f says other library functions defined in C standard header are not available ,so 🤷‍♂️ ). But I guess we could have more use cases in future. can this be a separate discussion ? This patch would unblock my current work for now.

I looked at the OpenCL spec for C standard library support and was surprised that 1) it's only talking about C99 so it's unclear what happens for C11 (clause 6 says "This document describes the modifications and restrictions to C99 and C11 in OpenCL C" but 6.11 only talks about C99 headers and leaves iso646.h, math.h, stdbool.h, stddef.h, (all in C99) as well as stdalign.h, stdatomic.h, stdnoreturn.h, threads.h, and uchar.h available?), and 2) OpenCL's printf is not really the same function as C's printf (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#differences-between-opencl-c-and-c99-printf).

#1 is probably more of an oversight than anything, at least with the C11 headers. So maybe this isn't a super slippery slope, but maybe C23 will change that (I can imagine stdbit.h being of use in OpenCL for bit-bashing operations). However, the fact that the builtin isn't really printf but is printf-like makes me think we should make it a separate builtin to avoid surprises (we do diagnostics based on builtin IDs and we have special checking logic that we perhaps should be exempting in some cases).

@vikramRH
Copy link
Contributor Author

I looked at the OpenCL spec for C standard library support and was surprised that 1) it's only talking about C99 so it's unclear what happens for C11 (clause 6 says "This document describes the modifications and restrictions to C99 and C11 in OpenCL C" but 6.11 only talks about C99 headers and leaves iso646.h, math.h, stdbool.h, stddef.h, (all in C99) as well as stdalign.h, stdatomic.h, stdnoreturn.h, threads.h, and uchar.h available?), and 2) OpenCL's printf is not really the same function as C's printf (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#differences-between-opencl-c-and-c99-printf).

#1 is probably more of an oversight than anything, at least with the C11 headers. So maybe this isn't a super slippery slope, but maybe C23 will change that (I can imagine stdbit.h being of use in OpenCL for bit-bashing operations). However, the fact that the builtin isn't really printf but is printf-like makes me think we should make it a separate builtin to avoid surprises (we do diagnostics based on builtin IDs and we have special checking logic that we perhaps should be exempting in some cases).

Understood. Then I propose the following.

  1. Currently Builtin TableGen does not seem to support specifying lang address spaces in function prototypes. this needs to be implemented first if not already in development.
  2. We could have two new macro variants probably named "OCL_BUILTIN" and "OCL_LIB_BUILTIN" which will take the ID's of the form "BI_OCL##<builtin_name>". we would also need corresponding TableGen classes (probably named similar to the macros) which can expose such overloaded prototypes when required.

How does this sound ?

@AaronBallman
Copy link
Collaborator

I looked at the OpenCL spec for C standard library support and was surprised that 1) it's only talking about C99 so it's unclear what happens for C11 (clause 6 says "This document describes the modifications and restrictions to C99 and C11 in OpenCL C" but 6.11 only talks about C99 headers and leaves iso646.h, math.h, stdbool.h, stddef.h, (all in C99) as well as stdalign.h, stdatomic.h, stdnoreturn.h, threads.h, and uchar.h available?), and 2) OpenCL's printf is not really the same function as C's printf (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#differences-between-opencl-c-and-c99-printf).
#1 is probably more of an oversight than anything, at least with the C11 headers. So maybe this isn't a super slippery slope, but maybe C23 will change that (I can imagine stdbit.h being of use in OpenCL for bit-bashing operations). However, the fact that the builtin isn't really printf but is printf-like makes me think we should make it a separate builtin to avoid surprises (we do diagnostics based on builtin IDs and we have special checking logic that we perhaps should be exempting in some cases).

Understood. Then I propose the following.

1. Currently Builtin TableGen does not seem to support specifying lang address spaces in function prototypes. this needs to be implemented first if not already in development.

2. We could have two new macro variants probably named "OCL_BUILTIN" and "OCL_LIB_BUILTIN" which will take the ID's of the form "BI_OCL##<builtin_name>". we would also need corresponding TableGen classes (probably named similar to the macros) which can expose such overloaded prototypes when required.

How does this sound ?

I think that sounds like a reasonable way forward, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants