Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [Sema, Lex, Parse] Preprocessor embed in C and C++ (and Obj-C and Obj-C++ by-proxy) #68620

Open
wants to merge 126 commits into
base: main
Choose a base branch
from

Conversation

ThePhD
Copy link
Contributor

@ThePhD ThePhD commented Oct 9, 2023

This pull request implements the entirety of the now-accepted N3017 - Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will simply expand the list out to a sequence of numbers.

There are likely places where the __builtin_pp_embed intrinsic fails to compile properly and triggers ICEs. However, we have covered what we feel are the vast majority of cases where users would want or need the speedup.

Images are a dry run of a recently built clang and LLVM on Release mode (not RelWithDebInfo) on Windows, under the following properties:

OS Name: Microsoft Windows 10 Pro
Version: 10.0.19045 Build 19045
System Type: x64-based PC
Processor: AMD Ryzen 9 5950X 16-Core Processor, 3401 Mhz, 16 Core(s), 32 Logical Processor(s)
Installed Physical Memory (RAM): 32.0 GB
Total Physical Memory: 31.9 GB
Total Virtual Memory: 36.7 GB

All of the added tests pass under the above machine properties.

I have no intention of following up on this PR. I am too tired to carry it to fruition. Pick this apart, take from it what you want, or reimplement it entirely. (I will be unsubscribing from this immediately after posting.)

Good luck ✌!

1
2
3
4

@llvmbot llvmbot added cmake Build system in general and CMake in particular clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang-format clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:static analyzer llvm:support labels Oct 9, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 9, 2023

@llvm/pr-subscribers-clang-static-analyzer-1
@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-clang-format
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-support

Changes

This pull request implements the entirety of the now-accepted N3017 - Preprocessor Embed and its sister C++ paper p1967. It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will simply expand the list out to a sequence of numbers.

There are likely places where the __builtin_pp_embed intrinsic fails to compile properly and triggers ICEs. However, we have covered what we feel are the vast majority of cases where users would want or need the speedup.

Images are a dry run of a recently built clang and LLVM on Release mode (not RelWithDebInfo) on Windows, under the following properties:

OS Name: Microsoft Windows 10 Pro
Version: 10.0.19045 Build 19045
System Type: x64-based PC
Processor: AMD Ryzen 9 5950X 16-Core Processor, 3401 Mhz, 16 Core(s), 32 Logical Processor(s)
Installed Physical Memory (RAM): 32.0 GB
Total Physical Memory: 31.9 GB
Total Virtual Memory: 36.7 GB

All of the added tests pass under the above machine properties.

I have no intention of following up on this PR. I am too tired to carry it to fruition. Pick this apart, take from it what you want, or reimplement it entirely. (I will be unsubscribing from this immediately after posting.)

Good luck ✌!

1
2
3
4


Patch is 1.33 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/68620.diff

82 Files Affected:

  • (modified) clang/CMakeLists.txt (+2-1)
  • (modified) clang/include/clang/AST/Expr.h (+52)
  • (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+1)
  • (modified) clang/include/clang/Basic/Builtins.def (+3)
  • (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+6)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+7)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+23-1)
  • (modified) clang/include/clang/Basic/FileManager.h (+5-3)
  • (modified) clang/include/clang/Basic/StmtNodes.td (+1)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+5)
  • (modified) clang/include/clang/Driver/Options.td (+16)
  • (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+2)
  • (modified) clang/include/clang/Lex/PPCallbacks.h (+118-82)
  • (added) clang/include/clang/Lex/PPDirectiveParameter.h (+32)
  • (added) clang/include/clang/Lex/PPEmbedParameters.h (+78)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+191-128)
  • (modified) clang/include/clang/Lex/PreprocessorOptions.h (+7)
  • (modified) clang/include/clang/Lex/Token.h (+1-1)
  • (modified) clang/include/clang/Sema/Sema.h (+1124-1441)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/lib/AST/Expr.cpp (+17)
  • (modified) clang/lib/AST/ExprClassification.cpp (+4)
  • (modified) clang/lib/AST/ExprConstant.cpp (+8)
  • (modified) clang/lib/AST/ItaniumMangle.cpp (+1)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+7)
  • (modified) clang/lib/AST/StmtProfile.cpp (+4)
  • (modified) clang/lib/Basic/FileManager.cpp (+41-35)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+2-1)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+4-1)
  • (modified) clang/lib/Format/FormatToken.h (+2)
  • (modified) clang/lib/Format/TokenAnnotator.cpp (+28)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+21)
  • (modified) clang/lib/Frontend/DependencyFile.cpp (+29)
  • (modified) clang/lib/Frontend/DependencyGraph.cpp (+51-15)
  • (modified) clang/lib/Frontend/InitPreprocessor.cpp (+7)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+50-32)
  • (modified) clang/lib/Frontend/Rewrite/InclusionRewriter.cpp (+98-85)
  • (modified) clang/lib/Interpreter/Interpreter.cpp (+1)
  • (modified) clang/lib/Lex/Lexer.cpp (+8)
  • (modified) clang/lib/Lex/PPCallbacks.cpp (-11)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+942-174)
  • (modified) clang/lib/Lex/PPExpressions.cpp (+137-88)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+124)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+3-2)
  • (modified) clang/lib/Parse/ParseDeclCXX.cpp (-1)
  • (modified) clang/lib/Parse/ParseExpr.cpp (+361-308)
  • (modified) clang/lib/Parse/ParseInit.cpp (+3-3)
  • (modified) clang/lib/Parse/ParseTemplate.cpp (+2)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+1389-1383)
  • (modified) clang/lib/Sema/SemaDeclCXX.cpp (+2-1)
  • (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+1799-1483)
  • (modified) clang/lib/Sema/SemaTemplate.cpp (+947-1002)
  • (modified) clang/lib/Sema/SemaTemplateVariadic.cpp (+2-2)
  • (modified) clang/lib/Sema/TreeTransform.h (+7)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+13)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4)
  • (added) clang/test/Preprocessor/Inputs/jk.txt (+1)
  • (added) clang/test/Preprocessor/Inputs/media/art.txt (+9)
  • (added) clang/test/Preprocessor/Inputs/media/empty ()
  • (added) clang/test/Preprocessor/Inputs/single_byte.txt (+1)
  • (added) clang/test/Preprocessor/embed___has_embed.c (+34)
  • (added) clang/test/Preprocessor/embed___has_embed_supported.c (+24)
  • (added) clang/test/Preprocessor/embed_art.c (+106)
  • (added) clang/test/Preprocessor/embed_feature_test.cpp (+13)
  • (added) clang/test/Preprocessor/embed_file_not_found.c (+4)
  • (added) clang/test/Preprocessor/embed_init.c (+28)
  • (added) clang/test/Preprocessor/embed_parameter_if_empty.c (+16)
  • (added) clang/test/Preprocessor/embed_parameter_limit.c (+15)
  • (added) clang/test/Preprocessor/embed_parameter_offset.c (+15)
  • (added) clang/test/Preprocessor/embed_parameter_prefix.c (+15)
  • (added) clang/test/Preprocessor/embed_parameter_suffix.c (+15)
  • (added) clang/test/Preprocessor/embed_parameter_unrecognized.c (+8)
  • (added) clang/test/Preprocessor/embed_path_chevron.c (+8)
  • (added) clang/test/Preprocessor/embed_path_quote.c (+8)
  • (added) clang/test/Preprocessor/embed_single_entity.c (+7)
  • (added) clang/test/Preprocessor/embed_weird.cpp (+68)
  • (added) clang/test/Preprocessor/single_byte.txt (+1)
  • (modified) llvm/CMakeLists.txt (+8)
  • (modified) llvm/cmake/modules/GetHostTriple.cmake (+3-3)
  • (modified) llvm/include/llvm/Support/Base64.h (+21-15)
diff --git a/clang/CMakeLists.txt b/clang/CMakeLists.txt
index 9b52c58be41e7f7..1b88905da3b8597 100644
--- a/clang/CMakeLists.txt
+++ b/clang/CMakeLists.txt
@@ -300,6 +300,7 @@ configure_file(
   ${CMAKE_CURRENT_BINARY_DIR}/include/clang/Basic/Version.inc)
 
 # Add appropriate flags for GCC
+option(CLANG_ENABLE_PEDANTIC "Compile with pedantic enabled." ON)
 if (LLVM_COMPILER_IS_GCC_COMPATIBLE)
   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-common -Woverloaded-virtual")
   if (NOT "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
@@ -307,7 +308,7 @@ if (LLVM_COMPILER_IS_GCC_COMPATIBLE)
   endif ()
 
   # Enable -pedantic for Clang even if it's not enabled for LLVM.
-  if (NOT LLVM_ENABLE_PEDANTIC)
+  if (NOT LLVM_ENABLE_PEDANTIC AND CLANG_ENABLE_PEDANTIC)
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pedantic -Wno-long-long")
   endif ()
 
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index b69c616b0090365..9303307fd6a8af5 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4805,6 +4805,58 @@ class SourceLocExpr final : public Expr {
   friend class ASTStmtReader;
 };
 
+/// Represents a function call to __builtin_pp_embed().
+class PPEmbedExpr final : public Expr {
+  SourceLocation BuiltinLoc, RParenLoc;
+  DeclContext *ParentContext;
+  StringLiteral *Filename;
+  StringLiteral *BinaryData;
+
+public:
+  enum Action {
+    NotFound,
+    FoundOne,
+    Expanded,
+  };
+ 
+  PPEmbedExpr(const ASTContext &Ctx, QualType ResultTy, StringLiteral* Filename, StringLiteral* BinaryData,
+                SourceLocation BLoc, SourceLocation RParenLoc,
+                DeclContext *Context);
+
+  /// Build an empty call expression.
+  explicit PPEmbedExpr(EmptyShell Empty)
+      : Expr(SourceLocExprClass, Empty) {}
+
+  /// If the PPEmbedExpr has been resolved return the subexpression
+  /// representing the resolved value. Otherwise return null.
+  const DeclContext *getParentContext() const { return ParentContext; }
+  DeclContext *getParentContext() { return ParentContext; }
+
+  SourceLocation getLocation() const { return BuiltinLoc; }
+  SourceLocation getBeginLoc() const { return BuiltinLoc; }
+  SourceLocation getEndLoc() const { return RParenLoc; }
+
+  StringLiteral *getFilenameStringLiteral() const { return Filename; }
+  StringLiteral *getDataStringLiteral() const { return BinaryData; }
+
+  size_t getDataElementCount(ASTContext &Context) const;
+
+  child_range children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+
+  const_child_range children() const {
+    return const_child_range(child_iterator(), child_iterator());
+  }
+
+  static bool classof(const Stmt *T) {
+    return T->getStmtClass() == PPEmbedExprClass;
+  }
+
+private:
+  friend class ASTStmtReader;
+};
+
 /// Describes an C or C++ initializer list.
 ///
 /// InitListExpr describes an initializer list, which can be used to
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 3dd23eb38eeabfc..6b7211bb0a0d3f1 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -2809,6 +2809,7 @@ DEF_TRAVERSE_STMT(ShuffleVectorExpr, {})
 DEF_TRAVERSE_STMT(ConvertVectorExpr, {})
 DEF_TRAVERSE_STMT(StmtExpr, {})
 DEF_TRAVERSE_STMT(SourceLocExpr, {})
+DEF_TRAVERSE_STMT(PPEmbedExpr, {})
 
 DEF_TRAVERSE_STMT(UnresolvedLookupExpr, {
   TRY_TO(TraverseNestedNameSpecifierLoc(S->getQualifierLoc()));
diff --git a/clang/include/clang/Basic/Builtins.def b/clang/include/clang/Basic/Builtins.def
index 6ea8484606cfd5d..0dfc6456daf059a 100644
--- a/clang/include/clang/Basic/Builtins.def
+++ b/clang/include/clang/Basic/Builtins.def
@@ -1766,6 +1766,9 @@ BUILTIN(__builtin_ms_va_copy, "vc*&c*&", "n")
 // Arithmetic Fence: to prevent FP reordering and reassociation optimizations
 LANGBUILTIN(__arithmetic_fence, "v.", "tE", ALL_LANGUAGES)
 
+// preprocessor embed builtin
+LANGBUILTIN(__builtin_pp_embed, "v.", "tE", ALL_LANGUAGES)
+
 #undef BUILTIN
 #undef LIBBUILTIN
 #undef LANGBUILTIN
diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index f2df283c74829f6..4df86e35eebde38 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -59,6 +59,9 @@ def err_expected_string_literal : Error<"expected string literal "
           "'external_source_symbol' attribute|"
           "as argument of '%1' attribute}0">;
 
+def err_builtin_pp_embed_invalid_argument : Error<
+  "invalid argument to '__builtin_pp_embed': %0">;
+
 def err_invalid_string_udl : Error<
   "string literal with user-defined suffix cannot be used here">;
 def err_invalid_character_udl : Error<
@@ -80,6 +83,9 @@ def err_expected : Error<"expected %0">;
 def err_expected_either : Error<"expected %0 or %1">;
 def err_expected_after : Error<"expected %1 after %0">;
 
+def err_builtin_pp_embed_invalid_location : Error<
+  "'__builtin_pp_embed' in invalid location: %0%select{|%2}1">;
+
 def err_param_redefinition : Error<"redefinition of parameter %0">;
 def warn_method_param_redefinition : Warning<"redefinition of method parameter %0">;
 def warn_method_param_declaration : Warning<"redeclaration of method parameter %0">,
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 0b09c002191848a..7ebea56891d4654 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -708,6 +708,12 @@ def ReservedIdAsMacro : DiagGroup<"reserved-macro-identifier">;
 def ReservedIdAsMacroAlias : DiagGroup<"reserved-id-macro", [ReservedIdAsMacro]>;
 def RestrictExpansionMacro : DiagGroup<"restrict-expansion">;
 def FinalMacro : DiagGroup<"final-macro">;
+// Warnings about unknown preprocessor parameters (e.g. `#embed` and extensions)
+def UnsupportedDirective : DiagGroup<"unsupported-directive">;
+def UnknownDirectiveParameters : DiagGroup<"unknown-directive-parameters">;
+def IgnoredDirectiveParameters : DiagGroup<"ignored-directive-parameters">;
+def DirectiveParameters : DiagGroup<"directive-parameters",
+    [UnknownDirectiveParameters, IgnoredDirectiveParameters]>;
 
 // Just silence warnings about -Wstrict-aliasing for now.
 def : DiagGroup<"strict-aliasing=0">;
@@ -715,6 +721,7 @@ def : DiagGroup<"strict-aliasing=1">;
 def : DiagGroup<"strict-aliasing=2">;
 def : DiagGroup<"strict-aliasing">;
 
+
 // Just silence warnings about -Wstrict-overflow for now.
 def : DiagGroup<"strict-overflow=0">;
 def : DiagGroup<"strict-overflow=1">;
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 940cca67368492f..4490f40806b0345 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -422,6 +422,22 @@ def warn_cxx23_compat_warning_directive : Warning<
 def warn_c23_compat_warning_directive : Warning<
   "#warning is incompatible with C standards before C23">,
   InGroup<CPre23Compat>, DefaultIgnore;
+def warn_c23_pp_embed : Warning<
+  "'__has_embed' is a C23 extension">,
+  InGroup<CPre23Compat>,
+  DefaultIgnore;
+def warn_c23_pp_has_embed : Warning<
+  "'__has_embed' is a C23 extension">,
+  InGroup<CPre23Compat>,
+  DefaultIgnore;
+def warn_cxx26_pp_embed : Warning<
+  "'__has_embed' is a C++26 extension">,
+  InGroup<CXXPre26Compat>,
+  DefaultIgnore;
+def warn_cxx26_pp_has_embed : Warning<
+  "'__has_embed' is a C++26 extension">,
+  InGroup<CXXPre26Compat>,
+  DefaultIgnore;
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
@@ -483,7 +499,13 @@ def ext_pp_gnu_line_directive : Extension<
 def err_pp_invalid_directive : Error<
   "invalid preprocessing directive%select{|, did you mean '#%1'?}0">;
 def warn_pp_invalid_directive : Warning<
-  err_pp_invalid_directive.Summary>, InGroup<DiagGroup<"unknown-directives">>;
+  err_pp_invalid_directive.Summary>,
+  InGroup<UnsupportedDirective>;
+def warn_pp_unknown_parameter_ignored : Warning<
+  "unknown%select{ | embed}0 preprocessor parameter '%1' ignored">,
+  InGroup<UnknownDirectiveParameters>;
+def err_pp_unsupported_directive : Error<
+  "unsupported%select{ | embed}0 directive: %1">;
 def err_pp_directive_required : Error<
   "%0 must be used within a preprocessing directive">;
 def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index 56cb093dd8c376f..c757f8775b425e9 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -276,11 +276,13 @@ class FileManager : public RefCountedBase<FileManager> {
   /// MemoryBuffer if successful, otherwise returning null.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
-                   bool RequiresNullTerminator = true);
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt);
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(StringRef Filename, bool isVolatile = false,
-                   bool RequiresNullTerminator = true) {
-    return getBufferForFileImpl(Filename, /*FileSize=*/-1, isVolatile,
+                   bool RequiresNullTerminator = true,
+                   std::optional<int64_t> MaybeLimit = std::nullopt) {
+    return getBufferForFileImpl(Filename, /*FileSize=*/(MaybeLimit ? *MaybeLimit : -1), isVolatile,
                                 RequiresNullTerminator);
   }
 
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index cec301dfca2817b..e3be997dd1c86e0 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -203,6 +203,7 @@ def OpaqueValueExpr : StmtNode<Expr>;
 def TypoExpr : StmtNode<Expr>;
 def RecoveryExpr : StmtNode<Expr>;
 def BuiltinBitCastExpr : StmtNode<ExplicitCastExpr>;
+def PPEmbedExpr : StmtNode<Expr>;
 
 // Microsoft Extensions.
 def MSPropertyRefExpr : StmtNode<Expr>;
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94db56a9fd5d78c..167bd614efe7bd9 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -126,6 +126,9 @@ PPKEYWORD(error)
 // C99 6.10.6 - Pragma Directive.
 PPKEYWORD(pragma)
 
+// C23 & C++26 #embed
+PPKEYWORD(embed)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -751,6 +754,7 @@ ALIAS("__char32_t"   , char32_t          , KEYCXX)
 KEYWORD(__builtin_bit_cast               , KEYALL)
 KEYWORD(__builtin_available              , KEYALL)
 KEYWORD(__builtin_sycl_unique_stable_name, KEYSYCL)
+KEYWORD(__builtin_pp_embed               , KEYALL)
 
 // Keywords defined by Attr.td.
 #ifndef KEYWORD_ATTRIBUTE
@@ -986,6 +990,7 @@ ANNOTATION(repl_input_end)
 #undef CXX11_KEYWORD
 #undef KEYWORD
 #undef PUNCTUATOR
+#undef BUILTINOK
 #undef TOK
 #undef C99_KEYWORD
 #undef C23_KEYWORD
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5415b18d3f406df..bfc4b15d5411cde 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -114,6 +114,11 @@ def IncludePath_Group : OptionGroup<"<I/i group>">, Group<Preprocessor_Group>,
                         DocBrief<[{
 Flags controlling how ``#include``\s are resolved to files.}]>;
 
+def EmbedPath_Group : OptionGroup<"<Embed group>">, Group<Preprocessor_Group>,
+                        DocName<"Embed path management">,
+                        DocBrief<[{
+Flags controlling how ``#embed``\s and similar are resolved to files.}]>;
+
 def I_Group : OptionGroup<"<I group>">, Group<IncludePath_Group>, DocFlatten;
 def i_Group : OptionGroup<"<i group>">, Group<IncludePath_Group>, DocFlatten;
 def clang_i_Group : OptionGroup<"<clang i group>">, Group<i_Group>, DocFlatten;
@@ -816,6 +821,14 @@ will be ignored}]>;
 def L : JoinedOrSeparate<["-"], "L">, Flags<[RenderJoined]>, Group<Link_Group>,
     Visibility<[ClangOption, FlangOption]>,
     MetaVarName<"<dir>">, HelpText<"Add directory to library search path">;
+def embed_dir : JoinedOrSeparate<["-"], "embed-dir">,
+    Flags<[RenderJoined]>, Group<EmbedPath_Group>,
+    Visibility<[ClangOption, CC1Option, CC1AsOption, FlangOption, FC1Option]>,
+    MetaVarName<"<dir>">, HelpText<"Add directory to embed search path">;
+def embed_dir_EQ : JoinedOrSeparate<["-"], "embed-dir=">,
+    Flags<[RenderJoined]>, Group<EmbedPath_Group>,
+    Visibility<[ClangOption, CC1Option, CC1AsOption, FlangOption, FC1Option]>,
+    MetaVarName<"<dir>">, HelpText<"Add directory to embed search path">;
 def MD : Flag<["-"], "MD">, Group<M_Group>,
     HelpText<"Write a depfile containing user and system headers">;
 def MMD : Flag<["-"], "MMD">, Group<M_Group>,
@@ -1353,6 +1366,9 @@ def dD : Flag<["-"], "dD">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>
 def dI : Flag<["-"], "dI">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Print include directives in -E mode in addition to normal output">,
   MarshallingInfoFlag<PreprocessorOutputOpts<"ShowIncludeDirectives">>;
+def dE : Flag<["-"], "dE">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Print embed directives in -E mode in addition to normal output">,
+  MarshallingInfoFlag<PreprocessorOutputOpts<"ShowEmbedDirectives">>;
 def dM : Flag<["-"], "dM">, Group<d_Group>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Print macro definitions in -E mode instead of normal output">;
 def dead__strip : Flag<["-"], "dead_strip">;
diff --git a/clang/include/clang/Frontend/PreprocessorOutputOptions.h b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
index db2ec9f2ae20698..3e36db3f8ce46ea 100644
--- a/clang/include/clang/Frontend/PreprocessorOutputOptions.h
+++ b/clang/include/clang/Frontend/PreprocessorOutputOptions.h
@@ -22,6 +22,7 @@ class PreprocessorOutputOptions {
   unsigned ShowMacroComments : 1;  ///< Show comments, even in macros.
   unsigned ShowMacros : 1;         ///< Print macro definitions.
   unsigned ShowIncludeDirectives : 1;  ///< Print includes, imports etc. within preprocessed output.
+  unsigned ShowEmbedDirectives : 1;  ///< Print embeds, etc. within preprocessed output.
   unsigned RewriteIncludes : 1;    ///< Preprocess include directives only.
   unsigned RewriteImports  : 1;    ///< Include contents of transitively-imported modules.
   unsigned MinimizeWhitespace : 1; ///< Ignore whitespace from input.
@@ -37,6 +38,7 @@ class PreprocessorOutputOptions {
     ShowMacroComments = 0;
     ShowMacros = 0;
     ShowIncludeDirectives = 0;
+    ShowEmbedDirectives = 0;
     RewriteIncludes = 0;
     RewriteImports = 0;
     MinimizeWhitespace = 0;
diff --git a/clang/include/clang/Lex/PPCallbacks.h b/clang/include/clang/Lex/PPCallbacks.h
index 94f96cf9c512541..7028c81dbc84aac 100644
--- a/clang/include/clang/Lex/PPCallbacks.h
+++ b/clang/include/clang/Lex/PPCallbacks.h
@@ -22,11 +22,11 @@
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
-  class Token;
-  class IdentifierInfo;
-  class MacroDefinition;
-  class MacroDirective;
-  class MacroArgs;
+class Token;
+class IdentifierInfo;
+class MacroDefinition;
+class MacroDirective;
+class MacroArgs;
 
 /// This interface provides a way to observe the actions of the
 /// preprocessor as it does its thing.
@@ -36,9 +36,7 @@ class PPCallbacks {
 public:
   virtual ~PPCallbacks();
 
-  enum FileChangeReason {
-    EnterFile, ExitFile, SystemHeaderPragma, RenameFile
-  };
+  enum FileChangeReason { EnterFile, ExitFile, SystemHeaderPragma, RenameFile };
 
   /// Callback invoked whenever a source file is entered or exited.
   ///
@@ -47,8 +45,7 @@ class PPCallbacks {
   /// the file before the new one entered for \p Reason EnterFile.
   virtual void FileChanged(SourceLocation Loc, FileChangeReason Reason,
                            SrcMgr::CharacteristicKind FileType,
-                           FileID PrevFID = FileID()) {
-  }
+                           FileID PrevFID = FileID()) {}
 
   enum class LexedFileChangeReason { EnterFile, ExitFile };
 
@@ -83,6 +80,47 @@ class PPCallbacks {
                            const Token &FilenameTok,
                            SrcMgr::CharacteristicKind FileType) {}
 
+  /// Callback invoked whenever the preprocessor cannot find a file for an
+  /// embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \returns true to indicate that the preprocessor should skip this file
+  /// and not issue any diagnostic.
+  virtual bool EmbedFileNotFound(StringRef FileName) { return false; }
+
+  /// Callback invoked whenever an embed directive has been processed,
+  /// regardless of whether the embed will actually find a file.
+  ///
+  /// \param HashLoc The location of the '#' that starts the embed directive.
+  ///
+  /// \param FileName The name of the file being included, as written in the
+  /// source code.
+  ///
+  /// \param IsAngled Whether the file name was enclosed in angle brackets;
+  /// otherwise, it was enclosed in quotes.
+  ///
+  /// \param FilenameRange The character range of the quotes or angle brackets
+  /// for the written file name.
+  ///
+  /// \param ParametersRange The character range of the embed parameters. An
+  /// empty range if there were no parameters.
+  ///
+  /// \param File The actual file that may be included by this embed directive.
+  ///
+  /// \param SearchPath Contains the search path which was used to find the file
+  /// in the file system. If the file was found via an absolute path,
+  /// SearchPath will be empty.
+  ///
+  /// \param RelativePath The path relative to SearchPath, at which the resource
+  /// file was found. This is equal to FileName.
+  virtual void EmbedDirective(SourceLocation HashLoc, StringRef FileName,
+                              bool IsAngled, CharSourceRange FilenameRange,
+                              CharSourceRange ParametersRange,
+                              OptionalFileEntryRef File, StringRef SearchPath,
+                              StringRef RelativePath) {}
+
   /// Callback invoked whenever the preprocessor cannot find a file for an
   /// inclusion directive.
   ///
@@ -151,7 +189,7 @@ class PPCallbacks {
   /// \param ForPragma If entering from pragma directive.
   ///
   virtual void EnteredSubmodule(Module *M, SourceLocation ImportLoc,
-                                bool ForPragma) { }
+                                bool ForPragma) {}
 
   /// Callback invoked whenever a submodule was left.
   ///
@@ -162,7 +200,7 @@ class PPCallbacks {
   /// \param ForPragma If entering from pragma directive.
   ///
   virtual void LeftSubmodule(Module *M, SourceLocation ImportLoc,
-                             bool ForPragma) { }
+                             bool ForPragma) {}
 
   /// Callback invoked whenever there was an explicit module-import
   /// syntax.
@@ -174,49 +212,40 @@ class PPCallbacks {
   ///
   /// \param Imported The imported module; can be null if importing failed.
   ///
-  virtual void moduleImport(SourceLocation ImportLoc,
-                            ModuleIdPath Path,
-                            const Module *Imported) {
-  }
+  virtual void moduleImport(SourceLocation ImportLoc, ModuleIdPath Path,
+                            const Module *Imported) {}
 
   /// Callback invoked when the end of the main file is reached.
   ///
   /// No subsequent callbacks will be made.
-  virtual void EndOfMainFile() {
-  }
+  virtual void EndOfMainFile() {}
 
   /// Callback invoked when a \#ident or \#sccs directive is read.
   /// \param Loc The location of the directive.
   /// \param str The text of the directive.
   ///
-  virtual void Ident(SourceLocation Loc, StringRef str) {
-  }
+  virtual void Ident(SourceLocation Loc, StringRef str) {}
 
   /// Callback invoked when start reading any pragma directive.
   virtual void PragmaDirective(SourceLocation Loc,
-                               PragmaIntroducerKind Introducer) {
-  }
+                               PragmaIntroducerKind Introducer) {}
 
   /// Callback invoked when a \#p...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 96b2e35a58819eb2fbe1821650e35a1f0e085bd7 8629e19e7b227d95e676ccee80f9afb2d8922bdf -- clang/include/clang/Lex/PPDirectiveParameter.h clang/include/clang/Lex/PPEmbedParameters.h clang/test/C/C2x/Inputs/boop.h clang/test/C/C2x/n3017.c clang/test/Preprocessor/embed___has_embed.c clang/test/Preprocessor/embed___has_embed_parsing_errors.c clang/test/Preprocessor/embed___has_embed_supported.c clang/test/Preprocessor/embed_art.c clang/test/Preprocessor/embed_codegen.cpp clang/test/Preprocessor/embed_constexpr.cpp clang/test/Preprocessor/embed_dependencies.c clang/test/Preprocessor/embed_ext_compat_diags.c clang/test/Preprocessor/embed_feature_test.cpp clang/test/Preprocessor/embed_file_not_found_chevron.c clang/test/Preprocessor/embed_file_not_found_quote.c clang/test/Preprocessor/embed_init.c clang/test/Preprocessor/embed_parameter_if_empty.c clang/test/Preprocessor/embed_parameter_limit.c clang/test/Preprocessor/embed_parameter_offset.c clang/test/Preprocessor/embed_parameter_prefix.c clang/test/Preprocessor/embed_parameter_suffix.c clang/test/Preprocessor/embed_parameter_unrecognized.c clang/test/Preprocessor/embed_parsing_errors.c clang/test/Preprocessor/embed_path_chevron.c clang/test/Preprocessor/embed_path_quote.c clang/test/Preprocessor/embed_preprocess_to_file.c clang/test/Preprocessor/embed_single_entity.c clang/test/Preprocessor/embed_weird.cpp clang-tools-extra/test/clang-tidy/checkers/modernize/loop-convert-extra.cpp clang-tools-extra/test/pp-trace/pp-trace-macro.cpp clang/include/clang/AST/Expr.h clang/include/clang/AST/RecursiveASTVisitor.h clang/include/clang/AST/TextNodeDumper.h clang/include/clang/Basic/FileManager.h clang/include/clang/Frontend/PreprocessorOutputOptions.h clang/include/clang/Lex/PPCallbacks.h clang/include/clang/Lex/Preprocessor.h clang/include/clang/Lex/PreprocessorOptions.h clang/include/clang/Parse/Parser.h clang/include/clang/Sema/Sema.h clang/include/clang/Serialization/ASTBitCodes.h clang/lib/AST/Expr.cpp clang/lib/AST/ExprClassification.cpp clang/lib/AST/ExprConstant.cpp clang/lib/AST/Interp/ByteCodeExprGen.cpp clang/lib/AST/Interp/ByteCodeExprGen.h clang/lib/AST/ItaniumMangle.cpp clang/lib/AST/StmtPrinter.cpp clang/lib/AST/StmtProfile.cpp clang/lib/AST/TextNodeDumper.cpp clang/lib/Basic/FileManager.cpp clang/lib/Basic/IdentifierTable.cpp clang/lib/CodeGen/CGExprAgg.cpp clang/lib/CodeGen/CGExprConstant.cpp clang/lib/CodeGen/CGExprScalar.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Frontend/CompilerInvocation.cpp clang/lib/Frontend/DependencyFile.cpp clang/lib/Frontend/DependencyGraph.cpp clang/lib/Frontend/InitPreprocessor.cpp clang/lib/Frontend/PrintPreprocessedOutput.cpp clang/lib/Lex/PPDirectives.cpp clang/lib/Lex/PPExpressions.cpp clang/lib/Lex/PPMacroExpansion.cpp clang/lib/Parse/ParseExpr.cpp clang/lib/Parse/ParseInit.cpp clang/lib/Parse/ParseTemplate.cpp clang/lib/Sema/SemaExceptionSpec.cpp clang/lib/Sema/SemaExpr.cpp clang/lib/Sema/SemaInit.cpp clang/lib/Sema/TreeTransform.h clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/lib/StaticAnalyzer/Core/ExprEngine.cpp clang/test/Preprocessor/init-aarch64.c clang/test/Preprocessor/init.c clang/tools/libclang/CXCursor.cpp
View the diff from clang-format here.
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 0bd2201978..45fb5bcc0d 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4857,9 +4857,7 @@ public:
   EmbedDataStorage *getData() const { return Data; }
 
   unsigned getStartingElementPos() const { return Begin; }
-  size_t getDataElementCount() const {
-    return NumOfElements;
-  }
+  size_t getDataElementCount() const { return NumOfElements; }
 
   template <bool Const>
   class ChildElementIter
@@ -4937,9 +4935,7 @@ public:
     return T->getStmtClass() == EmbedExprClass;
   }
 
-  ChildElementIter<false> begin() {
-    return ChildElementIter<false>(this);
-  }
+  ChildElementIter<false> begin() { return ChildElementIter<false>(this); }
 
   ChildElementIter<true> begin() const {
     return ChildElementIter<true>(const_cast<EmbedExpr *>(this));
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 882ed68fd9..3720345d53 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11140,8 +11140,7 @@ bool ArrayExprEvaluator::VisitCXXParenListOrInitListExpr(
     NumEltsToInit = NumElts;
   } else {
     for (auto *Init : Args) {
-      if (auto *EmbedS =
-              dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+      if (auto *EmbedS = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
         NumEltsToInit += EmbedS->getDataElementCount() - 1;
       }
     }
@@ -11189,8 +11188,7 @@ bool ArrayExprEvaluator::VisitCXXParenListOrInitListExpr(
     const Expr *Init = Index < Args.size() ? Args[Index] : ArrayFiller;
     if (ArrayIndex >= NumEltsToInit)
       break;
-    if (auto *EmbedS =
-            dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+    if (auto *EmbedS = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
       if (!EmbedS->doForEachDataElement(Eval, ArrayIndex))
         return false;
     } else {
diff --git a/clang/lib/AST/TextNodeDumper.cpp b/clang/lib/AST/TextNodeDumper.cpp
index 2ddfd8d7c1..751c9a397c 100644
--- a/clang/lib/AST/TextNodeDumper.cpp
+++ b/clang/lib/AST/TextNodeDumper.cpp
@@ -2721,5 +2721,5 @@ void TextNodeDumper::VisitOpenACCConstructStmt(const OpenACCConstructStmt *S) {
 void TextNodeDumper::VisitEmbedExpr(const EmbedExpr *S) {
   AddChild("begin", [=] { OS << S->getStartingElementPos(); });
   AddChild("number of elements", [=] { OS << S->getDataElementCount(); });
-  //AddChild("embed", [=] { Visit(S->getEmbed()); });
+  // AddChild("embed", [=] { Visit(S->getEmbed()); });
 }
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index 31ee4aec65..ecfda370a7 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -511,8 +511,7 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
 
   uint64_t NumArrayElements = AType->getNumElements();
   for (const auto *Init : Args) {
-    if (const auto *Embed =
-            dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+    if (const auto *Embed = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
       NumInitElements += Embed->getDataElementCount() - 1;
       if (NumInitElements > NumArrayElements) {
         NumInitElements = NumArrayElements;
@@ -622,8 +621,7 @@ void AggExprEmitter::EmitArrayInit(Address DestPtr, llvm::ArrayType *AType,
   for (uint64_t i = 0; i != NumInitElements; ++i) {
     if (ArrayIndex >= NumInitElements)
       break;
-    if (auto *EmbedS =
-            dyn_cast<EmbedExpr>(Args[i]->IgnoreParenImpCasts())) {
+    if (auto *EmbedS = dyn_cast<EmbedExpr>(Args[i]->IgnoreParenImpCasts())) {
       EmbedS->doForEachDataElement(Emit, ArrayIndex);
     } else {
       Emit(Args[i], ArrayIndex);
diff --git a/clang/lib/CodeGen/CGExprConstant.cpp b/clang/lib/CodeGen/CGExprConstant.cpp
index de00dbdabc..96d0101b97 100644
--- a/clang/lib/CodeGen/CGExprConstant.cpp
+++ b/clang/lib/CodeGen/CGExprConstant.cpp
@@ -1309,8 +1309,7 @@ public:
     unsigned ArrayIndex = 0;
     for (unsigned i = 0; i < ILE->getNumInits(); ++i) {
       const Expr *Init = ILE->getInit(i);
-      if (auto *EmbedS =
-              dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
+      if (auto *EmbedS = dyn_cast<EmbedExpr>(Init->IgnoreParenImpCasts())) {
         if (!EmbedS->doForEachDataElement(Emit, ArrayIndex,
                                           /*EmbedInit=*/true))
           return nullptr;
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 3511c1867b..06a25849fa 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -2139,8 +2139,7 @@ void InitListChecker::CheckArrayType(const InitializedEntity &Entity,
             elementIndex + CurEmbedIndex - EmbedElementIndexBeforeInit - 1;
       } else {
         auto Embed = cast<EmbedExpr>(Init);
-        elementIndex = elementIndex +
-                       Embed->getDataElementCount() -
+        elementIndex = elementIndex + Embed->getDataElementCount() -
                        EmbedElementIndexBeforeInit - 1;
       }
     }

@cor3ntin cor3ntin added the c23 label Oct 9, 2023
clang/lib/Format/TokenAnnotator.cpp Outdated Show resolved Hide resolved
clang/lib/Format/TokenAnnotator.cpp Outdated Show resolved Hide resolved
clang/lib/Format/TokenAnnotator.cpp Outdated Show resolved Hide resolved
@AaronBallman
Copy link
Collaborator

FWIW, I spoke offline with the original author of the PR and he said that he's fine with me picking up the changes and carrying the review forward.

Because I don't know of any better way to commandeer a patch in GitHub, I'll probably grab the changes, get them into my own fork, and then recreate the PR to get the review started. Whenever we get ready to land the PR, @ThePhD will be credited as a co-author.

@h-vetinari
Copy link
Contributor

Because I don't know of any better way to commandeer a patch in GitHub

As a maintainer, you can push into this branch (unless @ThePhD unchecked the default setting when creating the PR), including forcefully.

For example (you could also use the github CLI, but I'm using vanilla git to demo):

git remote add ThePhD https://github.com/ThePhD/llvm-project
git fetch ThePhD
git checkout thephd/embed-speed  # git will tell you you've got this branch, tracking the ThePhD remote
# do changes
git push ThePhD  # add `-f` if you've modified the existing commits, e.g. via rebase

@jyknight
Copy link
Member

jyknight commented Oct 9, 2023

This pull request implements the entirety of the now-accepted N3017 - Preprocessor Embed.

Amazing! I had started to think about looking into getting this implemented recently, so it's really nice to see an implementation uploaded now!

I have no intention of following up on this PR. I am too tired to carry it to fruition. Pick this apart, take from it what you want, or reimplement it entirely.

I'm very sorry to hear that, but thank you for the contribution regardless!

@shafik
Copy link
Collaborator

shafik commented Oct 10, 2023

FWIW, I spoke offline with the original author of the PR and he said that he's fine with me picking up the changes and carrying the review forward.

Because I don't know of any better way to commandeer a patch in GitHub, I'll probably grab the changes, get them into my own fork, and then recreate the PR to get the review started. Whenever we get ready to land the PR, @ThePhD will be credited as a co-author.

So just to clarify, we should wait till you post a follow-up review before commenting, to avoid fragmenting the discussion.

@h-vetinari
Copy link
Contributor

Because I don't know of any better way to commandeer a patch in GitHub

As a maintainer, you can push into this branch

I had reached out to @ThePhD with an offer to help on this before they opened this PR, and I now have push access to their fork, which means I could change the state of this PR.

To be clear, I cannot offer to defend the content (semantically) in the face of review, but - if desired - I'd be happy to help with some more routine operations like rebasing, formatting, or chopping off pieces into separate PRs (all while keeping @ThePhD's authorship, of course).

@AaronBallman
Copy link
Collaborator

Because I don't know of any better way to commandeer a patch in GitHub

As a maintainer, you can push into this branch

I had reached out to @ThePhD with an offer to help on this before they opened this PR, and I now have push access to their fork, which means I could change the state of this PR.

To be clear, I cannot offer to defend the content (semantically) in the face of review, but - if desired - I'd be happy to help with some more routine operations like rebasing, formatting, or chopping off pieces into separate PRs (all while keeping @ThePhD's authorship, of course).

Oh, that's excellent! @ThePhD also gave me push access to his repo. If we're going to tag-team efforts on this and we both have access, it probably makes sense to just leave it in his repo and work from there. If he finds the notifications too distracting, we can always move the work elsewhere at that point.

@h-vetinari -- would you mind doing some initial cleanup on the patch for things like rebasing, removing spurious formatting changes, naming, and the likes? I have WG14 meetings coming up next week (they should hopefully go quickly though), so it is likely going to be ~a week before I have the chance to work on this more heavily.

So just to clarify, we should wait till you post a follow-up review before commenting, to avoid fragmenting the discussion.

Given the above, I'd say we can keep the review going here.

@h-vetinari
Copy link
Contributor

@h-vetinari -- would you mind doing some initial cleanup on the patch for things like rebasing, removing spurious formatting changes, naming, and the likes?

I'll try to do that!

…+26.

🛠 [Frontend] Ensure commas inserted by #embed are properly serialized to output
@h-vetinari
Copy link
Contributor

h-vetinari commented Oct 14, 2023

OK, I've got a first rebase going. However, because iterating on a diff with 1000s of lines is extremely hard to review and bound to lead to mistakes and iterations (making it harder still to follow what's going on), I've tried to follow a more structured approach.

Basically, I've pushed tags for intermediate milestones during the rebase to my fork, and am documenting below what I've done. That should make it easier to:

  • understand how the diff evolved
  • make comparisons between the various different stages (e.g. one can add /compare/<tag>..<tag> on Github)
  • roll back to intermediate iterations if something goes wrong, without having to start from scratch
  • validate what I did (for those so inclined) or branch off in a different direction

I've used a simple tag scheme embed_vX[.Y], where X increases if exisiting commits were changed, and Y increases if changes happened without rebasing. I did some basic compilation tests to ensure I didn't accidentally drop/introduce changes that break the build, but no more than that.

The table is probably busted visually on mobile (sorry...), but I hope it reads alright on a desktop. Where used, equality (==) refers to the content of the tree at the commits being compared, not the diffs in those commits.

tag description steps to reproduce compile-tested1 diff to
prev. version
embed_v0 the state of the PR
as opened originally
(base 08b20d8)
check out the tag 32c8097 ✅ (presumably)
357bda5 ❌ (needs adc9737)
dfd6383
adc9737
embed_v1 separated superfluous
formatting changes
from first 2 commits
(base 08b20d8)
Interactive rebase galore
(the most painful part)
0b4b6a6
a05117d ✅ (==v0:32c8097)
8231b72 ❌ (needs 36c1a5f)
0a6f48b ❌ (==v0:357bda5)
b356860 ❌ (==v0:dfd6383)
36c1a5f ✅ (==v0:adc9737)
empty
embed_v1.1 added suggestions from
git-clang-format
(base 08b20d8)
C&P + git apply v1 + 26c6fcc 1 commit;
no rebase
embed_v2 undo superfluous
formatting changes
(base 08b20d8)
git rebase -i ...; drop
a05117d & 0a6f48b, fix
minor conflicts in 26c6fcc
0b4b6a6 ✅ (same as v1)
07795f2 ❌ (needs 748c3b5)
1179116
748c3b5 ✅ (likely; untested)
b6d053e
uninteresting
embed_v3 squash small commits
(base 08b20d8)
git rebase -i ...; squash 0b4b6a6 ✅ (same as v1)
a6f134d
empty
embed_v4 rebase (cleanly) on main
(base 7060422)
git rebase upstream/main 7050c93 ❔ (untested)
5956fc2 ❔ (untested)
uninteresting
embed_v4.1 formatting fix-ups
(base 7060422)
apply suggestions;
inspect diff in UI
v4 + 01983b4 + 5f5d3a6 2 commits;
no rebase
embed_v5 squash fix-ups
(base 7060422)
git rebase -i ...; squash 7050c93 ❔ (same as v4)
6a7a4c9 ❔ (untested)
minor vs. v4
empty vs. v4.1

Since the rebase from v3 to v4 went cleanly, I haven't run the compilation tests. I will push v4 v5 now, and hope that this is a good enough jumping off point for further review and iteration.

Footnotes

  1. cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_BUILD_TOOLS=OFF -DLLVM_CCACHE_BUILD=ON ..\llvm

⚡ [Lex] Better reservations for improved performance/memory usage.

🛠 [Lex, Frontend] Remove comma hardcoding since we are servicing a full file

apply suggestions from git-clang-format
@ThePhD ThePhD changed the title [Sema, Lex, Parse] Preprocessor embed in C and C++ (and Obj-C and Obj-C++ by-proxy) ✨ [Sema, Lex, Parse] Preprocessor embed in C and C++ (and Obj-C and Obj-C++ by-proxy) Oct 19, 2023
@h-vetinari
Copy link
Contributor

JeanHeyd wrote up a blog post about this implementation, which is probably helpful as background material for any prospective reviewers. :)

@h-vetinari
Copy link
Contributor

@AaronBallman, hope the WG14 meeting went well? I've essentially done what I planned to do here, so I was wondering if you're ready to take it from here? I can try to incorporate the review feedback myself, but it's quite likely I'd make a mess of it (not familiar with the clang code base at all). 😅

@Fznamznon
Copy link
Contributor

So, with the latest improvement on my machine

const char x[] = {
#embed "file" suffix(,)
  0
};

Takes ~13 seconds to compile with 200 mb file. It used to take 4m20s without my changes. The following

const char x[] = {
#embed "file"
};

Takes 1.1s to compile in my environment.
I do realize it is not the same performance and there is still a room for improvement. However it is a great improvement comparing to the original state. If that is not enough I'll try to optimize more.

With code move from AddInitializerToDecl to InitializationSequence the case like

struct S {
  char buffer[2] = {
#embed <jk.txt>
  };
};

also works now.
Dear reviewers, I'm looking forward to your feedback regarding the approach and whether the performance improvement is enough.

@Fznamznon Fznamznon marked this pull request as ready for review April 17, 2024 10:11
clang/lib/AST/Interp/ByteCodeExprGen.cpp Outdated Show resolved Hide resolved
clang/lib/AST/Interp/ByteCodeExprGen.cpp Outdated Show resolved Hide resolved
clang/lib/AST/Interp/ByteCodeExprGen.cpp Outdated Show resolved Hide resolved
clang/lib/AST/Interp/ByteCodeExprGen.cpp Outdated Show resolved Hide resolved
@cor3ntin
Copy link
Contributor

@Fznamznon Thanks for looking into that.

I am fairly concerned by the growing complexity of this PR and the long term maintenance burden.
When the embed is smaller than a few elements (and especially when it is of size 1), or when it appears in a context when it does not make sense, we should try to produce tokens instead (ie, convert back the embed into tokens when it appear in a template argument, cast expression, etc)

ExpandSinglePPEmbedExpr, ModifyTemplateArguments in particular should not be needed

Can we merge EmbedSubscriptExpr and EmbedExpr in a single class or find a way to make only one of these things an expression that needs to be visited by all consumers?

@Fznamznon
Copy link
Contributor

Thanks for the feedback @cor3ntin .

Can we merge EmbedSubscriptExpr and EmbedExpr in a single class or find a way to make only one of these things an expression that needs to be visited by all consumers?

I think we still want to be able to reference a part of #embed data, maybe reference the same data several times for cases with nested initializer lists without braces, so we understand where each data element goes.
So, I would prefer having two classes.
I think that all the consumers should just handle EmbedSubscriptExpr, as it allows to represent any part of embed data. Right now whatever handling for PPEmbedExpr remains there doesn't make much sense.
I'll see what I can do.

@cor3ntin
Copy link
Contributor

So, I would prefer having two classes.
I think that all the consumers should just handle EmbedSubscriptExpr, as it allows to represent any part of embed data. Right now whatever handling for PPEmbedExpr remains there doesn't make much sense.
I'll see what I can do.

How aout:

PPEmbedExpr does not need to be an Expr - just a class that holds the data
EmbedSubscriptExpr (which you can rename to EmbedExpr or EmbedSliceExpr) always refer to a PPEmbed and by default it can take the full range (0, number elements)

@Fznamznon
Copy link
Contributor

PPEmbedExpr does not need to be an Expr - just a class that holds the data
EmbedSubscriptExpr (which you can rename to EmbedExpr or EmbedSliceExpr) always refer to a PPEmbed and by default it can take the full range (0, number elements)

SGTM!

@Fznamznon Fznamznon marked this pull request as draft April 22, 2024 17:11
@Fznamznon Fznamznon marked this pull request as ready for review April 25, 2024 16:28
@Fznamznon
Copy link
Contributor

Okay, now there is only one new Expr class - EmbedExpr. I've tried to reduce appearance of it after Parser as much as possible.
No more ExpandSinglePPEmbedExpr, ModifyTemplateArguments functions too.
Embed is mostly expanded in Parser now, except for the case when it appears inside of an initializer list. I was actually trying to follow this idea - #68620 (comment) , however it is unclear to me how to expand everything from ParseCastExpr since it seems to be intended to return a single value.
@cor3ntin , I would appreciate your feedback.

@@ -1062,7 +1062,7 @@ void f() {
E Ee{ { { g( { Array[I] } ) } } };
}
// CHECK-MESSAGES: :[[@LINE-7]]:3: warning: use range-based for loop instead
// CHECK-FIXES: for (int I : Array)
// CHECK-FIXES: for (int & I : Array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes in SemaInit.cpp likely caused this, however the outcome seems correct to me, so I just changed the test.

@Fznamznon
Copy link
Contributor

Gentle ping :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c23 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:static analyzer clang Clang issues not falling into any other category cmake Build system in general and CMake in particular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet