Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++20] [Modules] Introduce reduced BMI #75894

Merged
merged 1 commit into from
Mar 8, 2024
Merged

Conversation

ChuanqiXu9
Copy link
Member

Close #71034

See
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755

This patch introduces reduced BMI, which doesn't contain the definitions of functions and variables if its definitions won't contribute to the ABI.

Testing is a big part of the patch. We want to make sure the reduced BMI contains the same behavior with the existing and relatively stable fatBMI. This is pretty helpful for further reduction.

The user interfaces part it left to following patches to ease the reviewing.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Dec 19, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 19, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Close #71034

See
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755

This patch introduces reduced BMI, which doesn't contain the definitions of functions and variables if its definitions won't contribute to the ABI.

Testing is a big part of the patch. We want to make sure the reduced BMI contains the same behavior with the existing and relatively stable fatBMI. This is pretty helpful for further reduction.

The user interfaces part it left to following patches to ease the reviewing.


Patch is 108.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75894.diff

108 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+3-1)
  • (modified) clang/include/clang/Frontend/FrontendActions.h (+13-1)
  • (modified) clang/include/clang/Frontend/FrontendOptions.h (+5-1)
  • (modified) clang/include/clang/Serialization/ASTWriter.h (+28-2)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+2)
  • (modified) clang/lib/Frontend/FrontendActions.cpp (+30-6)
  • (modified) clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp (+2)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+18-14)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+38-7)
  • (modified) clang/lib/Serialization/GeneratePCH.cpp (+35-2)
  • (modified) clang/test/CXX/basic/basic.link/p10-ex2.cpp (+2)
  • (modified) clang/test/CXX/basic/basic.lookup/basic.lookup.argdep/p4-friend-in-reachable-class.cpp (+4-1)
  • (modified) clang/test/Modules/InheritDefaultArguments.cppm (+3)
  • (modified) clang/test/Modules/Reachability-Private.cpp (+10)
  • (modified) clang/test/Modules/Reachability-func-default-arg.cpp (+3)
  • (modified) clang/test/Modules/Reachability-func-ret.cpp (+3)
  • (modified) clang/test/Modules/Reachability-template-default-arg.cpp (+3)
  • (modified) clang/test/Modules/Reachability-template-instantiation.cpp (+4)
  • (modified) clang/test/Modules/Reachability-using-templates.cpp (+3)
  • (modified) clang/test/Modules/Reachability-using.cpp (+3)
  • (modified) clang/test/Modules/concept.cppm (+4)
  • (modified) clang/test/Modules/concept_differ.cppm (+5)
  • (modified) clang/test/Modules/ctor.arg.dep.cppm (+4)
  • (modified) clang/test/Modules/cxx20-10-1-ex1.cpp (+13)
  • (modified) clang/test/Modules/cxx20-10-1-ex2.cpp (+30-6)
  • (modified) clang/test/Modules/cxx20-10-2-ex2.cpp (+12)
  • (modified) clang/test/Modules/cxx20-10-2-ex5.cpp (+12)
  • (modified) clang/test/Modules/cxx20-10-3-ex1.cpp (+14)
  • (modified) clang/test/Modules/cxx20-10-3-ex2.cpp (+10)
  • (modified) clang/test/Modules/cxx20-10-5-ex1.cpp (+12)
  • (modified) clang/test/Modules/cxx20-import-diagnostics-a.cpp (+39)
  • (modified) clang/test/Modules/cxx20-import-diagnostics-b.cpp (+25)
  • (modified) clang/test/Modules/cxx20-module-file-info-macros.cpp (+3)
  • (modified) clang/test/Modules/deduction-guide.cppm (+3)
  • (modified) clang/test/Modules/deduction-guide2.cppm (+3)
  • (modified) clang/test/Modules/deduction-guide3.cppm (+3)
  • (modified) clang/test/Modules/derived_class.cpp (+3)
  • (modified) clang/test/Modules/duplicated-module-file-eq-module-name.cppm (+4)
  • (modified) clang/test/Modules/enum-class.cppm (+3)
  • (modified) clang/test/Modules/explicitly-specialized-template.cpp (+3)
  • (modified) clang/test/Modules/export-language-linkage.cppm (+7-1)
  • (modified) clang/test/Modules/ftime-trace.cppm (+9)
  • (modified) clang/test/Modules/inconsistent-deduction-guide-linkage.cppm (+6)
  • (modified) clang/test/Modules/inconsistent-export.cppm (+13)
  • (modified) clang/test/Modules/inherited_arg.cppm (+11)
  • (modified) clang/test/Modules/instantiation-argdep-lookup.cppm (+3)
  • (modified) clang/test/Modules/lambdas.cppm (+15)
  • (modified) clang/test/Modules/merge-concepts-cxx-modules.cpp (+12)
  • (modified) clang/test/Modules/merge-constrained-friends.cpp (+3)
  • (modified) clang/test/Modules/merge-lambdas.cppm (+4)
  • (modified) clang/test/Modules/merge-requires-with-lambdas.cppm (+19)
  • (modified) clang/test/Modules/merge-var-template-spec-cxx-modules.cppm (+5)
  • (modified) clang/test/Modules/mismatch-diagnostics.cpp (+11)
  • (modified) clang/test/Modules/module-init-duplicated-import.cppm (+11)
  • (modified) clang/test/Modules/named-modules-adl-2.cppm (+4)
  • (modified) clang/test/Modules/named-modules-adl-3.cppm (+17)
  • (modified) clang/test/Modules/named-modules-adl.cppm (+3)
  • (modified) clang/test/Modules/no-duplicate-codegen-in-GMF.cppm (+8)
  • (modified) clang/test/Modules/pair-unambiguous-ctor.cppm (+9)
  • (modified) clang/test/Modules/partial_specialization.cppm (+3)
  • (modified) clang/test/Modules/placement-new-reachable.cpp (+3)
  • (modified) clang/test/Modules/polluted-operator.cppm (+3)
  • (modified) clang/test/Modules/pr54457.cppm (+3)
  • (modified) clang/test/Modules/pr56916.cppm (+12)
  • (modified) clang/test/Modules/pr58532.cppm (+6)
  • (modified) clang/test/Modules/pr58716.cppm (+1-1)
  • (modified) clang/test/Modules/pr59719.cppm (+3)
  • (modified) clang/test/Modules/pr59780.cppm (+10)
  • (modified) clang/test/Modules/pr59999.cppm (+13)
  • (modified) clang/test/Modules/pr60036.cppm (+14)
  • (modified) clang/test/Modules/pr60085.cppm (+17)
  • (modified) clang/test/Modules/pr60275.cppm (+6-1)
  • (modified) clang/test/Modules/pr60486.cppm (+3)
  • (modified) clang/test/Modules/pr60693.cppm (+4)
  • (modified) clang/test/Modules/pr60775.cppm (+13)
  • (modified) clang/test/Modules/pr60890.cppm (+6)
  • (modified) clang/test/Modules/pr61065.cppm (+13)
  • (modified) clang/test/Modules/pr61065_2.cppm (+15)
  • (modified) clang/test/Modules/pr61067.cppm (+14)
  • (modified) clang/test/Modules/pr61317.cppm (+9)
  • (modified) clang/test/Modules/pr61783.cppm (+8)
  • (modified) clang/test/Modules/pr61892.cppm (+20-20)
  • (modified) clang/test/Modules/pr62158.cppm (+9)
  • (modified) clang/test/Modules/pr62359.cppm (+16)
  • (modified) clang/test/Modules/pr62589.cppm (+3)
  • (modified) clang/test/Modules/pr62705.cppm (+8)
  • (modified) clang/test/Modules/pr62796.cppm (+4)
  • (modified) clang/test/Modules/pr62943.cppm (+12)
  • (modified) clang/test/Modules/pr63544.cppm (+12)
  • (modified) clang/test/Modules/pr63595.cppm (+10)
  • (modified) clang/test/Modules/pr67627.cppm (+4)
  • (modified) clang/test/Modules/pr67893.cppm (+12)
  • (modified) clang/test/Modules/predefined.cpp (+3)
  • (modified) clang/test/Modules/preferred_name.cppm (+10)
  • (modified) clang/test/Modules/redefinition-merges.cppm (+6)
  • (modified) clang/test/Modules/redundant-template-default-arg.cpp (+3)
  • (modified) clang/test/Modules/redundant-template-default-arg2.cpp (+3)
  • (modified) clang/test/Modules/redundant-template-default-arg3.cpp (+3)
  • (modified) clang/test/Modules/search-partitions.cpp (+16)
  • (modified) clang/test/Modules/seperated-member-function-definition-for-template-class.cppm (+12)
  • (modified) clang/test/Modules/template-function-specialization.cpp (+4-1)
  • (modified) clang/test/Modules/template-lambdas.cppm (+15)
  • (modified) clang/test/Modules/template-pack.cppm (+3)
  • (modified) clang/test/Modules/template_default_argument.cpp (+3)
  • (modified) clang/unittests/Sema/SemaNoloadLookupTest.cpp (+4-5)
  • (modified) clang/unittests/Serialization/ForceCheckFileInputTest.cpp (+6-4)
  • (modified) clang/unittests/Serialization/NoCommentsTest.cpp (+5-4)
  • (modified) clang/unittests/Serialization/VarDeclConstantInitTest.cpp (+8-5)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 1b02087425b751..c8f7675c6170ad 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -7302,7 +7302,9 @@ def ast_view : Flag<["-"], "ast-view">,
 def emit_module : Flag<["-"], "emit-module">,
   HelpText<"Generate pre-compiled module file from a module map">;
 def emit_module_interface : Flag<["-"], "emit-module-interface">,
-  HelpText<"Generate pre-compiled module file from a C++ module interface">;
+  HelpText<"Generate pre-compiled module file from a standard C++ module interface unit">;
+def emit_reduced_module_interface : Flag<["-"], "emit-reduced-module-interface">,
+  HelpText<"Generate reduced prebuilt module interface from a standard C++ module interface unit">;
 def emit_header_unit : Flag<["-"], "emit-header-unit">,
   HelpText<"Generate C++20 header units from header files">;
 def emit_pch : Flag<["-"], "emit-pch">,
diff --git a/clang/include/clang/Frontend/FrontendActions.h b/clang/include/clang/Frontend/FrontendActions.h
index fcce31ac0590ff..00a118d51f58a4 100644
--- a/clang/include/clang/Frontend/FrontendActions.h
+++ b/clang/include/clang/Frontend/FrontendActions.h
@@ -118,6 +118,9 @@ class GenerateModuleAction : public ASTFrontendAction {
   CreateOutputFile(CompilerInstance &CI, StringRef InFile) = 0;
 
 protected:
+  std::vector<std::unique_ptr<ASTConsumer>>
+  CreateMultiplexConsumer(CompilerInstance &CI, StringRef InFile);
+
   std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
                                                  StringRef InFile) override;
 
@@ -147,8 +150,10 @@ class GenerateModuleFromModuleMapAction : public GenerateModuleAction {
   CreateOutputFile(CompilerInstance &CI, StringRef InFile) override;
 };
 
+/// Generates fatBMI (which contains full information to generate the object
+/// files) for C++20 Named Modules.
 class GenerateModuleInterfaceAction : public GenerateModuleAction {
-private:
+protected:
   bool BeginSourceFileAction(CompilerInstance &CI) override;
 
   std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
@@ -158,6 +163,13 @@ class GenerateModuleInterfaceAction : public GenerateModuleAction {
   CreateOutputFile(CompilerInstance &CI, StringRef InFile) override;
 };
 
+/// Only generates the reduced BMI. This action is mainly used by tests.
+class GenerateThinModuleInterfaceAction : public GenerateModuleInterfaceAction {
+private:
+  std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
+                                                 StringRef InFile) override;
+};
+
 class GenerateHeaderUnitAction : public GenerateModuleAction {
 
 private:
diff --git a/clang/include/clang/Frontend/FrontendOptions.h b/clang/include/clang/Frontend/FrontendOptions.h
index 53a8681cfdbba0..2ee342154f8cbf 100644
--- a/clang/include/clang/Frontend/FrontendOptions.h
+++ b/clang/include/clang/Frontend/FrontendOptions.h
@@ -85,9 +85,13 @@ enum ActionKind {
   /// Generate pre-compiled module from a module map.
   GenerateModule,
 
-  /// Generate pre-compiled module from a C++ module interface file.
+  /// Generate pre-compiled module from a standard C++ module interface unit.
   GenerateModuleInterface,
 
+  /// Generate reduced module interface for a standard C++ module interface
+  /// unit.
+  GenerateThinModuleInterface,
+
   /// Generate a C++20 header unit module from a header file.
   GenerateHeaderUnit,
 
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index a56929ef0245ee..87cc31a72d95f2 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -166,6 +166,10 @@ class ASTWriter : public ASTDeserializationListener,
   /// Indicates that the AST contained compiler errors.
   bool ASTHasCompilerErrors = false;
 
+  /// Indicates that we're going to generate the reduced BMI for C++20
+  /// named modules.
+  bool GeneratingReducedBMI = false;
+
   /// Mapping from input file entries to the index into the
   /// offset table where information about that input file is stored.
   llvm::DenseMap<const FileEntry *, uint32_t> InputFileIDs;
@@ -582,7 +586,8 @@ class ASTWriter : public ASTDeserializationListener,
   ASTWriter(llvm::BitstreamWriter &Stream, SmallVectorImpl<char> &Buffer,
             InMemoryModuleCache &ModuleCache,
             ArrayRef<std::shared_ptr<ModuleFileExtension>> Extensions,
-            bool IncludeTimestamps = true, bool BuildingImplicitModule = false);
+            bool IncludeTimestamps = true, bool BuildingImplicitModule = false,
+            bool GeneratingReducedBMI = false);
   ~ASTWriter() override;
 
   ASTContext &getASTContext() const {
@@ -813,6 +818,13 @@ class PCHGenerator : public SemaConsumer {
   const ASTWriter &getWriter() const { return Writer; }
   SmallVectorImpl<char> &getPCH() const { return Buffer->Data; }
 
+  bool isComplete() const { return Buffer->IsComplete; }
+  PCHBuffer *getBufferPtr() { return Buffer.get(); }
+  StringRef getOutputFile() const { return OutputFile; }
+  DiagnosticsEngine &getDiagnostics() const {
+    return SemaPtr->getDiagnostics();
+  }
+
 public:
   PCHGenerator(const Preprocessor &PP, InMemoryModuleCache &ModuleCache,
                StringRef OutputFile, StringRef isysroot,
@@ -820,7 +832,8 @@ class PCHGenerator : public SemaConsumer {
                ArrayRef<std::shared_ptr<ModuleFileExtension>> Extensions,
                bool AllowASTWithErrors = false, bool IncludeTimestamps = true,
                bool BuildingImplicitModule = false,
-               bool ShouldCacheASTInMemory = false);
+               bool ShouldCacheASTInMemory = false,
+               bool GeneratingReducedBMI = false);
   ~PCHGenerator() override;
 
   void InitializeSema(Sema &S) override { SemaPtr = &S; }
@@ -830,6 +843,19 @@ class PCHGenerator : public SemaConsumer {
   bool hasEmittedPCH() const { return Buffer->IsComplete; }
 };
 
+class ReducedBMIGenerator : public PCHGenerator {
+public:
+  ReducedBMIGenerator(const Preprocessor &PP, InMemoryModuleCache &ModuleCache,
+                   StringRef OutputFile, std::shared_ptr<PCHBuffer> Buffer,
+                   bool IncludeTimestamps);
+
+  void HandleTranslationUnit(ASTContext &Ctx) override;
+};
+
+/// If the definition may impact the ABI. If yes, we're allowed to eliminate
+/// the definition of D in reduced BMI.
+bool MayDefAffectABI(const Decl *D);
+
 /// A simple helper class to pack several bits in order into (a) 32 bit
 /// integer(s).
 class BitsPacker {
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 11f3f2c2d6425c..ffaf3db5fdc076 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2567,6 +2567,7 @@ static const auto &getFrontendActionTable() {
 
       {frontend::GenerateModule, OPT_emit_module},
       {frontend::GenerateModuleInterface, OPT_emit_module_interface},
+      {frontend::GenerateThinModuleInterface, OPT_emit_reduced_module_interface},
       {frontend::GenerateHeaderUnit, OPT_emit_header_unit},
       {frontend::GeneratePCH, OPT_emit_pch},
       {frontend::GenerateInterfaceStubs, OPT_emit_interface_stubs},
@@ -4274,6 +4275,7 @@ static bool isStrictlyPreprocessorAction(frontend::ActionKind Action) {
   case frontend::FixIt:
   case frontend::GenerateModule:
   case frontend::GenerateModuleInterface:
+  case frontend::GenerateThinModuleInterface:
   case frontend::GenerateHeaderUnit:
   case frontend::GeneratePCH:
   case frontend::GenerateInterfaceStubs:
diff --git a/clang/lib/Frontend/FrontendActions.cpp b/clang/lib/Frontend/FrontendActions.cpp
index c1d6e71455365c..c9a0b6858ae8ad 100644
--- a/clang/lib/Frontend/FrontendActions.cpp
+++ b/clang/lib/Frontend/FrontendActions.cpp
@@ -184,12 +184,12 @@ bool GeneratePCHAction::BeginSourceFileAction(CompilerInstance &CI) {
   return true;
 }
 
-std::unique_ptr<ASTConsumer>
-GenerateModuleAction::CreateASTConsumer(CompilerInstance &CI,
-                                        StringRef InFile) {
+std::vector<std::unique_ptr<ASTConsumer>>
+GenerateModuleAction::CreateMultiplexConsumer(CompilerInstance &CI,
+                                              StringRef InFile) {
   std::unique_ptr<raw_pwrite_stream> OS = CreateOutputFile(CI, InFile);
   if (!OS)
-    return nullptr;
+    return {};
 
   std::string OutputFile = CI.getFrontendOpts().OutputFile;
   std::string Sysroot;
@@ -210,6 +210,17 @@ GenerateModuleAction::CreateASTConsumer(CompilerInstance &CI,
       +CI.getFrontendOpts().BuildingImplicitModule));
   Consumers.push_back(CI.getPCHContainerWriter().CreatePCHContainerGenerator(
       CI, std::string(InFile), OutputFile, std::move(OS), Buffer));
+  return std::move(Consumers);
+}
+
+std::unique_ptr<ASTConsumer>
+GenerateModuleAction::CreateASTConsumer(CompilerInstance &CI,
+                                        StringRef InFile) {
+  std::vector<std::unique_ptr<ASTConsumer>> Consumers =
+      CreateMultiplexConsumer(CI, InFile);
+  if (Consumers.empty())
+    return nullptr;
+
   return std::make_unique<MultiplexConsumer>(std::move(Consumers));
 }
 
@@ -265,7 +276,12 @@ GenerateModuleInterfaceAction::CreateASTConsumer(CompilerInstance &CI,
   CI.getHeaderSearchOpts().ModulesSkipHeaderSearchPaths = true;
   CI.getHeaderSearchOpts().ModulesSkipPragmaDiagnosticMappings = true;
 
-  return GenerateModuleAction::CreateASTConsumer(CI, InFile);
+  std::vector<std::unique_ptr<ASTConsumer>> Consumers =
+      CreateMultiplexConsumer(CI, InFile);
+  if (Consumers.empty())
+    return nullptr;
+
+  return std::make_unique<MultiplexConsumer>(std::move(Consumers));
 }
 
 std::unique_ptr<raw_pwrite_stream>
@@ -274,6 +290,15 @@ GenerateModuleInterfaceAction::CreateOutputFile(CompilerInstance &CI,
   return CI.createDefaultOutputFile(/*Binary=*/true, InFile, "pcm");
 }
 
+std::unique_ptr<ASTConsumer>
+GenerateThinModuleInterfaceAction::CreateASTConsumer(CompilerInstance &CI,
+                                                     StringRef InFile) {
+  auto Buffer = std::make_shared<PCHBuffer>();
+  return std::make_unique<ReducedBMIGenerator>(
+      CI.getPreprocessor(), CI.getModuleCache(), CI.getFrontendOpts().OutputFile, Buffer,
+      /*IncludeTimestamps=*/+CI.getFrontendOpts().IncludeTimestamps);
+}
+
 bool GenerateHeaderUnitAction::BeginSourceFileAction(CompilerInstance &CI) {
   if (!CI.getLangOpts().CPlusPlusModules) {
     CI.getDiagnostics().Report(diag::err_module_interface_requires_cpp_modules);
@@ -840,7 +865,6 @@ void DumpModuleInfoAction::ExecuteAction() {
 
   const LangOptions &LO = getCurrentASTUnit().getLangOpts();
   if (LO.CPlusPlusModules && !LO.CurrentModule.empty()) {
-
     ASTReader *R = getCurrentASTUnit().getASTReader().get();
     unsigned SubModuleCount = R->getTotalNumSubmodules();
     serialization::ModuleFile &MF = R->getModuleManager().getPrimaryModule();
diff --git a/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp b/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
index b280a1359d2f27..59f7f955db5097 100644
--- a/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
+++ b/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
@@ -65,6 +65,8 @@ CreateFrontendBaseAction(CompilerInstance &CI) {
     return std::make_unique<GenerateModuleFromModuleMapAction>();
   case GenerateModuleInterface:
     return std::make_unique<GenerateModuleInterfaceAction>();
+  case GenerateThinModuleInterface:
+    return std::make_unique<GenerateThinModuleInterfaceAction>();
   case GenerateHeaderUnit:
     return std::make_unique<GenerateHeaderUnitAction>();
   case GeneratePCH:            return std::make_unique<GeneratePCHAction>();
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 91eb2af8f8ad6a..71d1a577eefcc4 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -4595,10 +4595,12 @@ ASTWriter::ASTWriter(llvm::BitstreamWriter &Stream,
                      SmallVectorImpl<char> &Buffer,
                      InMemoryModuleCache &ModuleCache,
                      ArrayRef<std::shared_ptr<ModuleFileExtension>> Extensions,
-                     bool IncludeTimestamps, bool BuildingImplicitModule)
+                     bool IncludeTimestamps, bool BuildingImplicitModule,
+                     bool GeneratingReducedBMI)
     : Stream(Stream), Buffer(Buffer), ModuleCache(ModuleCache),
       IncludeTimestamps(IncludeTimestamps),
-      BuildingImplicitModule(BuildingImplicitModule) {
+      BuildingImplicitModule(BuildingImplicitModule),
+      GeneratingReducedBMI(GeneratingReducedBMI) {
   for (const auto &Ext : Extensions) {
     if (auto Writer = Ext->createExtensionWriter(*this))
       ModuleFileExtensionWriters.push_back(std::move(Writer));
@@ -5405,18 +5407,20 @@ void ASTWriter::WriteDeclUpdatesBlocks(RecordDataImpl &OffsetsRecord) {
 
     // Add a trailing update record, if any. These must go last because we
     // lazily load their attached statement.
-    if (HasUpdatedBody) {
-      const auto *Def = cast<FunctionDecl>(D);
-      Record.push_back(UPD_CXX_ADDED_FUNCTION_DEFINITION);
-      Record.push_back(Def->isInlined());
-      Record.AddSourceLocation(Def->getInnerLocStart());
-      Record.AddFunctionDefinition(Def);
-    } else if (HasAddedVarDefinition) {
-      const auto *VD = cast<VarDecl>(D);
-      Record.push_back(UPD_CXX_ADDED_VAR_DEFINITION);
-      Record.push_back(VD->isInline());
-      Record.push_back(VD->isInlineSpecified());
-      Record.AddVarDeclInit(VD);
+    if (!GeneratingReducedBMI || MayDefAffectABI(D)) {
+      if (HasUpdatedBody) {
+        const auto *Def = cast<FunctionDecl>(D);
+        Record.push_back(UPD_CXX_ADDED_FUNCTION_DEFINITION);
+        Record.push_back(Def->isInlined());
+        Record.AddSourceLocation(Def->getInnerLocStart());
+        Record.AddFunctionDefinition(Def);
+      } else if (HasAddedVarDefinition) {
+        const auto *VD = cast<VarDecl>(D);
+        Record.push_back(UPD_CXX_ADDED_VAR_DEFINITION);
+        Record.push_back(VD->isInline());
+        Record.push_back(VD->isInlineSpecified());
+        Record.AddVarDeclInit(VD);
+      }
     }
 
     OffsetsRecord.push_back(GetDeclRef(D));
diff --git a/clang/lib/Serialization/ASTWriterDecl.cpp b/clang/lib/Serialization/ASTWriterDecl.cpp
index 43169b2befc687..b7a1562d0422ac 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -16,6 +16,7 @@
 #include "clang/AST/DeclTemplate.h"
 #include "clang/AST/DeclVisitor.h"
 #include "clang/AST/Expr.h"
+#include "clang/AST/ODRHash.h"
 #include "clang/AST/OpenMPClause.h"
 #include "clang/AST/PrettyDeclStackTrace.h"
 #include "clang/Basic/SourceManager.h"
@@ -40,11 +41,14 @@ namespace clang {
     serialization::DeclCode Code;
     unsigned AbbrevToUse;
 
+    bool GeneratingReducedBMI = false;
+
   public:
     ASTDeclWriter(ASTWriter &Writer, ASTContext &Context,
-                  ASTWriter::RecordDataImpl &Record)
+                  ASTWriter::RecordDataImpl &Record, bool GeneratingReducedBMI)
         : Writer(Writer), Context(Context), Record(Writer, Record),
-          Code((serialization::DeclCode)0), AbbrevToUse(0) {}
+          Code((serialization::DeclCode)0), AbbrevToUse(0),
+          GeneratingReducedBMI(GeneratingReducedBMI) {}
 
     uint64_t Emit(Decl *D) {
       if (!Code)
@@ -270,6 +274,27 @@ namespace clang {
   };
 }
 
+bool clang::MayDefAffectABI(const Decl *D) {
+  if (auto *FD = dyn_cast<FunctionDecl>(D)) {
+    if (FD->isInlined() || FD->isConstexpr())
+      return true;
+
+    if (FD->isDependentContext())
+      return true;
+  }
+
+  if (auto *VD = dyn_cast<VarDecl>(D)) {
+    if (!VD->getDeclContext()->getRedeclContext()->isFileContext() ||
+        VD->isInline() || VD->isConstexpr() || isa<ParmVarDecl>(VD))
+      return true;
+
+    if (VD->getTemplateSpecializationKind() == TSK_ImplicitInstantiation)
+      return true;
+  }
+
+  return false;
+}
+
 void ASTDeclWriter::Visit(Decl *D) {
   DeclVisitor<ASTDeclWriter>::Visit(D);
 
@@ -285,9 +310,12 @@ void ASTDeclWriter::Visit(Decl *D) {
   // have been written. We want it last because we will not read it back when
   // retrieving it from the AST, we'll just lazily set the offset.
   if (auto *FD = dyn_cast<FunctionDecl>(D)) {
-    Record.push_back(FD->doesThisDeclarationHaveABody());
-    if (FD->doesThisDeclarationHaveABody())
-      Record.AddFunctionDefinition(FD);
+    if (!GeneratingReducedBMI || MayDefAffectABI(FD)) {
+      Record.push_back(FD->doesThisDeclarationHaveABody());
+      if (FD->doesThisDeclarationHaveABody())
+        Record.AddFunctionDefinition(FD);
+    } else
+      Record.push_back(0);
   }
 
   // Similar to FunctionDecls, handle VarDecl's initializer here and write it
@@ -295,7 +323,10 @@ void ASTDeclWriter::Visit(Decl *D) {
   // we have finished recursive deserialization, because it can recursively
   // refer back to the variable.
   if (auto *VD = dyn_cast<VarDecl>(D)) {
-    Record.AddVarDeclInit(VD);
+    if (!GeneratingReducedBMI || MayDefAffectABI(VD))
+      Record.AddVarDeclInit(VD);
+    else
+      Record.push_back(0);
   }
 
   // And similarly for FieldDecls. We already serialized whether there is a
@@ -2474,7 +2505,7 @@ void ASTWriter::WriteDecl(ASTContext &Context, Decl *D) {
   assert(ID >= FirstDeclID && "invalid decl ID");
 
   RecordData Record;
-  ASTDeclWriter W(*this, Context, Record);
+  ASTDeclWriter W(*this, Context, Record, GeneratingReducedBMI);
 
   // Build a record for this declaration
   W.Visit(D);
diff --git a/clang/lib/Serialization/GeneratePCH.cpp b/clang/lib/Serialization/GeneratePCH.cpp
index cf8084333811f1..c4c2ac98b3e9d5 100644
--- a/clang/lib/Serialization/GeneratePCH.cpp
+++ b/clang/lib/Serialization/GeneratePCH.cpp
@@ -12,9 +12,11 @@
 //===----------------------------------------------------------------------===//
 
 #include "clang/AST/ASTContext.h"
+#include "clang/Frontend/FrontendDiagnostic.h"
 #include "clang/Lex/HeaderSearch.h"
 #include "clang/Lex/Preprocessor.h"
 #include "clang/Sema/SemaConsumer.h"
+#include "clang/Serialization/ASTReader.h"
 #include "clang/Serialization/ASTWriter.h"
 #include "llvm/Bitstream/BitstreamWriter.h"
 
@@ -25,11 +27,12 @@ PCHGenerator::PCHGenerator(
     StringRef OutputFile, StringRef isysroot, std::shared_ptr<PCHBuffer> Buffer,
     ArrayRef<std::shared_ptr<ModuleFileExtension>> Extensions,
     bool AllowASTWithErrors, bool IncludeTimestamps,
-    bool BuildingImplicitModule, bool ShouldCacheASTInMemory)
+    bool BuildingImplicitModule, bool ShouldCacheASTInMemory,
+    bool GeneratingReducedBMI)
     : PP(PP), OutputFile(OutputFile), isysroot(isysroot.str()),
       SemaPtr(nullptr), Buffer(std::move(Buffer)), Stream(this->Buffer->Data),
       Writer(Stream, this->Buffer->Data, ModuleCache, Extensions,
-             IncludeTimestamps, BuildingImplicitModule),
+             IncludeTimestamps, BuildingImplicitModule, GeneratingReducedBMI),
       AllowASTWithErrors(AllowASTWithErrors),
       ShouldCacheASTInMemory(ShouldCacheASTInMemory) {
   this->Buffer->IsComplete = false;
@@ -78,3 +81,33 @@ ASTMutationListener *PCHGenerator::GetASTMutationListener() {
 ASTDeserializationListener *PCHGenerator::GetASTDeserializationListener() {
   return &Writer;
 }
+
+ReducedBMIGenerator::ReducedBMIGenerator(const Preprocessor &PP,
+                                   InMemoryModuleCache &ModuleCache,
+                                   StringRef OutputFile,
+                                   std::shared_ptr<PCHBuffer> Buffer,
+                                   bool IncludeTimestamps)
+    : PCHGenerator(
+          PP, ModuleCache, OutputFile, llvm::StringRef(), Buffer,
+          /*Extensions=*/ArrayRef<std::shared_ptr<ModuleFileExtension>>(),
+          /*AllowASTWithErrors*/ false, /*IncludeTimestamps=*/IncludeTimestamps,
+          /*BuildingImplicitModule=*/false, /*ShouldCacheASTInMemory=*/false,
+          /*GeneratingReducedBMI=*/true) {}
+
+void ReducedBMIGenerator::HandleTranslationUnit(ASTContext &Ctx) {
+  PCHGenerator::HandleTranslationUnit(Ctx);
+
+  if (!isComplete())
+    return;
+
+  std::error_code EC;
+  auto OS = std::make_unique<llvm::raw_fd_ostream>(getOutputFile(), EC);
+  if (EC) {
+    getDiagnostics().Report(diag::err_fe_unable_...
[truncated]

Copy link

github-actions bot commented Dec 19, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Dec 19, 2023

My impression to the feedbacks is that every one of us loves the direction, while we may need more agreement on the user interfaces.

To make it easier to review, I split all the user interfaces related part to following patches. So that the current patch won't affect users. This is almost a NFC : )

This patch introduced a CC1 option -emit-thin-module-interface and tests the behavior for almost every places we can to make sure it is stable. While the diff is big, we can find most parts of it are boiler plate, so don't be panic.

I feel the patch good and hope we can land this quickly so that we can start the work to help users actually.

@ChuanqiXu9
Copy link
Member Author

@iains @dwblaikie ping~

@iains
Copy link
Contributor

iains commented Jan 3, 2024

@ChuanqiXu9 very sorry for the slow review. It would help me if the design was described in the commit message instead of trying to deduce it from the patch (maybe it's in a thread somewhere - so a cross-reference would help).

two immediate questions and one observation:

  • I see you are using a multiplex consumer (actually, for some reason, I thought you were objecting to that part of my design); does this mean that your proposed solution can emit both the object and the reduced BMI from a single compilation job?

  • I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?

  • It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side. It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.

so something like

raw AST +======== > codegen (when required)
        |
        + =====> AST transforms ====> BMI output.

As I understand the patch you are combining the transform with the output?

@ChuanqiXu9
Copy link
Member Author

@ChuanqiXu9 very sorry for the slow review. It would help me if the design was described in the commit message instead of trying to deduce it from the patch (maybe it's in a thread somewhere - so a cross-reference would help).

hi @iains , sorry for the confusion. It may be hard to balance how detail the message it is. So it is no problem at all to ask questions as long as it need. This should be the point of the review process too.

For this patch itself, the design is to introduce a CC1 command line -emit-reduced-module-interface and the corrresponding action GenerateReducedModuleInterfaceAction. Both of them are aimed to help testing reduced BMI instead of being used by end users actually. Then it might be straightforward to get it by seeing the implementation of GenerateReducedModuleInterfaceAction.

Like I said in the commit message, this patch itself doesn't involve anything relevant to user interfaces. I left it to the latter patches.

two immediate questions and one observation:

  • I see you are using a multiplex consumer (actually, for some reason, I thought you were objecting to that part of my design);

In fact, I am not objecting your design due to multiplex consumer. I am objecting your design in implementing it in FrontendAction::CreateWrappedASTConsumer(). I think this is not the correct place to introduce language or module specific things.

does this mean that your proposed solution can emit both the object and the reduced BMI from a single compilation job?

Yes, this is the goal. I've already had a patch in the downstream to do that. But I am still wondering if I can implement it better.

  • I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?

I am not sure if I understand this. What does it mean "require a codegen back end to be instantiated to allow the reduced BMI "? Do you mean to not touch CodeGen part or to not touch the CodeGen action? My local patch touched the code gen action without touching any real CodeGen related things.

the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs)

For --precompile/-fmodule-only type jobs, I'll create another action to make it (Similar to existing GenerateModuleInterfaceAction). Then both of the actions will try to reuse the same consumer ReducedBMIGenerator to avoid repeated works.

  • It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side. It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.

Yeah, it should be less concerned. BTW, currently the simpler serializer/deserializer should be ASTRecordWriter/ASTRecordReader. And the current ASTReader/ASTWriter takes some semantical job.

so something like

raw AST +======== > codegen (when required)
        |
        + =====> AST transforms ====> BMI output.

As I understand the patch you are combining the transform with the output?

On the one hand, the current patch doesn't do that. The current patch is almost a NFC patch. It belongs to the following patch. On the other hand, the answer may be yes. Probably I did the AST transforms you said in the AST Writer. I don't feel it is so awful.


Given all of us loves reduced BMI, I suggest we can focus on current patch then discuss user interfaces related things in the next patch after this got landed.

@iains
Copy link
Contributor

iains commented Jan 4, 2024

Like I said in the commit message, this patch itself doesn't involve anything relevant to user interfaces. I left it to the latter patches.

Are you in a position to post the next patch (at least as a draft)? That would help me see the direction.

  • I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?

I am not sure if I understand this. What does it mean "require a codegen back end to be instantiated to allow the reduced BMI "? Do you mean to not touch CodeGen part or to not touch the CodeGen action? My local patch touched the code gen action without touching any real CodeGen related things.

the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs)

For --precompile/-fmodule-only type jobs, I'll create another action to make it (Similar to existing GenerateModuleInterfaceAction). Then both of the actions will try to reuse the same consumer ReducedBMIGenerator to avoid repeated works.

OK, that answers my concern (which was that we might have to add the code-gen backend to a --precompile if that was the mechanism used to do the BMI reduction).

  • It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side. It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.

Yeah, it should be less concerned. BTW, currently the simpler serializer/deserializer should be ASTRecordWriter/ASTRecordReader. And the current ASTReader/ASTWriter takes some semantical job.

... and, on the reader side, that already gives us some big problems (as I say, I am less concerned on the writer side, but who can see the whole future?).

so something like

raw AST +======== > codegen (when required)
        |
        + =====> AST transforms ====> BMI output.

As I understand the patch you are combining the transform with the output?

On the one hand, the current patch doesn't do that. The current patch is almost a NFC patch. It belongs to the following patch. On the other hand, the answer may be yes. Probably I did the AST transforms you said in the AST Writer. I don't feel it is so awful.

Maybe not for the short-term relatively simple tasks - but we should also take a view on the medium and longer term (for example, GMF decl elision is likely to be helpful to users in reducing both the size of the BMI and the number of decls that need merging on input).

We need the AST in this path to be mutable (including removal of decls); that way each transform can be maintained as a separate entity - I think that if we end up doing "many transforms" as part of the output, it will become very confusing.

(although, to be clear, in the short-term we might agree to do the work in the output - I really do think it would be bad to make that the long term mechanism).

Given all of us loves reduced BMI, I suggest we can fosus on current patch then discuss user interfaces related things in the next patch after this got landed.

We do all want to produce the reduced BMI, I agree; but we also always have limited resources to do the work, so that it would be good to try and pick an implementation that will be smooth for the future work too.

I understand the purpose of the current patch better now - and will try to take a more detailed look over the next few days - as noted above, it would help very much to have a preview of the next patch in the series.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this pull request Jan 4, 2024
This is draft for the user interfaces for
llvm#75894 and required by @iains
to get a feeling about the proposed future direction.

Note that this is still in an early stage and nothing is decided.
@ChuanqiXu9
Copy link
Member Author

Like I said in the commit message, this patch itself doesn't involve anything relevant to user interfaces. I left it to the latter patches.

Are you in a position to post the next patch (at least as a draft)? That would help me see the direction.

I post it here ChuanqiXu9@efcc7e8 since I didn't get how to send stacked pr in github yet. But I guess it might be sufficient since it is really in an early phase.

  • I was concerned from earlier conversations that this design might require a codegen back end to be instantiated to allow the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs). Any comments?

I am not sure if I understand this. What does it mean "require a codegen back end to be instantiated to allow the reduced BMI "? Do you mean to not touch CodeGen part or to not touch the CodeGen action? My local patch touched the code gen action without touching any real CodeGen related things.

the reduced BMI (which would be bad for --precompile/-fmodule-only type jobs)

For --precompile/-fmodule-only type jobs, I'll create another action to make it (Similar to existing GenerateModuleInterfaceAction). Then both of the actions will try to reuse the same consumer ReducedBMIGenerator to avoid repeated works.

OK, that answers my concern (which was that we might have to add the code-gen backend to a --precompile if that was the mechanism used to do the BMI reduction).

  • It would be better to avoid introducing more layering violations but, as we discussed in face-to-face meetings, I have less concern on the output side. It still seems to me that the best model is one where we have AST transforms (that very likely need Sema to be correct) and then the serializer is a simple as possible.

Yeah, it should be less concerned. BTW, currently the simpler serializer/deserializer should be ASTRecordWriter/ASTRecordReader. And the current ASTReader/ASTWriter takes some semantical job.

... and, on the reader side, that already gives us some big problems (as I say, I am less concerned on the writer side, but who can see the whole future?).

I guess you're referring the process how we decide a declaration is visible?

so something like

raw AST +======== > codegen (when required)
        |
        + =====> AST transforms ====> BMI output.

As I understand the patch you are combining the transform with the output?

On the one hand, the current patch doesn't do that. The current patch is almost a NFC patch. It belongs to the following patch. On the other hand, the answer may be yes. Probably I did the AST transforms you said in the AST Writer. I don't feel it is so awful.

Maybe not for the short-term relatively simple tasks - but we should also take a view on the medium and longer term (for example, GMF decl elision is likely to be helpful to users in reducing both the size of the BMI and the number of decls that need merging on input).

For GMF decl elision, I posted a patch to implement it in ASTWriter and I reposted it in #76930. The big problem is that this is not formal. (Just for sharing, I am not proposing this now)

We need the AST in this path to be mutable (including removal of decls); that way each transform can be maintained as a separate entity - I think that if we end up doing "many transforms" as part of the output, it will become very confusing.

While the model sounds good, I am pessimistic for making it correctly, completely, and efficiently.

(although, to be clear, in the short-term we might agree to do the work in the output - I really do think it would be bad to make that the long term mechanism).

Not against, I just think it is not so bad. There already many optimizations in the current serializations.

Given all of us loves reduced BMI, I suggest we can fosus on current patch then discuss user interfaces related things in the next patch after this got landed.

We do all want to produce the reduced BMI, I agree; but we also always have limited resources to do the work, so that it would be good to try and pick an implementation that will be smooth for the future work too.

Understood. I just think it won't be too bad. Or it is not easy for us to get a much better solution in the resources we have. I prefer the style that don't make perfect the enemy of better.

I understand the purpose of the current patch better now - and will try to take a more detailed look over the next few days

I don't intend to land this in clang18. So we don't need to be hurry.

  • as noted above, it would help very much to have a preview of the next patch in the series.

Sent. But I just feel it is not so helpful for reviewing this patch : )

@ChuanqiXu9
Copy link
Member Author

@iains would you like to take a look on this. I feel OK to not land this in 18 but I think it is really important to land this in 19. Although it is still early to that, there are other works to do (e.g., the next patch and more testing). So it may be better to start it earlier.

@ChuanqiXu9
Copy link
Member Author

@mizvekov would you like to take a look at this? This is related (in some level) to what you say #79959. In short, in the direction, in the one-phase compilation, the pcm won't be compiled into object files. (But it still will in the two-phase compilation).

@ChuanqiXu9
Copy link
Member Author

@iains @mizvekov ping~

@iains
Copy link
Contributor

iains commented Mar 5, 2024

  • I do not want to block progress, so let's move forward with this patch for now.

  • It seems to me (as we found with GMF decl elision) that the process is quite a bit more complex than simply omitting a decl. We need to elide other decls that are then unused (e.g. decls local to an elided function body) and also avoid emitting types that are now no longer existent.

The process seems to me to be either one of:

  • rewriting the AST (which is why my patch set picked the use of the plugin API since that is the purpose there).
  • walking the AST and marking entities as used / not used / elided.

It still feels to me to be better to have clear separation of this work from the work of streaming - but if we can make clear layers within the streaming, then maybe the maintenance will not be too hard.

@ChuanqiXu9
Copy link
Member Author

  • I do not want to block progress, so let's move forward with this patch for now.

Yeah. Great to see we have some progress finally. I think this is really important since I see more and more peope complaninig the performance for modules. I feel this series patch is key to to improve the user's impression that named modules are just purely wrappers for PCH.

  • It seems to me (as we found with GMF decl elision) that the process is quite a bit more complex than simply omitting a decl. We need to elide other decls that are then unused (e.g. decls local to an elided function body) and also avoid emitting types that are now no longer existent.

After looking into the codes more, I changed my mind for this. I feel it is the most efficient and the most natural way to omit declarations in ASTWriter.

The idea is, previously, we'll start to write declarations in the current TU from the first declarations. And in the C++20 named modules world, we would start to write declarations from the first declarations in the global module fragment. And we can implement GMF decl elision naturally by writing declarations from the first declaration in the module purview and only write the referenced decl from the GMF during the writing process. This is super natural and efficient.

The process seems to me to be either one of:

  • rewriting the AST (which is why my patch set picked the use of the plugin API since that is the purpose there).
  • walking the AST and marking entities as used / not used / elided.

Now I doubt both of the method to be not efficiency and it adds additional burdens to the developers and the users.

It still feels to me to be better to have clear separation of this work from the work of streaming - but if we can make clear layers within the streaming, then maybe the maintenance will not be too hard.

I think the maintainance may not be too hard in that way. ASTWriter is not such a devil :) Probably we need some refactoration in the serialization to make codes more clear. But the point is that we must get the ASTWriter/ASTReader involved. It may not be a good idea to leave the ASTWriter/ASTReader as is and add another layer...

For example, it should be better to not deserialize the declaration from the very beginning instead of deserializing the declaration and judge that it is not wanted/visible. And to achieve this, we must touch the serialization layer deeply.

@iains
Copy link
Contributor

iains commented Mar 5, 2024

Do you expect to make any changes to type streaming?

@ChuanqiXu9
Copy link
Member Author

Do you expect to make any changes to type streaming?

I don't expect to do that explicitly. The number of types deserialized can be decreased naturally after we avoid emitting declarations during the writing.

@ChuanqiXu9
Copy link
Member Author

I am going to land this in the week later if no objections come in. I think it is necessary to land the series of patches (to reduce the contents of BMI) for clang19. And of course, the functionality will be opt in for one~two releases for experimental.

@iains
Copy link
Contributor

iains commented Mar 6, 2024

I have no further comments so LGTM if there are no objections from the other reviewers this week.

@cor3ntin
Copy link
Contributor

cor3ntin commented Mar 6, 2024

Can we add a release note and documentation for this?
Thanks!

@ChuanqiXu9
Copy link
Member Author

Can we add a release note and documentation for this? Thanks!

The current patch is transparent to users and it is only part of the series patches. I'd like to document that after I made the series of patches.

Close llvm#71034

See
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755

This patch introduces reduced BMI, which doesn't contain the definitions of
functions and variables if its definitions won't contribute to the ABI.

Testing is a big part of the patch. We want to make sure the reduced BMI
contains the same behavior with the existing and relatively stable
fatBMI. This is pretty helpful for further reduction.

The user interfaces part it left to following patches to ease the
reviewing.
@ChuanqiXu9 ChuanqiXu9 merged commit da00c60 into llvm:main Mar 8, 2024
3 of 4 checks passed
ChuanqiXu9 added a commit that referenced this pull request Apr 15, 2024
This is the driver part of
#75894.

This patch introduces '-fexperimental-modules-reduced-bmi' to enable
generating the reduced BMI.

This patch did:
- When `-fexperimental-modules-reduced-bmi` is specified but
`--precompile` is not specified for a module unit, we'll skip the
precompile phase to avoid unnecessary two-phase compilation phases. Then
if `-c` is specified, we will generate the reduced BMI in CodeGenAction
as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified and
`--precompile` is specified, we will generate the reduced BMI in
GenerateModuleInterfaceAction as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified for a
non-module unit. We don't do anything nor try to give a warn. This is
more user friendly so that the end users can try to test and experiment
with the feature without asking help from the build systems.

The core design idea is that users should be able to enable this easily
with the existing cmake mechanisms.

The future plan for the flag is:
- Add this to clang19 and make it opt-in for 1~2 releases. It depends on
the testing feedback to decide how long we like to make it opt-in.
- Then we can announce the existing BMI generating may be deprecated and
suggesting people (end users or build systems) to enable this for 1~2
releases.
- Finally we will enable this by default. When that time comes, the term
`BMI` will refer to the reduced BMI today and the existing BMI will only
be meaningful to build systems which loves to support two phase
compilations.

I'll send release notes and document in seperate commits after this get
landed.
bazuzi pushed a commit to bazuzi/llvm-project that referenced this pull request Apr 15, 2024
…85050)

This is the driver part of
llvm#75894.

This patch introduces '-fexperimental-modules-reduced-bmi' to enable
generating the reduced BMI.

This patch did:
- When `-fexperimental-modules-reduced-bmi` is specified but
`--precompile` is not specified for a module unit, we'll skip the
precompile phase to avoid unnecessary two-phase compilation phases. Then
if `-c` is specified, we will generate the reduced BMI in CodeGenAction
as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified and
`--precompile` is specified, we will generate the reduced BMI in
GenerateModuleInterfaceAction as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified for a
non-module unit. We don't do anything nor try to give a warn. This is
more user friendly so that the end users can try to test and experiment
with the feature without asking help from the build systems.

The core design idea is that users should be able to enable this easily
with the existing cmake mechanisms.

The future plan for the flag is:
- Add this to clang19 and make it opt-in for 1~2 releases. It depends on
the testing feedback to decide how long we like to make it opt-in.
- Then we can announce the existing BMI generating may be deprecated and
suggesting people (end users or build systems) to enable this for 1~2
releases.
- Finally we will enable this by default. When that time comes, the term
`BMI` will refer to the reduced BMI today and the existing BMI will only
be meaningful to build systems which loves to support two phase
compilations.

I'll send release notes and document in seperate commits after this get
landed.
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 15, 2024
…85050)

This is the driver part of
llvm#75894.

This patch introduces '-fexperimental-modules-reduced-bmi' to enable
generating the reduced BMI.

This patch did:
- When `-fexperimental-modules-reduced-bmi` is specified but
`--precompile` is not specified for a module unit, we'll skip the
precompile phase to avoid unnecessary two-phase compilation phases. Then
if `-c` is specified, we will generate the reduced BMI in CodeGenAction
as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified and
`--precompile` is specified, we will generate the reduced BMI in
GenerateModuleInterfaceAction as a by-product.
- When `-fexperimental-modules-reduced-bmi` is specified for a
non-module unit. We don't do anything nor try to give a warn. This is
more user friendly so that the end users can try to test and experiment
with the feature without asking help from the build systems.

The core design idea is that users should be able to enable this easily
with the existing cmake mechanisms.

The future plan for the flag is:
- Add this to clang19 and make it opt-in for 1~2 releases. It depends on
the testing feedback to decide how long we like to make it opt-in.
- Then we can announce the existing BMI generating may be deprecated and
suggesting people (end users or build systems) to enable this for 1~2
releases.
- Finally we will enable this by default. When that time comes, the term
`BMI` will refer to the reduced BMI today and the existing BMI will only
be meaningful to build systems which loves to support two phase
compilations.

I'll send release notes and document in seperate commits after this get
landed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++20] [Modules] Try to introduce thin BMI to exclude the things not necessary in an interface
4 participants