-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Modules] Add -cc1 -flate-module-map-file to load module maps after PCMs #88893
base: main
Are you sure you want to change the base?
Conversation
If the same module map is passed to multiple compilation actions that build PCMs and later load them, we currently create a new FileID for it every time a PCM gets built. This is not very problematic in terms of performance, but it causes us to waste source location space when those PCMs get loaded together: the module map will take source location space in each of the PCMs, effectively multiplying the space it takes by the number of PCMs that we load. PR llvm#83660 has inadvertenly caused more PCMs to be loaded in our internal builds and some actions tipped over the source location limit. To provide a workaround, introduce an option to load module maps after PCMs and ensure we reuse existing FileID if it's available. This allows to reduce the source location space taken by module maps because we will use an existing FileID from some loaded PCM instead of creating a new one each time. There are probably some holes in the approach for more complicated uses. In particular: - It is unclear how the module map shadowing rules are affected. - Code replaying AST files is not aware of the new option and will keep using `fmodule-map-file`, it is unclear whether this acutally causes any issues - ModuleDepCollector does not distinguish betwee late and non-late module maps. The new flag is `-cc1` for now to allow providing a quick workaround for our builds in the wake of llvm#83660 without figuring out all the details. I plan to start a follow up discussion on Discourse to see if this flag should be promoted to Clang flag or if there are alternative solutions that avoid wasting the source location space in duplicate module maps.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules Author: Ilya Biryukov (ilya-biryukov) ChangesIf the same module map is passed to multiple compilation actions that build PCMs and later load them, we currently create a new FileID for it every time a PCM gets built. This is not very problematic in terms of performance, but it causes us to waste source location space when those PCMs get loaded together: the module map will take source location space in each of the PCMs, effectively multiplying the space it takes by the number of PCMs that we load. PR #83660 has inadvertenly caused more PCMs to be loaded in our internal builds and some actions tipped over the source location limit. To provide a workaround, introduce an option to load module maps after PCMs and ensure we reuse existing FileID if it's available. This allows to reduce the source location space taken by module maps because we will use an existing FileID from some loaded PCM instead of creating a new one each time. There are probably some holes in the approach for more complicated uses. In particular:
The new flag is Full diff: https://github.com/llvm/llvm-project/pull/88893.diff 7 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index e24626913add76..d3fdc054076751 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3179,6 +3179,12 @@ def fmodule_map_file : Joined<["-"], "fmodule-map-file=">,
MetaVarName<"<file>">,
HelpText<"Load this module map file">,
MarshallingInfoStringVector<FrontendOpts<"ModuleMapFiles">>;
+def flate_module_map_file : Joined<["-"], "flate-module-map-file=">,
+ Group<f_Group>,
+ Visibility<[CC1Option]>,
+ MetaVarName<"<file>">,
+ HelpText<"Load this module map file after all modules">,
+ MarshallingInfoStringVector<FrontendOpts<"LateModuleMapFiles">>;
def fmodule_file : Joined<["-"], "fmodule-file=">,
Group<i_Group>,
Visibility<[ClangOption, CC1Option, CLOption]>,
diff --git a/clang/include/clang/Frontend/FrontendOptions.h b/clang/include/clang/Frontend/FrontendOptions.h
index a738c1f3757682..cc45e68899c7e8 100644
--- a/clang/include/clang/Frontend/FrontendOptions.h
+++ b/clang/include/clang/Frontend/FrontendOptions.h
@@ -536,6 +536,9 @@ class FrontendOptions {
/// The list of module map files to load before processing the input.
std::vector<std::string> ModuleMapFiles;
+ /// \brief The list of module map files to load after ModuleFile.
+ std::vector<std::string> LateModuleMapFiles;
+
/// The list of additional prebuilt module files to load before
/// processing the input.
std::vector<std::string> ModuleFiles;
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index b7c9967316f0b8..14745584c6ccaa 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -919,14 +919,17 @@ bool FrontendAction::BeginSourceFile(CompilerInstance &CI,
if (!BeginSourceFileAction(CI))
return false;
- // If we were asked to load any module map files, do so now.
- for (const auto &Filename : CI.getFrontendOpts().ModuleMapFiles) {
+ auto LoadModuleMap = [&](StringRef Filename) {
if (auto File = CI.getFileManager().getOptionalFileRef(Filename))
CI.getPreprocessor().getHeaderSearchInfo().loadModuleMapFile(
*File, /*IsSystem*/false);
else
CI.getDiagnostics().Report(diag::err_module_map_not_found) << Filename;
- }
+ };
+ // If we were asked to load any module map files, do so now.
+ for (StringRef Filename : CI.getFrontendOpts().ModuleMapFiles)
+ LoadModuleMap(Filename);
+ // Note that late module maps will be loaded after modules.
// If compiling implementation of a module, load its module map file now.
(void)CI.getPreprocessor().getCurrentModuleImplementation();
@@ -1038,6 +1041,11 @@ bool FrontendAction::BeginSourceFile(CompilerInstance &CI,
diag::warn_eagerly_load_for_standard_cplusplus_modules);
}
+ // Some module maps are loaded after modules to allow avoiding redundant
+ // processing if they were processed by modules.
+ for (StringRef Filename : CI.getFrontendOpts().LateModuleMapFiles)
+ LoadModuleMap(Filename);
+
// If there is a layout overrides file, attach an external AST source that
// provides the layouts from that file.
if (!CI.getFrontendOpts().OverrideRecordLayoutsFile.empty() &&
diff --git a/clang/lib/Frontend/Rewrite/FrontendActions.cpp b/clang/lib/Frontend/Rewrite/FrontendActions.cpp
index cf5a9437e89e6c..f6e7aea06535de 100644
--- a/clang/lib/Frontend/Rewrite/FrontendActions.cpp
+++ b/clang/lib/Frontend/Rewrite/FrontendActions.cpp
@@ -255,6 +255,7 @@ class RewriteIncludesAction::RewriteImportsListener : public ASTReaderListener {
Filename, InputKind(Language::Unknown, InputKind::Precompiled));
Instance.getFrontendOpts().ModuleFiles.clear();
Instance.getFrontendOpts().ModuleMapFiles.clear();
+ Instance.getFrontendOpts().LateModuleMapFiles.clear();
// Don't recursively rewrite imports. We handle them all at the top level.
Instance.getPreprocessorOutputOpts().RewriteImports = false;
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index eed7eca2e73562..232420b58f6004 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -3130,7 +3130,10 @@ bool ModuleMap::parseModuleMapFile(FileEntryRef File, bool IsSystem,
if (ID.isInvalid()) {
auto FileCharacter =
IsSystem ? SrcMgr::C_System_ModuleMap : SrcMgr::C_User_ModuleMap;
- ID = SourceMgr.createFileID(File, ExternModuleLoc, FileCharacter);
+ if (ExternModuleLoc.isValid())
+ ID = SourceMgr.createFileID(File, ExternModuleLoc, FileCharacter);
+ else // avoid creating redundant FileIDs, they take sloc space in PCMs.
+ ID = SourceMgr.getOrCreateFileID(File, FileCharacter);
}
assert(Target && "Missing target information");
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index e19f19b2528c15..a45a8e5267389c 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -242,6 +242,8 @@ ModuleDepCollector::getInvocationAdjustedForModuleBuildWithoutOutputs(
// Remove directly passed modulemap files. They will get added back if they
// were actually used.
CI.getMutFrontendOpts().ModuleMapFiles.clear();
+ // Late module maps can only be readded into ModuleMapFiles.
+ CI.getMutFrontendOpts().LateModuleMapFiles.clear();
auto DepModuleMapFiles = collectModuleMapFiles(Deps.ClangModuleDeps);
for (StringRef ModuleMapFile : Deps.ModuleMapFileDeps) {
diff --git a/clang/test/Modules/late-module-maps.cpp b/clang/test/Modules/late-module-maps.cpp
new file mode 100644
index 00000000000000..2ca4af3ae203a0
--- /dev/null
+++ b/clang/test/Modules/late-module-maps.cpp
@@ -0,0 +1,93 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// We need to check we don't waste source location space for the same file
+// (i.e. base.modulemap) when it's passed to multiple PCM file.
+//
+// *** First, try to use normal module map files.
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap \
+// RUN: -fmodule-name=a -xc++ -emit-module -o %t/a.pcm %t/a.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap -fmodule-map-file=%t/b.modulemap \
+// RUN: -fmodule-file=%t/a.pcm \
+// RUN: -fmodule-name=b -xc++ -emit-module -o %t/b.pcm %t/b.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -fmodule-map-file=%t/base.modulemap -fmodule-map-file=%t/a.modulemap -fmodule-map-file=%t/b.modulemap \
+// RUN: -fmodule-file=%t/a.pcm -fmodule-file=%t/b.pcm \
+// RUN: -fsyntax-only -print-stats %t/use.cpp 2>&1 \
+// RUN: | FileCheck %t/use.cpp
+
+// *** Switch to -flate-module-map-file and check it produces less loaded SLO entries.
+// RUN: rm %t/*.pcm
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap \
+// RUN: -fmodule-name=a -xc++ -emit-module -o %t/a.pcm %t/a.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap -flate-module-map-file=%t/b.modulemap \
+// RUN: -fmodule-file=%t/a.pcm \
+// RUN: -fmodule-name=b -xc++ -emit-module -o %t/b.pcm %t/b.modulemap
+
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules \
+// RUN: -flate-module-map-file=%t/base.modulemap -flate-module-map-file=%t/a.modulemap -flate-module-map-file=%t/b.modulemap \
+// RUN: -fmodule-file=%t/a.pcm -fmodule-file=%t/b.pcm \
+// RUN: -fsyntax-only -print-stats %t/use.cpp 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-LATE %t/use.cpp
+
+//--- use.cpp
+// This is a very shaky check, it would be nice if it was more directly looking at which files were loaded.
+// We load 2 SLocEntries less with flate-module-map-file, they correspond to savings from base.modulemap and a.modulemap
+// reusing the FileID in a.pcm and b.pcm.
+// We also have 3 less local SLocEntries, because we get to reuse FileIDs all module maps from PCMs.
+//
+// CHECK: Source Manager Stats:
+// CHECK: 7 local SLocEntries
+// CHECK: 13 loaded SLocEntries
+
+// CHECK-LATE: Source Manager Stats:
+// CHECK-LATE: 4 local SLocEntries
+// CHECK-LATE: 11 loaded SLocEntries
+#include "a.h"
+#include "b.h"
+#include "assert.h"
+
+int main() {
+ return a() + b();
+}
+
+//--- base.modulemap
+module "base" {
+ textual header "assert.h"
+}
+
+//--- a.modulemap
+module "a" {
+ header "a.h"
+ use "base"
+}
+
+//--- b.modulemap
+module "b" {
+ header "b.h"
+ use "a"
+ use "base"
+}
+
+
+//--- assert.h
+#define ASSERT
+
+//--- a.h
+#include "assert.h"
+inline int a() { return 1; }
+
+//--- b.h
+#include "a.h"
+#include "assert.h"
+
+inline int b() { return a() + 1; }
+
|
Have you tried changing the behavior of the existing |
|
I did not do it because I was sure this will break the "module shadowing" rules after reading the code. |
There's quite a few failures: The logic that sets up module structure from module maps before starting to load PCMs definitely needs to be reworked (and I'm not sure if it can be in the first place). I believe we get away with it internally because we always pass either a top-level PCM or a top-level module map for every top-level dependency of our compilation. Therefore, our loading of PCMs ends up being self-contained, which is not always the case. This probably can't be fixed. Btw, #89005 might make this change unnecessary for us as it should also unbreak our existing builds. |
I wanted to bring this up again. We were hitting more source location exhaustion bugs internally recently and landing this would really help us cope with them. @AaronBallman I know we were planning to do the move to 64bit source locations in not-very-distant future, so I wanted to CC you on this patch in case you have opinions. I would appreciate more feedback from the reviewers already in the list, but also adding my colleague @hokein to get some more eyes on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I just missed it initially.)
I took a quick look and the change looks at least not bad to me.
But I didn't have experience with clang modules and I don't have an idea for the concerns you list.
I don't mind accepting this if you want to land this. I feel it won't affect users. But let's give some time for discussing.
@ChuanqiXu9 thanks for taking a look! Makes sense to give others a chance to chime in, I will wait for more comments. |
To better understand the motivation, I have two questions:
This patch looks good to me, but I think it'd be worth pursuing making this the default behavior, i.e. investigating the test failures and at least understanding what exactly is going on. |
It's possible, but much more complicated to do at build system level and it would be much easier to solve this in the compiler. I want to be very clear that we also commit to support this flag while we are using it, and to remove it when we stop doing so. So if it causes any issues, we're there to help and take on the maintenance burden. The build system is Bazel, the code is available upstream too, btw. Although the specific code that runs modules is not enabled by default, and, AFAIK, Google internally is the only user of that. Definitely the only user that cares about the scale that causes these issues.
The biggest concern that we have for 64bit source location is performance costs. We expect some of our compile actions to time out or run out of memory after the change (because the added overhead is significant), so we would like to budget for it and make sure we get a chance to test it in advance of the upstream defaults actually changing. All of that takes time, so this change is intended to give us a way out the corner that we're in until 64 bit source locations are in.
I can take another look, but my vague memory from the investigation I made last time I checked, is that there were some module map shadowing semantics that we could not support unless we read the module map before loading the PCM. We do not use that internally and rely on the fact that each module name is uniquely identified by its module map, hence we can use the flag without giving it too much thought. I will need to refresh my memory, but I believe the conclusion I had at the time was that this shadowing is fundamentally incompatible with loading the module maps later than PCMs, because it affects which PCMs need to loaded. (The conclusions are a little rushed, I may misremember, will take another look next week). That was another reason why I felt that having this as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! This LGTM, but I agree with the sentiment that I'd be nice to have this as the default behavior. Any effort towards that would be appreciated!
If the same module map is passed to multiple compilation actions that build PCMs and later load them, we currently create a new FileID for it every time a PCM gets built.
This is not very problematic in terms of performance, but it causes us to waste source location space when those PCMs get loaded together: the module map will take source location space in each of the PCMs, effectively multiplying the space it takes by the number of PCMs that we load.
E.g. PR #83660 has inadvertenly caused more PCMs to be loaded in our internal builds and some actions tipped over the source location limit, which led us to discovering this issue.
To provide a workaround, this commit introduces a new command-line flag to load module maps after PCMs and ensure we reuse existing FileID if it's available. This allows to reduce the source location space taken by module maps because we will use an existing FileID from some loaded PCM instead of creating a new one each time.
There are probably some holes in the approach for more complicated uses. In particular:
fmodule-map-file
, it is unclear whether this actually causes any issuesThe new flag is
-cc1
for now to allow providing a quick workaround to relieve pressure from our builds. In the long term, we should ideally look for ways to remove the flag and change the default behavior to avoid duplicated allocations of the source location space for the same module map file.