Commit
Module::DefinitionLoc
This is a prep patch for avoiding the quadratic number of calls to `HeaderSearch::lookupModule()` in `ASTReader` for each (transitively) loaded PCM file. (Specifically in the context of `clang-scan-deps`). This patch explicitly serializes `Module::DefinitionLoc` so that we can stop relying on it being filled by the module map parser. This change also required change to the module map parser, where we used the absence of `DefinitionLoc` to determine whether a file came from a PCM file. We also need to make sure we consider the "containing" module map affecting when writing a PCM, so that it's not stripped during serialization, which ensures `DefinitionLoc` still ends up pointing to the correct offset. This is intended to be a NFC change. Reviewed By: benlangmuir Differential Revision: https://reviews.llvm.org/D150292
- Loading branch information
There are no files selected for viewing
1 comment
on commit abcf7ce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this caused a large compile time regression for our module builds (seeing +100% wall, +20% CPU).
The CPU profiles show a new hot path collectNonAffectingInputFiles
-> GetAffectingModuleMaps
-> ProcessModuleOnce
-> ForIncludeChain
-> SourceManager::translateFile
, which accounts for 20% CPU and was previously tiny (<0.5%).
At the bottom of that there are some calls to SourceManager::getLoadedSlocEntry
, so I guess the extra wall time is IO loading SLocEntrys from imported modules.
I'm trying to understand exactly why the regression is so large. translateFile
is an expensive function (linear search!) so calling it in a loop could explain it. However we should "only" be calling it twice as often as before this patch, unless the vast majority of our modules are inferred, which I don't think is the case.
Anyway, I'm going to do some more investigation over the next few days, just wanted to let you know in case you had ideas.
(Sorry to bring this up late - we've got compiler-release benchmarks but I think none for builds of large modules...)
We're potentially doing a lot of repeated work here, if we visit the same module map multiple times in
ForIncludeChain
. We should change this lambda to returnModuleMaps.insert(F).second
, and makeForIncludeChain
stop if it returnsfalse
. Though I'm not sure that's related to the high performance cost of this change.