-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][CodeGen] Omit pre-opt link when post-opt is link requested #85672
Conversation
Currently, when the -relink-builtin-bitcodes-postop option is used we link builtin bitcodes twice: once before optimization, and again after optimization. With this change, we omit the pre-opt linking when the option is set, and we rename the option to the following: -link-builtin-bitcodes-postopt
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang-codegen Author: Jacob Lambert (lamb-j) ChangesCurrently, when the -relink-builtin-bitcodes-postop option is used we link builtin bitcodes twice: once before optimization, and again after optimization. With this change, we omit the pre-opt linking when the option is set, and we rename the option to the following: -link-builtin-bitcodes-postopt The goal of this change is to reduce compile time. We do lose the benefits of pre-opt linking, but we may be able to address this in a future patch by adjusting the position of the builtin-bitcode linking pass. Compilations not setting the relink (link) option are unaffected Full diff: https://github.com/llvm/llvm-project/pull/85672.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 82b30b8d815629..c5571eb15ce3a9 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -112,9 +112,9 @@ static cl::opt<bool> ClSanitizeOnOptimizerEarlyEP(
extern cl::opt<InstrProfCorrelator::ProfCorrelatorKind> ProfileCorrelate;
// Re-link builtin bitcodes after optimization
-cl::opt<bool> ClRelinkBuiltinBitcodePostop(
- "relink-builtin-bitcode-postop", cl::Optional,
- cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));
+cl::opt<bool> ClLinkBuiltinBitcodePostopt(
+ "link-builtin-bitcode-postopt", cl::Optional,
+ cl::desc("Link builtin bitcodes after optimization."), cl::init(false));
} // namespace llvm
namespace {
@@ -1051,11 +1051,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
}
}
- // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode option
- // Some optimizations may generate new function calls that would not have
- // been linked pre-optimization (i.e. fused sincos calls generated by
- // AMDGPULibCalls::fold_sincos.)
- if (ClRelinkBuiltinBitcodePostop)
+ // Link against bitcodes supplied via the -mlink-builtin-bitcode option
+ if (ClLinkBuiltinBitcodePostopt)
MPM.addPass(LinkInModulesPass(BC, false));
// Add a verifier pass if requested. We don't have to do this if the action
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index bb9aaba025fa59..51f54d178672d1 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -58,7 +58,7 @@ using namespace llvm;
#define DEBUG_TYPE "codegenaction"
namespace llvm {
-extern cl::opt<bool> ClRelinkBuiltinBitcodePostop;
+extern cl::opt<bool> ClLinkBuiltinBitcodePostopt;
}
namespace clang {
@@ -359,7 +359,7 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
}
// Link each LinkModule into our module.
- if (LinkInModules(getModule()))
+ if (!LinkBuiltinBitcodesPostopt && LinkInModules(getModule()))
return;
for (auto &F : getModule()->functions()) {
@@ -1213,7 +1213,7 @@ void CodeGenAction::ExecuteAction() {
std::move(LinkModules), *VMContext, nullptr);
// Link in each pending link module.
- if (Result.LinkInModules(&*TheModule))
+ if (!LinkBuiltinBitcodesPostopt && Result.LinkInModules(&*TheModule))
return;
// PR44896: Force DiscardValueNames as false. DiscardValueNames cannot be
diff --git a/clang/lib/CodeGen/LinkInModulesPass.cpp b/clang/lib/CodeGen/LinkInModulesPass.cpp
index 929539cc8f3346..ce1c0ebe046a61 100644
--- a/clang/lib/CodeGen/LinkInModulesPass.cpp
+++ b/clang/lib/CodeGen/LinkInModulesPass.cpp
@@ -28,10 +28,6 @@ PreservedAnalyses LinkInModulesPass::run(Module &M, ModuleAnalysisManager &AM) {
if (!BC)
return PreservedAnalyses::all();
- // Re-load bitcode modules from files
- if (BC->ReloadModules(&M))
- report_fatal_error("Bitcode module re-loading failed, aborted!");
-
if (BC->LinkInModules(&M, ShouldLinkFiles))
report_fatal_error("Bitcode module re-linking failed, aborted!");
|
This seems like it could have a significant hit on performance. Have the runtime performance effects been measured? |
This means that there won't be any optimizations on these definitions, correct? Likely not ideal to have no inlining even if it saves compilation time. This "post-op" linking is only required because we emit calls to functions that don't exist in the module. The way we solved this in OpenMP is by always providing the library at the link step and making the optimization passes not emit new calls if we are in "post-link LTO" as determined by module flags. We could theoretically just force all AMDGPU compilation to go through the LTO pass, something like |
@b-sumner I just got some initial results back from CQE related to this change. There are some small performance hits (1-4%) for some applications, but as a whole this improves performance across the OpenCL apps tested. (See comment below for possible explanation)
@jhuber6 In the context in which this option will be used (CLR compilation), we do perform a distinct codegen call after generating the output bitcodes where this option is used and the device libraries are linked in. So in a sense, we are deferring some optimization of the device libraries to that second call. From our offline discussions, the consensus seems to be that a better long-term strategy is needed for linking in device libraries, preferably something that doesn't rely on -mlink-builtin-bitcode, and that there is progress being made to update the libraries to support an alternative linking strategy. I am totally on board with this effort, and would be happy to help get us there. However there are still several barriers in the libraries (example ockl) that need to be ironed out before we can fully switch to a new approach: switching from globals to libcall recognition, needing separate entry points for optimized versions, issues with wavefront, etc. In the meantime, I'd like to push this change forward, as I think it's an incremental improvement over what we currently have:
|
I doubt it's a good idea to give any weight to very short running kernels where the compilation time is mistakenly being factored into the runtime. I want to be sure that a kernel which is calling math functions or whatever billions of times is as optimized as we can make it. |
The motivation for this change is a 20% drop in performance on an OpenCL kernel due to increased build times, right? Is this some core software that has been deemed unacceptable to regress performance on even though it's so small the compilation time is the majority of time spent? We do need to further improve our infrastructure to make linking this stuff directly more viable however. |
I've been working on testing this patch with an array of OpenCL benchmarks over the past month. We did some high-level regression testing with the following benchmarks:
With apps, we didn't see any significant regressions. I also did some in-depth testing with FAHbench and Chimex: FAHBench Current:
With this PR:
Between the two builds, the total runtime difference is ~1.3 seconds, and the difference in compilation is also ~1.3 seconds. So it does seem to support that we're only removing overhead with this PR, not introducing regressions. I also looked into the intermediate files. If we dump the two final .so files, they're nearly identical, with only a few lines differing. Chimex Current:
With this PR:
|
@lamb-j I appreciate your carrying out these performance tests! Would you be open to providing an option to enable post-link optimization that could be off by default for now? |
Currently if a user doesn't supply the new "-link-builtin-bitcodes-postopt" option, linking builtin bitcodes happens first, then the optimization pipeline follows. Does that cover the case you're talking about? |
I'm thinking of an option that developers can use. If -link-builtin-bitcodes-postopt, becomes the default, how can developers disable it? |
Presumably you'd add |
I assume I'd need an |
Switched the option from a cl::opt to a Clang/CodeGen option. Also added a LIT test which tests both -mlink-builtin-bitcode-postopt and -mno-link-builtin-bitcode-postopt |
Looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully in the future we can handle this in the linker better.
After 11a6799 "[clang][CodeGen] Omit pre-opt link when post-opt is link requested (llvm#85672)" I'm seeing a new warning: > BackendConsumer.h:37:22: error: private field 'FileMgr' is not used > [-Werror,-Wunused-private-field] Remove the field since it's no longer used.
Follow up PR to fix warning with now-unused private variable in BackendConsumer class (FileMgr) |
Currently, when the -relink-builtin-bitcodes-postop option is used we link builtin bitcodes twice: once before optimization, and again after optimization.
With this change, we omit the pre-opt linking when the option is set, and we rename the option to the following:
-Xclang -mlink-builtin-bitcodes-postopt
(-Xclang -mno-link-builtin-bitcodes-postopt)
The goal of this change is to reduce compile time. We do lose the theoretical benefits of pre-opt linking, but in practice these are small than the overhead of linking twice. However we may be able to address this in a future patch by adjusting the position of the builtin-bitcode linking pass.
Compilations not setting the option are unaffected