-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[DFAJumpThreading] Constraint the number of cloned instructions #161632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DFAJumpThreading] Constraint the number of cloned instructions #161632
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Hongyu Chen (XChy) ChangesDuplicating blocks of threaded paths may cause a significant regression in IR size and slow down compile-time in later optimizations. This patch adds a coarse constraint on the number of duplicated instructions. New compile-time number: https://llvm-compile-time-tracker.com/compare.php?from=6935a3362628e823451baf86d15694d57f9664ed&to=455bd99ac151dd0445737f6e02210c2a4ec2dac5&stat=instructions%3Au Full diff: https://github.com/llvm/llvm-project/pull/161632.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index e9a3e983bc1e2..3bd7f8e2fcf93 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -120,6 +120,12 @@ static cl::opt<unsigned>
cl::desc("Maximum cost accepted for the transformation"),
cl::Hidden, cl::init(50));
+static cl::opt<double> MaxClonedRate(
+ "dfa-max-cloned-rate",
+ cl::desc(
+ "Maximum cloned instructions rate accepted for the transformation"),
+ cl::Hidden, cl::init(7.5));
+
namespace {
class SelectInstToUnfold {
@@ -828,6 +834,7 @@ struct TransformDFA {
/// also returns false if it is illegal to clone some required block.
bool isLegalAndProfitableToTransform() {
CodeMetrics Metrics;
+ uint64_t NumClonedInst = 0;
SwitchInst *Switch = SwitchPaths->getSwitchInst();
// Don't thread switch without multiple successors.
@@ -837,7 +844,6 @@ struct TransformDFA {
// Note that DuplicateBlockMap is not being used as intended here. It is
// just being used to ensure (BB, State) pairs are only counted once.
DuplicateBlockMap DuplicateMap;
-
for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
PathType PathBBs = TPath.getPath();
APInt NextState = TPath.getExitValue();
@@ -848,6 +854,7 @@ struct TransformDFA {
BasicBlock *VisitedBB = getClonedBB(BB, NextState, DuplicateMap);
if (!VisitedBB) {
Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
+ NumClonedInst += range_size(*BB);
DuplicateMap[BB].push_back({BB, NextState});
}
@@ -865,6 +872,7 @@ struct TransformDFA {
if (VisitedBB)
continue;
Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
+ NumClonedInst += range_size(*BB);
DuplicateMap[BB].push_back({BB, NextState});
}
@@ -901,6 +909,22 @@ struct TransformDFA {
}
}
+ // Too much cloned instructions slow down later optimizations, especially
+ // SLPVectorizer.
+ // TODO: Thread the switch partially before reaching the threshold.
+ uint64_t NumOrigInst = 0;
+ for (auto &[BB, _] : DuplicateMap)
+ NumOrigInst += range_size(*BB);
+ if (double(NumClonedInst) / double(NumOrigInst) > MaxClonedRate) {
+ LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
+ "instructions wll be cloned\n");
+ ORE->emit([&]() {
+ return OptimizationRemarkMissed(DEBUG_TYPE, "NotProfitable", Switch)
+ << "Too much instructions will be cloned.";
+ });
+ return false;
+ }
+
InstructionCost DuplicationCost = 0;
unsigned JumpTableSize = 0;
|
How many transformations are restricted by this (would it be possible to check stats on the test suite)? I fear that restricting possibly beneficial transformations to avoid second-order effects might not be the best way forward. |
I have checked the statistics on the test suite. Only reduce one transformation in the outlier
Passes like inlining and loop unrolling would prevent massive expansion; otherwise, the compile time will blow up in later optimizations, although there are actually benefits after expansion. That's an acceptable trade-off for me. The outlier libclamav_nsis_LZMADecode.c grows from about 1K lines to over 4K lines after DFAJumpThreading, that's unacceptable to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Duplicating blocks of threaded paths may cause a significant regression in IR size and slow down compile-time in later optimizations. This patch adds a coarse constraint on the number of duplicated instructions.
New compile-time number: https://llvm-compile-time-tracker.com/compare.php?from=6935a3362628e823451baf86d15694d57f9664ed&to=455bd99ac151dd0445737f6e02210c2a4ec2dac5&stat=instructions%3Au