-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JumpThreading] Limit number of free instructions #75671
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Nick Terrell (terrelln) Changes
This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cost threshold is met, it doesn't include them in the cost. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero. I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue #75666 under 1 minute. That example compiles in 6 seconds without the Fixes Issue #75666. Full diff: https://github.com/llvm/llvm-project/pull/75671.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
index 8603c5cf9c022c..621299efb6b6a8 100644
--- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
@@ -102,6 +102,11 @@ static cl::opt<unsigned> PhiDuplicateThreshold(
cl::desc("Max PHIs in BB to duplicate for jump threading"), cl::init(76),
cl::Hidden);
+static cl::opt<unsigned> FreeInstDuplicateThreshold(
+ "jump-threading-free-inst-threshold",
+ cl::desc("Max free instructions in BB to duplicate for jump threading"), cl::init(500),
+ cl::Hidden);
+
static cl::opt<bool> ThreadAcrossLoopHeaders(
"jump-threading-across-loop-headers",
cl::desc("Allow JumpThreading to thread across loop headers, for testing"),
@@ -467,6 +472,7 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
// terminator-based Size adjustment at the end.
Threshold += Bonus;
+ unsigned Free = 0;
// Sum up the cost of each instruction until we get to the terminator. Don't
// include the terminator because the copy won't include it.
unsigned Size = 0;
@@ -488,8 +494,14 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
return ~0U;
if (TTI->getInstructionCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) ==
- TargetTransformInfo::TCC_Free)
+ TargetTransformInfo::TCC_Free) {
+ // Do not duplicate the BB if it has a lot of free instructions.
+ // In edge cases they can add up and significantly increase compile time of
+ // later passes by bloating the IR.
+ if (Free++ > FreeInstDuplicateThreshold)
+ return ~0U;
continue;
+ }
// All other instructions count for at least one unit.
++Size;
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
0e4c5e2
to
427cf1d
Compare
`JumpThreading` bails out when the duplication cost is too high, however it doesn't consider free instructions (like lifetime annotations) in this cost. Normally this is fine, but in edge case behavior it can end up quadratically increasing the size of the IR, like in Issue llvm#75666. This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cos. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero. I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue llvm#75666 under 1 minute. That example compiles in 6 seconds without the `JumpThreading` pass, 37 seconds with a threshold of 500, and 2:42 hours without the threshold. This is likely conservative, because I didn't want to impact normal compilations that aren't hitting this edge case, so please let me know if I should lower it. Fixes Issue llvm#75666.
427cf1d
to
cbbc1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not against this change but I believe we should probably run compile time/perf numbers to make sure this doesn't pessimize things.
@nikic tracks compile time improvements so he might want to comment on this |
are debug info intrinsics considered free? if they are then this patch will cause the presence of debug info to affect codegen, which is something we don't want |
I think so as they're expected to not exist in lowering -- we should probably skip them altogether. |
@aeubanks I believe that these intrinsics are considered free in this context:
|
Is this a hard requirement or a guideline? Currently, with the default settings, you'd need 500 "free" intrinsics for less than 14 non-free instructions to change the decision to duplicate. To me this seems like an extreme number that is unlikely to show up often in practice (but maybe I'm wrong). But if it is a hard requirement, then I get that. If this isn't acceptable, do you have a suggestion for a path forward? |
I think it's an hard requirement. We never want |
Makes sense to me in general to control compile-time. But we need to make sure debug intrinsics are excluded, presence of debug intrinsics must not impact codegen as mentioned earlier. This should probably have at least a test to make sure the option works as expected + a variant with debug intrinsics to make sure those aren't ignored. |
JumpThreading
bails out when the duplication cost is too high, however it doesn't consider free instructions (like lifetime annotations) in this cost. Normally this is fine, but in edge case behavior it can end up quadratically increasing the size of the IR, like in Issue #75666.This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cost threshold is met, it doesn't include them in the cost. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero.
I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue #75666 under 1 minute. That example compiles in 6 seconds without the
JumpThreading
pass, 37 seconds with a threshold of 500, and 2:42 hours without the threshold. This is likely conservative, because I didn't want to impact normal compilations that aren't hitting this edge case, so please let me know if I should lower it.Fixes Issue #75666.