Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JumpThreading] Limit number of free instructions #75671

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Dec 16, 2023

JumpThreading bails out when the duplication cost is too high, however it doesn't consider free instructions (like lifetime annotations) in this cost. Normally this is fine, but in edge case behavior it can end up quadratically increasing the size of the IR, like in Issue #75666.

This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cost threshold is met, it doesn't include them in the cost. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero.

I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue #75666 under 1 minute. That example compiles in 6 seconds without the JumpThreading pass, 37 seconds with a threshold of 500, and 2:42 hours without the threshold. This is likely conservative, because I didn't want to impact normal compilations that aren't hitting this edge case, so please let me know if I should lower it.

Fixes Issue #75666.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 16, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Nick Terrell (terrelln)

Changes

JumpThreading bails out when the duplication cost is too high, however it doesn't consider free instructions (like lifetime annotations) in this cost. Normally this is fine, but in edge case behavior it can end up quadratically increasing the size of the IR, like in Issue #75666.

This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cost threshold is met, it doesn't include them in the cost. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero.

I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue #75666 under 1 minute. That example compiles in 6 seconds without the JumpThreading pass, 37 seconds with a threshold of 500, and hours without the threshold. This is likely conservative, because I didn't want to impact normal compilations that aren't hitting this edge case, so please let me know if I should lower it.

Fixes Issue #75666.


Full diff: https://github.com/llvm/llvm-project/pull/75671.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/JumpThreading.cpp (+13-1)
diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
index 8603c5cf9c022c..621299efb6b6a8 100644
--- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
@@ -102,6 +102,11 @@ static cl::opt<unsigned> PhiDuplicateThreshold(
     cl::desc("Max PHIs in BB to duplicate for jump threading"), cl::init(76),
     cl::Hidden);
 
+static cl::opt<unsigned> FreeInstDuplicateThreshold(
+    "jump-threading-free-inst-threshold",
+    cl::desc("Max free instructions in BB to duplicate for jump threading"), cl::init(500),
+    cl::Hidden);
+
 static cl::opt<bool> ThreadAcrossLoopHeaders(
     "jump-threading-across-loop-headers",
     cl::desc("Allow JumpThreading to thread across loop headers, for testing"),
@@ -467,6 +472,7 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
   // terminator-based Size adjustment at the end.
   Threshold += Bonus;
 
+  unsigned Free = 0;
   // Sum up the cost of each instruction until we get to the terminator.  Don't
   // include the terminator because the copy won't include it.
   unsigned Size = 0;
@@ -488,8 +494,14 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
         return ~0U;
 
     if (TTI->getInstructionCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) ==
-        TargetTransformInfo::TCC_Free)
+        TargetTransformInfo::TCC_Free) {
+      // Do not duplicate the BB if it has a lot of free instructions.
+      // In edge cases they can add up and significantly increase compile time of
+      // later passes by bloating the IR.
+      if (Free++ > FreeInstDuplicateThreshold)
+        return ~0U;
       continue;
+    }
 
     // All other instructions count for at least one unit.
     ++Size;

Copy link

github-actions bot commented Dec 16, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@terrelln terrelln force-pushed the 2023-12-15-jump-threading branch 3 times, most recently from 0e4c5e2 to 427cf1d Compare December 18, 2023 17:39
`JumpThreading` bails out when the duplication cost is too high, however it
doesn't consider free instructions (like lifetime annotations) in this cost.
Normally this is fine, but in edge case behavior it can end up quadratically
increasing the size of the IR, like in Issue llvm#75666.

This PR bails out if there are far too many free instructions, but otherwise
doesn't include them in the cos. This is desirable because we just want to avoid
edge case behavior in compilation speed, and the code size cost is still actually
zero.

I picked the default threshold somewhat arbitrarily. I selected the highest
multiple of 100 that kept the compile time for the repro in Issue llvm#75666 under
1 minute. That example compiles in 6 seconds without the `JumpThreading` pass,
37 seconds with a threshold of 500, and 2:42 hours without the threshold. This is
likely conservative, because I didn't want to impact normal compilations that
aren't hitting this edge case, so please let me know if I should lower it.

Fixes Issue llvm#75666.
Copy link
Member

@dcci dcci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against this change but I believe we should probably run compile time/perf numbers to make sure this doesn't pessimize things.

@dcci
Copy link
Member

dcci commented Dec 18, 2023

@nikic tracks compile time improvements so he might want to comment on this

@aeubanks
Copy link
Contributor

are debug info intrinsics considered free? if they are then this patch will cause the presence of debug info to affect codegen, which is something we don't want

@dcci
Copy link
Member

dcci commented Dec 19, 2023

I think so as they're expected to not exist in lowering -- we should probably skip them altogether.

@terrelln
Copy link
Contributor Author

@aeubanks I believe that these intrinsics are considered free in this context:

@terrelln
Copy link
Contributor Author

this patch will cause the presence of debug info to affect codegen, which is something we don't want

Is this a hard requirement or a guideline? Currently, with the default settings, you'd need 500 "free" intrinsics for less than 14 non-free instructions to change the decision to duplicate. To me this seems like an extreme number that is unlikely to show up often in practice (but maybe I'm wrong). But if it is a hard requirement, then I get that.

If this isn't acceptable, do you have a suggestion for a path forward?

@dcci
Copy link
Member

dcci commented Dec 20, 2023

Is this a hard requirement or a guideline?

I think it's an hard requirement. We never want -g to change the generated output (if it does, that's a bug).

@dcci
Copy link
Member

dcci commented Apr 3, 2024

@nikic / @fhahn -- any thought on this?

@fhahn
Copy link
Contributor

fhahn commented Apr 3, 2024

Makes sense to me in general to control compile-time. But we need to make sure debug intrinsics are excluded, presence of debug intrinsics must not impact codegen as mentioned earlier.

This should probably have at least a test to make sure the option works as expected + a variant with debug intrinsics to make sure those aren't ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants