Skip to content

Conversation

XChy
Copy link
Member

@XChy XChy commented Oct 2, 2025

Duplicating blocks of threaded paths may cause a significant regression in IR size and slow down compile-time in later optimizations. This patch adds a coarse constraint on the number of duplicated instructions.

New compile-time number: https://llvm-compile-time-tracker.com/compare.php?from=6935a3362628e823451baf86d15694d57f9664ed&to=455bd99ac151dd0445737f6e02210c2a4ec2dac5&stat=instructions%3Au

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Hongyu Chen (XChy)

Changes

Duplicating blocks of threaded paths may cause a significant regression in IR size and slow down compile-time in later optimizations. This patch adds a coarse constraint on the number of duplicated instructions.

New compile-time number: https://llvm-compile-time-tracker.com/compare.php?from=6935a3362628e823451baf86d15694d57f9664ed&to=455bd99ac151dd0445737f6e02210c2a4ec2dac5&stat=instructions%3Au


Full diff: https://github.com/llvm/llvm-project/pull/161632.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+25-1)
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index e9a3e983bc1e2..3bd7f8e2fcf93 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -120,6 +120,12 @@ static cl::opt<unsigned>
                   cl::desc("Maximum cost accepted for the transformation"),
                   cl::Hidden, cl::init(50));
 
+static cl::opt<double> MaxClonedRate(
+    "dfa-max-cloned-rate",
+    cl::desc(
+        "Maximum cloned instructions rate accepted for the transformation"),
+    cl::Hidden, cl::init(7.5));
+
 namespace {
 
 class SelectInstToUnfold {
@@ -828,6 +834,7 @@ struct TransformDFA {
   /// also returns false if it is illegal to clone some required block.
   bool isLegalAndProfitableToTransform() {
     CodeMetrics Metrics;
+    uint64_t NumClonedInst = 0;
     SwitchInst *Switch = SwitchPaths->getSwitchInst();
 
     // Don't thread switch without multiple successors.
@@ -837,7 +844,6 @@ struct TransformDFA {
     // Note that DuplicateBlockMap is not being used as intended here. It is
     // just being used to ensure (BB, State) pairs are only counted once.
     DuplicateBlockMap DuplicateMap;
-
     for (ThreadingPath &TPath : SwitchPaths->getThreadingPaths()) {
       PathType PathBBs = TPath.getPath();
       APInt NextState = TPath.getExitValue();
@@ -848,6 +854,7 @@ struct TransformDFA {
       BasicBlock *VisitedBB = getClonedBB(BB, NextState, DuplicateMap);
       if (!VisitedBB) {
         Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
+        NumClonedInst += range_size(*BB);
         DuplicateMap[BB].push_back({BB, NextState});
       }
 
@@ -865,6 +872,7 @@ struct TransformDFA {
         if (VisitedBB)
           continue;
         Metrics.analyzeBasicBlock(BB, *TTI, EphValues);
+        NumClonedInst += range_size(*BB);
         DuplicateMap[BB].push_back({BB, NextState});
       }
 
@@ -901,6 +909,22 @@ struct TransformDFA {
       }
     }
 
+    // Too much cloned instructions slow down later optimizations, especially
+    // SLPVectorizer.
+    // TODO: Thread the switch partially before reaching the threshold.
+    uint64_t NumOrigInst = 0;
+    for (auto &[BB, _] : DuplicateMap)
+      NumOrigInst += range_size(*BB);
+    if (double(NumClonedInst) / double(NumOrigInst) > MaxClonedRate) {
+      LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
+                           "instructions wll be cloned\n");
+      ORE->emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NotProfitable", Switch)
+               << "Too much instructions will be cloned.";
+      });
+      return false;
+    }
+
     InstructionCost DuplicationCost = 0;
 
     unsigned JumpTableSize = 0;

@UsmanNadeem
Copy link
Contributor

How many transformations are restricted by this (would it be possible to check stats on the test suite)? I fear that restricting possibly beneficial transformations to avoid second-order effects might not be the best way forward.

@XChy
Copy link
Member Author

XChy commented Oct 3, 2025

How many transformations are restricted by this (would it be possible to check stats on the test suite)?

I have checked the statistics on the test suite. Only reduce one transformation in the outlier libclamav_nsis_LZMADecode.c.

I fear that restricting possibly beneficial transformations to avoid second-order effects might not be the best way forward.

Passes like inlining and loop unrolling would prevent massive expansion; otherwise, the compile time will blow up in later optimizations, although there are actually benefits after expansion. That's an acceptable trade-off for me.

The outlier libclamav_nsis_LZMADecode.c grows from about 1K lines to over 4K lines after DFAJumpThreading, that's unacceptable to me.

Copy link
Contributor

@UsmanNadeem UsmanNadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@XChy XChy enabled auto-merge (squash) October 7, 2025 04:05
@XChy XChy disabled auto-merge October 7, 2025 04:30
@XChy XChy merged commit cf85ec5 into llvm:main Oct 7, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants