Skip to content

Conversation

@XChy
Copy link
Member

@XChy XChy commented Oct 14, 2025

For every threadable path B1 -> B2 -> ... -> Bn, we need to insert phi nodes into every unduplicated successor of Bi if there are outer uses of duplicated definitions in B_i. To prevent the booming of phi nodes, this patch adds a threshold for the maximum number of unduplicated successors that may contain outer uses. This threshold makes sense especially when multi-target branches like switch/indirectbr/callbr are duplicated.

Note that the O3 statistics in llvm-test-suite are not influenced.

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Hongyu Chen (XChy)

Changes

For every threadable path B1 -> B2 -> ... -> Bn, we need to insert phi nodes into every unduplicated successor of Bi if there are outer uses of duplicated definitions in B_i. To prevent the booming of phi nodes, this patch adds a threshold for the maximum number of unduplicated successors that may contain outer uses. This threshold makes sense especially when multi-target branches like switch/indirectbr/callbr are duplicated.

Note that the O3 statistics in llvm-test-suite are not influenced.


Full diff: https://github.com/llvm/llvm-project/pull/163428.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+33-5)
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index ff5f390d6fe18..11d65d5a59175 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -122,16 +122,22 @@ static cl::opt<unsigned>
                   cl::desc("Maximum cost accepted for the transformation"),
                   cl::Hidden, cl::init(50));
 
-extern cl::opt<bool> ProfcheckDisableMetadataFixes;
-
-} // namespace llvm
-
 static cl::opt<double> MaxClonedRate(
     "dfa-max-cloned-rate",
     cl::desc(
         "Maximum cloned instructions rate accepted for the transformation"),
     cl::Hidden, cl::init(7.5));
 
+static cl::opt<unsigned>
+    MaxOuterUseBlocks("dfa-max-out-use-blocks",
+                      cl::desc("Maximum unduplicated blocks with outer uses "
+                               "accepted for the transformation"),
+                      cl::Hidden, cl::init(40));
+
+extern cl::opt<bool> ProfcheckDisableMetadataFixes;
+
+} // namespace llvm
+
 namespace {
 class SelectInstToUnfold {
   SelectInst *SI;
@@ -966,8 +972,16 @@ struct TransformDFA {
     // SLPVectorizer.
     // TODO: Thread the switch partially before reaching the threshold.
     uint64_t NumOrigInst = 0;
-    for (auto *BB : DuplicateMap.keys())
+    uint64_t NumOuterUseBlock = 0;
+    for (auto *BB : DuplicateMap.keys()) {
       NumOrigInst += BB->sizeWithoutDebug();
+      // Only unduplicated blocks with single predecessor require new phi
+      // nodes.
+      for (auto *Succ : successors(BB))
+        if (!DuplicateMap.count(Succ) && Succ->getSinglePredecessor())
+          NumOuterUseBlock++;
+    }
+
     if (double(NumClonedInst) / double(NumOrigInst) > MaxClonedRate) {
       LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
                            "instructions wll be cloned\n");
@@ -978,6 +992,20 @@ struct TransformDFA {
       return false;
     }
 
+    // Too much unduplicated blocks with outer uses may cause too much
+    // insertions of phi nodes for duplicated definitions. TODO: Drop this
+    // threshold if we come up with another way to reduce the number of inserted
+    // phi nodes.
+    if (NumOuterUseBlock > MaxOuterUseBlocks) {
+      LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
+                           "blocks with outer uses\n");
+      ORE->emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NotProfitable", Switch)
+               << "Too much blocks with outer uses.";
+      });
+      return false;
+    }
+
     InstructionCost DuplicationCost = 0;
 
     unsigned JumpTableSize = 0;

@XChy
Copy link
Member Author

XChy commented Oct 20, 2025

ping.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, but should probably have test coverage (with a reduced limit)?

@XChy XChy force-pushed the dfa-jump-thread-max-outer-uses branch from 8704e9a to c23bc4b Compare October 28, 2025 16:22
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=5 %s | FileCheck %s

define void @max_outer_uses_by_switch(i32 %cond) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems like this doesn't really demonstrate the phi explosion issue. Without the limit you get:

diff --git a/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll b/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
index f059754b8221..24f06c77614c 100644
--- a/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
+++ b/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
-; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=5 %s | FileCheck %s
+; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=10 %s | FileCheck %s
 
 define void @max_outer_uses_by_switch(i32 %cond) {
 ; CHECK-LABEL: define void @max_outer_uses_by_switch(
@@ -7,23 +7,36 @@ define void @max_outer_uses_by_switch(i32 %cond) {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
 ; CHECK-NEXT:    br label %[[SWITCH_BB:.*]]
 ; CHECK:       [[SWITCH_BB]]:
-; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[ONE:%.*]], %[[CASE1_SUCC:.*]] ], [ 2, %[[CASE2:.*]] ]
+; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ poison, %[[CASE1_SUCC:.*]] ]
 ; CHECK-NEXT:    switch i32 [[PHI]], label %[[DEFAULT_DEST:.*]] [
 ; CHECK-NEXT:      i32 0, label %[[CASE1:.*]]
-; CHECK-NEXT:      i32 1, label %[[CASE2]]
+; CHECK-NEXT:      i32 1, label %[[CASE2:.*]]
 ; CHECK-NEXT:    ]
+; CHECK:       [[SWITCH_BB_JT2:.*]]:
+; CHECK-NEXT:    [[PHI_JT2:%.*]] = phi i32 [ 2, %[[CASE2]] ]
+; CHECK-NEXT:    br label %[[DEFAULT_DEST]]
+; CHECK:       [[SWITCH_BB_JT1:.*]]:
+; CHECK-NEXT:    [[PHI_JT1:%.*]] = phi i32 [ [[ONE_JT1:%.*]], %[[CASE1_SUCC_JT1:.*]] ]
+; CHECK-NEXT:    br label %[[CASE2]]
 ; CHECK:       [[CASE1]]:
-; CHECK-NEXT:    br label %[[CASE1_SUCC]]
+; CHECK-NEXT:    br label %[[CASE1_SUCC_JT1]]
 ; CHECK:       [[CASE1_SUCC]]:
-; CHECK-NEXT:    [[ONE]] = phi i32 [ 1, %[[CASE1]] ]
 ; CHECK-NEXT:    switch i32 [[COND]], label %[[SWITCH_BB]] [
 ; CHECK-NEXT:      i32 0, label %[[OUTER1:.*]]
 ; CHECK-NEXT:      i32 1, label %[[OUTER2:.*]]
 ; CHECK-NEXT:      i32 2, label %[[OUTER3:.*]]
 ; CHECK-NEXT:      i32 3, label %[[OUTER4:.*]]
 ; CHECK-NEXT:    ]
+; CHECK:       [[CASE1_SUCC_JT1]]:
+; CHECK-NEXT:    [[ONE_JT1]] = phi i32 [ 1, %[[CASE1]] ]
+; CHECK-NEXT:    switch i32 [[COND]], label %[[SWITCH_BB_JT1]] [
+; CHECK-NEXT:      i32 0, label %[[OUTER1]]
+; CHECK-NEXT:      i32 1, label %[[OUTER2]]
+; CHECK-NEXT:      i32 2, label %[[OUTER3]]
+; CHECK-NEXT:      i32 3, label %[[OUTER4]]
+; CHECK-NEXT:    ]
 ; CHECK:       [[CASE2]]:
-; CHECK-NEXT:    br label %[[SWITCH_BB]]
+; CHECK-NEXT:    br label %[[SWITCH_BB_JT2]]
 ; CHECK:       [[OUTER1]]:
 ; CHECK-NEXT:    ret void
 ; CHECK:       [[OUTER2]]:

which does not introduce any new (multi-operand) phis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I updated the tests now. They should show new phis now.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XChy XChy merged commit affed57 into llvm:main Oct 29, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 29, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/17633

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
For every threadable path `B1 -> B2 -> ... -> Bn`, we need to insert phi
nodes into every unduplicated successor of `Bi` if there are outer uses
of duplicated definitions in `B_i`. To prevent the booming of phi nodes,
this patch adds a threshold for the maximum number of unduplicated
successors that may contain outer uses. This threshold makes sense
especially when multi-target branches like switch/indirectbr/callbr are
duplicated.

Note that the O3 statistics in llvm-test-suite are not influenced.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants