-
Notifications
You must be signed in to change notification settings - Fork 15k
[DFAJumpThreading] Add MaxOuterUseBlocks threshold #163428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Hongyu Chen (XChy) ChangesFor every threadable path Note that the O3 statistics in llvm-test-suite are not influenced. Full diff: https://github.com/llvm/llvm-project/pull/163428.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index ff5f390d6fe18..11d65d5a59175 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -122,16 +122,22 @@ static cl::opt<unsigned>
cl::desc("Maximum cost accepted for the transformation"),
cl::Hidden, cl::init(50));
-extern cl::opt<bool> ProfcheckDisableMetadataFixes;
-
-} // namespace llvm
-
static cl::opt<double> MaxClonedRate(
"dfa-max-cloned-rate",
cl::desc(
"Maximum cloned instructions rate accepted for the transformation"),
cl::Hidden, cl::init(7.5));
+static cl::opt<unsigned>
+ MaxOuterUseBlocks("dfa-max-out-use-blocks",
+ cl::desc("Maximum unduplicated blocks with outer uses "
+ "accepted for the transformation"),
+ cl::Hidden, cl::init(40));
+
+extern cl::opt<bool> ProfcheckDisableMetadataFixes;
+
+} // namespace llvm
+
namespace {
class SelectInstToUnfold {
SelectInst *SI;
@@ -966,8 +972,16 @@ struct TransformDFA {
// SLPVectorizer.
// TODO: Thread the switch partially before reaching the threshold.
uint64_t NumOrigInst = 0;
- for (auto *BB : DuplicateMap.keys())
+ uint64_t NumOuterUseBlock = 0;
+ for (auto *BB : DuplicateMap.keys()) {
NumOrigInst += BB->sizeWithoutDebug();
+ // Only unduplicated blocks with single predecessor require new phi
+ // nodes.
+ for (auto *Succ : successors(BB))
+ if (!DuplicateMap.count(Succ) && Succ->getSinglePredecessor())
+ NumOuterUseBlock++;
+ }
+
if (double(NumClonedInst) / double(NumOrigInst) > MaxClonedRate) {
LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
"instructions wll be cloned\n");
@@ -978,6 +992,20 @@ struct TransformDFA {
return false;
}
+ // Too much unduplicated blocks with outer uses may cause too much
+ // insertions of phi nodes for duplicated definitions. TODO: Drop this
+ // threshold if we come up with another way to reduce the number of inserted
+ // phi nodes.
+ if (NumOuterUseBlock > MaxOuterUseBlocks) {
+ LLVM_DEBUG(dbgs() << "DFA Jump Threading: Not jump threading, too much "
+ "blocks with outer uses\n");
+ ORE->emit([&]() {
+ return OptimizationRemarkMissed(DEBUG_TYPE, "NotProfitable", Switch)
+ << "Too much blocks with outer uses.";
+ });
+ return false;
+ }
+
InstructionCost DuplicationCost = 0;
unsigned JumpTableSize = 0;
|
|
ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me, but should probably have test coverage (with a reduced limit)?
8704e9a to
c23bc4b
Compare
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6 | ||
| ; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=5 %s | FileCheck %s | ||
|
|
||
| define void @max_outer_uses_by_switch(i32 %cond) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it seems like this doesn't really demonstrate the phi explosion issue. Without the limit you get:
diff --git a/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll b/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
index f059754b8221..24f06c77614c 100644
--- a/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
+++ b/llvm/test/Transforms/DFAJumpThreading/max-outer-uses.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
-; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=5 %s | FileCheck %s
+; RUN: opt -S -passes=dfa-jump-threading -dfa-max-out-use-blocks=10 %s | FileCheck %s
define void @max_outer_uses_by_switch(i32 %cond) {
; CHECK-LABEL: define void @max_outer_uses_by_switch(
@@ -7,23 +7,36 @@ define void @max_outer_uses_by_switch(i32 %cond) {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: br label %[[SWITCH_BB:.*]]
; CHECK: [[SWITCH_BB]]:
-; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[ONE:%.*]], %[[CASE1_SUCC:.*]] ], [ 2, %[[CASE2:.*]] ]
+; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ poison, %[[CASE1_SUCC:.*]] ]
; CHECK-NEXT: switch i32 [[PHI]], label %[[DEFAULT_DEST:.*]] [
; CHECK-NEXT: i32 0, label %[[CASE1:.*]]
-; CHECK-NEXT: i32 1, label %[[CASE2]]
+; CHECK-NEXT: i32 1, label %[[CASE2:.*]]
; CHECK-NEXT: ]
+; CHECK: [[SWITCH_BB_JT2:.*]]:
+; CHECK-NEXT: [[PHI_JT2:%.*]] = phi i32 [ 2, %[[CASE2]] ]
+; CHECK-NEXT: br label %[[DEFAULT_DEST]]
+; CHECK: [[SWITCH_BB_JT1:.*]]:
+; CHECK-NEXT: [[PHI_JT1:%.*]] = phi i32 [ [[ONE_JT1:%.*]], %[[CASE1_SUCC_JT1:.*]] ]
+; CHECK-NEXT: br label %[[CASE2]]
; CHECK: [[CASE1]]:
-; CHECK-NEXT: br label %[[CASE1_SUCC]]
+; CHECK-NEXT: br label %[[CASE1_SUCC_JT1]]
; CHECK: [[CASE1_SUCC]]:
-; CHECK-NEXT: [[ONE]] = phi i32 [ 1, %[[CASE1]] ]
; CHECK-NEXT: switch i32 [[COND]], label %[[SWITCH_BB]] [
; CHECK-NEXT: i32 0, label %[[OUTER1:.*]]
; CHECK-NEXT: i32 1, label %[[OUTER2:.*]]
; CHECK-NEXT: i32 2, label %[[OUTER3:.*]]
; CHECK-NEXT: i32 3, label %[[OUTER4:.*]]
; CHECK-NEXT: ]
+; CHECK: [[CASE1_SUCC_JT1]]:
+; CHECK-NEXT: [[ONE_JT1]] = phi i32 [ 1, %[[CASE1]] ]
+; CHECK-NEXT: switch i32 [[COND]], label %[[SWITCH_BB_JT1]] [
+; CHECK-NEXT: i32 0, label %[[OUTER1]]
+; CHECK-NEXT: i32 1, label %[[OUTER2]]
+; CHECK-NEXT: i32 2, label %[[OUTER3]]
+; CHECK-NEXT: i32 3, label %[[OUTER4]]
+; CHECK-NEXT: ]
; CHECK: [[CASE2]]:
-; CHECK-NEXT: br label %[[SWITCH_BB]]
+; CHECK-NEXT: br label %[[SWITCH_BB_JT2]]
; CHECK: [[OUTER1]]:
; CHECK-NEXT: ret void
; CHECK: [[OUTER2]]:which does not introduce any new (multi-operand) phis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I updated the tests now. They should show new phis now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/17633 Here is the relevant piece of the build log for the reference |
For every threadable path `B1 -> B2 -> ... -> Bn`, we need to insert phi nodes into every unduplicated successor of `Bi` if there are outer uses of duplicated definitions in `B_i`. To prevent the booming of phi nodes, this patch adds a threshold for the maximum number of unduplicated successors that may contain outer uses. This threshold makes sense especially when multi-target branches like switch/indirectbr/callbr are duplicated. Note that the O3 statistics in llvm-test-suite are not influenced.
For every threadable path
B1 -> B2 -> ... -> Bn, we need to insert phi nodes into every unduplicated successor ofBiif there are outer uses of duplicated definitions inB_i. To prevent the booming of phi nodes, this patch adds a threshold for the maximum number of unduplicated successors that may contain outer uses. This threshold makes sense especially when multi-target branches like switch/indirectbr/callbr are duplicated.Note that the O3 statistics in llvm-test-suite are not influenced.