Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JumpThreading] Thread over BB with only an unconditional branch #86312

Merged
merged 2 commits into from
Apr 16, 2024

Conversation

XChy
Copy link
Member

@XChy XChy commented Mar 22, 2024

Fixes #76609
This patch does:

  • relax the phis constraint in CanRedirectPredsOfEmptyBBToSucc
  • guarantee the BB has multiple different predecessors to redirect, so that we can handle the case without phis in BB. Without this change and phi constraint, we may redirect the CommonPred.

The motivation is consistent with JumpThreading. We always want the branch to jump more direct to the destination, without passing the middle block. In this way, we can expose more other optimization opportunities.

An obivous example proposed by @dtcxzyw is like:

define i32 @test(...) {
entry:
   br i1 %c, label %do.end, label %if.then

if.then:                                          ; preds = %entry
   %call2 = call i32 @dummy()
   %tobool3.not = icmp eq i32 %call2, 0
   br i1 %tobool3.not, label %do.end, label %return

do.end:                                           ; preds = %entry, %if.then
   br label %return

return:                                           ; preds = %if.then, %do.end
   %retval.0 = phi i32 [ 0, %do.end ], [ %call2, %if.then ]
   ret i32 %retval.0
}

entry can directly jump to return, without passing do.end, and then the if-else pattern can be simplified further:

define i32 @test(...) {
entry:
   br i1 %c, label %return, label %if.then

if.then:                                          ; preds = %entry
   %call2 = call i32 @dummy()
   br label %return

return:                                           ; preds = %if.then
   %retval.0 = phi i32 [ 0, %entry ], [ %call2, %if.then ]
   ret i32 %retval.0
}

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 22, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-transforms

Author: XChy (XChy)

Changes

Fixes #76609
This patch does:

  • relax the phis constraint in CanRedirectPredsOfEmptyBBToSucc
  • guarantee the BB whose preds are redirected has multiple different predecessors, so that it can handle the case without phis in BB. Without this change and phi constraint, we may redirect the CommonPred.

Full diff: https://github.com/llvm/llvm-project/pull/86312.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+5-4)
  • (modified) llvm/test/Transforms/JumpThreading/pr79175.ll (+4-4)
  • (modified) llvm/test/Transforms/JumpThreading/select.ll (+28-26)
  • (modified) llvm/test/Transforms/JumpThreading/thread-prob-7.ll (+4-4)
  • (added) llvm/test/Transforms/JumpThreading/uncond-no-phi.ll (+123)
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 422319eb185134..9760988e5c09d7 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -1013,12 +1013,13 @@ CanRedirectPredsOfEmptyBBToSucc(BasicBlock *BB, BasicBlock *Succ,
                                 const SmallPtrSetImpl<BasicBlock *> &SuccPreds,
                                 BasicBlock *&CommonPred) {
 
-  // There must be phis in BB, otherwise BB will be merged into Succ directly
-  if (BB->phis().empty() || Succ->phis().empty())
+  // Only handle the nontrivial case. Assume we need to update the phi of succ
+  // here.
+  if (Succ->phis().empty())
     return false;
 
-  // BB must have predecessors not shared that can be redirected to Succ
-  if (!BB->hasNPredecessorsOrMore(2))
+  // BB must have multiple different uncommon predecessors that can be redirected to Succ
+  if (BB->getUniquePredecessor() || pred_empty(BB))
     return false;
 
   // Get single common predecessors of both BB and Succ
diff --git a/llvm/test/Transforms/JumpThreading/pr79175.ll b/llvm/test/Transforms/JumpThreading/pr79175.ll
index 2c7ee0770cdc73..cce30ce079999c 100644
--- a/llvm/test/Transforms/JumpThreading/pr79175.ll
+++ b/llvm/test/Transforms/JumpThreading/pr79175.ll
@@ -17,11 +17,11 @@ define i32 @test(i64 %idx, i32 %val) {
 ; CHECK:       cond.end:
 ; CHECK-NEXT:    [[CMP_I:%.*]] = icmp sgt i32 [[VAL]], 0
 ; CHECK-NEXT:    [[COND_FR:%.*]] = freeze i1 [[CMP_I]]
-; CHECK-NEXT:    br i1 [[COND_FR]], label [[COND_END_THREAD]], label [[TMP0:%.*]]
-; CHECK:       cond.end.thread:
-; CHECK-NEXT:    br label [[TMP0]]
+; CHECK-NEXT:    br i1 [[COND_FR]], label [[TMP0:%.*]], label [[COND_END_THREAD]]
 ; CHECK:       0:
-; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ 0, [[COND_END_THREAD]] ], [ [[VAL]], [[COND_END]] ]
+; CHECK-NEXT:    br label [[COND_END_THREAD]]
+; CHECK:       cond.end.thread:
+; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ [[VAL]], [[COND_END]] ], [ 0, [[TMP0]] ], [ 0, [[FOR_BODY]] ]
 ; CHECK-NEXT:    [[F_IDX:%.*]] = getelementptr inbounds i32, ptr @f, i64 [[IDX]]
 ; CHECK-NEXT:    store i32 [[TMP1]], ptr [[F_IDX]], align 4
 ; CHECK-NEXT:    [[F_RELOAD:%.*]] = load i32, ptr @f, align 4
diff --git a/llvm/test/Transforms/JumpThreading/select.ll b/llvm/test/Transforms/JumpThreading/select.ll
index 4ec55a66bb8ac1..53aebeb7b05c0d 100644
--- a/llvm/test/Transforms/JumpThreading/select.ll
+++ b/llvm/test/Transforms/JumpThreading/select.ll
@@ -21,7 +21,7 @@ declare void @quux()
 ; booleans where at least one operand is true/false/undef.
 
 ;.
-; CHECK: @[[ANCHOR:[a-zA-Z0-9_$"\\.-]+]] = constant [3 x ptr] [ptr blockaddress(@test_indirectbr, [[L1:%.*]]), ptr inttoptr (i32 1 to ptr), ptr blockaddress(@test_indirectbr, [[L3:%.*]])]
+; CHECK-BPI: @anchor = constant [3 x ptr] [ptr blockaddress(@test_indirectbr, %L1), ptr inttoptr (i32 1 to ptr), ptr blockaddress(@test_indirectbr, %L3)]
 ;.
 define void @test_br(i1 %cond, i1 %value) nounwind {
 ; CHECK-LABEL: @test_br(
@@ -66,8 +66,8 @@ define void @test_switch(i1 %cond, i8 %value) nounwind {
 ; CHECK-NEXT:    call void @quux()
 ; CHECK-NEXT:    [[EXPR:%.*]] = select i1 [[COND]], i8 1, i8 [[VALUE:%.*]]
 ; CHECK-NEXT:    switch i8 [[EXPR]], label [[L3:%.*]] [
-; CHECK-NEXT:    i8 1, label [[L1]]
-; CHECK-NEXT:    i8 2, label [[L2:%.*]]
+; CHECK-NEXT:      i8 1, label [[L1]]
+; CHECK-NEXT:      i8 2, label [[L2:%.*]]
 ; CHECK-NEXT:    ]
 ; CHECK:       L1:
 ; CHECK-NEXT:    call void @foo()
@@ -192,8 +192,8 @@ define void @test_switch_cmp(i1 %cond, i32 %val, i8 %value) nounwind {
 ; CHECK:       0:
 ; CHECK-NEXT:    [[TMP1:%.*]] = phi i8 [ [[VALUE:%.*]], [[L0]] ]
 ; CHECK-NEXT:    switch i8 [[TMP1]], label [[L3:%.*]] [
-; CHECK-NEXT:    i8 1, label [[L1]]
-; CHECK-NEXT:    i8 2, label [[L2:%.*]]
+; CHECK-NEXT:      i8 1, label [[L1]]
+; CHECK-NEXT:      i8 2, label [[L2:%.*]]
 ; CHECK-NEXT:    ]
 ; CHECK:       L1:
 ; CHECK-NEXT:    call void @foo()
@@ -237,8 +237,8 @@ define void @test_switch_default(ptr nocapture %status) nounwind {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[STATUS:%.*]], align 4
 ; CHECK-NEXT:    switch i32 [[TMP0]], label [[L2:%.*]] [
-; CHECK-NEXT:    i32 5061, label [[L2_THREAD:%.*]]
-; CHECK-NEXT:    i32 0, label [[L2]]
+; CHECK-NEXT:      i32 5061, label [[L2_THREAD:%.*]]
+; CHECK-NEXT:      i32 0, label [[L2]]
 ; CHECK-NEXT:    ]
 ; CHECK:       L2.thread:
 ; CHECK-NEXT:    store i32 10025, ptr [[STATUS]], align 4
@@ -377,21 +377,21 @@ define i32 @unfold3(i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z, i32 %j) noun
 ; CHECK-NEXT:    br i1 [[CMP_I]], label [[DOTEXIT_THREAD4:%.*]], label [[COND_FALSE_I:%.*]]
 ; CHECK:       cond.false.i:
 ; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp sgt i32 [[U]], [[V]]
-; CHECK-NEXT:    br i1 [[CMP4_I]], label [[DOTEXIT_THREAD:%.*]], label [[COND_FALSE_6_I:%.*]]
+; CHECK-NEXT:    br i1 [[CMP4_I]], label [[DOTEXIT_THREAD4]], label [[COND_FALSE_6_I:%.*]]
 ; CHECK:       cond.false.6.i:
 ; CHECK-NEXT:    [[CMP8_I:%.*]] = icmp slt i32 [[W:%.*]], [[X:%.*]]
 ; CHECK-NEXT:    br i1 [[CMP8_I]], label [[DOTEXIT_THREAD4]], label [[COND_FALSE_10_I:%.*]]
 ; CHECK:       cond.false.10.i:
 ; CHECK-NEXT:    [[CMP13_I:%.*]] = icmp sgt i32 [[W]], [[X]]
-; CHECK-NEXT:    br i1 [[CMP13_I]], label [[DOTEXIT_THREAD]], label [[DOTEXIT:%.*]]
+; CHECK-NEXT:    br i1 [[CMP13_I]], label [[DOTEXIT_THREAD4]], label [[DOTEXIT:%.*]]
 ; CHECK:       .exit:
 ; CHECK-NEXT:    [[PHITMP:%.*]] = icmp sge i32 [[Y:%.*]], [[Z:%.*]]
 ; CHECK-NEXT:    [[COND_FR:%.*]] = freeze i1 [[PHITMP]]
-; CHECK-NEXT:    br i1 [[COND_FR]], label [[DOTEXIT_THREAD]], label [[DOTEXIT_THREAD4]]
-; CHECK:       .exit.thread:
+; CHECK-NEXT:    br i1 [[COND_FR]], label [[DOTEXIT_THREAD:%.*]], label [[DOTEXIT_THREAD4]]
+; CHECK:       0:
 ; CHECK-NEXT:    br label [[DOTEXIT_THREAD4]]
-; CHECK:       .exit.thread4:
-; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ [[J]], [[DOTEXIT_THREAD]] ], [ [[ADD3]], [[DOTEXIT]] ], [ [[ADD3]], [[ENTRY:%.*]] ], [ [[ADD3]], [[COND_FALSE_6_I]] ]
+; CHECK:       .exit.thread:
+; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ [[ADD3]], [[DOTEXIT]] ], [ [[J]], [[DOTEXIT_THREAD]] ], [ [[J]], [[COND_FALSE_I]] ], [ [[J]], [[COND_FALSE_10_I]] ], [ [[ADD3]], [[ENTRY:%.*]] ], [ [[ADD3]], [[COND_FALSE_6_I]] ]
 ; CHECK-NEXT:    ret i32 [[TMP0]]
 ;
 entry:
@@ -430,23 +430,23 @@ define i32 @unfold4(i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z, i32 %j) noun
 ; CHECK-NEXT:    br i1 [[CMP_I]], label [[DOTEXIT_THREAD:%.*]], label [[COND_FALSE_I:%.*]]
 ; CHECK:       cond.false.i:
 ; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp sgt i32 [[U]], [[V]]
-; CHECK-NEXT:    br i1 [[CMP4_I]], label [[DOTEXIT_THREAD5:%.*]], label [[COND_FALSE_6_I:%.*]]
+; CHECK-NEXT:    br i1 [[CMP4_I]], label [[DOTEXIT_THREAD]], label [[COND_FALSE_6_I:%.*]]
 ; CHECK:       cond.false.6.i:
 ; CHECK-NEXT:    [[CMP8_I:%.*]] = icmp slt i32 [[W:%.*]], [[X:%.*]]
 ; CHECK-NEXT:    br i1 [[CMP8_I]], label [[DOTEXIT_THREAD]], label [[COND_FALSE_10_I:%.*]]
 ; CHECK:       cond.false.10.i:
 ; CHECK-NEXT:    [[CMP13_I:%.*]] = icmp sgt i32 [[W]], [[X]]
-; CHECK-NEXT:    br i1 [[CMP13_I]], label [[DOTEXIT_THREAD5]], label [[DOTEXIT:%.*]]
+; CHECK-NEXT:    br i1 [[CMP13_I]], label [[DOTEXIT_THREAD]], label [[DOTEXIT:%.*]]
 ; CHECK:       .exit:
 ; CHECK-NEXT:    [[CMP19_I:%.*]] = icmp sge i32 [[Y:%.*]], [[Z:%.*]]
 ; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[CMP19_I]] to i32
 ; CHECK-NEXT:    [[LNOT_I18:%.*]] = icmp eq i32 [[CONV]], 1
 ; CHECK-NEXT:    [[COND_FR:%.*]] = freeze i1 [[LNOT_I18]]
-; CHECK-NEXT:    br i1 [[COND_FR]], label [[DOTEXIT_THREAD]], label [[DOTEXIT_THREAD5]]
+; CHECK-NEXT:    br i1 [[COND_FR]], label [[TMP1:%.*]], label [[DOTEXIT_THREAD]]
+; CHECK:       0:
+; CHECK-NEXT:    br label [[DOTEXIT_THREAD]]
 ; CHECK:       .exit.thread:
-; CHECK-NEXT:    br label [[DOTEXIT_THREAD5]]
-; CHECK:       .exit.thread5:
-; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ [[J]], [[DOTEXIT_THREAD]] ], [ [[ADD3]], [[DOTEXIT]] ], [ [[ADD3]], [[COND_FALSE_I]] ], [ [[ADD3]], [[COND_FALSE_10_I]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ [[ADD3]], [[DOTEXIT]] ], [ [[J]], [[TMP1]] ], [ [[J]], [[ENTRY:%.*]] ], [ [[J]], [[COND_FALSE_6_I]] ], [ [[ADD3]], [[COND_FALSE_I]] ], [ [[ADD3]], [[COND_FALSE_10_I]] ]
 ; CHECK-NEXT:    ret i32 [[TMP0]]
 ;
 entry:
@@ -560,10 +560,10 @@ define void @test_func(ptr nocapture readonly %a, ptr nocapture readonly %b, ptr
 ; CHECK:       if.end:
 ; CHECK-NEXT:    [[LOCAL_VAR_0:%.*]] = phi i32 [ [[TMP1]], [[FOR_BODY]] ]
 ; CHECK-NEXT:    switch i32 [[LOCAL_VAR_0]], label [[SW_DEFAULT]] [
-; CHECK-NEXT:    i32 2, label [[SW_BB]]
-; CHECK-NEXT:    i32 4, label [[SW_BB7]]
-; CHECK-NEXT:    i32 5, label [[SW_BB8:%.*]]
-; CHECK-NEXT:    i32 7, label [[SW_BB9:%.*]]
+; CHECK-NEXT:      i32 2, label [[SW_BB]]
+; CHECK-NEXT:      i32 4, label [[SW_BB7]]
+; CHECK-NEXT:      i32 5, label [[SW_BB8:%.*]]
+; CHECK-NEXT:      i32 7, label [[SW_BB9:%.*]]
 ; CHECK-NEXT:    ]
 ; CHECK:       sw.bb:
 ; CHECK-NEXT:    call void @foo()
@@ -669,8 +669,10 @@ if.end:
 !0 = !{!"branch_weights", i64 1073741824, i64 3221225472}
 !1 = !{!"function_entry_count", i64 1984}
 ;.
-; CHECK: attributes #[[ATTR0:[0-9]+]] = { nounwind }
+; CHECK-BPI: attributes #[[ATTR0:[0-9]+]] = { nounwind }
 ;.
-; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1984}
-; CHECK: [[PROF1]] = !{!"branch_weights", i64 1073741824, i64 3221225472}
+; CHECK-BPI: [[META0:![0-9]+]] = !{!"function_entry_count", i64 1984}
+; CHECK-BPI: [[PROF1]] = !{!"branch_weights", i64 1073741824, i64 3221225472}
 ;.
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK-BPI: {{.*}}
diff --git a/llvm/test/Transforms/JumpThreading/thread-prob-7.ll b/llvm/test/Transforms/JumpThreading/thread-prob-7.ll
index 8c9d89871d00b3..4623a579be48f6 100644
--- a/llvm/test/Transforms/JumpThreading/thread-prob-7.ll
+++ b/llvm/test/Transforms/JumpThreading/thread-prob-7.ll
@@ -14,15 +14,15 @@ define i32 @func0(i32 %a0, i32 %a1) !prof !0 {
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[BB_JOIN_THREAD:%.*]], label [[TEST2_FALSE:%.*]], !prof [[PROF2:![0-9]+]]
 ; CHECK:       test2_false:
 ; CHECK-NEXT:    call void @foobar()
-; CHECK-NEXT:    br label [[TMP0:%.*]]
+; CHECK-NEXT:    br label [[BB_JOIN_THREAD]]
 ; CHECK:       bb_join:
 ; CHECK-NEXT:    [[C:%.*]] = phi i1 [ [[CX]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    [[COND_FR:%.*]] = freeze i1 [[C]]
-; CHECK-NEXT:    br i1 [[COND_FR]], label [[BB_JOIN_THREAD]], label [[TMP0]], !prof [[PROF3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[COND_FR]], label [[BB_JOIN_THREAD1:%.*]], label [[BB_JOIN_THREAD]], !prof [[PROF3:![0-9]+]]
 ; CHECK:       bb_join.thread:
-; CHECK-NEXT:    br label [[TMP0]]
+; CHECK-NEXT:    br label [[BB_JOIN_THREAD]]
 ; CHECK:       0:
-; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ 42, [[BB_JOIN_THREAD]] ], [ 7, [[BB_JOIN]] ], [ 7, [[TEST2_FALSE]] ]
+; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ 7, [[BB_JOIN]] ], [ 7, [[TEST2_FALSE]] ], [ 42, [[TEST2]] ], [ 42, [[BB_JOIN_THREAD1]] ]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
 entry:
diff --git a/llvm/test/Transforms/JumpThreading/uncond-no-phi.ll b/llvm/test/Transforms/JumpThreading/uncond-no-phi.ll
new file mode 100644
index 00000000000000..6104e8f8778bc0
--- /dev/null
+++ b/llvm/test/Transforms/JumpThreading/uncond-no-phi.ll
@@ -0,0 +1,123 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=jump-threading -S < %s | FileCheck %s
+
+define i1 @if_else(i1 %c, i1 %c1) {
+; CHECK-LABEL: define i1 @if_else(
+; CHECK-SAME: i1 [[C:%.*]], i1 [[C1:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 [[C]], label [[THEN:%.*]], label [[RETURN:%.*]]
+; CHECK:       then:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br i1 [[C1]], label [[ELSE:%.*]], label [[RETURN]]
+; CHECK:       else:
+; CHECK-NEXT:    br label [[RETURN]]
+; CHECK:       return:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i1 [ false, [[THEN]] ], [ true, [[ENTRY:%.*]] ], [ true, [[ELSE]] ]
+; CHECK-NEXT:    ret i1 [[RETVAL_0]]
+;
+entry:
+  br i1 %c, label %then, label %else
+
+then:
+  call void @dummy()
+  br i1 %c1, label %else, label %return
+
+else:
+  br label %return
+
+return:
+  %retval.0 = phi i1 [ true, %else ], [ false, %then ]
+  ret i1 %retval.0
+}
+
+define i8 @switch_uncond(i8 %arg) {
+; CHECK-LABEL: define i8 @switch_uncond(
+; CHECK-SAME: i8 [[ARG:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    switch i8 [[ARG]], label [[DEFAULT:%.*]] [
+; CHECK-NEXT:      i8 0, label [[BB1:%.*]]
+; CHECK-NEXT:      i8 1, label [[BB3:%.*]]
+; CHECK-NEXT:      i8 2, label [[BB2:%.*]]
+; CHECK-NEXT:      i8 3, label [[END:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       default:
+; CHECK-NEXT:    unreachable
+; CHECK:       bb:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       bb1:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       end:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ 1, [[ENTRY:%.*]] ], [ 0, [[BB3]] ], [ 0, [[BB1]] ], [ 0, [[BB2]] ]
+; CHECK-NEXT:    ret i8 [[PHI]]
+;
+entry:
+  switch i8 %arg, label %default [
+  i8 0, label %bb
+  i8 1, label %bb1
+  i8 2, label %bb2
+  i8 3, label %end
+  ]
+
+default:
+  unreachable
+
+bb:
+  call void @dummy()
+  br label %bb2
+
+bb1:
+  call void @dummy()
+  br label %bb2
+
+; Predecessors of %bb2 are %bb and %bb1, they are not identical.
+; So we can thread %bb2.
+bb2:
+  br label %end
+
+end:
+  %phi = phi i8 [ 0, %bb2 ], [ 1, %entry ]
+  ret i8 %phi
+}
+
+define i8 @switch_uncond_fail(i8 %arg) {
+; CHECK-LABEL: define i8 @switch_uncond_fail(
+; CHECK-SAME: i8 [[ARG:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    switch i8 [[ARG]], label [[DEFAULT:%.*]] [
+; CHECK-NEXT:      i8 0, label [[BB:%.*]]
+; CHECK-NEXT:      i8 1, label [[BB]]
+; CHECK-NEXT:      i8 2, label [[END:%.*]]
+; CHECK-NEXT:    ]
+; CHECK:       default:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       bb:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       end:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ 0, [[BB]] ], [ 1, [[ENTRY:%.*]] ], [ 2, [[DEFAULT]] ]
+; CHECK-NEXT:    ret i8 [[PHI]]
+;
+entry:
+  switch i8 %arg, label %default [
+  i8 0, label %bb
+  i8 1, label %bb
+  i8 2, label %end
+  ]
+
+default:
+  br label %end
+
+; Predecessor of %bb is only %entry (though there are two in predecessor list),
+; thus it's unthreadable.
+bb:
+  br label %end
+
+end:
+  %phi = phi i8 [ 0, %bb ], [ 1, %entry ], [ 2, %default ]
+  ret i8 %phi
+}
+
+declare void @dummy()

Copy link

github-actions bot commented Mar 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 22, 2024
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for additional approval from other reviewers.

llvm/lib/Transforms/Utils/Local.cpp Outdated Show resolved Hide resolved
Copy link

✅ With the latest revision this PR passed the Python code formatter.

llvm/test/Transforms/SimplifyCFG/branch-fold.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/and-sink.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/SimplifyCFG/thread-uncond-bb.ll Outdated Show resolved Hide resolved
@DianQK
Copy link
Member

DianQK commented Mar 24, 2024

Hmm, please consider the following case:

define i32 @thread_uncond_bb_cmp_zext(i1 %c, i32 %v, i32 %v2, i32 %v3) {
entry:
  br i1 %c, label %do.end, label %if.then

if.then:                                          ; preds = %entry
  call void @dummy()
  %tobool = icmp eq i32 %v, 0
  br i1 %tobool, label %do.end, label %return

do.end:                                           ; preds = %entry, %if.then
  br label %return

return:                                           ; preds = %if.then, %do.end
  %retval = phi i32 [ %v2, %do.end ], [ %v3, %if.then ]
  ret i32 %retval
}

declare void @dummy()

https://llvm.godbolt.org/z/74f7qz54E.

The generated code will become more complex if other passes cannot eliminate a branch instruction or convert it into a select instruction.

@XChy
Copy link
Member Author

XChy commented Mar 26, 2024

@DianQK, in most cases, we could produce a select instruction for such branch. And whether fold it to select in generated code depends on the backend. The armv7 and aarch64 backends emit the select as expected for your example: https://llvm.godbolt.org/z/v4ParEM8E
As the patch shortens the distance between srcBB and dstBB, we enable more optimization. This may split one shared data move for rare unoptimized cases, increasing codesize a bit, but I don't think it would have obivous impact on performance.

Copy link
Member

@DianQK DianQK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with "Comment" :3. (It's usually just because I haven't prepared myself to approve.)

My main concern is the impact of introducing more phi nodes in larger programs. Perhaps I'm overthinking it.
If I remember correctly, #85510 is also affected because CanRedirectPredsOfEmptyBBToSucc can only see less facts.

@XChy
Copy link
Member Author

XChy commented Apr 12, 2024

I would like to merge this, @nikic , any comment?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm/test/Transforms/JumpThreading/select.ll Outdated Show resolved Hide resolved
@XChy XChy merged commit 36b3c26 into llvm:main Apr 16, 2024
3 of 4 checks passed
@nikic
Copy link
Contributor

nikic commented Apr 16, 2024

This seems to cause a 0.15% compile-time regression using stage2 clang only (https://llvm-compile-time-tracker.com/compare.php?from=1120d8e6f799121b611aa23bdc128e40cf9c6c58&to=36b3c26451bf9a42f0b6b415993d3942bb73abdd&stat=instructions:u), which indicates that causes an optimization regression, at least for clang :(

@XChy
Copy link
Member Author

XChy commented Apr 16, 2024

Emm, unexpected result :(. Since I can't see obvious regression in llvm-opt-benchmark and have no simple regression testcase for it, I can't fix it immediately. If emergent, feel free to revert this patch.

@dyung
Copy link
Collaborator

dyung commented Apr 16, 2024

@XChy your change seems to be causing a test failure of llvm/test/CodeGen/Generic/live-debug-label.ll on several buildbots:

Can you take a look and revert if you need time to investigate?

@XChy
Copy link
Member Author

XChy commented Apr 16, 2024

For compile-time regression and a failure on debug info test(I can't reproduce it on my machine), revert this patch now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[JumpThreading] Missing optimization: thread over a nearly empty basicblock
6 participants