[MachineSink] Fix missing sinks along critical edges #97618

mshockwave · 2024-07-03T18:25:22Z

4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V.

This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities.
This change also improves two of the AArch64's cases.

I'd put a pre-commit test in this PR for easier comparison.

llvmbot · 2024-07-03T18:25:58Z

@llvm/pr-subscribers-backend-aarch64

Author: Min-Yih Hsu (mshockwave)

Changes

4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V.

This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities.
This change also improves two of the AArch64's cases.

I'd put a pre-commit test in this PR for easier comparison.

Full diff: https://github.com/llvm/llvm-project/pull/97618.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/MachineSink.cpp (+66-15)
(modified) llvm/test/CodeGen/AArch64/and-sink.ll (+5-8)
(modified) llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll (+8-16)
(added) llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll (+185)

diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 4dabaabe3659f..6c8b4a6b7338b 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -130,6 +130,14 @@ namespace {
     // Remember which edges have been considered for breaking.
     SmallSet<std::pair<MachineBasicBlock*, MachineBasicBlock*>, 8>
     CEBCandidates;
+    // Memorize the register that also wanted to sink into the same block along
+    // a different critical edge.
+    // {register to sink, sink-to block} -> the first sink-from block.
+    // We're recording the first sink-from block because that (critical) edge
+    // was deferred until we see another register that's going to sink into the
+    // same block.
+    DenseMap<std::pair<Register, MachineBasicBlock *>, MachineBasicBlock *>
+        CEMergeCandidates;
     // Remember which edges we are about to split.
     // This is different from CEBCandidates since those edges
     // will be split.
@@ -197,14 +205,17 @@ namespace {
 
     void releaseMemory() override {
       CEBCandidates.clear();
+      CEMergeCandidates.clear();
     }
 
   private:
     bool ProcessBlock(MachineBasicBlock &MBB);
     void ProcessDbgInst(MachineInstr &MI);
-    bool isWorthBreakingCriticalEdge(MachineInstr &MI,
-                                     MachineBasicBlock *From,
-                                     MachineBasicBlock *To);
+    bool isLegalBreakingCriticalEdge(MachineInstr &MI, MachineBasicBlock *From,
+                                     MachineBasicBlock *To, bool BreakPHIEdge);
+    bool isWorthBreakingCriticalEdge(MachineInstr &MI, MachineBasicBlock *From,
+                                     MachineBasicBlock *To,
+                                     MachineBasicBlock *&DeferredFromBlock);
 
     bool hasStoreBetween(MachineBasicBlock *From, MachineBasicBlock *To,
                          MachineInstr &MI);
@@ -725,6 +736,7 @@ bool MachineSinking::runOnMachineFunction(MachineFunction &MF) {
 
     // Process all basic blocks.
     CEBCandidates.clear();
+    CEMergeCandidates.clear();
     ToSplit.clear();
     for (auto &MBB: MF)
       MadeChange |= ProcessBlock(MBB);
@@ -873,9 +885,9 @@ void MachineSinking::ProcessDbgInst(MachineInstr &MI) {
   SeenDbgVars.insert(Var);
 }
 
-bool MachineSinking::isWorthBreakingCriticalEdge(MachineInstr &MI,
-                                                 MachineBasicBlock *From,
-                                                 MachineBasicBlock *To) {
+bool MachineSinking::isWorthBreakingCriticalEdge(
+    MachineInstr &MI, MachineBasicBlock *From, MachineBasicBlock *To,
+    MachineBasicBlock *&DeferredFromBlock) {
   // FIXME: Need much better heuristics.
 
   // If the pass has already considered breaking this edge (during this pass
@@ -887,6 +899,27 @@ bool MachineSinking::isWorthBreakingCriticalEdge(MachineInstr &MI,
   if (!MI.isCopy() && !TII->isAsCheapAsAMove(MI))
     return true;
 
+  // Check and record the register and the destination block we want to sink
+  // into. Note that we want to do the following before the next check on branch
+  // probability. Because we want to record the initial candidate even if it's
+  // on hot edge, so that other candidates that might not on hot edges can be
+  // sinked as well.
+  for (const auto &MO : MI.all_defs()) {
+    Register Reg = MO.getReg();
+    if (!Reg)
+      continue;
+    Register SrcReg = Reg.isVirtual() ? TRI->lookThruCopyLike(Reg, MRI) : Reg;
+    auto Key = std::make_pair(SrcReg, To);
+    auto Res = CEMergeCandidates.insert(std::make_pair(Key, From));
+    // We wanted to sink the same register into the same block, consider it to
+    // be profitable.
+    if (!Res.second) {
+      // Return the source block that was previously holded off.
+      DeferredFromBlock = Res.first->second;
+      return true;
+    }
+  }
+
   if (From->isSuccessor(To) && MBPI->getEdgeProbability(From, To) <=
       BranchProbability(SplitEdgeProbabilityThreshold, 100))
     return true;
@@ -921,13 +954,10 @@ bool MachineSinking::isWorthBreakingCriticalEdge(MachineInstr &MI,
   return false;
 }
 
-bool MachineSinking::PostponeSplitCriticalEdge(MachineInstr &MI,
-                                               MachineBasicBlock *FromBB,
-                                               MachineBasicBlock *ToBB,
-                                               bool BreakPHIEdge) {
-  if (!isWorthBreakingCriticalEdge(MI, FromBB, ToBB))
-    return false;
-
+bool MachineSinking::isLegalBreakingCriticalEdge(MachineInstr &MI,
+                                                 MachineBasicBlock *FromBB,
+                                                 MachineBasicBlock *ToBB,
+                                                 bool BreakPHIEdge) {
   // Avoid breaking back edge. From == To means backedge for single BB cycle.
   if (!SplitEdges || FromBB == ToBB)
     return false;
@@ -985,11 +1015,32 @@ bool MachineSinking::PostponeSplitCriticalEdge(MachineInstr &MI,
         return false;
   }
 
-  ToSplit.insert(std::make_pair(FromBB, ToBB));
-
   return true;
 }
 
+bool MachineSinking::PostponeSplitCriticalEdge(MachineInstr &MI,
+                                               MachineBasicBlock *FromBB,
+                                               MachineBasicBlock *ToBB,
+                                               bool BreakPHIEdge) {
+  bool Status = false;
+  MachineBasicBlock *DeferredFromBB = nullptr;
+  if (isWorthBreakingCriticalEdge(MI, FromBB, ToBB, DeferredFromBB)) {
+    // If there is a DeferredFromBB, we consider FromBB only if _both_
+    // of them are legal to split.
+    if ((!DeferredFromBB ||
+         ToSplit.count(std::make_pair(DeferredFromBB, ToBB)) ||
+         isLegalBreakingCriticalEdge(MI, DeferredFromBB, ToBB, BreakPHIEdge)) &&
+        isLegalBreakingCriticalEdge(MI, FromBB, ToBB, BreakPHIEdge)) {
+      ToSplit.insert(std::make_pair(FromBB, ToBB));
+      if (DeferredFromBB)
+        ToSplit.insert(std::make_pair(DeferredFromBB, ToBB));
+      Status = true;
+    }
+  }
+
+  return Status;
+}
+
 std::vector<unsigned> &
 MachineSinking::getBBRegisterPressure(const MachineBasicBlock &MBB) {
   // Currently to save compiling time, MBB's register pressure will not change
diff --git a/llvm/test/CodeGen/AArch64/and-sink.ll b/llvm/test/CodeGen/AArch64/and-sink.ll
index f298a55dab721..c84310629e5fd 100644
--- a/llvm/test/CodeGen/AArch64/and-sink.ll
+++ b/llvm/test/CodeGen/AArch64/and-sink.ll
@@ -46,9 +46,8 @@ bb2:
 define dso_local i32 @and_sink2(i32 %a, i1 %c, i1 %c2) {
 ; CHECK-LABEL: and_sink2:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, wzr
-; CHECK-NEXT:    adrp x9, A
-; CHECK-NEXT:    str wzr, [x9, :lo12:A]
+; CHECK-NEXT:    adrp x8, A
+; CHECK-NEXT:    str wzr, [x8, :lo12:A]
 ; CHECK-NEXT:    tbz w1, #0, .LBB1_5
 ; CHECK-NEXT:  // %bb.1: // %bb0.preheader
 ; CHECK-NEXT:    adrp x8, B
@@ -56,17 +55,15 @@ define dso_local i32 @and_sink2(i32 %a, i1 %c, i1 %c2) {
 ; CHECK-NEXT:  .LBB1_2: // %bb0
 ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    str wzr, [x8, :lo12:B]
-; CHECK-NEXT:    tbz w2, #0, .LBB1_6
+; CHECK-NEXT:    tbz w2, #0, .LBB1_5
 ; CHECK-NEXT:  // %bb.3: // %bb1
 ; CHECK-NEXT:    // in Loop: Header=BB1_2 Depth=1
 ; CHECK-NEXT:    str wzr, [x9, :lo12:C]
 ; CHECK-NEXT:    tbnz w0, #2, .LBB1_2
 ; CHECK-NEXT:  // %bb.4:
-; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:  .LBB1_5: // %common.ret
-; CHECK-NEXT:    mov w0, w8
+; CHECK-NEXT:    mov w0, #1 // =0x1
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB1_6:
+; CHECK-NEXT:  .LBB1_5:
 ; CHECK-NEXT:    mov w0, wzr
 ; CHECK-NEXT:    ret
 
diff --git a/llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll b/llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll
index d92bbfd7a21d6..49e31447c1c0d 100644
--- a/llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll
+++ b/llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll
@@ -4,13 +4,11 @@
 define i64 @test_or(i32 %a, i32 %b) {
 ; CHECK-LABEL: test_or:
 ; CHECK:       ; %bb.0: ; %bb1
-; CHECK-NEXT:    mov w8, w0
+; CHECK-NEXT:    cbnz w0, LBB0_2
+; CHECK-NEXT:  LBB0_1:
 ; CHECK-NEXT:    mov x0, xzr
-; CHECK-NEXT:    cbnz w8, LBB0_2
-; CHECK-NEXT:  LBB0_1: ; %common.ret
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  LBB0_2: ; %bb1.cond.split
-; CHECK-NEXT:    mov x0, xzr
 ; CHECK-NEXT:    cbz w1, LBB0_1
 ; CHECK-NEXT:  ; %bb.3: ; %bb4
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
@@ -37,13 +35,11 @@ bb4:
 define i64 @test_or_select(i32 %a, i32 %b) {
 ; CHECK-LABEL: test_or_select:
 ; CHECK:       ; %bb.0: ; %bb1
-; CHECK-NEXT:    mov w8, w0
+; CHECK-NEXT:    cbnz w0, LBB1_2
+; CHECK-NEXT:  LBB1_1:
 ; CHECK-NEXT:    mov x0, xzr
-; CHECK-NEXT:    cbnz w8, LBB1_2
-; CHECK-NEXT:  LBB1_1: ; %common.ret
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  LBB1_2: ; %bb1.cond.split
-; CHECK-NEXT:    mov x0, xzr
 ; CHECK-NEXT:    cbz w1, LBB1_1
 ; CHECK-NEXT:  ; %bb.3: ; %bb4
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
@@ -70,13 +66,11 @@ bb4:
 define i64 @test_and(i32 %a, i32 %b) {
 ; CHECK-LABEL: test_and:
 ; CHECK:       ; %bb.0: ; %bb1
-; CHECK-NEXT:    mov w8, w0
+; CHECK-NEXT:    cbnz w0, LBB2_2
+; CHECK-NEXT:  LBB2_1:
 ; CHECK-NEXT:    mov x0, xzr
-; CHECK-NEXT:    cbnz w8, LBB2_2
-; CHECK-NEXT:  LBB2_1: ; %common.ret
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  LBB2_2: ; %bb1.cond.split
-; CHECK-NEXT:    mov x0, xzr
 ; CHECK-NEXT:    cbz w1, LBB2_1
 ; CHECK-NEXT:  ; %bb.3: ; %bb4
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
@@ -103,13 +97,11 @@ bb4:
 define i64 @test_and_select(i32 %a, i32 %b) {
 ; CHECK-LABEL: test_and_select:
 ; CHECK:       ; %bb.0: ; %bb1
-; CHECK-NEXT:    mov w8, w0
+; CHECK-NEXT:    cbnz w0, LBB3_2
+; CHECK-NEXT:  LBB3_1:
 ; CHECK-NEXT:    mov x0, xzr
-; CHECK-NEXT:    cbnz w8, LBB3_2
-; CHECK-NEXT:  LBB3_1: ; %common.ret
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  LBB3_2: ; %bb1.cond.split
-; CHECK-NEXT:    mov x0, xzr
 ; CHECK-NEXT:    cbz w1, LBB3_1
 ; CHECK-NEXT:  ; %bb.3: ; %bb4
 ; CHECK-NEXT:    stp x29, x30, [sp, #-16]! ; 16-byte Folded Spill
diff --git a/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
new file mode 100644
index 0000000000000..00d10b9f2bc0c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll
@@ -0,0 +1,185 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 < %s | FileCheck %s
+
+define i1 @sink_li(ptr %text, ptr %text.addr.0) {
+; CHECK-LABEL: sink_li:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi sp, sp, -32
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset ra, -8
+; CHECK-NEXT:    .cfi_offset s0, -16
+; CHECK-NEXT:    .cfi_offset s1, -24
+; CHECK-NEXT:    .cfi_offset s2, -32
+; CHECK-NEXT:    mv s1, a1
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    call toupper
+; CHECK-NEXT:    li a1, 0
+; CHECK-NEXT:    beqz s0, .LBB0_26
+; CHECK-NEXT:  # %bb.1: # %while.body.preheader
+; CHECK-NEXT:    li a2, 1
+; CHECK-NEXT:    li a3, 9
+; CHECK-NEXT:    li a4, 32
+; CHECK-NEXT:  .LBB0_2: # %while.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_4
+; CHECK-NEXT:  # %bb.3: # %while.body
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bne a2, a3, .LBB0_16
+; CHECK-NEXT:  .LBB0_4: # %while.body.1
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_6
+; CHECK-NEXT:  # %bb.5: # %while.body.1
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bne a2, a3, .LBB0_17
+; CHECK-NEXT:  .LBB0_6: # %while.body.3
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_8
+; CHECK-NEXT:  # %bb.7: # %while.body.3
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bne a2, a4, .LBB0_19
+; CHECK-NEXT:  .LBB0_8: # %while.body.4
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_10
+; CHECK-NEXT:  # %bb.9: # %while.body.4
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bne a2, a4, .LBB0_21
+; CHECK-NEXT:  .LBB0_10: # %while.body.5
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_12
+; CHECK-NEXT:  # %bb.11: # %while.body.5
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bne a2, a3, .LBB0_23
+; CHECK-NEXT:  .LBB0_12: # %while.body.6
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    bnez a2, .LBB0_2
+; CHECK-NEXT:  # %bb.13: # %while.body.6
+; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
+; CHECK-NEXT:    beq a2, a3, .LBB0_2
+; CHECK-NEXT:  # %bb.14: # %while.body.6
+; CHECK-NEXT:    beqz a2, .LBB0_24
+; CHECK-NEXT:  # %bb.15: # %strdup.exit.split.loop.exit126
+; CHECK-NEXT:    addi s0, s1, 7
+; CHECK-NEXT:    j .LBB0_25
+; CHECK-NEXT:  .LBB0_16: # %while.body
+; CHECK-NEXT:    beqz a2, .LBB0_26
+; CHECK-NEXT:    j .LBB0_18
+; CHECK-NEXT:  .LBB0_17: # %while.body.1
+; CHECK-NEXT:    beqz a2, .LBB0_24
+; CHECK-NEXT:  .LBB0_18: # %strdup.exit.loopexit
+; CHECK-NEXT:    li s0, 0
+; CHECK-NEXT:    j .LBB0_25
+; CHECK-NEXT:  .LBB0_19: # %while.body.3
+; CHECK-NEXT:    beqz a2, .LBB0_24
+; CHECK-NEXT:  # %bb.20: # %strdup.exit.split.loop.exit120
+; CHECK-NEXT:    addi s0, s1, 4
+; CHECK-NEXT:    j .LBB0_25
+; CHECK-NEXT:  .LBB0_21: # %while.body.4
+; CHECK-NEXT:    beqz a2, .LBB0_24
+; CHECK-NEXT:  # %bb.22: # %strdup.exit.split.loop.exit122
+; CHECK-NEXT:    addi s0, s1, 5
+; CHECK-NEXT:    j .LBB0_25
+; CHECK-NEXT:  .LBB0_23: # %while.body.5
+; CHECK-NEXT:    bnez a2, .LBB0_25
+; CHECK-NEXT:  .LBB0_24:
+; CHECK-NEXT:    li a1, 0
+; CHECK-NEXT:    j .LBB0_26
+; CHECK-NEXT:  .LBB0_25: # %strdup.exit
+; CHECK-NEXT:    li s1, 0
+; CHECK-NEXT:    mv s2, a0
+; CHECK-NEXT:    li a0, 0
+; CHECK-NEXT:    mv a1, s0
+; CHECK-NEXT:    jalr s1
+; CHECK-NEXT:    li a0, 0
+; CHECK-NEXT:    mv a1, s2
+; CHECK-NEXT:    li a2, 0
+; CHECK-NEXT:    jalr s1
+; CHECK-NEXT:    li a1, 1
+; CHECK-NEXT:  .LBB0_26: # %return
+; CHECK-NEXT:    mv a0, a1
+; CHECK-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
+; CHECK-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
+; CHECK-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
+; CHECK-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
+; CHECK-NEXT:    addi sp, sp, 32
+; CHECK-NEXT:    ret
+entry:
+  %call = call i32 @toupper()
+  %tobool.not = icmp eq ptr %text, null
+  br i1 %tobool.not, label %return, label %while.body
+
+while.body:                                       ; preds = %while.body.6, %while.body.6, %entry
+  switch i8 1, label %strdup.exit.split.loop.exit114 [
+    i8 1, label %while.body.1
+    i8 9, label %while.body.1
+    i8 0, label %return
+  ]
+
+while.body.1:                                     ; preds = %while.body, %while.body
+  switch i8 1, label %strdup.exit [
+    i8 1, label %while.body.3
+    i8 9, label %while.body.3
+    i8 0, label %return
+  ]
+
+while.body.3:                                     ; preds = %while.body.1, %while.body.1
+  switch i8 1, label %strdup.exit.split.loop.exit120 [
+    i8 32, label %while.body.4
+    i8 1, label %while.body.4
+    i8 0, label %return
+  ]
+
+while.body.4:                                     ; preds = %while.body.3, %while.body.3
+  switch i8 1, label %strdup.exit.split.loop.exit122 [
+    i8 32, label %while.body.5
+    i8 1, label %while.body.5
+    i8 0, label %return
+  ]
+
+while.body.5:                                     ; preds = %while.body.4, %while.body.4
+  switch i8 1, label %strdup.exit.split.loop.exit124 [
+    i8 1, label %while.body.6
+    i8 9, label %while.body.6
+    i8 0, label %return
+  ]
+
+while.body.6:                                     ; preds = %while.body.5, %while.body.5
+  switch i8 1, label %strdup.exit.split.loop.exit126 [
+    i8 1, label %while.body
+    i8 9, label %while.body
+    i8 0, label %return
+  ]
+
+strdup.exit.split.loop.exit114:        ; preds = %while.body
+  br label %strdup.exit
+
+strdup.exit.split.loop.exit120:        ; preds = %while.body.3
+  %incdec.ptr.3.le = getelementptr i8, ptr %text.addr.0, i64 4
+  br label %strdup.exit
+
+strdup.exit.split.loop.exit122:        ; preds = %while.body.4
+  %incdec.ptr.4.le = getelementptr i8, ptr %text.addr.0, i64 5
+  br label %strdup.exit
+
+strdup.exit.split.loop.exit124:        ; preds = %while.body.5
+  br label %strdup.exit
+
+strdup.exit.split.loop.exit126:        ; preds = %while.body.6
+  %incdec.ptr.6.le = getelementptr i8, ptr %text.addr.0, i64 7
+  br label %strdup.exit
+
+strdup.exit:                           ; preds = %strdup.exit.split.loop.exit126, %strdup.exit.split.loop.exit124, %strdup.exit.split.loop.exit122, %strdup.exit.split.loop.exit120, %strdup.exit.split.loop.exit114, %while.body.1
+  %text.addr.0.lcssa = phi ptr [ null, %strdup.exit.split.loop.exit114 ], [ %incdec.ptr.3.le, %strdup.exit.split.loop.exit120 ], [ %incdec.ptr.4.le, %strdup.exit.split.loop.exit122 ], [ %text, %strdup.exit.split.loop.exit124 ], [ %incdec.ptr.6.le, %strdup.exit.split.loop.exit126 ], [ null, %while.body.1 ]
+  %call5.i = tail call ptr null(ptr null, ptr %text.addr.0.lcssa)
+  %memchr64 = tail call ptr null(ptr null, i32 %call, i64 0)
+  br label %return
+
+return:                                           ; preds = %strdup.exit, %while.body.6, %while.body.5, %while.body.4, %while.body.3, %while.body.1, %while.body, %entry
+  %retval.1 = phi i1 [ false, %entry ], [ true, %strdup.exit ], [ false, %while.body ], [ false, %while.body.1 ], [ false, %while.body.3 ], [ false, %while.body.4 ], [ false, %while.body.5 ], [ false, %while.body.6 ]
+  ret i1 %retval.1
+}
+
+declare i32 @toupper()

topperc · 2024-07-03T22:39:06Z

llvm/lib/CodeGen/MachineSink.cpp

+    // We wanted to sink the same register into the same block, consider it to
+    // be profitable.
+    if (!Res.second) {
+      // Return the source block that was previously holded off.


holded -> held

topperc · 2024-07-03T22:40:34Z

llvm/lib/CodeGen/MachineSink.cpp

+      continue;
+    Register SrcReg = Reg.isVirtual() ? TRI->lookThruCopyLike(Reg, MRI) : Reg;
+    auto Key = std::make_pair(SrcReg, To);
+    auto Res = CEMergeCandidates.insert(std::make_pair(Key, From));


You can use try_emplace instead of insert to avoid the make_pair

Good point. It's fixed now.

topperc · 2024-07-03T22:44:08Z

llvm/lib/CodeGen/MachineSink.cpp

-  if (!isWorthBreakingCriticalEdge(MI, FromBB, ToBB))
-    return false;
-
+bool MachineSinking::isLegalBreakingCriticalEdge(MachineInstr &MI,


isLegalToBreakCriticalEdge?

topperc · 2024-07-03T22:45:33Z

llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll

+; CHECK-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset ra, -8


Add nounwind to drop cfi directives

wangpc-pp

LGTM. Thanks!

4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V. This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities.

alexfh · 2024-07-11T14:46:10Z

This commit causes an assertion failure when compiling some code with hwasan for aarch64:

assertion failed at llvm-project/llvm/lib/CodeGen/MachineBasicBlock.cpp:877 in void llvm::MachineBasicBlock::replaceSuccessor(MachineBasicBlock *, MachineBasicBlock *): OldI != E && "Old is not a successor of this block"
    @     0x55c77a4e3cc4  __assert_fail
    @     0x55c778a22219  llvm::MachineBasicBlock::replaceSuccessor()
    @     0x55c778a239b6  llvm::MachineBasicBlock::SplitCriticalEdge()
    @     0x55c778b558fc  (anonymous namespace)::MachineSinking::runOnMachineFunction()
    @     0x55c778a9c590  llvm::MachineFunctionPass::runOnFunction()
    @     0x55c77a195cb2  llvm::FPPassManager::runOnFunction()
    @     0x55c77a19dbc4  llvm::FPPassManager::runOnModule()
    @     0x55c77a1967a3  llvm::legacy::PassManagerImpl::run()
    @     0x55c774f5f9fc  clang::EmitBackendOutput()
    @     0x55c774bb077f  clang::BackendConsumer::HandleTranslationUnit()
    @     0x55c775b505b9  clang::ParseAST()
    @     0x55c77589d4c3  clang::FrontendAction::Execute()
    @     0x55c775817d8d  clang::CompilerInstance::ExecuteAction()
    @     0x55c77482a6d8  clang::ExecuteCompilerInvocation()
    @     0x55c77481ead6  cc1_main()

I'm currently working on a standalone reproducer.

alexfh · 2024-07-11T16:11:38Z

The reduced test case is:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64-generic-linux-gnu"

define void @_f(ptr %project, i1 %cmp.i.i.not9.i) {
entry:
  %bf.load.i.i.i.i = load i8, ptr %project, align 1
  %tobool.i.i.i.i = icmp slt i8 %bf.load.i.i.i.i, 0
  br i1 %tobool.i.i.i.i, label %cond.true.i9.i.i, label %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE16__get_short_sizeEv.exit.i.i.i

cond.true.i9.i.i:                                 ; preds = %entry
  %0 = load i64, ptr %project, align 8
  br label %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit

_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE16__get_short_sizeEv.exit.i.i.i: ; preds = %entry
  %conv13.i.i.i.i = zext i8 %bf.load.i.i.i.i to i64
  br label %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit

_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit: ; preds = %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE16__get_short_sizeEv.exit.i.i.i, %cond.true.i9.i.i
  %cond.i8.i.i5 = phi i64 [ %0, %cond.true.i9.i.i ], [ %conv13.i.i.i.i, %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE16__get_short_sizeEv.exit.i.i.i ]
  br i1 %cmp.i.i.not9.i, label %common.ret, label %for.body.i

for.cond.i:                                       ; preds = %for.body.i
  %incdec.ptr.i.i = getelementptr i8, ptr %__first.sroa.0.010.i, i64 1
  %cmp.i.i.not.i = icmp eq i64 %cond.i8.i.i5, 0
  br i1 %cmp.i.i.not.i, label %common.ret, label %for.body.i

for.body.i:                                       ; preds = %for.cond.i, %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit
  %__first.sroa.0.010.i = phi ptr [ %incdec.ptr.i.i, %for.cond.i ], [ %project, %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit ]
  %1 = load i8, ptr %__first.sroa.0.010.i, align 1
  %isdigit = icmp eq i8 %1, 0
  br i1 %isdigit, label %cond.false, label %for.cond.i

common.ret:                                       ; preds = %cond.end.i.thread.i.i72, %cond.false, %for.cond.i, %_ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4cendEv.exit
  ret void

cond.false:                                       ; preds = %for.body.i
  br i1 %tobool.i.i.i.i, label %common.ret, label %cond.end.i.thread.i.i72

cond.end.i.thread.i.i72:                          ; preds = %cond.false
  %conv13.i.i.i.i73 = zext i8 %bf.load.i.i.i.i to i64
  store i64 %conv13.i.i.i.i73, ptr %project, align 8
  br label %common.ret
}

./clang-checked-bad -O2 --target=aarch64-generic-linux-gnu -mcpu=neoverse-n1 -mtune=generic -fsanitize=hwaddress -mllvm -hwasan-inline-fast-path-checks=0 -c reduced.ll -o /dev/null
assertion failed at llvm-project/llvm/lib/CodeGen/MachineBasicBlock.cpp:877 in void llvm::MachineBasicBlock::replaceSuccessor(MachineBasicBlock *, MachineBasicBlock *): OldI != E && "Old is not a successor of this block"
    @     0x556998ae3cc4  __assert_fail
    @     0x556997022219  llvm::MachineBasicBlock::replaceSuccessor()
    @     0x5569970239b6  llvm::MachineBasicBlock::SplitCriticalEdge()
    @     0x5569971558fc  (anonymous namespace)::MachineSinking::runOnMachineFunction()
    @     0x55699709c590  llvm::MachineFunctionPass::runOnFunction()
    @     0x556998795cb2  llvm::FPPassManager::runOnFunction()
    @     0x55699879dbc4  llvm::FPPassManager::runOnModule()
    @     0x5569987967a3  llvm::legacy::PassManagerImpl::run()
    @     0x55699355f9fc  clang::EmitBackendOutput()
    @     0x5569931b6a3c  clang::CodeGenAction::ExecuteAction()
    @     0x556993e9d4c3  clang::FrontendAction::Execute()
    @     0x556993e17d8d  clang::CompilerInstance::ExecuteAction()
    @     0x556992e2a6d8  clang::ExecuteCompilerInvocation()
    @     0x556992e1ead6  cc1_main()
...

Please fix or revert. Thanks!

This reverts commit 7e2f961.

mshockwave · 2024-07-11T20:01:34Z

Hi @alexfh thanks for discovering this bug. I think you just found a really interesting issue caused by an old bug.

Here is the MachineFunction right before the crash:

# Machine code for function _f: IsSSA, TracksLiveness
Function Live Ins: $x0 in %6, $w1 in %7

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)
  liveins: $x0, $w1
  %7:gpr32 = COPY $w1
  %6:gpr64common = COPY $x0
  %8:gpr32 = COPY %7:gpr32
  %9:gpr32 = LDRSBWui %6:gpr64common, 0 :: (load (s8) from %ir.project)
  %11:gpr64all = IMPLICIT_DEF
  %10:gpr64 = INSERT_SUBREG %11:gpr64all(tied-def 0), %9:gpr32, %subreg.sub_32
  %12:gpr64sp = ANDXri killed %10:gpr64, 4103
  %0:gpr64 = COPY %12:gpr64sp
  %1:gpr64all = COPY %12:gpr64sp
  TBZW %9:gpr32, 31, %bb.2
  B %bb.1

bb.1.cond.true.i9.i.i:
; predecessors: %bb.0
  successors: %bb.5(0x40000000), %bb.4(0x40000000); %bb.5(50.00%), %bb.4(50.00%)

  %13:gpr64 = LDRXui %6:gpr64common, 0 :: (load (s64) from %ir.project)
  %2:gpr64 = COPY %13:gpr64
  TBNZW %8:gpr32, 0, %bb.5
  B %bb.4

bb.2._ZNKSt3__u12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE16__get_short_sizeEv.exit.i.i.i:
; predecessors: %bb.0
  successors: %bb.5(0x40000000), %bb.4(0x40000000); %bb.5(50.00%), %bb.4(50.00%)

  %18:gpr64 = COPY %1:gpr64all
  TBNZW %8:gpr32, 0, %bb.5
  B %bb.4

bb.3.for.cond.i:
; predecessors: %bb.4
  successors: %bb.5(0x04000000), %bb.4(0x7c000000); %bb.5(3.12%), %bb.4(96.88%)

  %15:gpr64sp = ADDXri %5:gpr64sp, 1, 0
  %4:gpr64all = COPY %15:gpr64sp
  CBZX %19:gpr64, %bb.5
  B %bb.4

bb.4.for.body.i:
; predecessors: %bb.1, %bb.2, %bb.3
  successors: %bb.6(0x04000000), %bb.3(0x7c000000); %bb.6(3.12%), %bb.3(96.88%)

  %19:gpr64 = PHI %19:gpr64, %bb.3, %2:gpr64, %bb.1, %18:gpr64, %bb.2
  %5:gpr64sp = PHI %6:gpr64common, %bb.1, %4:gpr64all, %bb.3, %6:gpr64common, %bb.2
  %14:gpr32 = LDRBBui %5:gpr64sp, 0 :: (load (s8) from %ir.__first.sroa.0.010.i)
  CBZW killed %14:gpr32, %bb.6
  B %bb.3

bb.5.common.ret:
; predecessors: %bb.1, %bb.2, %bb.3, %bb.6, %bb.7

  RET_ReallyLR

bb.6.cond.false:
; predecessors: %bb.4
  successors: %bb.5(0x30000000), %bb.7(0x50000000); %bb.5(37.50%), %bb.7(62.50%)

  %16:gpr32 = COPY %1.sub_32:gpr64all
  TBNZW killed %16:gpr32, 7, %bb.5
  B %bb.7

bb.7.cond.end.i.thread.i.i72:
; predecessors: %bb.6
  successors: %bb.5(0x80000000); %bb.5(100.00%)

  STRXui %0:gpr64, %6:gpr64common, 0 :: (store (s64) into %ir.project)
  B %bb.5

# End machine code for function _f.

MachineSink thinks it's profitable to sink %0:gpr64 = COPY %12:gpr64sp from bb.0 to bb.4 after the critical edge between those two blocks are broken. But bb.4 is not even a successor of bb.0, hence the assertion failure.

The reason it thought bb.0 - bb.4 is even a critical edge in the first place is because bb.4 is the SuccToSinkTo block w.r.t the current block (i.e. bb.0) returned from FindSuccToSinkTo. MachineSink consider the edge between current block and SuccToSinkTo a critical edge if SuccToSinkTo has more than one predecessor -- which is true for most of the cases. However, FindSuccToSinkTo not only return a (immediate) successor block but also look beyond to all descendant blocks:

llvm-project/llvm/lib/CodeGen/MachineSink.cpp

Line 1202 in 43024a4

// Handle cases where sinking can happen but where the sink point isn't a

. Which is why FindSuccToSinkTo return bb.4 despite the fact that bb.4 is not even a successor of bb.0. (The reason I said this is an old bug because the logics of looking into descendant blocks has been there for a long time)

I think a simple fix on isLegalToBreakCriticalEdge to reject cases like this should fix the issue. I'm cooking a patch right now...

…sic blocks involved in critical edge splitting (#98540) Fix an issue in #97618 - if the two basic blocks involved are not predecessor / successor to each other, treat the candidate as illegal for critical edge splitting. Closes #98477 (checked in test copied from its comment).

4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V. This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities. This change also improves two of the AArch64's cases.

…sic blocks involved in critical edge splitting (llvm#98540) Fix an issue in llvm#97618 - if the two basic blocks involved are not predecessor / successor to each other, treat the candidate as illegal for critical edge splitting. Closes llvm#98477 (checked in test copied from its comment).

mshockwave requested review from TNorthover, topperc, XChy and wangpc-pp July 3, 2024 18:25

llvmbot added the backend:AArch64 label Jul 3, 2024

topperc reviewed Jul 3, 2024

View reviewed changes

wangpc-pp approved these changes Jul 9, 2024

View reviewed changes

mshockwave added a commit that referenced this pull request Jul 9, 2024

[test][MachineSink][RISCV] Pre-commit test for #97618

4283566

mshockwave added 2 commits July 9, 2024 10:45

Address review comments

d93577d

mshockwave force-pushed the patch/fix-machine-sink-load-imm branch from b5683f3 to d93577d Compare July 9, 2024 17:46

mshockwave merged commit 7e2f961 into llvm:main Jul 9, 2024
4 of 6 checks passed

mshockwave deleted the patch/fix-machine-sink-load-imm branch July 9, 2024 17:48

aemerson added a commit that referenced this pull request Jul 11, 2024

Revert "[MachineSink] Fix missing sinks along critical edges (#97618)"

8f45544

This reverts commit 7e2f961.

yozhu mentioned this pull request Jul 11, 2024

[MachineSink] Check predecessor/successor relationship between two basic blocks involved in critical edge splitting #98540

Merged

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024

[test][MachineSink][RISCV] Pre-commit test for llvm#97618

51adbdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MachineSink] Fix missing sinks along critical edges #97618

[MachineSink] Fix missing sinks along critical edges #97618

mshockwave commented Jul 3, 2024

llvmbot commented Jul 3, 2024

topperc Jul 3, 2024

mshockwave Jul 8, 2024

topperc Jul 3, 2024

mshockwave Jul 8, 2024

topperc Jul 3, 2024

mshockwave Jul 8, 2024

topperc Jul 3, 2024

mshockwave Jul 8, 2024

wangpc-pp left a comment

alexfh commented Jul 11, 2024

alexfh commented Jul 11, 2024

mshockwave commented Jul 11, 2024

[MachineSink] Fix missing sinks along critical edges #97618

[MachineSink] Fix missing sinks along critical edges #97618

Conversation

mshockwave commented Jul 3, 2024

llvmbot commented Jul 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangpc-pp left a comment

Choose a reason for hiding this comment

alexfh commented Jul 11, 2024

alexfh commented Jul 11, 2024

mshockwave commented Jul 11, 2024