[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction #72845

bzEq · 2023-11-20T09:54:26Z

When expand select_cc in finalize-isel, we should not generate isel for targets not feature it.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

ecnelises · 2023-11-21T02:34:32Z

llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll

+; CHECK-NEXT:  .LBB0_18: # %bb2
+; CHECK-NEXT:    subfic 4, 3, 0
+; CHECK-NEXT:    subfe 3, 29, 30
+; CHECK-NEXT:  .LBB0_19: # %bb3


These changes seem to be not isel related, curious how they happened?

Previously, isel are eliminated by ppc-expand-isel post-ra. Now we eliminate isel after instruction selection and run multiple SSA optimization passes on MIR.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

nemanjai · 2023-11-26T20:18:50Z

I asked a similar question on the previous patch that turned off insertion of isel in Early If Conversion...
What problem is this solving? Why is this advantageous over the previous solution to always generate isel and then eliminate it if it needs to be eliminated?

bzEq · 2023-11-27T07:47:58Z

I asked a similar question on the previous patch that turned off insertion of isel in Early If Conversion...

I'll answer all of them here.

What problem is this solving?

We observed some redundant copies in branches after PPCExpandISEL in some internal workloads, these copies can be removed if corresponding computing was sunk into the branch.

PPCExpandISEL should be getting rid of these on targets that don't support them.
Why is this advantageous over the previous solution to always generate isel and then eliminate it if it needs to be eliminated?

Yes. However, PPCExpandISEL is a late pass. If we can convert isels to branches early, we'll have MIR SSA optimizations running on these converted branches and might sink corresponding computing(those only needed in true/false branch) into their branches, thus reduce instructions executed in runtime.

Among these passes, EarlyIfConvertor is the one that might convert branches into isel. We don't want to see branches converted to isel back and forth, so I disable EarlyIfConvertor on targets not supporting isel in #72211.

nemanjai

I don't remember the details, but I remember that we decided to implement this as a late pass and maximally produce selects when the pass was first implemented. Unfortunately, the information seems not to have been preserved.
Perhaps it was to allow us to add intelligence to the pass about which ISEL instructions to convert to branches and which to leave based on HW performance characteristics. But of course, the CPU's for which ISEL is sometimes very slow are mostly becoming less of a concern - plus we never implemented such heuristics, so perhaps it is time to retire this pass and do what you're suggesting here.

It would be nice if you could highlight some changes in the test cases that are clear wins with this approach (I wasn't able to spot any with a cursory glance at the test case changes).

nemanjai · 2023-11-28T08:29:25Z

llvm/test/CodeGen/PowerPC/save-crbp-ppc32svr4.ll

@@ -13,7 +13,7 @@
 ; CHECK: addic 29, 0, 20
 ; Save CR through R12 using R29 as the stack pointer (aligned base pointer).
 ; CHECK: mfcr 12
-; CHECK: stw 12, -24(29)
+; CHECK: stw 12, -28(29)


Why does the stack offset change?

It uses one more CSR(r28) than the original case.

bzEq · 2023-11-28T10:22:29Z

But of course, the CPU's for which ISEL is sometimes very slow are mostly becoming less of a concern - plus we never implemented such heuristics

SelectOptimizePass, which is target independent, was introduced last year and is able to play this role(decide to use branch or isel), however it's not enabled by default on PowerPC. I recently did some investigation of this pass and have observed improvement in some benchmarks in SPEC2017 when pgo+lto is enabled.

chenzheng1030 · 2024-01-02T03:30:17Z

It would be nice if you could highlight some changes in the test cases that are clear wins with this approach (I wasn't able to spot any with a cursory glance at the test case changes).

I remembered we had a motivated case when -mno-isel was specified. @bzEq could you please post it and add it to this patch to show the benefit?

bzEq · 2024-01-03T03:07:22Z

@bzEq could you please post it and add it to this patch to show the benefit?

Added llvm/test/CodeGen/PowerPC/expand-isel-to-branch.ll.

bzEq · 2024-02-26T01:45:17Z

Ping.

chenzheng1030 · 2024-02-26T05:33:04Z

If I understand correctly, selecting isel/isel8 to branch instructions if hasISEL() == false in the ISEL phase instead of the late expand-isel pass can make the sink related pass sink the true/false value to the related branch.
Like the case llvm/test/CodeGen/PowerPC/expand-isel-to-branch.ll showed:
Before the patch:

;  CHECK-NEXT:    addi 5, 4, 1 // value in true branch, post-dominate the entry block
; CHECK-NEXT:    li 4, 1 // value in the false branch,  post-dominate the entry block
; CHECK-NEXT:    bc 12, 0, L..BB0_1
; CHECK-NEXT:    b L..BB0_2
; CHECK-NEXT:  L..BB0_1: # %bb
; CHECK-NEXT:    addi 4, 5, 0
; CHECK-NEXT:  L..BB0_2: # %bb
; CHECK-NEXT:    stw 4, 0(3)

after the patch:

; CHECK-NEXT:    blt 0, L..BB0_2
; CHECK-NEXT:  # %bb.1: # %bb
; CHECK-NEXT:    li 4, 1   // value in the false branch, don't post-dominate the entry block
; CHECK-NEXT:    b L..BB0_3
; CHECK-NEXT:  L..BB0_2:
; CHECK-NEXT:    addi 4, 4, 1 // value in the true branch, don't post-dominate the entry block
; CHECK-NEXT:  L..BB0_3: # %bb
; CHECK-NEXT:    stw 4, 0(3)

I think this advantage should be applied to float select instructions and vector selection instructions too? The issue is for float select instructions and vector selection instructions, there is no option to disable them. So it is not an issue that needs to be addressed here.

Since now the SELECT are selected in the DAG ISel pass with the guard hasISEL(), is it necessary to keep pass expand-isel which also depends on hasISEL() in the ppc backend? And it would be good to have a perf test.

llvmbot · 2024-02-28T03:55:40Z

@llvm/pr-subscribers-backend-powerpc

Author: Kai Luo (bzEq)

Changes

When expand select_cc in finalize-isel, we should not generate isel for targets not feature it.

Patch is 154.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72845.diff

27 Files Affected:

(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+51-31)
(modified) llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll (+31-33)
(modified) llvm/test/CodeGen/PowerPC/atomicrmw-uinc-udec-wrap.ll (+122-150)
(modified) llvm/test/CodeGen/PowerPC/ctrloops-pseudo.ll (+32-8)
(modified) llvm/test/CodeGen/PowerPC/expand-isel-to-branch.ll (+6-6)
(modified) llvm/test/CodeGen/PowerPC/fp-strict-fcmp-spe.ll (+146-114)
(modified) llvm/test/CodeGen/PowerPC/fp-to-int-to-fp.ll (+50-53)
(modified) llvm/test/CodeGen/PowerPC/fptoui-be-crash.ll (+38-40)
(modified) llvm/test/CodeGen/PowerPC/funnel-shift-rot.ll (+42-82)
(modified) llvm/test/CodeGen/PowerPC/funnel-shift.ll (+313-292)
(modified) llvm/test/CodeGen/PowerPC/i1-to-double.ll (+26-26)
(modified) llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll (+7-8)
(modified) llvm/test/CodeGen/PowerPC/pr43976.ll (+9-9)
(modified) llvm/test/CodeGen/PowerPC/pr49509.ll (+13-22)
(modified) llvm/test/CodeGen/PowerPC/save-crbp-ppc32svr4.ll (+1-1)
(modified) llvm/test/CodeGen/PowerPC/select-cc-no-isel.ll (+16-4)
(modified) llvm/test/CodeGen/PowerPC/select.ll (+51-48)
(modified) llvm/test/CodeGen/PowerPC/select_const.ll (+96-129)
(modified) llvm/test/CodeGen/PowerPC/smulfixsat.ll (+17-19)
(modified) llvm/test/CodeGen/PowerPC/spe.ll (+83-93)
(modified) llvm/test/CodeGen/PowerPC/srem-seteq-illegal-types.ll (+13-16)
(modified) llvm/test/CodeGen/PowerPC/umulfixsat.ll (+10-10)
(modified) llvm/test/CodeGen/PowerPC/umulo-128-legalisation-lowering.ll (+94-86)
(modified) llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll (+32-42)
(modified) llvm/test/CodeGen/PowerPC/varargs.ll (+18-21)
(modified) llvm/test/CodeGen/PowerPC/wide-scalar-shift-by-byte-multiple-legalization.ll (+29-29)
(modified) llvm/test/CodeGen/PowerPC/wide-scalar-shift-legalization.ll (+29-29)

diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 51becf1d5b8584..19cf1a8394b069 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -12661,6 +12661,44 @@ PPCTargetLowering::emitProbedAlloca(MachineInstr &MI,
   return TailMBB;
 }
 
+static bool IsSelectCC(MachineInstr &MI) {
+  switch (MI.getOpcode()) {
+  case PPC::SELECT_CC_I4:
+  case PPC::SELECT_CC_I8:
+  case PPC::SELECT_CC_F4:
+  case PPC::SELECT_CC_F8:
+  case PPC::SELECT_CC_F16:
+  case PPC::SELECT_CC_VRRC:
+  case PPC::SELECT_CC_VSFRC:
+  case PPC::SELECT_CC_VSSRC:
+  case PPC::SELECT_CC_VSRC:
+  case PPC::SELECT_CC_SPE4:
+  case PPC::SELECT_CC_SPE:
+    return true;
+  default:
+    return false;
+  }
+}
+
+static bool IsSelect(MachineInstr &MI) {
+  switch (MI.getOpcode()) {
+  case PPC::SELECT_I4:
+  case PPC::SELECT_I8:
+  case PPC::SELECT_F4:
+  case PPC::SELECT_F8:
+  case PPC::SELECT_F16:
+  case PPC::SELECT_SPE:
+  case PPC::SELECT_SPE4:
+  case PPC::SELECT_VRRC:
+  case PPC::SELECT_VSFRC:
+  case PPC::SELECT_VSSRC:
+  case PPC::SELECT_VSRC:
+    return true;
+  default:
+    return false;
+  }
+}
+
 MachineBasicBlock *
 PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
                                                MachineBasicBlock *BB) const {
@@ -12698,9 +12736,10 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
   MachineFunction *F = BB->getParent();
   MachineRegisterInfo &MRI = F->getRegInfo();
 
-  if (MI.getOpcode() == PPC::SELECT_CC_I4 ||
-      MI.getOpcode() == PPC::SELECT_CC_I8 || MI.getOpcode() == PPC::SELECT_I4 ||
-      MI.getOpcode() == PPC::SELECT_I8) {
+  if (Subtarget.hasISEL() &&
+      (MI.getOpcode() == PPC::SELECT_CC_I4 ||
+       MI.getOpcode() == PPC::SELECT_CC_I8 ||
+       MI.getOpcode() == PPC::SELECT_I4 || MI.getOpcode() == PPC::SELECT_I8)) {
     SmallVector<MachineOperand, 2> Cond;
     if (MI.getOpcode() == PPC::SELECT_CC_I4 ||
         MI.getOpcode() == PPC::SELECT_CC_I8)
@@ -12712,24 +12751,7 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
     DebugLoc dl = MI.getDebugLoc();
     TII->insertSelect(*BB, MI, dl, MI.getOperand(0).getReg(), Cond,
                       MI.getOperand(2).getReg(), MI.getOperand(3).getReg());
-  } else if (MI.getOpcode() == PPC::SELECT_CC_F4 ||
-             MI.getOpcode() == PPC::SELECT_CC_F8 ||
-             MI.getOpcode() == PPC::SELECT_CC_F16 ||
-             MI.getOpcode() == PPC::SELECT_CC_VRRC ||
-             MI.getOpcode() == PPC::SELECT_CC_VSFRC ||
-             MI.getOpcode() == PPC::SELECT_CC_VSSRC ||
-             MI.getOpcode() == PPC::SELECT_CC_VSRC ||
-             MI.getOpcode() == PPC::SELECT_CC_SPE4 ||
-             MI.getOpcode() == PPC::SELECT_CC_SPE ||
-             MI.getOpcode() == PPC::SELECT_F4 ||
-             MI.getOpcode() == PPC::SELECT_F8 ||
-             MI.getOpcode() == PPC::SELECT_F16 ||
-             MI.getOpcode() == PPC::SELECT_SPE ||
-             MI.getOpcode() == PPC::SELECT_SPE4 ||
-             MI.getOpcode() == PPC::SELECT_VRRC ||
-             MI.getOpcode() == PPC::SELECT_VSFRC ||
-             MI.getOpcode() == PPC::SELECT_VSSRC ||
-             MI.getOpcode() == PPC::SELECT_VSRC) {
+  } else if (IsSelectCC(MI) || IsSelect(MI)) {
     // The incoming instruction knows the destination vreg to set, the
     // condition code register to branch on, the true/false values to
     // select between, and a branch opcode to use.
@@ -12738,7 +12760,7 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
     //  ...
     //   TrueVal = ...
     //   cmpTY ccX, r1, r2
-    //   bCC copy1MBB
+    //   bCC sinkMBB
     //   fallthrough --> copy0MBB
     MachineBasicBlock *thisMBB = BB;
     MachineBasicBlock *copy0MBB = F->CreateMachineBasicBlock(LLVM_BB);
@@ -12747,6 +12769,12 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
     F->insert(It, copy0MBB);
     F->insert(It, sinkMBB);
 
+    // Set the call frame size on entry to the new basic blocks.
+    // See https://reviews.llvm.org/D156113.
+    unsigned CallFrameSize = TII->getCallFrameSizeAt(MI);
+    copy0MBB->setCallFrameSize(CallFrameSize);
+    sinkMBB->setCallFrameSize(CallFrameSize);
+
     // Transfer the remainder of BB and its successor edges to sinkMBB.
     sinkMBB->splice(sinkMBB->begin(), BB,
                     std::next(MachineBasicBlock::iterator(MI)), BB->end());
@@ -12756,15 +12784,7 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
     BB->addSuccessor(copy0MBB);
     BB->addSuccessor(sinkMBB);
 
-    if (MI.getOpcode() == PPC::SELECT_I4 || MI.getOpcode() == PPC::SELECT_I8 ||
-        MI.getOpcode() == PPC::SELECT_F4 || MI.getOpcode() == PPC::SELECT_F8 ||
-        MI.getOpcode() == PPC::SELECT_F16 ||
-        MI.getOpcode() == PPC::SELECT_SPE4 ||
-        MI.getOpcode() == PPC::SELECT_SPE ||
-        MI.getOpcode() == PPC::SELECT_VRRC ||
-        MI.getOpcode() == PPC::SELECT_VSFRC ||
-        MI.getOpcode() == PPC::SELECT_VSSRC ||
-        MI.getOpcode() == PPC::SELECT_VSRC) {
+    if (IsSelect(MI)) {
       BuildMI(BB, dl, TII->get(PPC::BC))
           .addReg(MI.getOperand(1).getReg())
           .addMBB(sinkMBB);
diff --git a/llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll b/llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll
index 0405b25e7fb032..e20fc400f80f8f 100644
--- a/llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll
+++ b/llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll
@@ -36,7 +36,7 @@ define i64 @__fixunstfdi(ppc_fp128 %a) nounwind readnone {
 ; CHECK-NEXT:  # %bb.1: # %bb5
 ; CHECK-NEXT:    li 3, 0
 ; CHECK-NEXT:    li 4, 0
-; CHECK-NEXT:    b .LBB0_17
+; CHECK-NEXT:    b .LBB0_19
 ; CHECK-NEXT:  .LBB0_2: # %bb1
 ; CHECK-NEXT:    lfd 0, 400(1)
 ; CHECK-NEXT:    lis 3, 15856
@@ -99,24 +99,22 @@ define i64 @__fixunstfdi(ppc_fp128 %a) nounwind readnone {
 ; CHECK-NEXT:    fadd 1, 28, 29
 ; CHECK-NEXT:    mtfsf 1, 0
 ; CHECK-NEXT:    lfs 0, .LCPI0_1@l(3)
-; CHECK-NEXT:    fctiwz 1, 1
-; CHECK-NEXT:    stfd 1, 152(1)
 ; CHECK-NEXT:    fcmpu 0, 28, 27
-; CHECK-NEXT:    lwz 3, 164(1)
+; CHECK-NEXT:    fctiwz 1, 1
 ; CHECK-NEXT:    fcmpu 1, 29, 0
-; CHECK-NEXT:    lwz 4, 156(1)
 ; CHECK-NEXT:    crandc 20, 6, 0
 ; CHECK-NEXT:    cror 20, 5, 20
-; CHECK-NEXT:    addis 3, 3, -32768
+; CHECK-NEXT:    stfd 1, 152(1)
 ; CHECK-NEXT:    bc 12, 20, .LBB0_4
 ; CHECK-NEXT:  # %bb.3: # %bb1
-; CHECK-NEXT:    ori 30, 4, 0
+; CHECK-NEXT:    lwz 30, 156(1)
 ; CHECK-NEXT:    b .LBB0_5
-; CHECK-NEXT:  .LBB0_4: # %bb1
-; CHECK-NEXT:    addi 30, 3, 0
+; CHECK-NEXT:  .LBB0_4:
+; CHECK-NEXT:    lwz 3, 164(1)
+; CHECK-NEXT:    addis 30, 3, -32768
 ; CHECK-NEXT:  .LBB0_5: # %bb1
-; CHECK-NEXT:    li 4, 0
 ; CHECK-NEXT:    mr 3, 30
+; CHECK-NEXT:    li 4, 0
 ; CHECK-NEXT:    bl __floatditf
 ; CHECK-NEXT:    lis 3, 17392
 ; CHECK-NEXT:    stfd 1, 208(1)
@@ -179,10 +177,10 @@ define i64 @__fixunstfdi(ppc_fp128 %a) nounwind readnone {
 ; CHECK-NEXT:    lwz 3, 168(1)
 ; CHECK-NEXT:    stw 3, 272(1)
 ; CHECK-NEXT:    lfd 31, 272(1)
-; CHECK-NEXT:    bc 12, 20, .LBB0_14
+; CHECK-NEXT:    bc 12, 20, .LBB0_13
 ; CHECK-NEXT:  # %bb.10: # %bb1
 ; CHECK-NEXT:    cror 20, 1, 3
-; CHECK-NEXT:    bc 12, 20, .LBB0_14
+; CHECK-NEXT:    bc 12, 20, .LBB0_13
 ; CHECK-NEXT:  # %bb.11: # %bb2
 ; CHECK-NEXT:    fneg 29, 31
 ; CHECK-NEXT:    stfd 29, 48(1)
@@ -223,24 +221,17 @@ define i64 @__fixunstfdi(ppc_fp128 %a) nounwind readnone {
 ; CHECK-NEXT:    fadd 1, 28, 29
 ; CHECK-NEXT:    mtfsf 1, 0
 ; CHECK-NEXT:    lfs 0, .LCPI0_3@l(3)
-; CHECK-NEXT:    fctiwz 1, 1
-; CHECK-NEXT:    stfd 1, 24(1)
 ; CHECK-NEXT:    fcmpu 0, 30, 2
-; CHECK-NEXT:    lwz 3, 36(1)
+; CHECK-NEXT:    fctiwz 1, 1
 ; CHECK-NEXT:    fcmpu 1, 31, 0
-; CHECK-NEXT:    lwz 4, 28(1)
 ; CHECK-NEXT:    crandc 20, 6, 1
 ; CHECK-NEXT:    cror 20, 4, 20
-; CHECK-NEXT:    addis 3, 3, -32768
-; CHECK-NEXT:    bc 12, 20, .LBB0_13
+; CHECK-NEXT:    stfd 1, 24(1)
+; CHECK-NEXT:    bc 12, 20, .LBB0_17
 ; CHECK-NEXT:  # %bb.12: # %bb2
-; CHECK-NEXT:    ori 3, 4, 0
-; CHECK-NEXT:    b .LBB0_13
-; CHECK-NEXT:  .LBB0_13: # %bb2
-; CHECK-NEXT:    subfic 4, 3, 0
-; CHECK-NEXT:    subfe 3, 29, 30
-; CHECK-NEXT:    b .LBB0_17
-; CHECK-NEXT:  .LBB0_14: # %bb3
+; CHECK-NEXT:    lwz 3, 28(1)
+; CHECK-NEXT:    b .LBB0_18
+; CHECK-NEXT:  .LBB0_13: # %bb3
 ; CHECK-NEXT:    stfd 31, 112(1)
 ; CHECK-NEXT:    li 3, 0
 ; CHECK-NEXT:    stw 3, 148(1)
@@ -278,22 +269,29 @@ define i64 @__fixunstfdi(ppc_fp128 %a) nounwind readnone {
 ; CHECK-NEXT:    fadd 2, 30, 31
 ; CHECK-NEXT:    mtfsf 1, 0
 ; CHECK-NEXT:    lfs 0, .LCPI0_1@l(3)
-; CHECK-NEXT:    fctiwz 2, 2
-; CHECK-NEXT:    stfd 2, 88(1)
 ; CHECK-NEXT:    fcmpu 0, 30, 1
-; CHECK-NEXT:    lwz 3, 100(1)
+; CHECK-NEXT:    fctiwz 1, 2
 ; CHECK-NEXT:    fcmpu 1, 31, 0
-; CHECK-NEXT:    lwz 4, 92(1)
 ; CHECK-NEXT:    crandc 20, 6, 0
 ; CHECK-NEXT:    cror 20, 5, 20
-; CHECK-NEXT:    addis 3, 3, -32768
+; CHECK-NEXT:    stfd 1, 88(1)
 ; CHECK-NEXT:    bc 12, 20, .LBB0_15
+; CHECK-NEXT:  # %bb.14: # %bb3
+; CHECK-NEXT:    lwz 4, 92(1)
 ; CHECK-NEXT:    b .LBB0_16
-; CHECK-NEXT:  .LBB0_15: # %bb3
-; CHECK-NEXT:    addi 4, 3, 0
+; CHECK-NEXT:  .LBB0_15:
+; CHECK-NEXT:    lwz 3, 100(1)
+; CHECK-NEXT:    addis 4, 3, -32768
 ; CHECK-NEXT:  .LBB0_16: # %bb3
 ; CHECK-NEXT:    mr 3, 30
-; CHECK-NEXT:  .LBB0_17: # %bb5
+; CHECK-NEXT:    b .LBB0_19
+; CHECK-NEXT:  .LBB0_17:
+; CHECK-NEXT:    lwz 3, 36(1)
+; CHECK-NEXT:    addis 3, 3, -32768
+; CHECK-NEXT:  .LBB0_18: # %bb2
+; CHECK-NEXT:    subfic 4, 3, 0
+; CHECK-NEXT:    subfe 3, 29, 30
+; CHECK-NEXT:  .LBB0_19: # %bb3
 ; CHECK-NEXT:    lfd 31, 456(1) # 8-byte Folded Reload
 ; CHECK-NEXT:    lfd 30, 448(1) # 8-byte Folded Reload
 ; CHECK-NEXT:    lfd 29, 440(1) # 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/PowerPC/atomicrmw-uinc-udec-wrap.ll b/llvm/test/CodeGen/PowerPC/atomicrmw-uinc-udec-wrap.ll
index adbb956ba32ad9..505ac8639595fd 100644
--- a/llvm/test/CodeGen/PowerPC/atomicrmw-uinc-udec-wrap.ll
+++ b/llvm/test/CodeGen/PowerPC/atomicrmw-uinc-udec-wrap.ll
@@ -6,55 +6,50 @@ define i8 @atomicrmw_uinc_wrap_i8(ptr %ptr, i8 %val) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    sync
 ; CHECK-NEXT:    mr 5, 3
-; CHECK-NEXT:    rlwinm 7, 5, 3, 27, 28
+; CHECK-NEXT:    rlwinm 6, 5, 3, 27, 28
 ; CHECK-NEXT:    lbz 3, 0(3)
-; CHECK-NEXT:    xori 7, 7, 24
-; CHECK-NEXT:    li 8, 255
-; CHECK-NEXT:    li 6, 0
+; CHECK-NEXT:    xori 6, 6, 24
+; CHECK-NEXT:    li 7, 255
 ; CHECK-NEXT:    clrlwi 4, 4, 24
 ; CHECK-NEXT:    rldicr 5, 5, 0, 61
-; CHECK-NEXT:    slw 8, 8, 7
+; CHECK-NEXT:    slw 7, 7, 6
 ; CHECK-NEXT:    b .LBB0_2
 ; CHECK-NEXT:  .LBB0_1: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    srw 3, 11, 7
-; CHECK-NEXT:    cmplw 3, 9
-; CHECK-NEXT:    beq 0, .LBB0_8
+; CHECK-NEXT:    srw 3, 10, 6
+; CHECK-NEXT:    cmplw 3, 8
+; CHECK-NEXT:    beq 0, .LBB0_7
 ; CHECK-NEXT:  .LBB0_2: # %atomicrmw.start
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB0_6 Depth 2
-; CHECK-NEXT:    clrlwi 9, 3, 24
-; CHECK-NEXT:    addi 10, 3, 1
-; CHECK-NEXT:    cmplw 9, 4
-; CHECK-NEXT:    bc 12, 0, .LBB0_4
+; CHECK-NEXT:    # Child Loop BB0_5 Depth 2
+; CHECK-NEXT:    clrlwi 8, 3, 24
+; CHECK-NEXT:    cmplw 8, 4
+; CHECK-NEXT:    li 9, 0
+; CHECK-NEXT:    bge 0, .LBB0_4
 ; CHECK-NEXT:  # %bb.3: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    ori 3, 6, 0
-; CHECK-NEXT:    b .LBB0_5
+; CHECK-NEXT:    addi 9, 3, 1
 ; CHECK-NEXT:  .LBB0_4: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    addi 3, 10, 0
+; CHECK-NEXT:    slw 3, 9, 6
+; CHECK-NEXT:    slw 9, 8, 6
+; CHECK-NEXT:    and 3, 3, 7
+; CHECK-NEXT:    and 9, 9, 7
 ; CHECK-NEXT:  .LBB0_5: # %atomicrmw.start
-; CHECK-NEXT:    #
-; CHECK-NEXT:    slw 11, 9, 7
-; CHECK-NEXT:    slw 3, 3, 7
-; CHECK-NEXT:    and 3, 3, 8
-; CHECK-NEXT:    and 10, 11, 8
-; CHECK-NEXT:  .LBB0_6: # %atomicrmw.start
 ; CHECK-NEXT:    # Parent Loop BB0_2 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    lwarx 12, 0, 5
-; CHECK-NEXT:    and 11, 12, 8
-; CHECK-NEXT:    cmpw 11, 10
+; CHECK-NEXT:    lwarx 11, 0, 5
+; CHECK-NEXT:    and 10, 11, 7
+; CHECK-NEXT:    cmpw 10, 9
 ; CHECK-NEXT:    bne 0, .LBB0_1
-; CHECK-NEXT:  # %bb.7: # %atomicrmw.start
+; CHECK-NEXT:  # %bb.6: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    andc 12, 12, 8
-; CHECK-NEXT:    or 12, 12, 3
-; CHECK-NEXT:    stwcx. 12, 0, 5
-; CHECK-NEXT:    bne 0, .LBB0_6
+; CHECK-NEXT:    andc 11, 11, 7
+; CHECK-NEXT:    or 11, 11, 3
+; CHECK-NEXT:    stwcx. 11, 0, 5
+; CHECK-NEXT:    bne 0, .LBB0_5
 ; CHECK-NEXT:    b .LBB0_1
-; CHECK-NEXT:  .LBB0_8: # %atomicrmw.end
+; CHECK-NEXT:  .LBB0_7: # %atomicrmw.end
 ; CHECK-NEXT:    lwsync
 ; CHECK-NEXT:    blr
   %result = atomicrmw uinc_wrap ptr %ptr, i8 %val seq_cst
@@ -66,55 +61,51 @@ define i16 @atomicrmw_uinc_wrap_i16(ptr %ptr, i16 %val) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    sync
 ; CHECK-NEXT:    mr 5, 3
-; CHECK-NEXT:    li 6, 0
+; CHECK-NEXT:    li 7, 0
 ; CHECK-NEXT:    lhz 3, 0(3)
-; CHECK-NEXT:    rlwinm 7, 5, 3, 27, 27
-; CHECK-NEXT:    xori 7, 7, 16
-; CHECK-NEXT:    ori 8, 6, 65535
+; CHECK-NEXT:    rlwinm 6, 5, 3, 27, 27
+; CHECK-NEXT:    xori 6, 6, 16
+; CHECK-NEXT:    ori 7, 7, 65535
 ; CHECK-NEXT:    clrlwi 4, 4, 16
 ; CHECK-NEXT:    rldicr 5, 5, 0, 61
-; CHECK-NEXT:    slw 8, 8, 7
+; CHECK-NEXT:    slw 7, 7, 6
 ; CHECK-NEXT:    b .LBB1_2
 ; CHECK-NEXT:  .LBB1_1: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    srw 3, 11, 7
-; CHECK-NEXT:    cmplw 3, 9
-; CHECK-NEXT:    beq 0, .LBB1_8
+; CHECK-NEXT:    srw 3, 10, 6
+; CHECK-NEXT:    cmplw 3, 8
+; CHECK-NEXT:    beq 0, .LBB1_7
 ; CHECK-NEXT:  .LBB1_2: # %atomicrmw.start
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB1_6 Depth 2
-; CHECK-NEXT:    clrlwi 9, 3, 16
-; CHECK-NEXT:    addi 10, 3, 1
-; CHECK-NEXT:    cmplw 9, 4
-; CHECK-NEXT:    bc 12, 0, .LBB1_4
+; CHECK-NEXT:    # Child Loop BB1_5 Depth 2
+; CHECK-NEXT:    clrlwi 8, 3, 16
+; CHECK-NEXT:    cmplw 8, 4
+; CHECK-NEXT:    li 9, 0
+; CHECK-NEXT:    bge 0, .LBB1_4
 ; CHECK-NEXT:  # %bb.3: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    ori 3, 6, 0
-; CHECK-NEXT:    b .LBB1_5
+; CHECK-NEXT:    addi 9, 3, 1
 ; CHECK-NEXT:  .LBB1_4: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    addi 3, 10, 0
+; CHECK-NEXT:    slw 3, 9, 6
+; CHECK-NEXT:    slw 9, 8, 6
+; CHECK-NEXT:    and 3, 3, 7
+; CHECK-NEXT:    and 9, 9, 7
 ; CHECK-NEXT:  .LBB1_5: # %atomicrmw.start
-; CHECK-NEXT:    #
-; CHECK-NEXT:    slw 11, 9, 7
-; CHECK-NEXT:    slw 3, 3, 7
-; CHECK-NEXT:    and 3, 3, 8
-; CHECK-NEXT:    and 10, 11, 8
-; CHECK-NEXT:  .LBB1_6: # %atomicrmw.start
 ; CHECK-NEXT:    # Parent Loop BB1_2 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
-; CHECK-NEXT:    lwarx 12, 0, 5
-; CHECK-NEXT:    and 11, 12, 8
-; CHECK-NEXT:    cmpw 11, 10
+; CHECK-NEXT:    lwarx 11, 0, 5
+; CHECK-NEXT:    and 10, 11, 7
+; CHECK-NEXT:    cmpw 10, 9
 ; CHECK-NEXT:    bne 0, .LBB1_1
-; CHECK-NEXT:  # %bb.7: # %atomicrmw.start
+; CHECK-NEXT:  # %bb.6: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    andc 12, 12, 8
-; CHECK-NEXT:    or 12, 12, 3
-; CHECK-NEXT:    stwcx. 12, 0, 5
-; CHECK-NEXT:    bne 0, .LBB1_6
+; CHECK-NEXT:    andc 11, 11, 7
+; CHECK-NEXT:    or 11, 11, 3
+; CHECK-NEXT:    stwcx. 11, 0, 5
+; CHECK-NEXT:    bne 0, .LBB1_5
 ; CHECK-NEXT:    b .LBB1_1
-; CHECK-NEXT:  .LBB1_8: # %atomicrmw.end
+; CHECK-NEXT:  .LBB1_7: # %atomicrmw.end
 ; CHECK-NEXT:    lwsync
 ; CHECK-NEXT:    blr
   %result = atomicrmw uinc_wrap ptr %ptr, i16 %val seq_cst
@@ -125,39 +116,34 @@ define i32 @atomicrmw_uinc_wrap_i32(ptr %ptr, i32 %val) {
 ; CHECK-LABEL: atomicrmw_uinc_wrap_i32:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    sync
-; CHECK-NEXT:    li 6, 0
-; CHECK-NEXT:    lwz 5, 0(3)
+; CHECK-NEXT:    lwz 6, 0(3)
 ; CHECK-NEXT:    b .LBB2_2
 ; CHECK-NEXT:  .LBB2_1: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    cmplw 5, 7
-; CHECK-NEXT:    beq 0, .LBB2_7
+; CHECK-NEXT:    cmplw 5, 6
+; CHECK-NEXT:    mr 6, 5
+; CHECK-NEXT:    beq 0, .LBB2_6
 ; CHECK-NEXT:  .LBB2_2: # %atomicrmw.start
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB2_5 Depth 2
-; CHECK-NEXT:    mr 7, 5
-; CHECK-NEXT:    addi 5, 5, 1
-; CHECK-NEXT:    cmplw 7, 4
-; CHECK-NEXT:    bc 12, 0, .LBB2_4
+; CHECK-NEXT:    # Child Loop BB2_4 Depth 2
+; CHECK-NEXT:    cmplw 6, 4
+; CHECK-NEXT:    li 7, 0
+; CHECK-NEXT:    bge 0, .LBB2_4
 ; CHECK-NEXT:  # %bb.3: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    ori 8, 6, 0
-; CHECK-NEXT:    b .LBB2_5
+; CHECK-NEXT:    addi 7, 6, 1
 ; CHECK-NEXT:  .LBB2_4: # %atomicrmw.start
-; CHECK-NEXT:    #
-; CHECK-NEXT:    addi 8, 5, 0
-; CHECK-NEXT:  .LBB2_5: # %atomicrmw.start
 ; CHECK-NEXT:    # Parent Loop BB2_2 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    lwarx 5, 0, 3
-; CHECK-NEXT:    cmpw 5, 7
+; CHECK-NEXT:    cmpw 5, 6
 ; CHECK-NEXT:    bne 0, .LBB2_1
-; CHECK-NEXT:  # %bb.6: # %atomicrmw.start
+; CHECK-NEXT:  # %bb.5: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    stwcx. 8, 0, 3
-; CHECK-NEXT:    bne 0, .LBB2_5
+; CHECK-NEXT:    stwcx. 7, 0, 3
+; CHECK-NEXT:    bne 0, .LBB2_4
 ; CHECK-NEXT:    b .LBB2_1
-; CHECK-NEXT:  .LBB2_7: # %atomicrmw.end
+; CHECK-NEXT:  .LBB2_6: # %atomicrmw.end
 ; CHECK-NEXT:    mr 3, 5
 ; CHECK-NEXT:    lwsync
 ; CHECK-NEXT:    blr
@@ -169,39 +155,34 @@ define i64 @atomicrmw_uinc_wrap_i64(ptr %ptr, i64 %val) {
 ; CHECK-LABEL: atomicrmw_uinc_wrap_i64:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    sync
-; CHECK-NEXT:    ld 5, 0(3)
-; CHECK-NEXT:    li 6, 0
+; CHECK-NEXT:    ld 6, 0(3)
 ; CHECK-NEXT:    b .LBB3_2
 ; CHECK-NEXT:  .LBB3_1: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    cmpld 5, 7
-; CHECK-NEXT:    beq 0, .LBB3_7
+; CHECK-NEXT:    cmpld 5, 6
+; CHECK-NEXT:    mr 6, 5
+; CHECK-NEXT:    beq 0, .LBB3_6
 ; CHECK-NEXT:  .LBB3_2: # %atomicrmw.start
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB3_5 Depth 2
-; CHECK-NEXT:    mr 7, 5
-; CHECK-NEXT:    addi 5, 5, 1
-; CHECK-NEXT:    cmpld 7, 4
-; CHECK-NEXT:    bc 12, 0, .LBB3_4
+; CHECK-NEXT:    # Child Loop BB3_4 Depth 2
+; CHECK-NEXT:    cmpld 6, 4
+; CHECK-NEXT:    li 7, 0
+; CHECK-NEXT:    bge 0, .LBB3_4
 ; CHECK-NEXT:  # %bb.3: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    ori 8, 6, 0
-; CHECK-NEXT:    b .LBB3_5
+; CHECK-NEXT:    addi 7, 6, 1
 ; CHECK-NEXT:  .LBB3_4: # %atomicrmw.start
-; CHECK-NEXT:    #
-; CHECK-NEXT:    addi 8, 5, 0
-; CHECK-NEXT:  .LBB3_5: # %atomicrmw.start
 ; CHECK-NEXT:    # Parent Loop BB3_2 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    ldarx 5, 0, 3
-; CHECK-NEXT:    cmpd 5, 7
+; CHECK-NEXT:    cmpd 5, 6
 ; CHECK-NEXT:    bne 0, .LBB3_1
-; CHECK-NEXT:  # %bb.6: # %atomicrmw.start
+; CHECK-NEXT:  # %bb.5: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    stdcx. 8, 0, 3
-; CHECK-NEXT:    bne 0, .LBB3_5
+; CHECK-NEXT:    stdcx. 7, 0, 3
+; CHECK-NEXT:    bne 0, .LBB3_4
 ; CHECK-NEXT:    b .LBB3_1
-; CHECK-NEXT:  .LBB3_7: # %atomicrmw.end
+; CHECK-NEXT:  .LBB3_6: # %atomicrmw.end
 ; CHECK-NEXT:    mr 3, 5
 ; CHECK-NEXT:    lwsync
 ; CHECK-NEXT:    blr
@@ -226,43 +207,39 @@ define i8 @atomicrmw_udec_wrap_i8(ptr %ptr, i8 %val) {
 ; CHECK-NEXT:    #
 ; CHECK-NEXT:    srw 3, 11, 7
 ; CHECK-NEXT:    cmplw 3, 9
-; CHECK-NEXT:    beq 0, .LBB4_8
+; CHECK-NEXT:    beq 0, .LBB4_7
 ; CHECK-NEXT:  .LBB4_2: # %atomicrmw.start
 ; CHECK-NEXT:    # =>This Loop Header: Depth=1
-; CHECK-NEXT:    # Child Loop BB4_6 Depth 2
+; CHECK-NEXT:    # Child Loop BB4_5 Depth 2
 ; CHECK-NEXT:    andi. 9, 3, 255
 ; CHECK-NEXT:    cmplw 1, 9, 6
-; CHECK-NEXT:    addi 10, 3, -1
 ; CHECK-NEXT:    cror 20, 2, 5
+; CHECK-NEXT:    mr 10, 4
 ; CHECK-NEXT:    bc 12, 20, .LBB4_4
 ; CHECK-NEXT:  # %bb.3: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    ori 3, 10, 0
-; CHECK-NEXT:    b .LBB4_5
+; CHECK-NEXT:    addi 10, 3, -1
 ; CHECK-NEXT:  .LBB4_4: # %atomicrmw.start
 ; CHECK-NEXT:    #
-; CHECK-NEXT:    addi 3, 4, 0
-; CHECK-NEXT:  .LBB4_5: # %atomicrmw.start
-; CHECK-NEXT:    #
-; CHECK-NEXT:    slw 11, 9, 7
-; CHECK-NEXT:    slw 3, 3, 7
+; CHECK-NEXT:    slw 3, 10, 7
+; CHECK-NEXT:    slw 10, 9, 7
 ; CHECK-NEXT:    and 3, 3, 8
-; CHECK-NEXT:    and 10, 11, 8
-; CHECK-NEXT:  .LBB4_6: # %atomicrmw.start
+; CHECK-NEXT:...
[truncated]

bzEq · 2024-02-29T02:17:35Z

is it necessary to keep pass expand-isel which also depends on hasISEL() in the ppc backend? And it would be good to have a perf test.

I don't see regression in SPEC2017 when -mno-isel and -bplugin_opt:-mattr=-isel are specified with PPCExpandISEL removed from the backend pipeline.
I'll investigate more with PPCExpandISEL removed and will post another patch to disable it if no further issue is found.

chenzheng1030

I don't see regression in SPEC2017 when -mno-isel and -bplugin_opt:-mattr=-isel are specified with PPCExpandISEL removed from the backend pipeline.
I'll investigate more with PPCExpandISEL removed and will post another patch to disable it if no further issue is found.

LGTM. Thanks. And the plan to remove PPCExpandISEL pass also looks good to me.

bzEq requested review from chenzheng1030, ecnelises, nemanjai, amy-kwan and stefanp-ibm November 20, 2023 09:54

ecnelises reviewed Nov 21, 2023

View reviewed changes

nemanjai reviewed Nov 28, 2023

View reviewed changes

bzEq added a commit that referenced this pull request Jan 3, 2024

[PowerPC] Precommit test for #72845. NFC.

8ae73fe

bzEq force-pushed the branch-no-isel branch from f60f5cc to 2529f4d Compare January 3, 2024 03:06

bzEq requested review from nemanjai and ecnelises January 3, 2024 03:06

bzEq added 5 commits February 28, 2024 03:24

Update tests

dd1af34

Address comments

2c8d806

Update test

0ae2104

Make verifier happy

637789f

Format

8ef0feb

bzEq force-pushed the branch-no-isel branch from 2529f4d to 3b615fd Compare February 28, 2024 03:55

llvmbot added the backend:PowerPC label Feb 28, 2024

Fix text

3b615fd

chenzheng1030 approved these changes Feb 29, 2024

View reviewed changes

bzEq merged commit d1924f0 into llvm:main Mar 1, 2024
4 of 5 checks passed

bzEq deleted the branch-no-isel branch March 26, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction #72845

[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction #72845

bzEq commented Nov 20, 2023

ecnelises Nov 21, 2023

bzEq Nov 22, 2023 •

edited

nemanjai commented Nov 26, 2023

bzEq commented Nov 27, 2023 •

edited

nemanjai left a comment

nemanjai Nov 28, 2023

bzEq Jan 3, 2024

bzEq commented Nov 28, 2023

chenzheng1030 commented Jan 2, 2024

bzEq commented Jan 3, 2024

bzEq commented Feb 26, 2024

chenzheng1030 commented Feb 26, 2024

llvmbot commented Feb 28, 2024

bzEq commented Feb 29, 2024

chenzheng1030 left a comment

[PowerPC] Do not generate isel instruction if target doesn't have this instruction #72845

[PowerPC] Do not generate isel instruction if target doesn't have this instruction #72845

Conversation

bzEq commented Nov 20, 2023

ecnelises Nov 21, 2023

Choose a reason for hiding this comment

bzEq Nov 22, 2023 • edited

Choose a reason for hiding this comment

nemanjai commented Nov 26, 2023

bzEq commented Nov 27, 2023 • edited

nemanjai left a comment

Choose a reason for hiding this comment

nemanjai Nov 28, 2023

Choose a reason for hiding this comment

bzEq Jan 3, 2024

Choose a reason for hiding this comment

bzEq commented Nov 28, 2023

chenzheng1030 commented Jan 2, 2024

bzEq commented Jan 3, 2024

bzEq commented Feb 26, 2024

chenzheng1030 commented Feb 26, 2024

llvmbot commented Feb 28, 2024

bzEq commented Feb 29, 2024

chenzheng1030 left a comment

Choose a reason for hiding this comment

[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction #72845

[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction #72845

bzEq Nov 22, 2023 •

edited

bzEq commented Nov 27, 2023 •

edited