[RISCV] Add sink-and-fold support for RISC-V. #67602

topperc · 2023-09-27T20:28:04Z

This uses the recently introduced sink-and-fold support in MachineSink. https://reviews.llvm.org/D152828

This enables folding ADDI into load/store addresses.

Enabling by default will be a separate PR.

llvmbot · 2023-09-27T20:28:47Z

@llvm/pr-subscribers-backend-risc-v

Changes

This uses the recently introduced sink-and-fold support in MachineSink. https://reviews.llvm.org/D152828

This enables folding ADDI into load/store addresses.

I quickly threw this together after seeing the AArch64 patches. This could be an alternative to #67024

We'd probably want to stage this as a commit to add the code and a commit to enable it. I've done no testing other than lit so far.

Patch is 56.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67602.diff

8 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+71)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+7)
(modified) llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+8-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+166-245)
(modified) llvm/test/CodeGen/RISCV/split-offsets.ll (+10-12)
(modified) llvm/test/CodeGen/RISCV/srem-vector-lkk.ll (+50-50)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+50-50)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 6ee5e2d4c584049..7b9f8b08002c077 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1907,6 +1907,77 @@ bool RISCVInstrInfo::verifyInstruction(const MachineInstr &MI,
   return true;
 }
 
+bool RISCVInstrInfo::canFoldIntoAddrMode(const MachineInstr &MemI,
+                                         Register Reg,
+                                         const MachineInstr &AddrI,
+                                         ExtAddrMode &AM) const {
+  switch (MemI.getOpcode()) {
+  default:
+    return false;
+  case RISCV::LB:
+  case RISCV::LBU:
+  case RISCV::LH:
+  case RISCV::LHU:
+  case RISCV::LW:
+  case RISCV::LWU:
+  case RISCV::LD:
+  case RISCV::FLH:
+  case RISCV::FLW:
+  case RISCV::FLD:
+  case RISCV::SB:
+  case RISCV::SH:
+  case RISCV::SW:
+  case RISCV::SD:
+  case RISCV::FSH:
+  case RISCV::FSW:
+  case RISCV::FSD:
+    break;
+  }
+
+  // Check the fold operand is not the loaded/stored value.
+  const MachineOperand &BaseRegOp = MemI.getOperand(0);
+  if (BaseRegOp.isReg() && BaseRegOp.getReg() == Reg)
+    return false;
+
+  if (AddrI.getOpcode() != RISCV::ADDI)
+    return false;
+
+  int64_t OldOffset = MemI.getOperand(2).getImm();
+  int64_t Disp = AddrI.getOperand(2).getImm();
+  int64_t NewOffset = OldOffset + Disp;
+  if (!STI.is64Bit())
+    NewOffset = SignExtend64<32>(NewOffset);
+
+  if (!isInt<12>(NewOffset))
+    return false;
+
+  AM.BaseReg = AddrI.getOperand(1).getReg();
+  AM.ScaledReg = 0;
+  AM.Scale = 0;
+  AM.Displacement = NewOffset;
+  AM.Form = ExtAddrMode::Formula::Basic;
+  return true;
+}
+
+MachineInstr *RISCVInstrInfo::emitLdStWithAddr(MachineInstr &MemI,
+                                               const ExtAddrMode &AM) const {
+
+  const DebugLoc &DL = MemI.getDebugLoc();
+  MachineBasicBlock &MBB = *MemI.getParent();
+
+  assert(AM.ScaledReg == 0 && AM.Scale == 0 &&
+         "Addressing mode not supported for folding");
+
+  auto B = BuildMI(MBB, MemI, DL, get(MemI.getOpcode()))
+                 .addReg(MemI.getOperand(0).getReg(),
+                         MemI.mayLoad() ? RegState::Define : 0)
+                 .addReg(AM.BaseReg)
+                 .addImm(AM.Displacement)
+                 .setMemRefs(MemI.memoperands())
+                 .setMIFlags(MemI.getFlags());
+    return B.getInstr();
+}
+
 // Return true if get the base operand, byte offset of an instruction and the
 // memory width. Width is the size of memory that is being loaded/stored.
 bool RISCVInstrInfo::getMemOperandWithOffsetWidth(
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 99c907a98121ae3..2f8ea749b96e3d1 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -137,6 +137,13 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
   bool verifyInstruction(const MachineInstr &MI,
                          StringRef &ErrInfo) const override;
 
+  bool canFoldIntoAddrMode(const MachineInstr &MemI, Register Reg,
+                           const MachineInstr &AddrI,
+                           ExtAddrMode &AM) const override;
+
+  MachineInstr *emitLdStWithAddr(MachineInstr &MemI,
+                                 const ExtAddrMode &AM) const override;
+
   bool getMemOperandWithOffsetWidth(const MachineInstr &LdSt,
                                     const MachineOperand *&BaseOp,
                                     int64_t &Offset, unsigned &Width,
diff --git a/llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp b/llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp
index 7c6a89b6036fa3c..9d7660ba9a4b103 100644
--- a/llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp
@@ -275,7 +275,7 @@ bool RISCVInitUndef::processBasicBlock(MachineFunction &MF,
       Changed |= handleSubReg(MF, MI, DLD);
     if (MI.isImplicitDef()) {
       auto DstReg = MI.getOperand(0).getReg();
-      if (isVectorRegClass(DstReg))
+      if (DstReg.isVirtual() && isVectorRegClass(DstReg))
         Changed |= handleImplicitDef(MBB, I);
     }
   }
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 69a0569fccc4eca..cbf9d7f3d4872a7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -78,6 +78,11 @@ static cl::opt<bool> EnableRISCVDeadRegisterElimination(
              " them with stores to x0"),
     cl::init(true));
 
+static cl::opt<bool>
+    EnableSinkFold("riscv-enable-sink-fold",
+                   cl::desc("Enable sinking and folding of instruction copies"),
+                   cl::init(true), cl::Hidden);
+
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   RegisterTargetMachine<RISCVTargetMachine> X(getTheRISCV32Target());
   RegisterTargetMachine<RISCVTargetMachine> Y(getTheRISCV64Target());
@@ -242,7 +247,9 @@ namespace {
 class RISCVPassConfig : public TargetPassConfig {
 public:
   RISCVPassConfig(RISCVTargetMachine &TM, PassManagerBase &PM)
-      : TargetPassConfig(TM, PM) {}
+      : TargetPassConfig(TM, PM) {
+    setEnableSinkAndFold(EnableSinkFold);
+  }
 
   RISCVTargetMachine &getRISCVTargetMachine() const {
     return getTM<RISCVTargetMachine>();
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
index 480e5c2f8f2b8b6..e3b06af20e5ee14 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
@@ -12972,38 +12972,39 @@ define <4 x i32> @mgather_narrow_edge_case(ptr %base) {
 ; RV64ZVE32F-NEXT:    vmset.m v8
 ; RV64ZVE32F-NEXT:    vmv.x.s a1, v8
 ; RV64ZVE32F-NEXT:    # implicit-def: $v8
-; RV64ZVE32F-NEXT:    bnez zero, .LBB106_2
-; RV64ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV64ZVE32F-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; RV64ZVE32F-NEXT:    vlse32.v v8, (a0), zero
-; RV64ZVE32F-NEXT:  .LBB106_2: # %else
-; RV64ZVE32F-NEXT:    andi a3, a1, 2
-; RV64ZVE32F-NEXT:    addi a2, a0, -512
-; RV64ZVE32F-NEXT:    bnez a3, .LBB106_6
-; RV64ZVE32F-NEXT:  # %bb.3: # %else2
-; RV64ZVE32F-NEXT:    andi a3, a1, 4
-; RV64ZVE32F-NEXT:    bnez a3, .LBB106_7
-; RV64ZVE32F-NEXT:  .LBB106_4: # %else5
+; RV64ZVE32F-NEXT:    beqz zero, .LBB106_5
+; RV64ZVE32F-NEXT:  # %bb.1: # %else
+; RV64ZVE32F-NEXT:    andi a2, a1, 2
+; RV64ZVE32F-NEXT:    bnez a2, .LBB106_6
+; RV64ZVE32F-NEXT:  .LBB106_2: # %else2
+; RV64ZVE32F-NEXT:    andi a2, a1, 4
+; RV64ZVE32F-NEXT:    bnez a2, .LBB106_7
+; RV64ZVE32F-NEXT:  .LBB106_3: # %else5
 ; RV64ZVE32F-NEXT:    andi a1, a1, 8
 ; RV64ZVE32F-NEXT:    bnez a1, .LBB106_8
-; RV64ZVE32F-NEXT:  .LBB106_5: # %else8
+; RV64ZVE32F-NEXT:  .LBB106_4: # %else8
 ; RV64ZVE32F-NEXT:    ret
+; RV64ZVE32F-NEXT:  .LBB106_5: # %cond.load
+; RV64ZVE32F-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV64ZVE32F-NEXT:    vlse32.v v8, (a0), zero
+; RV64ZVE32F-NEXT:    andi a2, a1, 2
+; RV64ZVE32F-NEXT:    beqz a2, .LBB106_2
 ; RV64ZVE32F-NEXT:  .LBB106_6: # %cond.load1
-; RV64ZVE32F-NEXT:    lw a3, 0(a2)
+; RV64ZVE32F-NEXT:    lw a2, -512(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
-; RV64ZVE32F-NEXT:    vmv.s.x v9, a3
+; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 1
-; RV64ZVE32F-NEXT:    andi a3, a1, 4
-; RV64ZVE32F-NEXT:    beqz a3, .LBB106_4
+; RV64ZVE32F-NEXT:    andi a2, a1, 4
+; RV64ZVE32F-NEXT:    beqz a2, .LBB106_3
 ; RV64ZVE32F-NEXT:  .LBB106_7: # %cond.load4
-; RV64ZVE32F-NEXT:    lw a0, 0(a0)
+; RV64ZVE32F-NEXT:    lw a2, 0(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 3, e32, m1, tu, ma
-; RV64ZVE32F-NEXT:    vmv.s.x v9, a0
+; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 2
 ; RV64ZVE32F-NEXT:    andi a1, a1, 8
-; RV64ZVE32F-NEXT:    beqz a1, .LBB106_5
+; RV64ZVE32F-NEXT:    beqz a1, .LBB106_4
 ; RV64ZVE32F-NEXT:  .LBB106_8: # %cond.load7
-; RV64ZVE32F-NEXT:    lw a0, 0(a2)
+; RV64ZVE32F-NEXT:    lw a0, -512(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a0
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 3
@@ -13480,11 +13481,10 @@ define <8 x i16> @mgather_strided_unaligned(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_2
 ; RV64ZVE32F-NEXT:  .LBB107_10: # %cond.load1
-; RV64ZVE32F-NEXT:    addi a2, a0, 4
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 5(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 4(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
@@ -13492,64 +13492,58 @@ define <8 x i16> @mgather_strided_unaligned(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_3
 ; RV64ZVE32F-NEXT:  .LBB107_11: # %cond.load4
-; RV64ZVE32F-NEXT:    addi a2, a0, 8
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 9(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 8(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetivli zero, 3, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 2
 ; RV64ZVE32F-NEXT:    andi a2, a1, 8
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_4
 ; RV64ZVE32F-NEXT:  .LBB107_12: # %cond.load7
-; RV64ZVE32F-NEXT:    addi a2, a0, 12
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 13(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 12(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 3
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_5
 ; RV64ZVE32F-NEXT:  .LBB107_13: # %cond.load10
-; RV64ZVE32F-NEXT:    addi a2, a0, 16
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 17(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 16(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetivli zero, 5, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 4
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_6
 ; RV64ZVE32F-NEXT:  .LBB107_14: # %cond.load13
-; RV64ZVE32F-NEXT:    addi a2, a0, 20
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 21(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 20(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetivli zero, 6, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 5
 ; RV64ZVE32F-NEXT:    andi a2, a1, 64
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB107_7
 ; RV64ZVE32F-NEXT:  .LBB107_15: # %cond.load16
-; RV64ZVE32F-NEXT:    addi a2, a0, 24
-; RV64ZVE32F-NEXT:    lbu a3, 1(a2)
-; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    slli a3, a3, 8
-; RV64ZVE32F-NEXT:    or a2, a3, a2
+; RV64ZVE32F-NEXT:    lbu a2, 25(a0)
+; RV64ZVE32F-NEXT:    lbu a3, 24(a0)
+; RV64ZVE32F-NEXT:    slli a2, a2, 8
+; RV64ZVE32F-NEXT:    or a2, a2, a3
 ; RV64ZVE32F-NEXT:    vsetivli zero, 7, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 6
 ; RV64ZVE32F-NEXT:    andi a1, a1, -128
 ; RV64ZVE32F-NEXT:    beqz a1, .LBB107_8
 ; RV64ZVE32F-NEXT:  .LBB107_16: # %cond.load19
-; RV64ZVE32F-NEXT:    addi a0, a0, 28
-; RV64ZVE32F-NEXT:    lbu a1, 1(a0)
-; RV64ZVE32F-NEXT:    lbu a0, 0(a0)
+; RV64ZVE32F-NEXT:    lbu a1, 29(a0)
+; RV64ZVE32F-NEXT:    lbu a0, 28(a0)
 ; RV64ZVE32F-NEXT:    slli a1, a1, 8
 ; RV64ZVE32F-NEXT:    or a0, a1, a0
 ; RV64ZVE32F-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
@@ -13614,8 +13608,7 @@ define <8 x i16> @mgather_strided_2xSEW(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_2
 ; RV64ZVE32F-NEXT:  .LBB108_10: # %cond.load1
-; RV64ZVE32F-NEXT:    addi a2, a0, 2
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 2(a0)
 ; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
@@ -13623,48 +13616,42 @@ define <8 x i16> @mgather_strided_2xSEW(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_3
 ; RV64ZVE32F-NEXT:  .LBB108_11: # %cond.load4
-; RV64ZVE32F-NEXT:    addi a2, a0, 8
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 8(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 3, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 2
 ; RV64ZVE32F-NEXT:    andi a2, a1, 8
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_4
 ; RV64ZVE32F-NEXT:  .LBB108_12: # %cond.load7
-; RV64ZVE32F-NEXT:    addi a2, a0, 10
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 10(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 3
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_5
 ; RV64ZVE32F-NEXT:  .LBB108_13: # %cond.load10
-; RV64ZVE32F-NEXT:    addi a2, a0, 16
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 16(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 5, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 4
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_6
 ; RV64ZVE32F-NEXT:  .LBB108_14: # %cond.load13
-; RV64ZVE32F-NEXT:    addi a2, a0, 18
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 18(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 6, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 5
 ; RV64ZVE32F-NEXT:    andi a2, a1, 64
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB108_7
 ; RV64ZVE32F-NEXT:  .LBB108_15: # %cond.load16
-; RV64ZVE32F-NEXT:    addi a2, a0, 24
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 24(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 7, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 6
 ; RV64ZVE32F-NEXT:    andi a1, a1, -128
 ; RV64ZVE32F-NEXT:    beqz a1, .LBB108_8
 ; RV64ZVE32F-NEXT:  .LBB108_16: # %cond.load19
-; RV64ZVE32F-NEXT:    addi a0, a0, 26
-; RV64ZVE32F-NEXT:    lh a0, 0(a0)
+; RV64ZVE32F-NEXT:    lh a0, 26(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a0
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 7
@@ -13730,8 +13717,7 @@ define <8 x i16> @mgather_strided_2xSEW_with_offset(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_2
 ; RV64ZVE32F-NEXT:  .LBB109_10: # %cond.load1
-; RV64ZVE32F-NEXT:    addi a2, a0, 6
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 6(a0)
 ; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
@@ -13739,48 +13725,42 @@ define <8 x i16> @mgather_strided_2xSEW_with_offset(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_3
 ; RV64ZVE32F-NEXT:  .LBB109_11: # %cond.load4
-; RV64ZVE32F-NEXT:    addi a2, a0, 12
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 12(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 3, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 2
 ; RV64ZVE32F-NEXT:    andi a2, a1, 8
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_4
 ; RV64ZVE32F-NEXT:  .LBB109_12: # %cond.load7
-; RV64ZVE32F-NEXT:    addi a2, a0, 14
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 14(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 3
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_5
 ; RV64ZVE32F-NEXT:  .LBB109_13: # %cond.load10
-; RV64ZVE32F-NEXT:    addi a2, a0, 20
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 20(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 5, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 4
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_6
 ; RV64ZVE32F-NEXT:  .LBB109_14: # %cond.load13
-; RV64ZVE32F-NEXT:    addi a2, a0, 22
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 22(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 6, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 5
 ; RV64ZVE32F-NEXT:    andi a2, a1, 64
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB109_7
 ; RV64ZVE32F-NEXT:  .LBB109_15: # %cond.load16
-; RV64ZVE32F-NEXT:    addi a2, a0, 28
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 28(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 7, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 6
 ; RV64ZVE32F-NEXT:    andi a1, a1, -128
 ; RV64ZVE32F-NEXT:    beqz a1, .LBB109_8
 ; RV64ZVE32F-NEXT:  .LBB109_16: # %cond.load19
-; RV64ZVE32F-NEXT:    addi a0, a0, 30
-; RV64ZVE32F-NEXT:    lh a0, 0(a0)
+; RV64ZVE32F-NEXT:    lh a0, 30(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a0
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 7
@@ -13846,8 +13826,7 @@ define <8 x i16> @mgather_reverse_unit_strided_2xSEW(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_2
 ; RV64ZVE32F-NEXT:  .LBB110_10: # %cond.load1
-; RV64ZVE32F-NEXT:    addi a2, a0, 30
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 30(a0)
 ; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
@@ -13855,48 +13834,42 @@ define <8 x i16> @mgather_reverse_unit_strided_2xSEW(ptr %base) {
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_3
 ; RV64ZVE32F-NEXT:  .LBB110_11: # %cond.load4
-; RV64ZVE32F-NEXT:    addi a2, a0, 24
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 24(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 3, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 2
 ; RV64ZVE32F-NEXT:    andi a2, a1, 8
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_4
 ; RV64ZVE32F-NEXT:  .LBB110_12: # %cond.load7
-; RV64ZVE32F-NEXT:    addi a2, a0, 26
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 26(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 3
 ; RV64ZVE32F-NEXT:    andi a2, a1, 16
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_5
 ; RV64ZVE32F-NEXT:  .LBB110_13: # %cond.load10
-; RV64ZVE32F-NEXT:    addi a2, a0, 20
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 20(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 5, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 4
 ; RV64ZVE32F-NEXT:    andi a2, a1, 32
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_6
 ; RV64ZVE32F-NEXT:  .LBB110_14: # %cond.load13
-; RV64ZVE32F-NEXT:    addi a2, a0, 22
-; RV64ZVE32F-NEXT:    lh a2, 0(a2)
+; RV64ZVE32F-NEXT:    lh a2, 22(a0)
 ; RV64ZVE32F-NEXT:    vsetivli zero, 6, e16, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 5
 ; RV64ZVE32F-NEXT:    andi a2, a1, 64
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB110_7
 ; RV64ZVE32F-NEXT:  .LBB110_15: # %cond.load16
-; RV64ZVE32F-NEXT:    ad...
[truncated]

llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp

github-actions · 2023-09-27T20:38:47Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

mgudim · 2023-10-02T17:08:26Z

@topperc Sorry for delay with the review, I was away last week. Thanks for working on this, looks great! I prefer this approach to #67024 . Have you seen any changes on spec? In my testing, this is supposed to improve mcf (but I only have the instruction count, not cycles).

LGTM.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

topperc · 2023-10-02T19:17:19Z

I've removed the enabling by default but added the option to all currently affected test demonstration.

preames · 2023-10-02T19:20:20Z

This makes sense to me as well. LGTM

topperc · 2023-10-03T00:26:51Z

@topperc Sorry for delay with the review, I was away last week. Thanks for working on this, looks great! I prefer this approach to #67024 . Have you seen any changes on spec? In my testing, this is supposed to improve mcf (but I only have the instruction count, not cycles).

I can confirm that the train dataset on 429.mcf went from 17766169495 instructions to 17632812468 in my local testing. I think it was -mcpu=sifive-x280 -O3. I think that's a reduction of around 0.8%

asb

LGTM.

(Question for future iterations, rather than this patch): Do we ever get an ORI we'd want to fold, due to DAGCombiner converting a+b to a|b when that's a legal transformation?

mgudim · 2023-10-03T14:27:07Z

I can confirm that the train dataset on 429.mcf went from 17766169495 instructions to 17632812468 in my local testing. I think it was -mcpu=sifive-x280 -O3. I think that's a reduction of around 0.8%

Great, this is what I saw as well. Thanks for confirming!

wangpc-pp

LGTM.

This uses the recently introduced sink-and-fold support in MachineSink. https://reviews.llvm.org/D152828 This enables folding ADDI into load/store addresses. Enabling by default will be a separate PR.

topperc added the backend:RISC-V label Sep 27, 2023

topperc requested review from asb, luismarques, jrtc27, preames, mgudim and wangpc-pp September 27, 2023 20:28

topperc changed the title ~~[RISCV] Enable sink-and-fold for RISC-V.~~ [RISCV][WIP] Enable sink-and-fold for RISC-V. Sep 27, 2023

topperc commented Sep 27, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp Show resolved Hide resolved

luismarques marked this pull request as ready for review September 27, 2023 20:37

mgudim reviewed Oct 2, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp Outdated Show resolved Hide resolved

mgudim approved these changes Oct 2, 2023

View reviewed changes

mgudim reviewed Oct 2, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp Show resolved Hide resolved

mgudim reviewed Oct 2, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp Show resolved Hide resolved

topperc force-pushed the pr/sink-and-fold branch from e721d9e to 2e7456a Compare October 2, 2023 19:14

topperc changed the title ~~[RISCV][WIP] Enable sink-and-fold for RISC-V.~~ [RISCV] Add sink-and-fold support for RISC-V. Oct 2, 2023

asb approved these changes Oct 3, 2023

View reviewed changes

wangpc-pp approved these changes Oct 7, 2023

View reviewed changes

[RISCV] Add sink-and-fold support for RISC-V.

201bfb1

This uses the recently introduced sink-and-fold support in MachineSink. https://reviews.llvm.org/D152828 This enables folding ADDI into load/store addresses. Enabling by default will be a separate PR.

topperc force-pushed the pr/sink-and-fold branch from 2e7456a to 201bfb1 Compare October 7, 2023 17:28

topperc merged commit 45636ec into llvm:main Oct 7, 2023
2 of 3 checks passed

topperc deleted the pr/sink-and-fold branch October 7, 2023 17:38

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Add sink-and-fold support for RISC-V. #67602

[RISCV] Add sink-and-fold support for RISC-V. #67602

topperc commented Sep 27, 2023 •

edited

llvmbot commented Sep 27, 2023

github-actions bot commented Sep 27, 2023 •

edited

mgudim commented Oct 2, 2023

topperc commented Oct 2, 2023 •

edited

preames commented Oct 2, 2023

topperc commented Oct 3, 2023 •

edited

asb left a comment

mgudim commented Oct 3, 2023

wangpc-pp left a comment

[RISCV] Add sink-and-fold support for RISC-V. #67602

[RISCV] Add sink-and-fold support for RISC-V. #67602

Conversation

topperc commented Sep 27, 2023 • edited

llvmbot commented Sep 27, 2023

github-actions bot commented Sep 27, 2023 • edited

mgudim commented Oct 2, 2023

topperc commented Oct 2, 2023 • edited

preames commented Oct 2, 2023

topperc commented Oct 3, 2023 • edited

asb left a comment

Choose a reason for hiding this comment

mgudim commented Oct 3, 2023

wangpc-pp left a comment

Choose a reason for hiding this comment

topperc commented Sep 27, 2023 •

edited

github-actions bot commented Sep 27, 2023 •

edited

topperc commented Oct 2, 2023 •

edited

topperc commented Oct 3, 2023 •

edited