[RISCV] Minimally modify incoming state in transferBefore #72352

lukel97 · 2023-11-15T06:38:58Z

transferBefore currently takes an incoming state and an instruction, computes
the new state needed for the instruction, and then modifies that new state to
be more similar to the incoming state.

This patch reverses the approach by instead taking the incoming state and
modifying only the bits that are demanded by the instruction.

It is currently NFC, but later patches would rework adjustIncoming to bring back the improvements from earlier revisions of this PR.

llvmbot · 2023-11-15T06:39:32Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

transferBefore currently takes an incoming state and an instruction, computes
the new state needed for the instruction, and then modifies that new state to
be more similar to the incoming state.

This patch reverses the approach by instead taking the incoming state and
modifying only the bits that are demanded by the instruction.

I find this makes things slightly easier to reason about. It also us to use the
x0, x0 form in more places, in particular with vmv.x.s since we're no longer
relying on isScalarInsertInstr, but instead reusing the logic in getDemanded.

I haven't had a chance to check, but hopefully this might also address some of
the regressions seen in #71501.

Patch is 100.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72352.diff

13 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+53-48)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll (+10-10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+185-326)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+42-42)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+21-21)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+24-24)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+10-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-int-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-int.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (+1-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-regression.ll (+1-1)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 999ff9920716e87..87e595ab20cf83e 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -479,6 +479,8 @@ class VSETVLIInfo {
 
   unsigned getSEW() const { return SEW; }
   RISCVII::VLMUL getVLMUL() const { return VLMul; }
+  bool getTailAgnostic() const { return TailAgnostic; }
+  bool getMaskAgnostic() const { return MaskAgnostic; }
 
   bool hasNonZeroAVL(const MachineRegisterInfo &MRI) const {
     if (hasAVLImm())
@@ -1013,73 +1015,76 @@ bool RISCVInsertVSETVLI::needVSETVLI(const MachineInstr &MI,
   return true;
 }
 
-// Given an incoming state reaching MI, modifies that state so that it is minimally
-// compatible with MI.  The resulting state is guaranteed to be semantically legal
-// for MI, but may not be the state requested by MI.
+// Given an incoming state reaching MI, minimally modifies that state so that it
+// is compatible with MI. The resulting state is guaranteed to be semantically
+// legal for MI, but may not be the state requested by MI.
 void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info,
                                         const MachineInstr &MI) const {
   uint64_t TSFlags = MI.getDesc().TSFlags;
   if (!RISCVII::hasSEWOp(TSFlags))
     return;
 
-  const VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
+  VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
   if (Info.isValid() && !needVSETVLI(MI, NewInfo, Info))
     return;
 
-  const VSETVLIInfo PrevInfo = Info;
-  Info = NewInfo;
+  if (Info.hasSEWLMULRatioOnly() || !Info.isValid() || Info.isUnknown())
+    Info = NewInfo;
 
-  if (!RISCVII::hasVLOp(TSFlags))
-    return;
+  DemandedFields Demanded = getDemanded(MI, MRI, ST);
 
   // If we don't use LMUL or the SEW/LMUL ratio, then adjust LMUL so that we
   // maintain the SEW/LMUL ratio. This allows us to eliminate VL toggles in more
   // places.
-  DemandedFields Demanded = getDemanded(MI, MRI, ST);
   if (!Demanded.LMUL && !Demanded.SEWLMULRatio && Info.isValid() &&
-      PrevInfo.isValid() && !Info.isUnknown() && !PrevInfo.isUnknown()) {
-    if (auto NewVLMul = RISCVVType::getSameRatioLMUL(
-            PrevInfo.getSEW(), PrevInfo.getVLMUL(), Info.getSEW()))
-      Info.setVLMul(*NewVLMul);
-  }
-
-  // For vmv.s.x and vfmv.s.f, there are only two behaviors, VL = 0 and
-  // VL > 0. We can discard the user requested AVL and just use the last
-  // one if we can prove it equally zero.  This removes a vsetvli entirely
-  // if the types match or allows use of cheaper avl preserving variant
-  // if VLMAX doesn't change.  If VLMAX might change, we couldn't use
-  // the 'vsetvli x0, x0, vtype" variant, so we avoid the transform to
-  // prevent extending live range of an avl register operand.
-  // TODO: We can probably relax this for immediates.
-  if (isScalarInsertInstr(MI) && PrevInfo.isValid() &&
-      PrevInfo.hasEquallyZeroAVL(Info, *MRI) &&
-      Info.hasSameVLMAX(PrevInfo)) {
-    if (PrevInfo.hasAVLImm())
-      Info.setAVLImm(PrevInfo.getAVLImm());
-    else
-      Info.setAVLReg(PrevInfo.getAVLReg());
-    return;
+      !Info.isUnknown() && !Info.hasSEWLMULRatioOnly()) {
+    if (auto SameRatioLMUL = RISCVVType::getSameRatioLMUL(
+            Info.getSEW(), Info.getVLMUL(), NewInfo.getSEW())) {
+      NewInfo.setVLMul(*SameRatioLMUL);
+      Demanded.LMUL = true;
+    }
   }
 
-  // If AVL is defined by a vsetvli with the same VLMAX, we can
-  // replace the AVL operand with the AVL of the defining vsetvli.
-  // We avoid general register AVLs to avoid extending live ranges
-  // without being sure we can kill the original source reg entirely.
-  if (!Info.hasAVLReg() || !Info.getAVLReg().isVirtual())
-    return;
-  MachineInstr *DefMI = MRI->getVRegDef(Info.getAVLReg());
-  if (!DefMI || !isVectorConfigInstr(*DefMI))
-    return;
+  // If AVL is defined by a vsetvli with the same VLMAX, we can replace the AVL
+  // operand with the AVL of the defining vsetvli.  We avoid general register
+  // AVLs to avoid extending live ranges without being sure we can kill the
+  // original source reg entirely.
+  if (RISCVII::hasVLOp(TSFlags) && NewInfo.hasAVLReg() &&
+      NewInfo.getAVLReg().isVirtual()) {
+    MachineInstr *DefMI = MRI->getVRegDef(NewInfo.getAVLReg());
+    if (DefMI && isVectorConfigInstr(*DefMI)) {
+      VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
+      if (DefInfo.hasSameVLMAX(NewInfo) &&
+          (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) {
+        if (DefInfo.hasAVLImm())
+          NewInfo.setAVLImm(DefInfo.getAVLImm());
+        else
+          NewInfo.setAVLReg(DefInfo.getAVLReg());
+      }
+    }
+  }
 
-  VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
-  if (DefInfo.hasSameVLMAX(Info) &&
-      (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) {
-    if (DefInfo.hasAVLImm())
-      Info.setAVLImm(DefInfo.getAVLImm());
+  // If MI only demands that VL has the same zeroness, we only need to set the
+  // AVL if the zeroness differs, or if VLMAX changes (since that prevents us
+  // from using vsetvli x0, x0).
+  bool CanUseX0X0Form =
+      Info.hasEquallyZeroAVL(NewInfo, *MRI) && Info.hasSameVLMAX(NewInfo);
+  if (Demanded.VLAny || (Demanded.VLZeroness && !CanUseX0X0Form)) {
+    if (NewInfo.hasAVLImm())
+      Info.setAVLImm(NewInfo.getAVLImm());
     else
-      Info.setAVLReg(DefInfo.getAVLReg());
-    return;
-  }
+      Info.setAVLReg(NewInfo.getAVLReg());
+  }
+
+  Info.setVTYPE(
+      ((Demanded.LMUL || Demanded.SEWLMULRatio) ? NewInfo : Info).getVLMUL(),
+      ((Demanded.SEW || Demanded.SEWLMULRatio) ? NewInfo : Info).getSEW(),
+      // Prefer tail/mask agnostic since it can be relaxed to undisturbed later
+      // if needed.
+      (Demanded.TailPolicy ? NewInfo : Info).getTailAgnostic() ||
+          NewInfo.getTailAgnostic(),
+      (Demanded.MaskPolicy ? NewInfo : Info).getMaskAgnostic() ||
+          NewInfo.getMaskAgnostic());
 }
 
 // Given a state with which we evaluated MI (see transferBefore above for why
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
index 9d689c732d7999f..7ad01ceea33b154 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
@@ -106,7 +106,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
 ; RV32-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; RV32-NEXT:    vle8.v v8, (a0)
 ; RV32-NEXT:    vmseq.vi v8, v8, 0
-; RV32-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; RV32-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV32-NEXT:    vmv.x.s a0, v8
 ; RV32-NEXT:    srl a0, a0, a1
 ; RV32-NEXT:    andi a0, a0, 1
@@ -117,7 +117,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
 ; RV64-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; RV64-NEXT:    vle8.v v8, (a0)
 ; RV64-NEXT:    vmseq.vi v8, v8, 0
-; RV64-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; RV64-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV64-NEXT:    vmv.x.s a0, v8
 ; RV64-NEXT:    srl a0, a0, a1
 ; RV64-NEXT:    andi a0, a0, 1
@@ -128,7 +128,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
 ; RV32ZBS-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; RV32ZBS-NEXT:    vle8.v v8, (a0)
 ; RV32ZBS-NEXT:    vmseq.vi v8, v8, 0
-; RV32ZBS-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; RV32ZBS-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV32ZBS-NEXT:    vmv.x.s a0, v8
 ; RV32ZBS-NEXT:    bext a0, a0, a1
 ; RV32ZBS-NEXT:    ret
@@ -138,7 +138,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
 ; RV64ZBS-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; RV64ZBS-NEXT:    vle8.v v8, (a0)
 ; RV64ZBS-NEXT:    vmseq.vi v8, v8, 0
-; RV64ZBS-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; RV64ZBS-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; RV64ZBS-NEXT:    vmv.x.s a0, v8
 ; RV64ZBS-NEXT:    bext a0, a0, a1
 ; RV64ZBS-NEXT:    ret
@@ -155,7 +155,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
 ; RV32-NEXT:    vsetvli zero, a2, e8, m2, ta, ma
 ; RV32-NEXT:    vle8.v v8, (a0)
 ; RV32-NEXT:    vmseq.vi v10, v8, 0
-; RV32-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.x.s a0, v10
 ; RV32-NEXT:    srl a0, a0, a1
 ; RV32-NEXT:    andi a0, a0, 1
@@ -167,7 +167,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
 ; RV64-NEXT:    vsetvli zero, a2, e8, m2, ta, ma
 ; RV64-NEXT:    vle8.v v8, (a0)
 ; RV64-NEXT:    vmseq.vi v10, v8, 0
-; RV64-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vmv.x.s a0, v10
 ; RV64-NEXT:    srl a0, a0, a1
 ; RV64-NEXT:    andi a0, a0, 1
@@ -179,7 +179,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
 ; RV32ZBS-NEXT:    vsetvli zero, a2, e8, m2, ta, ma
 ; RV32ZBS-NEXT:    vle8.v v8, (a0)
 ; RV32ZBS-NEXT:    vmseq.vi v10, v8, 0
-; RV32ZBS-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV32ZBS-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; RV32ZBS-NEXT:    vmv.x.s a0, v10
 ; RV32ZBS-NEXT:    bext a0, a0, a1
 ; RV32ZBS-NEXT:    ret
@@ -190,7 +190,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
 ; RV64ZBS-NEXT:    vsetvli zero, a2, e8, m2, ta, ma
 ; RV64ZBS-NEXT:    vle8.v v8, (a0)
 ; RV64ZBS-NEXT:    vmseq.vi v10, v8, 0
-; RV64ZBS-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV64ZBS-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; RV64ZBS-NEXT:    vmv.x.s a0, v10
 ; RV64ZBS-NEXT:    bext a0, a0, a1
 ; RV64ZBS-NEXT:    ret
@@ -221,7 +221,7 @@ define i1 @extractelt_v64i1(ptr %x, i64 %idx) nounwind {
 ; RV64-NEXT:    vsetvli zero, a2, e8, m4, ta, ma
 ; RV64-NEXT:    vle8.v v8, (a0)
 ; RV64-NEXT:    vmseq.vi v12, v8, 0
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vsetvli zero, a2, e64, m4, ta, ma
 ; RV64-NEXT:    vmv.x.s a0, v12
 ; RV64-NEXT:    srl a0, a0, a1
 ; RV64-NEXT:    andi a0, a0, 1
@@ -246,7 +246,7 @@ define i1 @extractelt_v64i1(ptr %x, i64 %idx) nounwind {
 ; RV64ZBS-NEXT:    vsetvli zero, a2, e8, m4, ta, ma
 ; RV64ZBS-NEXT:    vle8.v v8, (a0)
 ; RV64ZBS-NEXT:    vmseq.vi v12, v8, 0
-; RV64ZBS-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64ZBS-NEXT:    vsetvli zero, a2, e64, m4, ta, ma
 ; RV64ZBS-NEXT:    vmv.x.s a0, v12
 ; RV64ZBS-NEXT:    bext a0, a0, a1
 ; RV64ZBS-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
index 728cf18e1a77d8a..aec9babec33a095 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
@@ -543,9 +543,8 @@ define <4 x i8> @mgather_truemask_v4i8(<4 x ptr> %ptrs, <4 x i8> %passthru) {
 ; RV64ZVE32F-NEXT:  .LBB9_6: # %cond.load1
 ; RV64ZVE32F-NEXT:    ld a2, 8(a0)
 ; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
-; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e8, mf4, tu, ma
+; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 1
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB9_3
@@ -1267,9 +1266,8 @@ define <4 x i16> @mgather_truemask_v4i16(<4 x ptr> %ptrs, <4 x i16> %passthru) {
 ; RV64ZVE32F-NEXT:  .LBB20_6: # %cond.load1
 ; RV64ZVE32F-NEXT:    ld a2, 8(a0)
 ; RV64ZVE32F-NEXT:    lh a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
-; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e16, mf2, tu, ma
+; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 1
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB20_3
@@ -2351,9 +2349,8 @@ define <4 x i32> @mgather_truemask_v4i32(<4 x ptr> %ptrs, <4 x i32> %passthru) {
 ; RV64ZVE32F-NEXT:  .LBB32_6: # %cond.load1
 ; RV64ZVE32F-NEXT:    ld a2, 8(a0)
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
+; RV64ZVE32F-NEXT:    vmv.s.x v9, a2
 ; RV64ZVE32F-NEXT:    vslideup.vi v8, v9, 1
 ; RV64ZVE32F-NEXT:    andi a2, a1, 4
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB32_3
@@ -3844,7 +3841,7 @@ define <4 x i64> @mgather_truemask_v4i64(<4 x ptr> %ptrs, <4 x i64> %passthru) {
 ; RV32ZVE32F-NEXT:    vmv.x.s a6, v9
 ; RV32ZVE32F-NEXT:    bnez zero, .LBB45_5
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a2, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -4265,12 +4262,12 @@ define <8 x i64> @mgather_baseidx_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8 x i1>
 ; RV32ZVE32F-NEXT:    vsext.vf4 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB48_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -4539,12 +4536,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8
 ; RV32ZVE32F-NEXT:    vsext.vf4 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB49_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -4815,12 +4812,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8
 ; RV32ZVE32F-NEXT:    vzext.vf4 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB50_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -5098,12 +5095,12 @@ define <8 x i64> @mgather_baseidx_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <8 x i
 ; RV32ZVE32F-NEXT:    vsext.vf2 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB51_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -5373,12 +5370,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <
 ; RV32ZVE32F-NEXT:    vsext.vf2 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB52_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -5650,12 +5647,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <
 ; RV32ZVE32F-NEXT:    vzext.vf2 v10, v8
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v10, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB53_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -5934,12 +5931,12 @@ define <8 x i64> @mgather_baseidx_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <8 x i
 ; RV32ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v8, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB54_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -6207,12 +6204,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <
 ; RV32ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v8, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB55_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -6481,12 +6478,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <
 ; RV32ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v8, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB56_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a3, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a3)
 ; RV32ZVE32F-NEXT:    lw a3, 0(a3)
@@ -6779,12 +6776,12 @@ define <8 x i64> @mgather_baseidx_v8i64(ptr %base, <8 x i64> %idxs, <8 x i1> %m,
 ; RV32ZVE32F-NEXT:    vslide1down.vx v8, v8, a4
 ; RV32ZVE32F-NEXT:    vsll.vi v8, v8, 3
 ; RV32ZVE32F-NEXT:    vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT:    vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s t0, v0
 ; RV32ZVE32F-NEXT:    andi a1, t0, 1
 ; RV32ZVE32F-NEXT:    beqz a1, .LBB57_7
 ; RV32ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; RV32ZVE32F-NEXT:    vmv.x.s a2, v8
 ; RV32ZVE32F-NEXT:    lw a1, 4(a2)
 ; RV32ZVE32F-NEXT:    lw a2, 0(a2)
@@ -7216,9 +7213,8 @@ define <4 x half> @mgather_truemask_v4f16(<4 x ptr> %ptrs, <4 x half> %passthru)
 ; RV64ZVE32F-NEXT:  .LBB61_6: # %cond.load1
 ; RV64ZVE32F-NEXT:    ld a2, 8(a0)
 ; RV64ZVE32F-NEXT:    flh fa5, 0(a2)
-; RV64ZVE32...
[truncated]

preames

This seems like a generally good idea, some style clean up needed before landing.

preames · 2023-11-16T15:42:56Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

 void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info,
                                        const MachineInstr &MI) const {
  uint64_t TSFlags = MI.getDesc().TSFlags;
  if (!RISCVII::hasSEWOp(TSFlags))
    return;

-  const VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
+  VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);


Please leave NewInfo and PrevInfo unchanged. Use Info as your mutable state variable if desired. Mixing the meaning leads to subtle confusions below.

I gave this an attempt, but couldn't figure out a good way of doing so. Mainly because the SEW/LMUL ratio "touch up" has to happen on NewInfo before we actually mutate Info, since CanUseX0X0Form is dependent on said ratio.

We were still essentially mutating NewInfo before this patch, it was just a bit obfuscated because we swapped it into Info first. Any ideas here?

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in #72352

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in #72352

The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in llvm#72352

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352

The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in llvm#72352

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352

There is an optimisation in transferBefore where if a VSETVLIInfo uses the AVL of a defining vsetvli, it uses that vsetvli's AVL provided VLMAX is the same. This patch moves it out of transferBefore and up into computeInfoForInstr to show how it isn't affected by the other optimisations in transferBefore, and to simplify the control flow by removing an early return. This should make llvm#72352 easier to reason about.

lukel97 · 2023-11-30T10:30:56Z

I have rebased this upon #73909 and added back an early exit that was responsible for a lot of test diffs. Ideally we should be able to view the diff of the four commits past #73909, but GitHub seems to error when I try this:

Here's a link to the diff instead: lukel97/llvm-project@move-avl-reg-coalescing...lukel97:llvm-project:reverse-transferBefore

lukel97 · 2023-11-30T16:37:46Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

-// Given an incoming state reaching MI, modifies that state so that it is minimally
-// compatible with MI.  The resulting state is guaranteed to be semantically legal
-// for MI, but may not be the state requested by MI.
+static VSETVLIInfo adjustIncoming(VSETVLIInfo PrevInfo, VSETVLIInfo NewInfo,


This forward definition is solely to improve the diff

preames

LGTM

…FC (#73909) There is an optimisation in transferBefore where if a VSETVLIInfo uses the AVL of a defining vsetvli, it uses that vsetvli's AVL provided VLMAX is the same. This patch moves it out of transferBefore and up into computeInfoForInstr to show how it isn't affected by the other optimisations in transferBefore, and to simplify the control flow by removing an early return. This should make #72352 easier to reason about.

transferBefore currently takes an incoming state and an instruction, computes the new state needed for the instruction, and then modifies that new state to be more similar to the incoming state. This patch reverses the approach by instead taking the incoming state and modifying only the bits that are demanded by the instruction. I find this makes things slightly easier to reason about. It also us to use the x0, x0 form in more places, in particular with vmv.x.s since we're no longer relying on isScalarInsertInstr, but instead reusing the logic in getDemanded. I haven't had a chance to check, but hopefully this might also address some of the regressions seen in llvm#71501.

lukel97 requested review from preames, topperc and wangpc-pp November 15, 2023 06:38

llvmbot added the backend:RISC-V label Nov 15, 2023

lukel97 mentioned this pull request Nov 15, 2023

[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants #71501

Merged

preames reviewed Nov 16, 2023

View reviewed changes

lukel97 added a commit that referenced this pull request Nov 20, 2023

[RISCV] Remove checks that MI's info is valid. NFC

933dd03

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in #72352

lukel97 force-pushed the reverse-transferBefore branch from 534f7c3 to 9d9fcf6 Compare November 20, 2023 06:00

sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023

[RISCV] Remove checks that MI's info is valid. NFC

0e964f6

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

[RISCV] Remove checks that MI's info is valid. NFC

4679a9f

It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352

lukel97 mentioned this pull request Nov 30, 2023

[RISCV] Move AVL coalescing logic upwards into computeInfoForInstr. NFC #73909

Merged

lukel97 force-pushed the reverse-transferBefore branch from 9d9fcf6 to 53a16ba Compare November 30, 2023 10:24

lukel97 force-pushed the reverse-transferBefore branch from 53a16ba to 731800f Compare November 30, 2023 16:34

lukel97 commented Nov 30, 2023

View reviewed changes

preames approved these changes Nov 30, 2023

View reviewed changes

lukel97 added 4 commits December 1, 2023 13:12

Update test outside of rvv directory

47b7583

Try to reduce code diff

98e51c8

Rework to keep old logic

365ebae

lukel97 force-pushed the reverse-transferBefore branch from 731800f to 365ebae Compare December 1, 2023 05:13

lukel97 merged commit cf1a979 into llvm:main Dec 1, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Minimally modify incoming state in transferBefore #72352

[RISCV] Minimally modify incoming state in transferBefore #72352

lukel97 commented Nov 15, 2023 •

edited

Loading

llvmbot commented Nov 15, 2023

preames left a comment

preames Nov 16, 2023

lukel97 Nov 20, 2023

lukel97 commented Nov 30, 2023

lukel97 Nov 30, 2023

preames left a comment

[RISCV] Minimally modify incoming state in transferBefore #72352

[RISCV] Minimally modify incoming state in transferBefore #72352

Conversation

lukel97 commented Nov 15, 2023 • edited Loading

llvmbot commented Nov 15, 2023

preames left a comment

Choose a reason for hiding this comment

preames Nov 16, 2023

Choose a reason for hiding this comment

lukel97 Nov 20, 2023

Choose a reason for hiding this comment

lukel97 commented Nov 30, 2023

lukel97 Nov 30, 2023

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

lukel97 commented Nov 15, 2023 •

edited

Loading