[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

preames · 2023-10-20T19:37:23Z

If we have a vsetvli which is toggling from a state with a constant AVL, to a state with the same constant AVL, then the vsetvli can use the x0, x0 VL preserving form provided that said VL is less than or equal to the minimal VLMAX of either state. (i.e. that AVL=VL for both states)

VTYPE-only toggles are generally cheaper, and these patterns show up a lot with mixed width arithmetic and large types which have been legalized via splitting.

Meta comments for the review:

I noticed this opportunity in the delta from [RISCV] Adjust LMUL if not used to avoid VL toggle #69259. I honestly feel quite silly for never noticing it before as it turned out to be both fairly trivial to implement and quite wide spread.
There's two refactorings in the current patch - one is just code motion and a rename, the other is adding the subtarget variable to the pass. Happy to separate these and land them if reviewers find it helpful.

If we have a vsetvli which is toggling from a state with a constant AVL, to a state with the same constant AVL, then the vsetvli can use the x0, x0 VL preserving form provided that said VL is less than or equal to the minimal VLMAX of either state. (i.e. that AVL=VL for both states) VTYPE-only toggles are generally cheaper, and these patterns show up a lot with mixed width arithmetic and large types which have been legalized via splitting.

llvmbot · 2023-10-20T19:38:38Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If we have a vsetvli which is toggling from a state with a constant AVL, to a state with the same constant AVL, then the vsetvli can use the x0, x0 VL preserving form provided that said VL is less than or equal to the minimal VLMAX of either state. (i.e. that AVL=VL for both states)

VTYPE-only toggles are generally cheaper, and these patterns show up a lot with mixed width arithmetic and large types which have been legalized via splitting.

Meta comments for the review:

I noticed this opportunity in the delta from [RISCV] Adjust LMUL if not used to avoid VL toggle #69259. I honestly feel quite silly for never noticing it before as it turned out to be both fairly trivial to implement and quite wide spread.
There's two refactorings in the current patch - one is just code motion and a rename, the other is adding the subtarget variable to the pass. Happy to separate these and land them if reviewers find it helpful.

Patch is 148.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69759.diff

36 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+37-30)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll (+100-100)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+21-21)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-conv.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-i2fp.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector-shuffle.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-exttrunc.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+7-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+122-122)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwadd.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwmul.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwsub.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsub.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-valid-elen-fp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 4c99da1244bf50c..bf7dd2359d9b458 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -720,6 +720,7 @@ struct BlockData {
 };
 
 class RISCVInsertVSETVLI : public MachineFunctionPass {
+  const RISCVSubtarget *ST;
   const TargetInstrInfo *TII;
   MachineRegisterInfo *MRI;
 
@@ -860,6 +861,28 @@ static VSETVLIInfo getInfoForVSETVLI(const MachineInstr &MI) {
   return NewInfo;
 }
 
+/// Return true if the VL value configured must be equal to the requested one.
+static bool willVLBeAVL(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
+  if (!Info.hasAVLImm())
+    // VLMAX is always the same value.
+    // TODO: Could extend to other registers by looking at the associated vreg
+    // def placement.
+    return RISCV::X0 == Info.getAVLReg();
+
+  unsigned AVL = Info.getAVLImm();
+  unsigned SEW = Info.getSEW();
+  unsigned AVLInBits = AVL * SEW;
+
+  unsigned LMul;
+  bool Fractional;
+  std::tie(LMul, Fractional) = RISCVVType::decodeVLMUL(Info.getVLMUL());
+
+  if (Fractional)
+    return ST.getRealMinVLen() / LMul >= AVLInBits;
+  return ST.getRealMinVLen() * LMul >= AVLInBits;
+}
+
+
 /// Return true if a vsetvli instruction to change from PrevInfo
 /// to Info might change the VL register.  If this returns false,
 /// the vsetvli can use the X0, X0 form.
@@ -885,6 +908,15 @@ bool RISCVInsertVSETVLI::mayChangeVL(const VSETVLIInfo &Info,
         return false;
     }
   }
+
+  // For constant AVL values less than VLMAX, we know that VL=AVL and thus
+  // if the two AVLs are the same, we know the VLs must also be.  As such,
+  // this vsetvli is not changing VL.
+  if (Info.hasAVLImm() && PrevInfo.hasAVLImm() &&
+      Info.getAVLImm() == PrevInfo.getAVLImm() &&
+      willVLBeAVL(Info, *ST) && willVLBeAVL(PrevInfo, *ST))
+    return false;
+
   return true;
 }
 
@@ -966,8 +998,7 @@ bool RISCVInsertVSETVLI::needVSETVLI(const MachineInstr &MI,
     return true;
 
   DemandedFields Used = getDemanded(MI, MRI);
-  bool HasVInstructionsF64 =
-      MI.getMF()->getSubtarget<RISCVSubtarget>().hasVInstructionsF64();
+  bool HasVInstructionsF64 = ST->hasVInstructionsF64();
 
   // A slidedown/slideup with an *undefined* merge op can freely clobber
   // elements not copied from the source vector (e.g. masked off, tail, or
@@ -1307,36 +1338,12 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
   }
 }
 
-/// Return true if the VL value configured must be equal to the requested one.
-static bool hasFixedResult(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
-  if (!Info.hasAVLImm())
-    // VLMAX is always the same value.
-    // TODO: Could extend to other registers by looking at the associated vreg
-    // def placement.
-    return RISCV::X0 == Info.getAVLReg();
-
-  unsigned AVL = Info.getAVLImm();
-  unsigned SEW = Info.getSEW();
-  unsigned AVLInBits = AVL * SEW;
-
-  unsigned LMul;
-  bool Fractional;
-  std::tie(LMul, Fractional) = RISCVVType::decodeVLMUL(Info.getVLMUL());
-
-  if (Fractional)
-    return ST.getRealMinVLen() / LMul >= AVLInBits;
-  return ST.getRealMinVLen() * LMul >= AVLInBits;
-}
-
 /// Perform simple partial redundancy elimination of the VSETVLI instructions
 /// we're about to insert by looking for cases where we can PRE from the
 /// beginning of one block to the end of one of its predecessors.  Specifically,
 /// this is geared to catch the common case of a fixed length vsetvl in a single
 /// block loop when it could execute once in the preheader instead.
 void RISCVInsertVSETVLI::doPRE(MachineBasicBlock &MBB) {
-  const MachineFunction &MF = *MBB.getParent();
-  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
-
   if (!BlockInfo[MBB.getNumber()].Pred.isUnknown())
     return;
 
@@ -1365,7 +1372,7 @@ void RISCVInsertVSETVLI::doPRE(MachineBasicBlock &MBB) {
     return;
 
   // If VL can be less than AVL, then we can't reduce the frequency of exec.
-  if (!hasFixedResult(AvailableInfo, ST))
+  if (!willVLBeAVL(AvailableInfo, *ST))
     return;
 
   // Model the effect of changing the input state of the block MBB to
@@ -1534,13 +1541,13 @@ void RISCVInsertVSETVLI::insertReadVL(MachineBasicBlock &MBB) {
 
 bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
   // Skip if the vector extension is not enabled.
-  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
-  if (!ST.hasVInstructions())
+  ST = &MF.getSubtarget<RISCVSubtarget>();
+  if (!ST->hasVInstructions())
     return false;
 
   LLVM_DEBUG(dbgs() << "Entering InsertVSETVLI for " << MF.getName() << "\n");
 
-  TII = ST.getInstrInfo();
+  TII = ST->getInstrInfo();
   MRI = &MF.getRegInfo();
 
   assert(BlockInfo.empty() && "Expect empty block infos");
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
index 4aaefb24d5aa279..16ef2bdedae7745 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
@@ -144,7 +144,7 @@ define <4 x i64> @sextload_v4i8_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf8 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -167,7 +167,7 @@ define <4 x i64> @zextload_v4i8_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf8 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -214,7 +214,7 @@ define <8 x i32> @sextload_v8i8_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -237,7 +237,7 @@ define <8 x i32> @zextload_v8i8_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -264,13 +264,13 @@ define <8 x i64> @sextload_v8i8_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -296,13 +296,13 @@ define <8 x i64> @zextload_v8i8_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -324,7 +324,7 @@ define <16 x i16> @sextload_v16i8_v16i16(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 8
-; LMULMAX1-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -347,7 +347,7 @@ define <16 x i16> @zextload_v16i8_v16i16(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 8
-; LMULMAX1-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -374,13 +374,13 @@ define <16 x i32> @sextload_v16i8_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -406,13 +406,13 @@ define <16 x i32> @zextload_v16i8_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -438,29 +438,29 @@ define <16 x i64> @sextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v12, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v13, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v13
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v14, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v13, v14
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v14, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v15, v11
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v16, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v11, v16
 ; LMULMAX1-NEXT:    ret
 ;
@@ -470,7 +470,7 @@ define <16 x i64> @sextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX4-NEXT:    vle8.v v16, (a0)
 ; LMULMAX4-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX4-NEXT:    vslidedown.vi v8, v16, 8
-; LMULMAX4-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; LMULMAX4-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; LMULMAX4-NEXT:    vsext.vf8 v12, v8
 ; LMULMAX4-NEXT:    vsext.vf8 v8, v16
 ; LMULMAX4-NEXT:    ret
@@ -490,29 +490,29 @@ define <16 x i64> @zextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v12, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v13, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v13
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v14, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v13, v14
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v14, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v15, v11
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v16, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v11, v16
 ; LMULMAX1-NEXT:    ret
 ;
@@ -522,7 +522,7 @@ define <16 x i64> @zextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX4-NEXT:    vle8.v v16, (a0)
 ; LMULMAX4-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX4-NEXT:    vslidedown.vi v8, v16, 8
-; LMULMAX4-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; LMULMAX4-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; LMULMAX4-NEXT:    vzext.vf8 v12, v8
 ; LMULMAX4-NEXT:    vzext.vf8 v8, v16
 ; LMULMAX4-NEXT:    ret
@@ -655,7 +655,7 @@ define <4 x i64> @sextload_v4i16_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -678,7 +678,7 @@ define <4 x i64> @zextload_v4i16_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -713,7 +713,7 @@ define <8 x i32> @sextload_v8i16_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -736,7 +736,7 @@ define <8 x i32> @zextload_v8i16_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -763,13 +763,13 @@ define <8 x i64> @sextload_v8i16_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -795,13 +795,13 @@ define <8 x i64> @zextload_v8i16_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -847,12 +847,12 @@ define <16 x i32> @sextload_v16i16_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v12, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
-; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v10, v12, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v11, v10
 ; LMULMAX1-NEXT:    vsext.vf2 v10, v12
 ; LMULMAX1-NEXT:    ret
@@ -877,12 +877,12 @@ define <16 x i32> @zextload_v16i16_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v12, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
-; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LM...
[truncated]

github-actions · 2023-10-20T19:48:31Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 9399094586aa803fb399ae508e5aa46c8728fdd6 622682a707bc53b5ee411b8d6778b77299832749 -- llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index bf7dd2359d9b..4b3edd237135 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -882,7 +882,6 @@ static bool willVLBeAVL(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
   return ST.getRealMinVLen() * LMul >= AVLInBits;
 }
 
-
 /// Return true if a vsetvli instruction to change from PrevInfo
 /// to Info might change the VL register.  If this returns false,
 /// the vsetvli can use the X0, X0 form.
@@ -913,8 +912,8 @@ bool RISCVInsertVSETVLI::mayChangeVL(const VSETVLIInfo &Info,
   // if the two AVLs are the same, we know the VLs must also be.  As such,
   // this vsetvli is not changing VL.
   if (Info.hasAVLImm() && PrevInfo.hasAVLImm() &&
-      Info.getAVLImm() == PrevInfo.getAVLImm() &&
-      willVLBeAVL(Info, *ST) && willVLBeAVL(PrevInfo, *ST))
+      Info.getAVLImm() == PrevInfo.getAVLImm() && willVLBeAVL(Info, *ST) &&
+      willVLBeAVL(PrevInfo, *ST))
     return false;
 
   return true;

lukel97 · 2023-10-20T21:29:12Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll

@@ -406,13 +406,13 @@ define <16 x i32> @zextload_v16i8_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma


Even though we know that AVL <= VLMAX and so VL won't be changed here, is this still valid per the spec?

When rs1=x0 and rd=x0, the instruction operates as if the current vector length in vl is used as the AVL, and the resulting value is written to vl, but not to a destination register. This form can only be used when VLMAX and hence vl is not actually changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved. Implementations may set vill in this case.

Since we're still changing the SEW/LMUL ratio and thus VLMAX, wouldn't this be enough to satisfy the "Use of the instruction with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved." clause?

You are unfortunately correct here. This is an absolutely silly choice for the specification to make. It doesn't disallow an implementation to have the sane semantics here, but you're right that we can't rely on this instruction being valid when VLMAX changes (even if VL doesn't.)

topperc

LGTM. Left a couple comments for possible optimizations.

topperc · 2023-10-21T05:45:04Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll

@@ -14,7 +14,7 @@ define <vscale x 8 x i32> @insert_nxv8i32_v2i32_0(<vscale x 8 x i32> %vec, ptr %
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vle32.v v12, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m4, tu, ma
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, tu, ma
 ; CHECK-NEXT:    vmv.v.v v8, v12


Can we just do a tail undisturbed load to avoid the vmv.v.v?

topperc · 2023-10-21T05:47:28Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll

@@ -23,7 +23,7 @@ define void @extract_v2i8_v4i8_2(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vle8.v v8, (a0)
 ; CHECK-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
 ; CHECK-NEXT:    vslidedown.vi v8, v8, 2
-; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-NEXT:    vsetvli zero, zero, e8, mf8, ta, ma


Seems like we could get away with just doing the store with the old vtype?

I spent more time on this than was warranted, and end up deciding this wasn't worth it. I can do stores specifically, but reasoning about when it's legal to increase and decrease LMUL gets into all kinds of cornercases since we have instructions which aren't regular in VL (i.e. vrgather.vv can read past VL) and LMUL (i.e. widening operations write to a wider source than LMUL would lead you to believe.) I ended up either having to do a very narrow special case, or have to write very subtle predicates for all instructions. I ended up deciding this just isn't worth it for the moment.

preames · 2023-10-23T18:25:59Z

As pointed out by @lukel97 , the transform proposed is invalid per the specification wording. Going to take another look at these cases, but will clearly have to come at it from another angle.

llvmbot added the backend:RISC-V label Oct 20, 2023

preames mentioned this pull request Oct 20, 2023

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

Closed

lukel97 reviewed Oct 20, 2023

View reviewed changes

topperc approved these changes Oct 21, 2023

View reviewed changes

preames closed this Oct 23, 2023

preames deleted the pr-riscv-vsetvli-VTYPE-toggle branch October 24, 2023 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

preames commented Oct 20, 2023

llvmbot commented Oct 20, 2023

github-actions bot commented Oct 20, 2023

lukel97 Oct 20, 2023

preames Oct 23, 2023

topperc left a comment

topperc Oct 21, 2023

topperc Oct 21, 2023

preames Oct 24, 2023

preames commented Oct 23, 2023

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

Conversation

preames commented Oct 20, 2023

llvmbot commented Oct 20, 2023

github-actions bot commented Oct 20, 2023

lukel97 Oct 20, 2023

Choose a reason for hiding this comment

preames Oct 23, 2023

Choose a reason for hiding this comment

topperc left a comment

Choose a reason for hiding this comment

topperc Oct 21, 2023

Choose a reason for hiding this comment

topperc Oct 21, 2023

Choose a reason for hiding this comment

preames Oct 24, 2023

Choose a reason for hiding this comment

preames commented Oct 23, 2023