Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][VSETVLI] Prefer VTYPE for immediate known to be less than VLMAX #69759

Closed
wants to merge 1 commit into from

Conversation

preames
Copy link
Collaborator

@preames preames commented Oct 20, 2023

If we have a vsetvli which is toggling from a state with a constant AVL, to a state with the same constant AVL, then the vsetvli can use the x0, x0 VL preserving form provided that said VL is less than or equal to the minimal VLMAX of either state. (i.e. that AVL=VL for both states)

VTYPE-only toggles are generally cheaper, and these patterns show up a lot with mixed width arithmetic and large types which have been legalized via splitting.

Meta comments for the review:

  • I noticed this opportunity in the delta from [RISCV] Adjust LMUL if not used to avoid VL toggle #69259. I honestly feel quite silly for never noticing it before as it turned out to be both fairly trivial to implement and quite wide spread.
  • There's two refactorings in the current patch - one is just code motion and a rename, the other is adding the subtarget variable to the pass. Happy to separate these and land them if reviewers find it helpful.

If we have a vsetvli which is toggling from a state with a constant AVL,
to a state with the same constant AVL, then the vsetvli can use the x0, x0
VL preserving form provided that said VL is less than or equal to the
minimal VLMAX of either state.  (i.e. that AVL=VL for both states)

VTYPE-only toggles are generally cheaper, and these patterns show up a
lot with mixed width arithmetic and large types which have been legalized
via splitting.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 20, 2023

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If we have a vsetvli which is toggling from a state with a constant AVL, to a state with the same constant AVL, then the vsetvli can use the x0, x0 VL preserving form provided that said VL is less than or equal to the minimal VLMAX of either state. (i.e. that AVL=VL for both states)

VTYPE-only toggles are generally cheaper, and these patterns show up a lot with mixed width arithmetic and large types which have been legalized via splitting.

Meta comments for the review:

  • I noticed this opportunity in the delta from [RISCV] Adjust LMUL if not used to avoid VL toggle #69259. I honestly feel quite silly for never noticing it before as it turned out to be both fairly trivial to implement and quite wide spread.
  • There's two refactorings in the current patch - one is just code motion and a rename, the other is adding the subtarget variable to the pass. Happy to separate these and land them if reviewers find it helpful.

Patch is 148.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69759.diff

36 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+37-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-conv.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-i2fp.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector-shuffle.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-exttrunc.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+122-122)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwadd.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwmul.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfwsub.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmul.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsub.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-valid-elen-fp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 4c99da1244bf50c..bf7dd2359d9b458 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -720,6 +720,7 @@ struct BlockData {
 };
 
 class RISCVInsertVSETVLI : public MachineFunctionPass {
+  const RISCVSubtarget *ST;
   const TargetInstrInfo *TII;
   MachineRegisterInfo *MRI;
 
@@ -860,6 +861,28 @@ static VSETVLIInfo getInfoForVSETVLI(const MachineInstr &MI) {
   return NewInfo;
 }
 
+/// Return true if the VL value configured must be equal to the requested one.
+static bool willVLBeAVL(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
+  if (!Info.hasAVLImm())
+    // VLMAX is always the same value.
+    // TODO: Could extend to other registers by looking at the associated vreg
+    // def placement.
+    return RISCV::X0 == Info.getAVLReg();
+
+  unsigned AVL = Info.getAVLImm();
+  unsigned SEW = Info.getSEW();
+  unsigned AVLInBits = AVL * SEW;
+
+  unsigned LMul;
+  bool Fractional;
+  std::tie(LMul, Fractional) = RISCVVType::decodeVLMUL(Info.getVLMUL());
+
+  if (Fractional)
+    return ST.getRealMinVLen() / LMul >= AVLInBits;
+  return ST.getRealMinVLen() * LMul >= AVLInBits;
+}
+
+
 /// Return true if a vsetvli instruction to change from PrevInfo
 /// to Info might change the VL register.  If this returns false,
 /// the vsetvli can use the X0, X0 form.
@@ -885,6 +908,15 @@ bool RISCVInsertVSETVLI::mayChangeVL(const VSETVLIInfo &Info,
         return false;
     }
   }
+
+  // For constant AVL values less than VLMAX, we know that VL=AVL and thus
+  // if the two AVLs are the same, we know the VLs must also be.  As such,
+  // this vsetvli is not changing VL.
+  if (Info.hasAVLImm() && PrevInfo.hasAVLImm() &&
+      Info.getAVLImm() == PrevInfo.getAVLImm() &&
+      willVLBeAVL(Info, *ST) && willVLBeAVL(PrevInfo, *ST))
+    return false;
+
   return true;
 }
 
@@ -966,8 +998,7 @@ bool RISCVInsertVSETVLI::needVSETVLI(const MachineInstr &MI,
     return true;
 
   DemandedFields Used = getDemanded(MI, MRI);
-  bool HasVInstructionsF64 =
-      MI.getMF()->getSubtarget<RISCVSubtarget>().hasVInstructionsF64();
+  bool HasVInstructionsF64 = ST->hasVInstructionsF64();
 
   // A slidedown/slideup with an *undefined* merge op can freely clobber
   // elements not copied from the source vector (e.g. masked off, tail, or
@@ -1307,36 +1338,12 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
   }
 }
 
-/// Return true if the VL value configured must be equal to the requested one.
-static bool hasFixedResult(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
-  if (!Info.hasAVLImm())
-    // VLMAX is always the same value.
-    // TODO: Could extend to other registers by looking at the associated vreg
-    // def placement.
-    return RISCV::X0 == Info.getAVLReg();
-
-  unsigned AVL = Info.getAVLImm();
-  unsigned SEW = Info.getSEW();
-  unsigned AVLInBits = AVL * SEW;
-
-  unsigned LMul;
-  bool Fractional;
-  std::tie(LMul, Fractional) = RISCVVType::decodeVLMUL(Info.getVLMUL());
-
-  if (Fractional)
-    return ST.getRealMinVLen() / LMul >= AVLInBits;
-  return ST.getRealMinVLen() * LMul >= AVLInBits;
-}
-
 /// Perform simple partial redundancy elimination of the VSETVLI instructions
 /// we're about to insert by looking for cases where we can PRE from the
 /// beginning of one block to the end of one of its predecessors.  Specifically,
 /// this is geared to catch the common case of a fixed length vsetvl in a single
 /// block loop when it could execute once in the preheader instead.
 void RISCVInsertVSETVLI::doPRE(MachineBasicBlock &MBB) {
-  const MachineFunction &MF = *MBB.getParent();
-  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
-
   if (!BlockInfo[MBB.getNumber()].Pred.isUnknown())
     return;
 
@@ -1365,7 +1372,7 @@ void RISCVInsertVSETVLI::doPRE(MachineBasicBlock &MBB) {
     return;
 
   // If VL can be less than AVL, then we can't reduce the frequency of exec.
-  if (!hasFixedResult(AvailableInfo, ST))
+  if (!willVLBeAVL(AvailableInfo, *ST))
     return;
 
   // Model the effect of changing the input state of the block MBB to
@@ -1534,13 +1541,13 @@ void RISCVInsertVSETVLI::insertReadVL(MachineBasicBlock &MBB) {
 
 bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
   // Skip if the vector extension is not enabled.
-  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
-  if (!ST.hasVInstructions())
+  ST = &MF.getSubtarget<RISCVSubtarget>();
+  if (!ST->hasVInstructions())
     return false;
 
   LLVM_DEBUG(dbgs() << "Entering InsertVSETVLI for " << MF.getName() << "\n");
 
-  TII = ST.getInstrInfo();
+  TII = ST->getInstrInfo();
   MRI = &MF.getRegInfo();
 
   assert(BlockInfo.empty() && "Expect empty block infos");
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
index 4aaefb24d5aa279..16ef2bdedae7745 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extload-truncstore.ll
@@ -144,7 +144,7 @@ define <4 x i64> @sextload_v4i8_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf8 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -167,7 +167,7 @@ define <4 x i64> @zextload_v4i8_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf8 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -214,7 +214,7 @@ define <8 x i32> @sextload_v8i8_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -237,7 +237,7 @@ define <8 x i32> @zextload_v8i8_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -264,13 +264,13 @@ define <8 x i64> @sextload_v8i8_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -296,13 +296,13 @@ define <8 x i64> @zextload_v8i8_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -324,7 +324,7 @@ define <16 x i16> @sextload_v16i8_v16i16(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 8
-; LMULMAX1-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -347,7 +347,7 @@ define <16 x i16> @zextload_v16i8_v16i16(ptr %x) {
 ; LMULMAX1-NEXT:    vle8.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 8
-; LMULMAX1-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -374,13 +374,13 @@ define <16 x i32> @sextload_v16i8_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -406,13 +406,13 @@ define <16 x i32> @zextload_v16i8_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -438,29 +438,29 @@ define <16 x i64> @sextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v12, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v13, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v9, v13
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v14, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v13, v14
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v14, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v15, v11
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v16, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf8 v11, v16
 ; LMULMAX1-NEXT:    ret
 ;
@@ -470,7 +470,7 @@ define <16 x i64> @sextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX4-NEXT:    vle8.v v16, (a0)
 ; LMULMAX4-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX4-NEXT:    vslidedown.vi v8, v16, 8
-; LMULMAX4-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; LMULMAX4-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; LMULMAX4-NEXT:    vsext.vf8 v12, v8
 ; LMULMAX4-NEXT:    vsext.vf8 v8, v16
 ; LMULMAX4-NEXT:    ret
@@ -490,29 +490,29 @@ define <16 x i64> @zextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v12, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v13, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v9, v13
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v14, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v13, v14
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v14, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v15, v11
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e8, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v10, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e8, mf4, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v16, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf8 v11, v16
 ; LMULMAX1-NEXT:    ret
 ;
@@ -522,7 +522,7 @@ define <16 x i64> @zextload_v16i8_v16i64(ptr %x) {
 ; LMULMAX4-NEXT:    vle8.v v16, (a0)
 ; LMULMAX4-NEXT:    vsetivli zero, 8, e8, m1, ta, ma
 ; LMULMAX4-NEXT:    vslidedown.vi v8, v16, 8
-; LMULMAX4-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; LMULMAX4-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; LMULMAX4-NEXT:    vzext.vf8 v12, v8
 ; LMULMAX4-NEXT:    vzext.vf8 v8, v16
 ; LMULMAX4-NEXT:    ret
@@ -655,7 +655,7 @@ define <4 x i64> @sextload_v4i16_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -678,7 +678,7 @@ define <4 x i64> @zextload_v4i16_v4i64(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf4 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -713,7 +713,7 @@ define <8 x i32> @sextload_v8i16_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -736,7 +736,7 @@ define <8 x i32> @zextload_v8i16_v8i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v10, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
 ; LMULMAX1-NEXT:    ret
@@ -763,13 +763,13 @@ define <8 x i64> @sextload_v8i16_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -795,13 +795,13 @@ define <8 x i64> @zextload_v8i16_v8i64(ptr %x) {
 ; LMULMAX1-NEXT:    vslidedown.vi v11, v9, 4
 ; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v10, v11
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v11, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v11, v12
-; LMULMAX1-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v12, v9, 2
-; LMULMAX1-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf4 v9, v12
 ; LMULMAX1-NEXT:    ret
 ;
@@ -847,12 +847,12 @@ define <16 x i32> @sextload_v16i16_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v12, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vsext.vf2 v8, v10
-; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v10, v12, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vsext.vf2 v11, v10
 ; LMULMAX1-NEXT:    vsext.vf2 v10, v12
 ; LMULMAX1-NEXT:    ret
@@ -877,12 +877,12 @@ define <16 x i32> @zextload_v16i16_v16i32(ptr %x) {
 ; LMULMAX1-NEXT:    vle16.v v12, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
 ; LMULMAX1-NEXT:    vslidedown.vi v8, v10, 4
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; LMULMAX1-NEXT:    vzext.vf2 v9, v8
 ; LMULMAX1-NEXT:    vzext.vf2 v8, v10
-; LMULMAX1-NEXT:    vsetivli zero, 4, e16, m1, ta, ma
+; LMULMAX1-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; LM...
[truncated]

@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 9399094586aa803fb399ae508e5aa46c8728fdd6 622682a707bc53b5ee411b8d6778b77299832749 -- llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index bf7dd2359d9b..4b3edd237135 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -882,7 +882,6 @@ static bool willVLBeAVL(const VSETVLIInfo &Info, const RISCVSubtarget &ST) {
   return ST.getRealMinVLen() * LMul >= AVLInBits;
 }
 
-
 /// Return true if a vsetvli instruction to change from PrevInfo
 /// to Info might change the VL register.  If this returns false,
 /// the vsetvli can use the X0, X0 form.
@@ -913,8 +912,8 @@ bool RISCVInsertVSETVLI::mayChangeVL(const VSETVLIInfo &Info,
   // if the two AVLs are the same, we know the VLs must also be.  As such,
   // this vsetvli is not changing VL.
   if (Info.hasAVLImm() && PrevInfo.hasAVLImm() &&
-      Info.getAVLImm() == PrevInfo.getAVLImm() &&
-      willVLBeAVL(Info, *ST) && willVLBeAVL(PrevInfo, *ST))
+      Info.getAVLImm() == PrevInfo.getAVLImm() && willVLBeAVL(Info, *ST) &&
+      willVLBeAVL(PrevInfo, *ST))
     return false;
 
   return true;

@@ -406,13 +406,13 @@ define <16 x i32> @zextload_v16i8_v16i32(ptr %x) {
; LMULMAX1-NEXT: vslidedown.vi v11, v9, 8
; LMULMAX1-NEXT: vsetivli zero, 4, e32, m1, ta, ma
; LMULMAX1-NEXT: vzext.vf4 v10, v11
; LMULMAX1-NEXT: vsetivli zero, 4, e8, mf2, ta, ma
; LMULMAX1-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we know that AVL <= VLMAX and so VL won't be changed here, is this still valid per the spec?

When rs1=x0 and rd=x0, the instruction operates as if the current vector length in vl is used as the AVL, and the resulting value is written to vl, but not to a destination register. This form can only be used when VLMAX and hence vl is not actually changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved. Implementations may set vill in this case.

Since we're still changing the SEW/LMUL ratio and thus VLMAX, wouldn't this be enough to satisfy the "Use of the instruction with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved." clause?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are unfortunately correct here. This is an absolutely silly choice for the specification to make. It doesn't disallow an implementation to have the sane semantics here, but you're right that we can't rely on this instruction being valid when VLMAX changes (even if VL doesn't.)

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a couple comments for possible optimizations.

@@ -14,7 +14,7 @@ define <vscale x 8 x i32> @insert_nxv8i32_v2i32_0(<vscale x 8 x i32> %vec, ptr %
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
; CHECK-NEXT: vle32.v v12, (a0)
; CHECK-NEXT: vsetivli zero, 2, e32, m4, tu, ma
; CHECK-NEXT: vsetvli zero, zero, e32, m4, tu, ma
; CHECK-NEXT: vmv.v.v v8, v12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do a tail undisturbed load to avoid the vmv.v.v?

@@ -23,7 +23,7 @@ define void @extract_v2i8_v4i8_2(ptr %x, ptr %y) {
; CHECK-NEXT: vle8.v v8, (a0)
; CHECK-NEXT: vsetivli zero, 2, e8, mf4, ta, ma
; CHECK-NEXT: vslidedown.vi v8, v8, 2
; CHECK-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
; CHECK-NEXT: vsetvli zero, zero, e8, mf8, ta, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could get away with just doing the store with the old vtype?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent more time on this than was warranted, and end up deciding this wasn't worth it. I can do stores specifically, but reasoning about when it's legal to increase and decrease LMUL gets into all kinds of cornercases since we have instructions which aren't regular in VL (i.e. vrgather.vv can read past VL) and LMUL (i.e. widening operations write to a wider source than LMUL would lead you to believe.) I ended up either having to do a very narrow special case, or have to write very subtle predicates for all instructions. I ended up deciding this just isn't worth it for the moment.

@preames
Copy link
Collaborator Author

preames commented Oct 23, 2023

As pointed out by @lukel97 , the transform proposed is invalid per the specification wording. Going to take another look at these cases, but will clearly have to come at it from another angle.

@preames preames closed this Oct 23, 2023
@preames preames deleted the pr-riscv-vsetvli-VTYPE-toggle branch October 24, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants