[RISCV] Reuse VL (if non-zero) when building single element vector fo…

…r start of reduction chain This is an alternative patch on a path to D137530. The basic problem being tackled here is that we need to place a scalar into lane 0 of a vector register before our reduction instructions. Since we only care about lane 0 of the vector, we can use any VL >= 1 provided that the total amount of work performed matches the work performed for a VL=1. This change does not contain the logic from D137530 to perform the insert at the original VT, and then extract down to LMUL1. That turns out to be a good choice, as discussion in this review has indicated there are issues around LMUL2 and above with our representation of vmv.s.x. We'd also need to be careful with the splat logic for the same reasons. The only potentially concerning codegen change I spot here is that we stop using a broadcast load (for VL=1) and instead do a scalar load and insert. I think this is probably reasonable; if reviewers disagree, I can investigate using a broadcast load which writes to the undef lanes. If we want to do that, we should do it for VECTOR_INSERT_ELT as well, so that'll end up as it's own patch series. Differential Revision: https://reviews.llvm.org/D139656
llvm · Dec 13, 2022 · 668cde8 · 668cde8
1 parent 22702cc
commit 668cde8
Show file tree

Hide file tree

Showing 6 changed files with 1,036 additions and 1,116 deletions.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5853,18 +5853,22 @@ static SDValue lowerReductionSeq(unsigned RVVOpcode, MVT ResVT,
   const MVT VecVT = Vec.getSimpleValueType();
   const MVT M1VT = getLMUL1VT(VecVT);
   const MVT XLenVT = Subtarget.getXLenVT();
+  const bool NonZeroAVL = hasNonZeroAVL(VL);
 
   // The reduction needs an LMUL1 input; do the splat at either LMUL1
   // or the original VT if fractional.
   auto InnerVT = VecVT.bitsLE(M1VT) ? VecVT : M1VT;
-  SDValue InitialValue =
-    lowerScalarInsert(StartValue, DAG.getConstant(1, DL, XLenVT),
-                      InnerVT, DL, DAG, Subtarget);
+  // We reuse the VL of the reduction to reduce vsetvli toggles if we can
+  // prove it is non-zero.  For the AVL=0 case, we need the scalar to
+  // be the result of the reduction operation.
+  auto InnerVL = NonZeroAVL ? VL : DAG.getConstant(1, DL, XLenVT);
+  SDValue InitialValue = lowerScalarInsert(StartValue, InnerVL, InnerVT, DL,
+                                           DAG, Subtarget);
   if (M1VT != InnerVT)
     InitialValue = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, M1VT,
                                DAG.getUNDEF(M1VT),
                                InitialValue, DAG.getConstant(0, DL, XLenVT));
-  SDValue PassThru = hasNonZeroAVL(VL) ? DAG.getUNDEF(M1VT) : InitialValue;
+  SDValue PassThru = NonZeroAVL ? DAG.getUNDEF(M1VT) : InitialValue;
   SDValue Reduction = DAG.getNode(RVVOpcode, DL, M1VT, PassThru, Vec,
                                   InitialValue, Mask, VL);
   return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ResVT, Reduction,
@@ -8068,7 +8072,7 @@ static SDValue combineBinOpToReduce(SDNode *N, SelectionDAG &DAG,
       ScalarV.getOpcode() != RISCVISD::VMV_V_X_VL)
     return SDValue();
 
-  if (!isOneConstant(ScalarV.getOperand(2)))
+  if (!hasNonZeroAVL(ScalarV.getOperand(2)))
     return SDValue();
 
   // Check the scalar of ScalarV is neutral element