Skip to content

Commit

Permalink
[LV] Allow scalable vectorization with vscale = 1
Browse files Browse the repository at this point in the history
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
  • Loading branch information
preames committed Jun 27, 2022
1 parent d2dad62 commit 20dd329
Show file tree
Hide file tree
Showing 2 changed files with 256 additions and 124 deletions.
15 changes: 11 additions & 4 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Expand Up @@ -6712,10 +6712,17 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,

bool TypeNotScalarized = false;
if (VF.isVector() && VectorTy->isVectorTy()) {
unsigned NumParts = TTI.getNumberOfParts(VectorTy);
if (NumParts)
TypeNotScalarized = NumParts < VF.getKnownMinValue();
else
if (unsigned NumParts = TTI.getNumberOfParts(VectorTy)) {
if (VF.isScalable())
// <vscale x 1 x iN> is assumed to be profitable over iN because
// scalable registers are a distinct register class from scalar ones.
// If we ever find a target which wants to lower scalable vectors
// back to scalars, we'll need to update this code to explicitly
// ask TTI about the register class uses for each part.
TypeNotScalarized = NumParts <= VF.getKnownMinValue();
else
TypeNotScalarized = NumParts < VF.getKnownMinValue();
} else
C = InstructionCost::getInvalid();
}
return VectorizationCostTy(C, TypeNotScalarized);
Expand Down

0 comments on commit 20dd329

Please sign in to comment.