[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

fhahn · 2024-01-14T20:16:29Z

Add a new recipe to model scalar cast instructions, without relying on an underlying instruction.

This allows creating scalar casts, without relying on an underlying instruction (like the current
VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step
and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as
follow-up.

Add a new recipe to model uniform-per-UF instructions, without relying on an underlying instruction. Initially, it supports uniform cast-ops and is therefore storing the result type. Not relying on an underlying instruction (like the current VPReplicateRecipe) allows to create instances without a corresponding instruction. In the future, to plan is to extend this recipe to handle all opcodes needed to replace the uniform part of VPReplicateRecipe.

llvmbot · 2024-01-14T20:16:59Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Add a new recipe to model uniform-per-UF instructions, without relying on an underlying instruction. Initially, it supports uniform cast-ops and is therefore storing the result type.

Not relying on an underlying instruction (like the current VPReplicateRecipe) allows to create instances without a corresponding instruction.

In the future, to plan is to extend this recipe to handle all opcodes needed to replace the uniform part of VPReplicateRecipe.

Full diff: https://github.com/llvm/llvm-project/pull/78113.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+30)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+5-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+42-7)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+9)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
(modified) llvm/test/Transforms/LoopVectorize/cast-induction.ll (+3-1)
(modified) llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll (+2-1)
(modified) llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll (+1-1)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 4b4f4911eb6415..d5985224cccc48 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1945,6 +1945,36 @@ class VPReplicateRecipe : public VPRecipeWithIRFlags, public VPValue {
   }
 };
 
+/// VPUniformPerUFRecipe represents an instruction with Opcode that is uniform
+/// per UF, i.e. it generates a single scalar instance per UF.
+/// TODO: at the moment, only Cast opcodes are supported, extend to support
+///       missing opcodes to replace uniform part of VPReplicateRecipe.
+class VPUniformPerUFRecipe : public VPRecipeBase, public VPValue {
+  unsigned Opcode;
+
+  /// Result type for the cast.
+  Type *ResultTy;
+
+  Value *generate(VPTransformState &State, unsigned Part);
+
+public:
+  VPUniformPerUFRecipe(Instruction::CastOps Opcode, VPValue *Op, Type *ResultTy)
+      : VPRecipeBase(VPDef::VPUniformPerUFSC, {Op}), VPValue(this),
+        Opcode(Opcode), ResultTy(ResultTy) {}
+
+  ~VPUniformPerUFRecipe() override = default;
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// A recipe for generating conditional branches on the bits of a mask.
 class VPBranchOnMaskRecipe : public VPRecipeBase {
 public:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 97a8a1803bbf5a..d71b0703994450 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
             return V->getUnderlyingValue()->getType();
           })
           .Case<VPWidenCastRecipe>(
-              [](const VPWidenCastRecipe *R) { return R->getResultType(); });
+              [](const VPWidenCastRecipe *R) { return R->getResultType(); })
+          .Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
+            return R->getSCEV()->getType();
+          });
+
   assert(ResultTy && "could not infer type for the given VPValue");
   CachedTypes[V] = ResultTy;
   return ResultTy;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 1f844bce23102e..423504e8f7e05e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -164,6 +164,8 @@ bool VPRecipeBase::mayHaveSideEffects() const {
     auto *R = cast<VPReplicateRecipe>(this);
     return R->getUnderlyingInstr()->mayHaveSideEffects();
   }
+  case VPUniformPerUFSC:
+    return false;
   default:
     return true;
   }
@@ -1117,13 +1119,7 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
 
   // Ensure step has the same type as that of scalar IV.
   Type *BaseIVTy = BaseIV->getType()->getScalarType();
-  if (BaseIVTy != Step->getType()) {
-    // TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
-    // avoid separate truncate here.
-    assert(Step->getType()->isIntegerTy() &&
-           "Truncation requires an integer step");
-    Step = State.Builder.CreateTrunc(Step, BaseIVTy);
-  }
+  assert(BaseIVTy == Step->getType());
 
   // We build scalar steps for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
@@ -1469,6 +1465,45 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+Value *VPUniformPerUFRecipe ::generate(VPTransformState &State, unsigned Part) {
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:
+  case Instruction::Trunc: {
+    Value *Op = State.get(getOperand(0), VPIteration(Part, 0));
+    return State.Builder.CreateCast(Instruction::CastOps(Opcode), Op, ResultTy);
+  }
+  default:
+    llvm_unreachable("opcode not implemented yet");
+  }
+}
+
+void VPUniformPerUFRecipe ::execute(VPTransformState &State) {
+  bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
+    return Op->isDefinedOutsideVectorRegions();
+  });
+  for (unsigned Part = 0; Part != State.UF; ++Part) {
+    Value *Res;
+    // Only generate a single instance, if the recipe is uniform across all UFs.
+    if (Part > 0 && UniformAcrossUFs)
+      Res = State.get(this, VPIteration(0, 0));
+    else
+      Res = generate(State, Part);
+    State.set(this, Res, VPIteration(Part, 0));
+  }
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPUniformPerUFRecipe ::print(raw_ostream &O, const Twine &Indent,
+                                  VPSlotTracker &SlotTracker) const {
+  O << Indent << "UNIFORM-PER-UF ";
+  printAsOperand(O, SlotTracker);
+  O << " = " << Instruction::getOpcodeName(Opcode) << " ";
+  printOperands(O, SlotTracker);
+  O << " to " << *ResultTy;
+}
+#endif
+
 void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
   assert(State.Instance && "Branch on Mask works only on single instance.");
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b3694e74a38509..6ba8901e76aa50 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -505,6 +505,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
     HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
   }
 
+  VPTypeAnalysis TypeInfo(SE.getContext());
+  if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
+    Step = new VPUniformPerUFRecipe(Instruction::Trunc, Step,
+                                    TypeInfo.inferScalarType(BaseIV));
+    auto *VecPreheader =
+        cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+    VecPreheader->appendRecipe(Step->getDefiningRecipe());
+  }
+
   VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
   HeaderVPBB->insert(Steps, IP);
   return Steps;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 8cc98f4abf933e..009edea39a3c43 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -362,6 +362,7 @@ class VPDef {
     // START: Phi-like recipes. Need to be kept together.
     VPBlendSC,
     VPPredInstPHISC,
+    VPUniformPerUFSC,
     // START: SubclassID for recipes that inherit VPHeaderPHIRecipe.
     // VPHeaderPHIRecipe need to be kept together.
     VPCanonicalIVPHISC,
diff --git a/llvm/test/Transforms/LoopVectorize/cast-induction.ll b/llvm/test/Transforms/LoopVectorize/cast-induction.ll
index c5edf9831d7d90..4121a1399c47f5 100644
--- a/llvm/test/Transforms/LoopVectorize/cast-induction.ll
+++ b/llvm/test/Transforms/LoopVectorize/cast-induction.ll
@@ -83,12 +83,14 @@ define void @cast_variable_step(i64 %step) {
 ; VF4: middle.block:
 ;
 ; IC2-LABEL: @cast_variable_step(
+; IC2:   [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
+; IC2:   br label %vector.body
+
 ; IC2-LABEL: vector.body:
 ; IC2-NEXT:   [[INDEX:%.+]] = phi i64 [ 0, %vector.ph ]
 ; IC2-NEXT:   [[MUL:%.+]] = mul i64 %index, %step
 ; IC2-NEXT:   [[OFFSET_IDX:%.+]] = add i64 10, [[MUL]]
 ; IC2-NEXT:   [[TRUNC_OFF:%.+]] = trunc i64 [[OFFSET_IDX]] to i32
-; IC2-NEXT:   [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
 ; IC2-NEXT:   [[STEP0:%.+]] = mul i32 0, [[TRUNC_STEP]]
 ; IC2-NEXT:   [[T0:%.+]] = add i32 [[TRUNC_OFF]], [[STEP0]]
 ; IC2-NEXT:   [[STEP1:%.+]] = mul i32 1, [[TRUNC_STEP]]
diff --git a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
index 297cd2a7c12f9a..6410a556589f94 100644
--- a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
+++ b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
@@ -184,6 +184,7 @@ exit:
 ; DBG-NEXT: No successors
 ; DBG-EMPTY:
 ; DBG-NEXT: vector.ph:
+; DBG-NEXT:   UNIFORM-PER-UF vp<[[CAST:%.+]]> = trunc ir<1> to i32
 ; DBG-NEXT: Successor(s): vector loop
 ; DBG-EMPTY:
 ; DBG-NEXT: <x1> vector loop: {
@@ -191,7 +192,7 @@ exit:
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
 ; DBG-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for> = phi ir<0>, vp<[[SCALAR_STEPS:.+]]>
 ; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV ir<0> + vp<[[CAN_IV]]> * ir<1> (truncated to i32)
-; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, vp<[[CAST]]>
 ; DBG-NEXT:     EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%for>, vp<[[SCALAR_STEPS]]>
 ; DBG-NEXT:     CLONE store vp<[[SPLICE]]>, ir<%dst>
 ; DBG-NEXT:     EMIT vp<[[IV_INC:%.+]]> = add nuw vp<[[CAN_IV]]>, vp<[[VFxUF]]>
diff --git a/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
index ea3de4a0fbb363..f0220f5e766b23 100644
--- a/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
+++ b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
@@ -43,7 +43,7 @@ define void @test(i16 %x, i64 %y, ptr %ptr) {
 ; CHECK-NEXT:    [[V3:%.*]] = add i8 [[V2]], 1
 ; CHECK-NEXT:    [[CMP15:%.*]] = icmp slt i8 [[V3]], 5
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], [[INC]]
-; CHECK-NEXT:    br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-NEXT:    br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       loop.exit:
 ; CHECK-NEXT:    [[DIV_1:%.*]] = udiv i64 [[Y]], [[ADD]]
 ; CHECK-NEXT:    [[V1:%.*]] = add i64 [[DIV_1]], 1

ayalz

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)

Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

ayalz · 2024-01-15T11:46:45Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  // Ensure step has the same type as that of scalar IV.
  Type *BaseIVTy = BaseIV->getType()->getScalarType();
-  if (BaseIVTy != Step->getType()) {
-    // TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
-    // avoid separate truncate here.
-    assert(Step->getType()->isIntegerTy() &&
-           "Truncation requires an integer step");
-    Step = State.Builder.CreateTrunc(Step, BaseIVTy);
-  }
+  assert(BaseIVTy == Step->getType());


nit: fold comment into assert message.

This assert is still missing a message, potentially that of the comment.

Added, thanks!

fhahn · 2024-01-15T14:07:11Z

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)

Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

We could adjust VPWidenCastRecipe to support the uniform-per-UF case by using the existing uniformity checks, but having recipes that support generating both scalar and vector code will make the individual recipes (slightly) more complicated (need extra checks and handle multiple cases in ::execute or cost estimates). Having separate recipes for the widen (vectorizing) and uniform cases would also allow to encode info explicitly and make it easier when reading the textual representation of a VPlan.

It also depends what we want to do with VPReplicateRecipe long-term, which currently models codegen for both replicating and uniform scalar codegen. Breaking up the recipe into 2 separate ones would help improve clarity and simplify codegen. As part of doing so we should try to make sure the new recipes can be created without an underlying IR instruction (which VPReplicateRecipe requires at the moment).

Note that after explicit unrolling lands, the PerUF part will be dropped.

ayalz · 2024-01-21T18:19:51Z

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)
Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

We could adjust VPWidenCastRecipe to support the uniform-per-UF case by using the existing uniformity checks, but having recipes that support generating both scalar and vector code will make the individual recipes (slightly) more complicated (need extra checks and handle multiple cases in ::execute or cost estimates). Having separate recipes for the widen (vectorizing) and uniform cases would also allow to encode info explicitly and make it easier when reading the textual representation of a VPlan.

It also depends what we want to do with VPReplicateRecipe long-term, which currently models codegen for both replicating and uniform scalar codegen. Breaking up the recipe into 2 separate ones would help improve clarity and simplify codegen. As part of doing so we should try to make sure the new recipes can be created without an underlying IR instruction (which VPReplicateRecipe requires at the moment).

Note that after explicit unrolling lands, the PerUF part will be dropped.

The VPUniformPerUFRecipe of the current patch deals only with casts, so best name it as such, at-least for now, and define it comparable to VPWidenCastRecipe. Perhaps VPScalarCastRecipe would work - it effectively serves Invariant values placed outside the loop that are Uniform across all trip-count iterations, not only across the loop's VF and/or UF. Down the road, it should presumably be possible to query whether its VPValue operand is scalar or vector along with its uniformity across VF and/or UF or the entire Trip Count, and use that for cost and code-generation, or for asserting the case of distinct recipes.

How to refactor VPReplicateRecipe, relieving it of relying on an underlying IR instruction, and dealing with uniform cases, deserves further thoughts. Renaming may be adequate when recipe(s) generate scalar instruction(s) in general, rather than "replicate" an existing underlying Instruction. I.e., in contrast to VPReplicateCastRecipe.

One aspect of VPWidenCastRecipe is its destination type. A vector type, currently set during ::execute, can be set as soon as VF is known. I.e., when VPlan's range of VF's holds a single value - either initially or eventually during optimizeForVFAndUF.

Yes, explicit unrolling-by-UF should definitely drop per-UF logic thereby simplifying recipe::execute overall. Similar unrolling-by-VF could simplify Replicating Region logic.

fhahn · 2024-01-21T21:16:05Z

The VPUniformPerUFRecipe of the current patch deals only with casts, so best name it as such, at-least for now, and define it comparable to VPWidenCastRecipe. Perhaps VPScalarCastRecipe would work - it effectively serves Invariant values placed outside the loop that are Uniform across all trip-count iterations, not only across the loop's VF and/or UF. Down the road, it should presumably be possible to query whether its VPValue operand is scalar or vector along with its uniformity across VF and/or UF or the entire Trip Count, and use that for cost and code-generation, or for asserting the case of distinct recipes.

Thanks, for now I updated the naming to reflect the cast-only case. Codegen is only implemented for the cases where only the first lane is used (as per isOnlyFirstLaneUsed check).

How to refactor VPReplicateRecipe, relieving it of relying on an underlying IR instruction, and dealing with uniform cases, deserves further thoughts. Renaming may be adequate when recipe(s) generate scalar instruction(s) in general, rather than "replicate" an existing underlying Instruction. I.e., in contrast to VPReplicateCastRecipe.

Sounds good!

One aspect of VPWidenCastRecipe is its destination type. A vector type, currently set during ::execute, can be set as soon as VF is known. I.e., when VPlan's range of VF's holds a single value - either initially or eventually during optimizeForVFAndUF.

Yes, that may be another option down the line.

ayalz · 2024-01-22T06:47:29Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
  Type *getResultType() const { return ResultTy; }
 };

+/// VPScalarCastRecipe is a recipe o create scalar cast instructions.


Suggested change

/// VPScalarCastRecipe is a recipe o create scalar cast instructions.

/// VPScalarCastRecipe is a recipe to create scalar cast instructions.

Fixed, thanks!

ayalz · 2024-01-22T06:47:48Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
  Type *getResultType() const { return ResultTy; }
 };

+/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
+class VPScalarCastRecipe : public VPRecipeBase, public VPValue {


VPSingleDefRecipe ?

Adjusted, thanks!

ayalz · 2024-01-22T06:48:53Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
  Type *getResultType() const { return ResultTy; }
 };

+/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
+class VPScalarCastRecipe : public VPRecipeBase, public VPValue {
+  /// Cast instruction opcode.


nit: comment adds little knowledge if any.

Dropped, thanks!

ayalz · 2024-01-22T06:49:06Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  /// Cast instruction opcode.
+  Instruction::CastOps Opcode;
+
+  /// Result type for the cast.


nit: comment adds little knowledge if any.

Removed, thanks!

ayalz · 2024-01-22T06:49:37Z

llvm/lib/Transforms/Vectorize/VPlan.h

+
+  ~VPScalarCastRecipe() override = default;
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)


VPWidenIntOrFpInductionSC ?

Fixed, thanks!

ayalz · 2024-01-22T06:55:56Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -1469,6 +1465,47 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif

+Value *VPScalarCastRecipe ::generate(VPTransformState &State, unsigned Part) {
+  assert(vputils::onlyFirstLaneUsed(this) &&
+         "Codegen only implemented for first lane only.");


Can simplify into an assert that the Opcode is SExt, ZExt or Trunc.

The implementation here is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

ayalz · 2024-01-22T06:56:39Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -1469,6 +1465,47 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif

+Value *VPScalarCastRecipe ::generate(VPTransformState &State, unsigned Part) {
+  assert(vputils::onlyFirstLaneUsed(this) &&
+         "Codegen only implemented for first lane only.");


Suggested change

"Codegen only implemented for first lane only.");

"Codegen only implemented for first lane.");

(only need a single only)

Dropped second only, thanks!

ayalz · 2024-01-22T07:00:13Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  });
+  for (unsigned Part = 0; Part != State.UF; ++Part) {
+    Value *Res;
+    // Only generate a single instance, if the recipe is uniform across all UFs.


Only a single instance is going to be generated, in the preheader? Recipe is uniform across all UFs by (current) definition (and even more so - invariant). Can assert UniformAcrossUFs holds, along with onlyFirstLaneUsed.

The implementation here is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

This is currently untested / dead code.

The value returned by an EVL recipe should arguably be of the same type as that of its first operand, even if the underlying intrinsic it currently wraps insists on always returning an i32? I.e., that case should arguably be fixed by performing a trunc/sext/zext as needed when providing the adequate EVL value given a current Phi/remaining-trip-count values, to facilitate adding/subtracting them together.

Updated version of the patch now also uses it for truncating derived IV/canonical IV if needed. Those are still uniform across VFs & UFs, but not invariant.

I suppose EVL at the moment can also be considered uniform across VFs and UFs.

Both the if/else sides should be executed by tests.

ayalz · 2024-01-22T07:02:48Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
    HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
  }

+  VPTypeAnalysis TypeInfo(SE.getContext());
+  if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
+    Step = new VPScalarCastRecipe(Instruction::Trunc, Step,


Does this recipe only generate Trunc? No need to hold an Opcode, can be renamed VPScalarTruncRecipe. Assert that the type sizes correspond to Trunc?

Yes, here the only use-case is Trunc, but the implementation of the recipe is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

Assert that the type sizes correspond to Trunc?

Added, thanks!

ayalz · 2024-01-22T07:03:32Z

llvm/lib/Transforms/Vectorize/VPlanValue.h

@@ -362,6 +362,7 @@ class VPDef {
    // START: Phi-like recipes. Need to be kept together.
    VPBlendSC,
    VPPredInstPHISC,
+    VPScalarCastSC,


Placed inside Phi-like recipes interval?

Moved, thanks!

github-actions · 2024-01-22T12:53:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ayalz · 2024-01-22T14:13:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2297,6 +2297,7 @@ emitTransformedIndex(IRBuilderBase &B, Value *Index, Value *StartValue,
                           ? B.CreateSExtOrTrunc(Index, StepTy)
                           : B.CreateCast(Instruction::SIToFP, Index, StepTy);
  if (CastedIndex != Index) {
+    assert(!isa<SExtInst>(CastedIndex));


nit: error message missing.

This was left over from testing, removed!

ayalz · 2024-01-22T15:14:46Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+}
+
+void VPScalarCastRecipe ::execute(VPTransformState &State) {
+  bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {


Suggested change

bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {

bool LoopInvariant = all_of(operands(), [](VPValue *Op) {

this checks more than uniformity across UFs.

And if it is invariance we're checking, we can alternatively check if the recipe itself resides outside vector regions (e.g., has been LICM'd), rather than all of its operands.

The latest version extends the patch to use it also to truncate VPDerivedIVRecipe, if needed. For that, it also checks if the operands are VPDerivedIV/VPCanonicalIVPHIs, which should be considered uniform across all VFs & UFs, unless I am missing something.

ayalz · 2024-01-22T15:22:04Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  });
+  for (unsigned Part = 0; Part != State.UF; ++Part) {
+    Value *Res;
+    // Only generate a single instance, if the recipe is uniform across all UFs.


This is currently untested / dead code.

The value returned by an EVL recipe should arguably be of the same type as that of its first operand, even if the underlying intrinsic it currently wraps insists on always returning an i32? I.e., that case should arguably be fixed by performing a trunc/sext/zext as needed when providing the adequate EVL value given a current Phi/remaining-trip-count values, to facilitate adding/subtracting them together.

ayalz · 2024-01-22T15:26:29Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
    HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
  }

+  VPTypeAnalysis TypeInfo(SE.getContext());
+  if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
+    Step = new VPScalarCastRecipe(Instruction::Trunc, Step,


Assert that the type sizes correspond to Trunc?

fhahn

Addressed latest comments, thanks! Also updated the code to use the new recipe to truncate VPDerivedIVRecipe if needed.

fhahn · 2024-01-22T21:10:30Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2297,6 +2297,7 @@ emitTransformedIndex(IRBuilderBase &B, Value *Index, Value *StartValue,
                           ? B.CreateSExtOrTrunc(Index, StepTy)
                           : B.CreateCast(Instruction::SIToFP, Index, StepTy);
  if (CastedIndex != Index) {
+    assert(!isa<SExtInst>(CastedIndex));


This was left over from testing, removed!

fhahn · 2024-01-22T21:11:27Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  /// Cast instruction opcode.
+  Instruction::CastOps Opcode;
+
+  /// Result type for the cast.


Removed, thanks!

fhahn · 2024-01-22T21:13:02Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+}
+
+void VPScalarCastRecipe ::execute(VPTransformState &State) {
+  bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {


The latest version extends the patch to use it also to truncate VPDerivedIVRecipe, if needed. For that, it also checks if the operands are VPDerivedIV/VPCanonicalIVPHIs, which should be considered uniform across all VFs & UFs, unless I am missing something.

fhahn · 2024-01-22T21:55:54Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  });
+  for (unsigned Part = 0; Part != State.UF; ++Part) {
+    Value *Res;
+    // Only generate a single instance, if the recipe is uniform across all UFs.


Updated version of the patch now also uses it for truncating derived IV/canonical IV if needed. Those are still uniform across VFs & UFs, but not invariant.

I suppose EVL at the moment can also be considered uniform across VFs and UFs.

Both the if/else sides should be executed by tests.

fhahn · 2024-01-22T21:58:56Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
    HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
  }

+  VPTypeAnalysis TypeInfo(SE.getContext());
+  if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
+    Step = new VPScalarCastRecipe(Instruction::Trunc, Step,


Added, thanks!

ayalz

Nice reuse and clean-up! Title and commit message deserve an update.

ayalz · 2024-01-23T11:56:24Z

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
            return V->getUnderlyingValue()->getType();
          })
          .Case<VPWidenCastRecipe>(
-              [](const VPWidenCastRecipe *R) { return R->getResultType(); });
+              [](const VPWidenCastRecipe *R) { return R->getResultType(); })
+          .Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {


So now that VPDerivedIVRecipe simply mirrors the type of its first start-value operand, should it join VPCanonicalIVPHIRecipe et al.'s or VPPredInstPHIRecipe et al.'s cases above and remove its getScalarType() method? Possibly as a follow-up.

Should VPWidenIntOrFpInductionRecipe's trunc also utilize VPScalarCastRecipe, and retire its getScalarType(), possibly as a follow-up?

ayalz · 2024-01-23T12:08:25Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -1469,6 +1461,52 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif

+static bool isUniformAcrossVFsAndUFs(VPScalarCastRecipe *C) {


Worth documenting, potentially leaving behind a note that uniformity should essentially be associated with VPValues.

Added comment + TODO, thanks!

ayalz · 2024-01-23T12:09:42Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+         "Codegen only implemented for first lane.");
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:


Only Trunc is currently being used, still.

Yep, could remove if preferred, but State.Builder.CreateCast generically supports any cast

Would be good to either remove or make a note that SExt and ZExt are currently unused/dead cases.

Added a note, thanks!

ayalz · 2024-01-24T22:29:44Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -498,10 +498,34 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
  VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
  Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
  VPValue *BaseIV = CanonicalIV;


Can we first check if Start and Step are canonical, regardless of types, in order to introduce a VPDerivedIVRecipe, i.e., remove the last Type parameter from isCanonical():

VPSingleDefRecipe *BaseIV = CanonicalIV; // If the induction needs transforming besides truncating, create a // VPDerivedIVRecipe. if (!CanonicalIV->isCanonical(ID.getKind(), StartV, Step)) { BaseIV = new VPDerivedIVRecipe(ID, StartV, CanonicalIV, Step); HeaderVPBB->insert(BaseIV, IP); }

then check if BaseIV needs to be truncated:

VPTypeAnalysis TypeInfo(SE.getContext()); Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV); if (TruncI && TruncI->getType() != BaseIVTy) { Type *TruncTy = TruncI->getType(); assert(BaseIVTy->getScalarSizeInBits() > TruncTy->getScalarSizeInBits() && BaseIVTy->isIntegerTy() && "Truncation requires an integer step"); BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy); BaseIVTy = TruncTy; HeaderVPBB->insert(BaseIV, IP); }

and finally check if Step needs to be truncated:

Type *StepTy = TypeInfo.inferScalarType(Step); if (BaseIVTy != StepTy) { assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() && StepTy->isIntegerTy() && "Not truncating."); Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy); auto *VecPreheader = cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor()); VecPreheader->appendRecipe(Step->getDefiningRecipe()); }

before creating, inserting and returning Steps? Is IVTy needed?

Updated, thanks! This works well now with the refactoring in this patch. IVTy removed.

fhahn

Comments should be addressed and title & description updated, thanks!

fhahn · 2024-01-25T11:06:30Z

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
            return V->getUnderlyingValue()->getType();
          })
          .Case<VPWidenCastRecipe>(
-              [](const VPWidenCastRecipe *R) { return R->getResultType(); });
+              [](const VPWidenCastRecipe *R) { return R->getResultType(); })
+          .Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {


VPDerivedIVRecipe getScalarType would be good to remove, will do as follow-up.

VPWidenIntOrFpInductionRecipe's trunc drives truncating the start value and step, effectively creating a narrow phi. I'll check to see if this can also be modeled with the new recipe.

fhahn · 2024-01-25T11:07:37Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+         "Codegen only implemented for first lane.");
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:


Yep, could remove if preferred, but State.Builder.CreateCast generically supports any cast

fhahn · 2024-01-25T11:11:58Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -1469,6 +1461,52 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif

+static bool isUniformAcrossVFsAndUFs(VPScalarCastRecipe *C) {


Added comment + TODO, thanks!

fhahn · 2024-01-25T11:20:43Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -498,10 +498,34 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
  VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
  Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
  VPValue *BaseIV = CanonicalIV;


Updated, thanks! This works well now with the refactoring in this patch. IVTy removed.

ayalz

Looks good to me, adding some minor suggestions.

ayalz · 2024-01-25T14:45:34Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -859,6 +859,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
    case VPRecipeBase::VPWidenIntOrFpInductionSC:
    case VPRecipeBase::VPWidenPointerInductionSC:
    case VPRecipeBase::VPReductionPHISC:
+    case VPRecipeBase::VPScalarCastSC:


nit (independent of this patch): would be good to keep in order.

Yep, will do separately.

ayalz · 2024-01-25T14:52:12Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+         "Codegen only implemented for first lane.");
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:


Would be good to either remove or make a note that SExt and ZExt are currently unused/dead cases.

ayalz · 2024-01-25T14:55:37Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
  auto IP = HeaderVPBB->getFirstNonPhi();
  VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
-  Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
  VPValue *BaseIV = CanonicalIV;


nit: defining BaseIV as a VPSingleDefRecipe* rather than VPValue* will allow removal of ->getDefiningRecipe().

Done, thanks!

ayalz · 2024-01-25T14:58:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
+    HeaderVPBB->insert(T, IP);
+    BaseIV = T;


Suggested change

auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);

HeaderVPBB->insert(T, IP);

BaseIV = T;

ResultTy = TruncTy;

BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, ResultTy);

HeaderVPBB->insert(BaseIV, IP);

?

There's no real need in T, and more consistent to simply redefine BaseIV = new ... (..., BaseIV, ...), again, as in the proposal above?

Adjusted, thanks!

ayalz · 2024-01-25T15:34:03Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  VPTypeAnalysis TypeInfo(SE.getContext());
+  Type *StepTy = TypeInfo.inferScalarType(Step);


Suggested change

VPTypeAnalysis TypeInfo(SE.getContext());

Type *StepTy = TypeInfo.inferScalarType(Step);

sunk, thanks!

ayalz · 2024-01-25T15:35:40Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

    HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
  }

+  // Truncate base induction if needed.


VPTypeAnalysis TypeInfo(SE.getContext()); Type *ResultTy = TypeInfo.inferScalarType(BaseIV);

both BaseIV and Step are subject to redefinition and truncation, perhaps aim to define the type of the result which they may truncate to.

Done, thanks! (missed originally)

ayalz · 2024-01-25T15:37:34Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
+               TruncTy->getScalarSizeInBits() &&
+           StepTy->isIntegerTy() && "Truncation requires an integer step");


Suggested change

assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >

TruncTy->getScalarSizeInBits() &&

StepTy->isIntegerTy() && "Truncation requires an integer step");

assert(ResultTy->getScalarSizeInBits() >

TruncTy->getScalarSizeInBits() &&

ResultTy->isIntegerTy() && "Truncation requires an integer type");

Updated and also separate into 2 separate asserts.

ayalz · 2024-01-25T15:43:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  }
+
+  // Truncate step if needed.
+  Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);


Suggested change

Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);

Type *StepTy = TypeInfo.inferScalarType(Step);

Moved, thanks!

ayalz · 2024-01-25T15:44:54Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (BaseIVTy != StepTy) {
+    assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
+           "Not truncating.");


Suggested change

if (BaseIVTy != StepTy) {

assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&

"Not truncating.");

if (StepTy != ResultTy) {

assert(StepTy->getScalarSizeInBits() > ResultTy->getScalarSizeInBits() &&

StepTy->isIntegerTy() && "Truncation requires an integer step");

Renamed thanks!

ayalz · 2024-01-25T15:45:19Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
+           "Not truncating.");
+
+    Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);


Suggested change

Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);

Step = new VPScalarCastRecipe(Instruction::Trunc, Step, ResultTy);

Renamed, thanks!

fhahn

Address latest comments, thanks!

fhahn · 2024-01-25T17:50:18Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+         "Codegen only implemented for first lane.");
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:


Added a note, thanks!

fhahn · 2024-01-25T17:50:34Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -859,6 +859,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
    case VPRecipeBase::VPWidenIntOrFpInductionSC:
    case VPRecipeBase::VPWidenPointerInductionSC:
    case VPRecipeBase::VPReductionPHISC:
+    case VPRecipeBase::VPScalarCastSC:


Yep, will do separately.

fhahn · 2024-01-25T17:52:15Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
  auto IP = HeaderVPBB->getFirstNonPhi();
  VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
-  Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
  VPValue *BaseIV = CanonicalIV;


Done, thanks!

fhahn · 2024-01-25T17:54:56Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
+           "Not truncating.");
+
+    Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);


Renamed, thanks!

fhahn · 2024-01-25T17:55:36Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (BaseIVTy != StepTy) {
+    assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
+           "Not truncating.");


Renamed thanks!

fhahn · 2024-01-25T17:55:45Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  }
+
+  // Truncate step if needed.
+  Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);


Moved, thanks!

fhahn · 2024-01-25T17:56:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  VPTypeAnalysis TypeInfo(SE.getContext());
+  Type *StepTy = TypeInfo.inferScalarType(Step);


sunk, thanks!

ayalz

Some comments still seem relevant, trying to further clarify.

ayalz · 2024-01-25T22:06:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  // Truncate base induction if needed.
+  VPTypeAnalysis TypeInfo(SE.getContext());
+  Type *StepTy = TypeInfo.inferScalarType(Step);


Suggested change

Type *StepTy = TypeInfo.inferScalarType(Step);

Suffice to define StepTy later, when dealing with the truncation of Step. Here we're dealing with truncating BaseIV, and may be good to define its current/original type instead (using "BaseIVTy" may be confusing or deserves updating due to changing BaseIV and its type).

Updated, thanks!

ayalz · 2024-01-25T22:08:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    Type *TruncTy = TruncI->getType();
+    assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
+               TruncTy->getScalarSizeInBits() &&
+           StepTy->isIntegerTy() && "Truncation requires an integer step");


Suggested change

StepTy->isIntegerTy() && "Truncation requires an integer step");

TypeInfo.inferScalarType(BaseIV)->isIntegerTy() && "Truncation requires an integer");

this should assert that the truncation of BaseIV is valid, independent of Step. As in the proposal above, which uses ResultTy.

Updated, thanks!

ayalz · 2024-01-25T22:11:48Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
+    HeaderVPBB->insert(T, IP);
+    BaseIV = T;


There's no real need in T, and more consistent to simply redefine BaseIV = new ... (..., BaseIV, ...), again, as in the proposal above?

ayalz · 2024-01-25T22:14:07Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  }
+
+  // Truncate step if needed.
+  Type *ResultTy = TypeInfo.inferScalarType(BaseIV);


Define StepTy here?

Moved, thanks!

fhahn

Missed comments should be addressed now, seems I missed them in the GitHub UI somehow

fhahn · 2024-01-25T22:58:58Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

    HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
  }

+  // Truncate base induction if needed.


Done, thanks! (missed originally)

fhahn · 2024-01-25T23:01:37Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  // Truncate base induction if needed.
+  VPTypeAnalysis TypeInfo(SE.getContext());
+  Type *StepTy = TypeInfo.inferScalarType(Step);


Updated, thanks!

fhahn · 2024-01-25T23:02:01Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
+               TruncTy->getScalarSizeInBits() &&
+           StepTy->isIntegerTy() && "Truncation requires an integer step");


Updated and also separate into 2 separate asserts.

fhahn · 2024-01-25T23:03:15Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
+    HeaderVPBB->insert(T, IP);
+    BaseIV = T;


Adjusted, thanks!

fhahn · 2024-01-25T23:03:22Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  }
+
+  // Truncate step if needed.
+  Type *ResultTy = TypeInfo.inferScalarType(BaseIV);


Moved, thanks!

fhahn · 2024-01-25T23:07:19Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    Type *TruncTy = TruncI->getType();
+    assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
+               TruncTy->getScalarSizeInBits() &&
+           StepTy->isIntegerTy() && "Truncation requires an integer step");


Updated, thanks!

ayalz

Missing an update of ResultTy, and possibly some test(s)?

ayalz · 2024-01-25T23:19:19Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    assert(ResultTy->getScalarSizeInBits() > TruncTy->getScalarSizeInBits() &&
+           "Not truncating.");
+    assert(ResultTy->isIntegerTy() && "Truncation requires an integer type");
+    BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);


Ah, ResultTy should be reset to TruncTy here, so that Step will be truncated to the updated type of BaseIV, rather than its original type. (Or reset right after the asserts, so that both truncations use ResultTy.) Wonder about test coverage if all pass?

Argh, sent the update too early (or too late in the day)! Should be fixed now., there were a number of test failures.

ayalz

Looks good to me, thanks!
Adding a last minor nit.

ayalz · 2024-01-26T10:18:36Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    auto *VecPreheader =
+        cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());


Suggested change

auto *VecPreheader =

cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());

auto *VecPreheader =

cast<VPBasicBlock>(HeaderVPBB->getSingleHierarchicalPredecessor());

?

Done, thanks!

fhahn requested review from ayalz and aniragil January 14, 2024 20:16

llvmbot added vectorization llvm:transforms labels Jan 14, 2024

ayalz reviewed Jan 15, 2024

View reviewed changes

fhahn mentioned this pull request Jan 16, 2024

[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. #76172

Merged

fhahn added 2 commits January 21, 2024 19:55

Merge branch 'main' into vpuniformforuf-recipe

f5f961e

!fixup specialize to VPScalarCastRecipe for now.

6b3e52e

ayalz reviewed Jan 22, 2024

View reviewed changes

!fixup address latest comments, thanks!

9331a45

!fixup fix formatting

9988d78

ayalz reviewed Jan 22, 2024

View reviewed changes

fhahn added 2 commits January 22, 2024 16:12

Merge branch 'main' into vpuniformforuf-recipe

fe5ec9e

!fixup address comments, use to truncate VPDerivedIVRecipe.

5500bdb

fhahn commented Jan 22, 2024

View reviewed changes

ayalz reviewed Jan 24, 2024

View reviewed changes

Merge branch 'main' into vpuniformforuf-recipe

18cd7d4

fhahn changed the title ~~[VPlan] Add new VPUniformPerUFRecipe, use for step truncation.~~ [VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. Jan 25, 2024

!fixup address latest comments, thanks!

f1f1eff

fhahn commented Jan 25, 2024

View reviewed changes

ayalz reviewed Jan 25, 2024

View reviewed changes

!fixup address latest comments, thanks!

d382232

fhahn commented Jan 25, 2024

View reviewed changes

ayalz reviewed Jan 25, 2024

View reviewed changes

!fixup address missed comments

6efa577

fhahn commented Jan 25, 2024

View reviewed changes

Merge branch 'main' into vpuniformforuf-recipe

e756798

ayalz reviewed Jan 25, 2024

View reviewed changes

!Fixup update ResultTy.

3c31111

ayalz approved these changes Jan 26, 2024

View reviewed changes

fhahn added 2 commits January 26, 2024 10:32

Merge branch 'main' into vpuniformforuf-recipe

dc7990d

!fixup use getSingleHierarchicalPredecessor.

adc5441

fhahn merged commit 0ab539f into llvm:main Jan 26, 2024
3 of 4 checks passed

fhahn deleted the vpuniformforuf-recipe branch January 26, 2024 11:13

	/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
	/// VPScalarCastRecipe is a recipe to create scalar cast instructions.


		~VPScalarCastRecipe() override = default;

		VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)

	"Codegen only implemented for first lane only.");
	"Codegen only implemented for first lane.");

	bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
	bool LoopInvariant = all_of(operands(), [](VPValue *Op) {

		VPTypeAnalysis TypeInfo(SE.getContext());
		Type *StepTy = TypeInfo.inferScalarType(Step);

	Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);
	Type *StepTy = TypeInfo.inferScalarType(Step);

	Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);
	Step = new VPScalarCastRecipe(Instruction::Trunc, Step, ResultTy);

	StepTy->isIntegerTy() && "Truncation requires an integer step");
	TypeInfo.inferScalarType(BaseIV)->isIntegerTy() && "Truncation requires an integer");

		auto *VecPreheader =
		cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());

[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

Conversation

fhahn commented Jan 14, 2024 • edited

llvmbot commented Jan 14, 2024

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Jan 15, 2024

ayalz commented Jan 21, 2024 • edited

fhahn commented Jan 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 22, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Jan 14, 2024 •

edited

ayalz commented Jan 21, 2024 •

edited

github-actions bot commented Jan 22, 2024 •

edited