Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. #78113

Merged
merged 15 commits into from
Jan 26, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 14, 2024

Add a new recipe to model scalar cast instructions, without relying on an underlying instruction.

This allows creating scalar casts, without relying on an underlying instruction (like the current
VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step
and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as
follow-up.

Add a new recipe to model uniform-per-UF instructions, without relying
on an underlying instruction. Initially, it supports uniform cast-ops
and is therefore storing the result type.

Not relying on an underlying instruction (like the current
VPReplicateRecipe) allows to create instances without a corresponding
instruction.

In the future, to plan is to extend this recipe to handle all opcodes
needed to replace the uniform part of VPReplicateRecipe.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 14, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Add a new recipe to model uniform-per-UF instructions, without relying on an underlying instruction. Initially, it supports uniform cast-ops and is therefore storing the result type.

Not relying on an underlying instruction (like the current VPReplicateRecipe) allows to create instances without a corresponding instruction.

In the future, to plan is to extend this recipe to handle all opcodes needed to replace the uniform part of VPReplicateRecipe.


Full diff: https://github.com/llvm/llvm-project/pull/78113.diff

8 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+30)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+5-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+42-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+9)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
  • (modified) llvm/test/Transforms/LoopVectorize/cast-induction.ll (+3-1)
  • (modified) llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll (+1-1)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 4b4f4911eb6415..d5985224cccc48 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1945,6 +1945,36 @@ class VPReplicateRecipe : public VPRecipeWithIRFlags, public VPValue {
   }
 };
 
+/// VPUniformPerUFRecipe represents an instruction with Opcode that is uniform
+/// per UF, i.e. it generates a single scalar instance per UF.
+/// TODO: at the moment, only Cast opcodes are supported, extend to support
+///       missing opcodes to replace uniform part of VPReplicateRecipe.
+class VPUniformPerUFRecipe : public VPRecipeBase, public VPValue {
+  unsigned Opcode;
+
+  /// Result type for the cast.
+  Type *ResultTy;
+
+  Value *generate(VPTransformState &State, unsigned Part);
+
+public:
+  VPUniformPerUFRecipe(Instruction::CastOps Opcode, VPValue *Op, Type *ResultTy)
+      : VPRecipeBase(VPDef::VPUniformPerUFSC, {Op}), VPValue(this),
+        Opcode(Opcode), ResultTy(ResultTy) {}
+
+  ~VPUniformPerUFRecipe() override = default;
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
 /// A recipe for generating conditional branches on the bits of a mask.
 class VPBranchOnMaskRecipe : public VPRecipeBase {
 public:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 97a8a1803bbf5a..d71b0703994450 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
             return V->getUnderlyingValue()->getType();
           })
           .Case<VPWidenCastRecipe>(
-              [](const VPWidenCastRecipe *R) { return R->getResultType(); });
+              [](const VPWidenCastRecipe *R) { return R->getResultType(); })
+          .Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
+            return R->getSCEV()->getType();
+          });
+
   assert(ResultTy && "could not infer type for the given VPValue");
   CachedTypes[V] = ResultTy;
   return ResultTy;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 1f844bce23102e..423504e8f7e05e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -164,6 +164,8 @@ bool VPRecipeBase::mayHaveSideEffects() const {
     auto *R = cast<VPReplicateRecipe>(this);
     return R->getUnderlyingInstr()->mayHaveSideEffects();
   }
+  case VPUniformPerUFSC:
+    return false;
   default:
     return true;
   }
@@ -1117,13 +1119,7 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
 
   // Ensure step has the same type as that of scalar IV.
   Type *BaseIVTy = BaseIV->getType()->getScalarType();
-  if (BaseIVTy != Step->getType()) {
-    // TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
-    // avoid separate truncate here.
-    assert(Step->getType()->isIntegerTy() &&
-           "Truncation requires an integer step");
-    Step = State.Builder.CreateTrunc(Step, BaseIVTy);
-  }
+  assert(BaseIVTy == Step->getType());
 
   // We build scalar steps for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
@@ -1469,6 +1465,45 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+Value *VPUniformPerUFRecipe ::generate(VPTransformState &State, unsigned Part) {
+  switch (Opcode) {
+  case Instruction::SExt:
+  case Instruction::ZExt:
+  case Instruction::Trunc: {
+    Value *Op = State.get(getOperand(0), VPIteration(Part, 0));
+    return State.Builder.CreateCast(Instruction::CastOps(Opcode), Op, ResultTy);
+  }
+  default:
+    llvm_unreachable("opcode not implemented yet");
+  }
+}
+
+void VPUniformPerUFRecipe ::execute(VPTransformState &State) {
+  bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
+    return Op->isDefinedOutsideVectorRegions();
+  });
+  for (unsigned Part = 0; Part != State.UF; ++Part) {
+    Value *Res;
+    // Only generate a single instance, if the recipe is uniform across all UFs.
+    if (Part > 0 && UniformAcrossUFs)
+      Res = State.get(this, VPIteration(0, 0));
+    else
+      Res = generate(State, Part);
+    State.set(this, Res, VPIteration(Part, 0));
+  }
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPUniformPerUFRecipe ::print(raw_ostream &O, const Twine &Indent,
+                                  VPSlotTracker &SlotTracker) const {
+  O << Indent << "UNIFORM-PER-UF ";
+  printAsOperand(O, SlotTracker);
+  O << " = " << Instruction::getOpcodeName(Opcode) << " ";
+  printOperands(O, SlotTracker);
+  O << " to " << *ResultTy;
+}
+#endif
+
 void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
   assert(State.Instance && "Branch on Mask works only on single instance.");
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b3694e74a38509..6ba8901e76aa50 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -505,6 +505,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
     HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
   }
 
+  VPTypeAnalysis TypeInfo(SE.getContext());
+  if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
+    Step = new VPUniformPerUFRecipe(Instruction::Trunc, Step,
+                                    TypeInfo.inferScalarType(BaseIV));
+    auto *VecPreheader =
+        cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+    VecPreheader->appendRecipe(Step->getDefiningRecipe());
+  }
+
   VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
   HeaderVPBB->insert(Steps, IP);
   return Steps;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 8cc98f4abf933e..009edea39a3c43 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -362,6 +362,7 @@ class VPDef {
     // START: Phi-like recipes. Need to be kept together.
     VPBlendSC,
     VPPredInstPHISC,
+    VPUniformPerUFSC,
     // START: SubclassID for recipes that inherit VPHeaderPHIRecipe.
     // VPHeaderPHIRecipe need to be kept together.
     VPCanonicalIVPHISC,
diff --git a/llvm/test/Transforms/LoopVectorize/cast-induction.ll b/llvm/test/Transforms/LoopVectorize/cast-induction.ll
index c5edf9831d7d90..4121a1399c47f5 100644
--- a/llvm/test/Transforms/LoopVectorize/cast-induction.ll
+++ b/llvm/test/Transforms/LoopVectorize/cast-induction.ll
@@ -83,12 +83,14 @@ define void @cast_variable_step(i64 %step) {
 ; VF4: middle.block:
 ;
 ; IC2-LABEL: @cast_variable_step(
+; IC2:   [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
+; IC2:   br label %vector.body
+
 ; IC2-LABEL: vector.body:
 ; IC2-NEXT:   [[INDEX:%.+]] = phi i64 [ 0, %vector.ph ]
 ; IC2-NEXT:   [[MUL:%.+]] = mul i64 %index, %step
 ; IC2-NEXT:   [[OFFSET_IDX:%.+]] = add i64 10, [[MUL]]
 ; IC2-NEXT:   [[TRUNC_OFF:%.+]] = trunc i64 [[OFFSET_IDX]] to i32
-; IC2-NEXT:   [[TRUNC_STEP:%.+]] = trunc i64 %step to i32
 ; IC2-NEXT:   [[STEP0:%.+]] = mul i32 0, [[TRUNC_STEP]]
 ; IC2-NEXT:   [[T0:%.+]] = add i32 [[TRUNC_OFF]], [[STEP0]]
 ; IC2-NEXT:   [[STEP1:%.+]] = mul i32 1, [[TRUNC_STEP]]
diff --git a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
index 297cd2a7c12f9a..6410a556589f94 100644
--- a/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
+++ b/llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
@@ -184,6 +184,7 @@ exit:
 ; DBG-NEXT: No successors
 ; DBG-EMPTY:
 ; DBG-NEXT: vector.ph:
+; DBG-NEXT:   UNIFORM-PER-UF vp<[[CAST:%.+]]> = trunc ir<1> to i32
 ; DBG-NEXT: Successor(s): vector loop
 ; DBG-EMPTY:
 ; DBG-NEXT: <x1> vector loop: {
@@ -191,7 +192,7 @@ exit:
 ; DBG-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
 ; DBG-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for> = phi ir<0>, vp<[[SCALAR_STEPS:.+]]>
 ; DBG-NEXT:     vp<[[DERIVED_IV:%.+]]> = DERIVED-IV ir<0> + vp<[[CAN_IV]]> * ir<1> (truncated to i32)
-; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, ir<1>
+; DBG-NEXT:     vp<[[SCALAR_STEPS]]> = SCALAR-STEPS vp<[[DERIVED_IV]]>, vp<[[CAST]]>
 ; DBG-NEXT:     EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%for>, vp<[[SCALAR_STEPS]]>
 ; DBG-NEXT:     CLONE store vp<[[SPLICE]]>, ir<%dst>
 ; DBG-NEXT:     EMIT vp<[[IV_INC:%.+]]> = add nuw vp<[[CAN_IV]]>, vp<[[VFxUF]]>
diff --git a/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
index ea3de4a0fbb363..f0220f5e766b23 100644
--- a/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
+++ b/llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll
@@ -43,7 +43,7 @@ define void @test(i16 %x, i64 %y, ptr %ptr) {
 ; CHECK-NEXT:    [[V3:%.*]] = add i8 [[V2]], 1
 ; CHECK-NEXT:    [[CMP15:%.*]] = icmp slt i8 [[V3]], 5
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], [[INC]]
-; CHECK-NEXT:    br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-NEXT:    br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       loop.exit:
 ; CHECK-NEXT:    [[DIV_1:%.*]] = udiv i64 [[Y]], [[ADD]]
 ; CHECK-NEXT:    [[V1:%.*]] = add i64 [[DIV_1]], 1

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)

Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

Comment on lines 1120 to 1122
// Ensure step has the same type as that of scalar IV.
Type *BaseIVTy = BaseIV->getType()->getScalarType();
if (BaseIVTy != Step->getType()) {
// TODO: Also use VPDerivedIVRecipe when only the step needs truncating, to
// avoid separate truncate here.
assert(Step->getType()->isIntegerTy() &&
"Truncation requires an integer step");
Step = State.Builder.CreateTrunc(Step, BaseIVTy);
}
assert(BaseIVTy == Step->getType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fold comment into assert message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is still missing a message, potentially that of the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@fhahn
Copy link
Contributor Author

fhahn commented Jan 15, 2024

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)

Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

We could adjust VPWidenCastRecipe to support the uniform-per-UF case by using the existing uniformity checks, but having recipes that support generating both scalar and vector code will make the individual recipes (slightly) more complicated (need extra checks and handle multiple cases in ::execute or cost estimates). Having separate recipes for the widen (vectorizing) and uniform cases would also allow to encode info explicitly and make it easier when reading the textual representation of a VPlan.

It also depends what we want to do with VPReplicateRecipe long-term, which currently models codegen for both replicating and uniform scalar codegen. Breaking up the recipe into 2 separate ones would help improve clarity and simplify codegen. As part of doing so we should try to make sure the new recipes can be created without an underlying IR instruction (which VPReplicateRecipe requires at the moment).

Note that after explicit unrolling lands, the PerUF part will be dropped.

@ayalz
Copy link
Collaborator

ayalz commented Jan 21, 2024

In order to cast a scalar uniform step (placed in the preheader) could we adapt VPWidenCastRecipe - which is a per-part recipe practically widening the resulting type (only) - by having it extend/truncate a given scalar or the elements of a given vector, per part - depending on whether its operand is a vector or scalar? (Could possibly rename it, and/or potentially also get the element count of the latter instead of State.VF, if preferred.)
Being uniform across VF lanes of each part, or across VF*UF, is more accurately associated with each VPValue than with recipes. The latter may propagate uniformity from their operands to the values they define, as in cast instructions. In some cases, such as integer divide (and floor) by VF or a multiple thereof, a non-uniform (consecutive, aligned on VF) operand may yield a uniform result.

We could adjust VPWidenCastRecipe to support the uniform-per-UF case by using the existing uniformity checks, but having recipes that support generating both scalar and vector code will make the individual recipes (slightly) more complicated (need extra checks and handle multiple cases in ::execute or cost estimates). Having separate recipes for the widen (vectorizing) and uniform cases would also allow to encode info explicitly and make it easier when reading the textual representation of a VPlan.

It also depends what we want to do with VPReplicateRecipe long-term, which currently models codegen for both replicating and uniform scalar codegen. Breaking up the recipe into 2 separate ones would help improve clarity and simplify codegen. As part of doing so we should try to make sure the new recipes can be created without an underlying IR instruction (which VPReplicateRecipe requires at the moment).

Note that after explicit unrolling lands, the PerUF part will be dropped.

The VPUniformPerUFRecipe of the current patch deals only with casts, so best name it as such, at-least for now, and define it comparable to VPWidenCastRecipe. Perhaps VPScalarCastRecipe would work - it effectively serves Invariant values placed outside the loop that are Uniform across all trip-count iterations, not only across the loop's VF and/or UF. Down the road, it should presumably be possible to query whether its VPValue operand is scalar or vector along with its uniformity across VF and/or UF or the entire Trip Count, and use that for cost and code-generation, or for asserting the case of distinct recipes.

How to refactor VPReplicateRecipe, relieving it of relying on an underlying IR instruction, and dealing with uniform cases, deserves further thoughts. Renaming may be adequate when recipe(s) generate scalar instruction(s) in general, rather than "replicate" an existing underlying Instruction. I.e., in contrast to VPReplicateCastRecipe.

One aspect of VPWidenCastRecipe is its destination type. A vector type, currently set during ::execute, can be set as soon as VF is known. I.e., when VPlan's range of VF's holds a single value - either initially or eventually during optimizeForVFAndUF.

Yes, explicit unrolling-by-UF should definitely drop per-UF logic thereby simplifying recipe::execute overall. Similar unrolling-by-VF could simplify Replicating Region logic.

@fhahn
Copy link
Contributor Author

fhahn commented Jan 21, 2024

The VPUniformPerUFRecipe of the current patch deals only with casts, so best name it as such, at-least for now, and define it comparable to VPWidenCastRecipe. Perhaps VPScalarCastRecipe would work - it effectively serves Invariant values placed outside the loop that are Uniform across all trip-count iterations, not only across the loop's VF and/or UF. Down the road, it should presumably be possible to query whether its VPValue operand is scalar or vector along with its uniformity across VF and/or UF or the entire Trip Count, and use that for cost and code-generation, or for asserting the case of distinct recipes.

Thanks, for now I updated the naming to reflect the cast-only case. Codegen is only implemented for the cases where only the first lane is used (as per isOnlyFirstLaneUsed check).

How to refactor VPReplicateRecipe, relieving it of relying on an underlying IR instruction, and dealing with uniform cases, deserves further thoughts. Renaming may be adequate when recipe(s) generate scalar instruction(s) in general, rather than "replicate" an existing underlying Instruction. I.e., in contrast to VPReplicateCastRecipe.

Sounds good!

One aspect of VPWidenCastRecipe is its destination type. A vector type, currently set during ::execute, can be set as soon as VF is known. I.e., when VPlan's range of VF's holds a single value - either initially or eventually during optimizeForVFAndUF.

Yes, that may be another option down the line.

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
Type *getResultType() const { return ResultTy; }
};

/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
/// VPScalarCastRecipe is a recipe to create scalar cast instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
Type *getResultType() const { return ResultTy; }
};

/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
class VPScalarCastRecipe : public VPRecipeBase, public VPValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPSingleDefRecipe ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted, thanks!

@@ -1338,6 +1339,34 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
Type *getResultType() const { return ResultTy; }
};

/// VPScalarCastRecipe is a recipe o create scalar cast instructions.
class VPScalarCastRecipe : public VPRecipeBase, public VPValue {
/// Cast instruction opcode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment adds little knowledge if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped, thanks!

/// Cast instruction opcode.
Instruction::CastOps Opcode;

/// Result type for the cast.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment adds little knowledge if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!


~VPScalarCastRecipe() override = default;

VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPWidenIntOrFpInductionSC ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@@ -1469,6 +1465,47 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif

Value *VPScalarCastRecipe ::generate(VPTransformState &State, unsigned Part) {
assert(vputils::onlyFirstLaneUsed(this) &&
"Codegen only implemented for first lane only.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify into an assert that the Opcode is SExt, ZExt or Trunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation here is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

@@ -1469,6 +1465,47 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif

Value *VPScalarCastRecipe ::generate(VPTransformState &State, unsigned Part) {
assert(vputils::onlyFirstLaneUsed(this) &&
"Codegen only implemented for first lane only.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Codegen only implemented for first lane only.");
"Codegen only implemented for first lane.");

(only need a single only)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped second only, thanks!

});
for (unsigned Part = 0; Part != State.UF; ++Part) {
Value *Res;
// Only generate a single instance, if the recipe is uniform across all UFs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a single instance is going to be generated, in the preheader? Recipe is uniform across all UFs by (current) definition (and even more so - invariant). Can assert UniformAcrossUFs holds, along with onlyFirstLaneUsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation here is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently untested / dead code.

The value returned by an EVL recipe should arguably be of the same type as that of its first operand, even if the underlying intrinsic it currently wraps insists on always returning an i32? I.e., that case should arguably be fixed by performing a trunc/sext/zext as needed when providing the adequate EVL value given a current Phi/remaining-trip-count values, to facilitate adding/subtracting them together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated version of the patch now also uses it for truncating derived IV/canonical IV if needed. Those are still uniform across VFs & UFs, but not invariant.

I suppose EVL at the moment can also be considered uniform across VFs and UFs.

Both the if/else sides should be executed by tests.

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

VPTypeAnalysis TypeInfo(SE.getContext());
if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
Step = new VPScalarCastRecipe(Instruction::Trunc, Step,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this recipe only generate Trunc? No need to hold an Opcode, can be renamed VPScalarTruncRecipe. Assert that the type sizes correspond to Trunc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here the only use-case is Trunc, but the implementation of the recipe is intentionally kept generic for any cast where only the first lane is used, so it can be used directly in #76172

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that the type sizes correspond to Trunc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@@ -362,6 +362,7 @@ class VPDef {
// START: Phi-like recipes. Need to be kept together.
VPBlendSC,
VPPredInstPHISC,
VPScalarCastSC,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placed inside Phi-like recipes interval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Copy link

github-actions bot commented Jan 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -2297,6 +2297,7 @@ emitTransformedIndex(IRBuilderBase &B, Value *Index, Value *StartValue,
? B.CreateSExtOrTrunc(Index, StepTy)
: B.CreateCast(Instruction::SIToFP, Index, StepTy);
if (CastedIndex != Index) {
assert(!isa<SExtInst>(CastedIndex));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: error message missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from testing, removed!

}

void VPScalarCastRecipe ::execute(VPTransformState &State) {
bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
bool LoopInvariant = all_of(operands(), [](VPValue *Op) {

this checks more than uniformity across UFs.

And if it is invariance we're checking, we can alternatively check if the recipe itself resides outside vector regions (e.g., has been LICM'd), rather than all of its operands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version extends the patch to use it also to truncate VPDerivedIVRecipe, if needed. For that, it also checks if the operands are VPDerivedIV/VPCanonicalIVPHIs, which should be considered uniform across all VFs & UFs, unless I am missing something.

});
for (unsigned Part = 0; Part != State.UF; ++Part) {
Value *Res;
// Only generate a single instance, if the recipe is uniform across all UFs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently untested / dead code.

The value returned by an EVL recipe should arguably be of the same type as that of its first operand, even if the underlying intrinsic it currently wraps insists on always returning an i32? I.e., that case should arguably be fixed by performing a trunc/sext/zext as needed when providing the adequate EVL value given a current Phi/remaining-trip-count values, to facilitate adding/subtracting them together.

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

VPTypeAnalysis TypeInfo(SE.getContext());
if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
Step = new VPScalarCastRecipe(Instruction::Trunc, Step,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that the type sizes correspond to Trunc?

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed latest comments, thanks! Also updated the code to use the new recipe to truncate VPDerivedIVRecipe if needed.

@@ -2297,6 +2297,7 @@ emitTransformedIndex(IRBuilderBase &B, Value *Index, Value *StartValue,
? B.CreateSExtOrTrunc(Index, StepTy)
: B.CreateCast(Instruction::SIToFP, Index, StepTy);
if (CastedIndex != Index) {
assert(!isa<SExtInst>(CastedIndex));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from testing, removed!

/// Cast instruction opcode.
Instruction::CastOps Opcode;

/// Result type for the cast.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

}

void VPScalarCastRecipe ::execute(VPTransformState &State) {
bool UniformAcrossUFs = all_of(operands(), [](VPValue *Op) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version extends the patch to use it also to truncate VPDerivedIVRecipe, if needed. For that, it also checks if the operands are VPDerivedIV/VPCanonicalIVPHIs, which should be considered uniform across all VFs & UFs, unless I am missing something.

});
for (unsigned Part = 0; Part != State.UF; ++Part) {
Value *Res;
// Only generate a single instance, if the recipe is uniform across all UFs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated version of the patch now also uses it for truncating derived IV/canonical IV if needed. Those are still uniform across VFs & UFs, but not invariant.

I suppose EVL at the moment can also be considered uniform across VFs and UFs.

Both the if/else sides should be executed by tests.

@@ -504,6 +504,15 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

VPTypeAnalysis TypeInfo(SE.getContext());
if (TypeInfo.inferScalarType(BaseIV) != TypeInfo.inferScalarType(Step)) {
Step = new VPScalarCastRecipe(Instruction::Trunc, Step,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reuse and clean-up! Title and commit message deserve an update.

@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
return V->getUnderlyingValue()->getType();
})
.Case<VPWidenCastRecipe>(
[](const VPWidenCastRecipe *R) { return R->getResultType(); });
[](const VPWidenCastRecipe *R) { return R->getResultType(); })
.Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that VPDerivedIVRecipe simply mirrors the type of its first start-value operand, should it join VPCanonicalIVPHIRecipe et al.'s or VPPredInstPHIRecipe et al.'s cases above and remove its getScalarType() method? Possibly as a follow-up.

Should VPWidenIntOrFpInductionRecipe's trunc also utilize VPScalarCastRecipe, and retire its getScalarType(), possibly as a follow-up?

@@ -1469,6 +1461,52 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif

static bool isUniformAcrossVFsAndUFs(VPScalarCastRecipe *C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth documenting, potentially leaving behind a note that uniformity should essentially be associated with VPValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment + TODO, thanks!

"Codegen only implemented for first lane.");
switch (Opcode) {
case Instruction::SExt:
case Instruction::ZExt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Trunc is currently being used, still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, could remove if preferred, but State.Builder.CreateCast generically supports any cast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to either remove or make a note that SExt and ZExt are currently unused/dead cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note, thanks!

@@ -498,10 +498,34 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
VPValue *BaseIV = CanonicalIV;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first check if Start and Step are canonical, regardless of types, in order to introduce a VPDerivedIVRecipe, i.e., remove the last Type parameter from isCanonical():

  VPSingleDefRecipe *BaseIV = CanonicalIV;

  // If the induction needs transforming besides truncating, create a
  // VPDerivedIVRecipe.
  if (!CanonicalIV->isCanonical(ID.getKind(), StartV, Step)) {
    BaseIV = new VPDerivedIVRecipe(ID, StartV, CanonicalIV, Step);
    HeaderVPBB->insert(BaseIV, IP);
  }

then check if BaseIV needs to be truncated:

  VPTypeAnalysis TypeInfo(SE.getContext());
  Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);
  if (TruncI && TruncI->getType() != BaseIVTy) {
    Type *TruncTy = TruncI->getType();
    assert(BaseIVTy->getScalarSizeInBits() > TruncTy->getScalarSizeInBits() &&
             BaseIVTy->isIntegerTy() && "Truncation requires an integer step");
    BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
    BaseIVTy = TruncTy;
    HeaderVPBB->insert(BaseIV, IP);
  }

and finally check if Step needs to be truncated:

  Type *StepTy = TypeInfo.inferScalarType(Step);
  if (BaseIVTy != StepTy) {
    assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
           StepTy->isIntegerTy() && "Not truncating.");
    Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);
    auto *VecPreheader =
        cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
    VecPreheader->appendRecipe(Step->getDefiningRecipe());
  }

before creating, inserting and returning Steps? Is IVTy needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! This works well now with the refactoring in this patch. IVTy removed.

@fhahn fhahn changed the title [VPlan] Add new VPUniformPerUFRecipe, use for step truncation. [VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. Jan 25, 2024
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be addressed and title & description updated, thanks!

@@ -230,7 +230,11 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
return V->getUnderlyingValue()->getType();
})
.Case<VPWidenCastRecipe>(
[](const VPWidenCastRecipe *R) { return R->getResultType(); });
[](const VPWidenCastRecipe *R) { return R->getResultType(); })
.Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPDerivedIVRecipe getScalarType would be good to remove, will do as follow-up.

VPWidenIntOrFpInductionRecipe's trunc drives truncating the start value and step, effectively creating a narrow phi. I'll check to see if this can also be modeled with the new recipe.

"Codegen only implemented for first lane.");
switch (Opcode) {
case Instruction::SExt:
case Instruction::ZExt:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, could remove if preferred, but State.Builder.CreateCast generically supports any cast

@@ -1469,6 +1461,52 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif

static bool isUniformAcrossVFsAndUFs(VPScalarCastRecipe *C) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment + TODO, thanks!

@@ -498,10 +498,34 @@ static VPValue *createScalarIVSteps(VPlan &Plan, const InductionDescriptor &ID,
VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
VPValue *BaseIV = CanonicalIV;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! This works well now with the refactoring in this patch. IVTy removed.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, adding some minor suggestions.

@@ -859,6 +859,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenIntOrFpInductionSC:
case VPRecipeBase::VPWidenPointerInductionSC:
case VPRecipeBase::VPReductionPHISC:
case VPRecipeBase::VPScalarCastSC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (independent of this patch): would be good to keep in order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do separately.

"Codegen only implemented for first lane.");
switch (Opcode) {
case Instruction::SExt:
case Instruction::ZExt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to either remove or make a note that SExt and ZExt are currently unused/dead cases.

VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
auto IP = HeaderVPBB->getFirstNonPhi();
VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
VPValue *BaseIV = CanonicalIV;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: defining BaseIV as a VPSingleDefRecipe* rather than VPValue* will allow removal of ->getDefiningRecipe().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 512 to 514
auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
HeaderVPBB->insert(T, IP);
BaseIV = T;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
HeaderVPBB->insert(T, IP);
BaseIV = T;
ResultTy = TruncTy;
BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, ResultTy);
HeaderVPBB->insert(BaseIV, IP);

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no real need in T, and more consistent to simply redefine BaseIV = new ... (..., BaseIV, ...), again, as in the proposal above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted, thanks!

Comment on lines 499 to 500
VPTypeAnalysis TypeInfo(SE.getContext());
Type *StepTy = TypeInfo.inferScalarType(Step);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPTypeAnalysis TypeInfo(SE.getContext());
Type *StepTy = TypeInfo.inferScalarType(Step);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sunk, thanks!

HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

// Truncate base induction if needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  VPTypeAnalysis TypeInfo(SE.getContext());
  Type *ResultTy = TypeInfo.inferScalarType(BaseIV);

both BaseIV and Step are subject to redefinition and truncation, perhaps aim to define the type of the result which they may truncate to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks! (missed originally)

Comment on lines 509 to 511
assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");
assert(ResultTy->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
ResultTy->isIntegerTy() && "Truncation requires an integer type");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and also separate into 2 separate asserts.

}

// Truncate step if needed.
Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);
Type *StepTy = TypeInfo.inferScalarType(Step);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Comment on lines 519 to 521
if (BaseIVTy != StepTy) {
assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
"Not truncating.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (BaseIVTy != StepTy) {
assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
"Not truncating.");
if (StepTy != ResultTy) {
assert(StepTy->getScalarSizeInBits() > ResultTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed thanks!

assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
"Not truncating.");

Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);
Step = new VPScalarCastRecipe(Instruction::Trunc, Step, ResultTy);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address latest comments, thanks!

"Codegen only implemented for first lane.");
switch (Opcode) {
case Instruction::SExt:
case Instruction::ZExt:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note, thanks!

@@ -859,6 +859,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenIntOrFpInductionSC:
case VPRecipeBase::VPWidenPointerInductionSC:
case VPRecipeBase::VPReductionPHISC:
case VPRecipeBase::VPScalarCastSC:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do separately.

VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
auto IP = HeaderVPBB->getFirstNonPhi();
VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
Type *TruncTy = TruncI ? TruncI->getType() : IVTy;
VPValue *BaseIV = CanonicalIV;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
"Not truncating.");

Step = new VPScalarCastRecipe(Instruction::Trunc, Step, BaseIVTy);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks!

Comment on lines 519 to 521
if (BaseIVTy != StepTy) {
assert(StepTy->getScalarSizeInBits() > BaseIVTy->getScalarSizeInBits() &&
"Not truncating.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed thanks!

}

// Truncate step if needed.
Type *BaseIVTy = TypeInfo.inferScalarType(BaseIV);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Comment on lines 499 to 500
VPTypeAnalysis TypeInfo(SE.getContext());
Type *StepTy = TypeInfo.inferScalarType(Step);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sunk, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments still seem relevant, trying to further clarify.


// Truncate base induction if needed.
VPTypeAnalysis TypeInfo(SE.getContext());
Type *StepTy = TypeInfo.inferScalarType(Step);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type *StepTy = TypeInfo.inferScalarType(Step);

Suffice to define StepTy later, when dealing with the truncation of Step. Here we're dealing with truncating BaseIV, and may be good to define its current/original type instead (using "BaseIVTy" may be confusing or deserves updating due to changing BaseIV and its type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Type *TruncTy = TruncI->getType();
assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StepTy->isIntegerTy() && "Truncation requires an integer step");
TypeInfo.inferScalarType(BaseIV)->isIntegerTy() && "Truncation requires an integer");

this should assert that the truncation of BaseIV is valid, independent of Step. As in the proposal above, which uses ResultTy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 512 to 514
auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
HeaderVPBB->insert(T, IP);
BaseIV = T;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no real need in T, and more consistent to simply redefine BaseIV = new ... (..., BaseIV, ...), again, as in the proposal above?

}

// Truncate step if needed.
Type *ResultTy = TypeInfo.inferScalarType(BaseIV);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define StepTy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed comments should be addressed now, seems I missed them in the GitHub UI somehow

HeaderVPBB->insert(BaseIV->getDefiningRecipe(), IP);
}

// Truncate base induction if needed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks! (missed originally)


// Truncate base induction if needed.
VPTypeAnalysis TypeInfo(SE.getContext());
Type *StepTy = TypeInfo.inferScalarType(Step);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 509 to 511
assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and also separate into 2 separate asserts.

Comment on lines 512 to 514
auto *T = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
HeaderVPBB->insert(T, IP);
BaseIV = T;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted, thanks!

}

// Truncate step if needed.
Type *ResultTy = TypeInfo.inferScalarType(BaseIV);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

Type *TruncTy = TruncI->getType();
assert(TypeInfo.inferScalarType(BaseIV)->getScalarSizeInBits() >
TruncTy->getScalarSizeInBits() &&
StepTy->isIntegerTy() && "Truncation requires an integer step");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an update of ResultTy, and possibly some test(s)?

assert(ResultTy->getScalarSizeInBits() > TruncTy->getScalarSizeInBits() &&
"Not truncating.");
assert(ResultTy->isIntegerTy() && "Truncation requires an integer type");
BaseIV = new VPScalarCastRecipe(Instruction::Trunc, BaseIV, TruncTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ResultTy should be reset to TruncTy here, so that Step will be truncated to the updated type of BaseIV, rather than its original type. (Or reset right after the asserts, so that both truncations use ResultTy.) Wonder about test coverage if all pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, sent the update too early (or too late in the day)! Should be fixed now., there were a number of test failures.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!
Adding a last minor nit.

Comment on lines 524 to 525
auto *VecPreheader =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VecPreheader =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
auto *VecPreheader =
cast<VPBasicBlock>(HeaderVPBB->getSingleHierarchicalPredecessor());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@fhahn fhahn merged commit 0ab539f into llvm:main Jan 26, 2024
3 of 4 checks passed
@fhahn fhahn deleted the vpuniformforuf-recipe branch January 26, 2024 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants