Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 22, 2025

Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost model:

  1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up
  2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed.

Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
 1. If the pointer is based on an induction, the legacy cost model
    passes its SCEV to getAddressComputationCost. In those cases, still
    fall back to the legacy cost. SCEV computations will be added  as
    follow-up
 2. If a load is used as part of an address of another load, the legacy
    cost model skips the scalarization overhead. Those cases are
    currently handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when
the legacy cost model computes the scalarization overhead, scalars have
not been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.
@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost model:

  1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up
  2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed.


Full diff: https://github.com/llvm/llvm-project/pull/160053.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+10-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+8-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHelpers.h (+10-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+103-13)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ca092dcfcb492..bc6f6c5da3aa5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3925,7 +3925,8 @@ void LoopVectorizationPlanner::emitInvalidCostRemarks(
       if (VF.isScalar())
         continue;
 
-      VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind);
+      VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind,
+                            *CM.PSE.getSE());
       precomputeCosts(*Plan, VF, CostCtx);
       auto Iter = vp_depth_first_deep(Plan->getVectorLoopRegion()->getEntry());
       for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
@@ -4182,7 +4183,8 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
 
       // Add on other costs that are modelled in VPlan, but not in the legacy
       // cost model.
-      VPCostContext CostCtx(CM.TTI, *CM.TLI, *P, CM, CM.CostKind);
+      VPCostContext CostCtx(CM.TTI, *CM.TLI, *P, CM, CM.CostKind,
+                            *CM.PSE.getSE());
       VPRegionBlock *VectorRegion = P->getVectorLoopRegion();
       assert(VectorRegion && "Expected to have a vector region!");
       for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
@@ -6871,7 +6873,7 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
 
 InstructionCost LoopVectorizationPlanner::cost(VPlan &Plan,
                                                ElementCount VF) const {
-  VPCostContext CostCtx(CM.TTI, *CM.TLI, Plan, CM, CM.CostKind);
+  VPCostContext CostCtx(CM.TTI, *CM.TLI, Plan, CM, CM.CostKind, *PSE.getSE());
   InstructionCost Cost = precomputeCosts(Plan, VF, CostCtx);
 
   // Now compute and add the VPlan-based cost.
@@ -7082,7 +7084,8 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
   // simplifications not accounted for in the legacy cost model. If that's the
   // case, don't trigger the assertion, as the extra simplifications may cause a
   // different VF to be picked by the VPlan-based cost model.
-  VPCostContext CostCtx(CM.TTI, *CM.TLI, BestPlan, CM, CM.CostKind);
+  VPCostContext CostCtx(CM.TTI, *CM.TLI, BestPlan, CM, CM.CostKind,
+                        *CM.PSE.getSE());
   precomputeCosts(BestPlan, BestFactor.Width, CostCtx);
   // Verify that the VPlan-based and legacy cost models agree, except for VPlans
   // with early exits and plans with additional VPlan simplifications. The
@@ -8704,7 +8707,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // TODO: Enable following transform when the EVL-version of extended-reduction
   // and mulacc-reduction are implemented.
   if (!CM.foldTailWithEVL()) {
-    VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind);
+    VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind,
+                          *CM.PSE.getSE());
     VPlanTransforms::runPass(VPlanTransforms::convertToAbstractRecipes, *Plan,
                              CostCtx, Range);
   }
@@ -10043,7 +10047,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     bool ForceVectorization =
         Hints.getForce() == LoopVectorizeHints::FK_Enabled;
     VPCostContext CostCtx(CM.TTI, *CM.TLI, LVP.getPlanFor(VF.Width), CM,
-                          CM.CostKind);
+                          CM.CostKind, *CM.PSE.getSE());
     if (!ForceVectorization &&
         !isOutsideLoopWorkProfitable(Checks, VF, L, PSE, CostCtx,
                                      LVP.getPlanFor(VF.Width), SEL,
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1e6f1e3aeb0ac..52a5a9053346d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1778,8 +1778,10 @@ VPCostContext::getOperandInfo(VPValue *V) const {
   return TTI::getOperandInfo(V->getLiveInIRValue());
 }
 
-InstructionCost VPCostContext::getScalarizationOverhead(
-    Type *ResultTy, ArrayRef<const VPValue *> Operands, ElementCount VF) {
+InstructionCost
+VPCostContext::getScalarizationOverhead(Type *ResultTy,
+                                        ArrayRef<const VPValue *> Operands,
+                                        ElementCount VF, bool Skip) {
   if (VF.isScalar())
     return 0;
 
@@ -1799,7 +1801,10 @@ InstructionCost VPCostContext::getScalarizationOverhead(
   SmallPtrSet<const VPValue *, 4> UniqueOperands;
   SmallVector<Type *> Tys;
   for (auto *Op : Operands) {
-    if (Op->isLiveIn() || isa<VPReplicateRecipe, VPPredInstPHIRecipe>(Op) ||
+    if (Op->isLiveIn() ||
+        (!Skip && isa<VPReplicateRecipe, VPPredInstPHIRecipe>(Op)) ||
+        (isa<VPReplicateRecipe>(Op) &&
+         cast<VPReplicateRecipe>(Op)->getOpcode() == Instruction::Load) ||
         !UniqueOperands.insert(Op).second)
       continue;
     Tys.push_back(toVectorizedTy(Types.inferScalarType(Op), VF));
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
index fe59774b7c838..2a8baec74b72b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
@@ -349,12 +349,14 @@ struct VPCostContext {
   LoopVectorizationCostModel &CM;
   SmallPtrSet<Instruction *, 8> SkipCostComputation;
   TargetTransformInfo::TargetCostKind CostKind;
+  ScalarEvolution &SE;
 
   VPCostContext(const TargetTransformInfo &TTI, const TargetLibraryInfo &TLI,
                 const VPlan &Plan, LoopVectorizationCostModel &CM,
-                TargetTransformInfo::TargetCostKind CostKind)
+                TargetTransformInfo::TargetCostKind CostKind,
+                ScalarEvolution &SE)
       : TTI(TTI), TLI(TLI), Types(Plan), LLVMCtx(Plan.getContext()), CM(CM),
-        CostKind(CostKind) {}
+        CostKind(CostKind), SE(SE) {}
 
   /// Return the cost for \p UI with \p VF using the legacy cost model as
   /// fallback until computing the cost of all recipes migrates to VPlan.
@@ -374,10 +376,12 @@ struct VPCostContext {
 
   /// Estimate the overhead of scalarizing a recipe with result type \p ResultTy
   /// and \p Operands with \p VF. This is a convenience wrapper for the
-  /// type-based getScalarizationOverhead API.
-  InstructionCost getScalarizationOverhead(Type *ResultTy,
-                                           ArrayRef<const VPValue *> Operands,
-                                           ElementCount VF);
+  /// type-based getScalarizationOverhead API. If \p AlwaysIncludeReplicatingR
+  /// is true, always compute the cost of scalarizing replicating operands.
+  InstructionCost
+  getScalarizationOverhead(Type *ResultTy, ArrayRef<const VPValue *> Operands,
+                           ElementCount VF,
+                           bool AlwaysIncludeReplicatingR = false);
 };
 
 /// This class can be used to assign names to VPValues. For VPValues without
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 8e9c3db50319f..3029599487bf9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -3075,6 +3075,63 @@ bool VPReplicateRecipe::shouldPack() const {
   });
 }
 
+/// Returns true if \p Ptr is a pointer computation for which the legacy cost
+/// model computes a SCEV expression when comping the address cost.
+static bool shouldUseAddressAccessSCEV(VPValue *Ptr) {
+  auto *PtrR = Ptr->getDefiningRecipe();
+  if (!PtrR || !((isa<VPReplicateRecipe>(PtrR) &&
+                  cast<VPReplicateRecipe>(PtrR)->getOpcode() ==
+                      Instruction::GetElementPtr) ||
+                 isa<VPWidenGEPRecipe>(PtrR)))
+    return false;
+
+  // We are looking for a gep with all loop invariant indices except for one
+  // which should be an induction variable.
+  unsigned NumOperands = PtrR->getNumOperands();
+  for (unsigned Idx = 1; Idx < NumOperands; ++Idx) {
+    VPValue *Opd = PtrR->getOperand(Idx);
+    if (!(Opd->isDefinedOutsideLoopRegions()) &&
+        !isa<VPScalarIVStepsRecipe, VPWidenIntOrFpInductionRecipe>(Opd))
+      return false;
+  }
+
+  return true;
+}
+
+/// Returns true of \p V is used as part of the address of another load or
+/// store.
+static bool isUsedByLoadStoreAddress(const VPUser *V) {
+  SmallPtrSet<const VPUser *, 4> Seen;
+  SmallVector<const VPUser *> WorkList = {V};
+
+  while (!WorkList.empty()) {
+    auto *Cur = dyn_cast<VPSingleDefRecipe>(WorkList.pop_back_val());
+    if (!Cur || !Seen.insert(Cur).second)
+      continue;
+
+    for (VPUser *U : Cur->users()) {
+      if (auto *InterleaveR = dyn_cast<VPInterleaveRecipe>(U))
+        if (InterleaveR->getAddr() == Cur)
+          return true;
+      if (auto *RepR = dyn_cast<VPReplicateRecipe>(U)) {
+        if (RepR->getOpcode() == Instruction::Load &&
+            RepR->getOperand(0) == Cur)
+          return true;
+        if (RepR->getOpcode() == Instruction::Store &&
+            RepR->getOperand(1) == Cur)
+          return true;
+      }
+      if (auto *MemR = dyn_cast<VPWidenMemoryRecipe>(U)) {
+        if (MemR->getAddr() == Cur && MemR->isConsecutive())
+          return true;
+      }
+    }
+
+    append_range(WorkList, cast<VPSingleDefRecipe>(Cur)->users());
+  }
+  return false;
+}
+
 InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
                                                VPCostContext &Ctx) const {
   Instruction *UI = cast<Instruction>(getUnderlyingValue());
@@ -3182,21 +3239,54 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
   }
   case Instruction::Load:
   case Instruction::Store: {
-    if (isSingleScalar()) {
-      bool IsLoad = UI->getOpcode() == Instruction::Load;
-      Type *ValTy = Ctx.Types.inferScalarType(IsLoad ? this : getOperand(0));
-      Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1));
-      const Align Alignment = getLoadStoreAlignment(UI);
-      unsigned AS = getLoadStoreAddressSpace(UI);
-      TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(UI->getOperand(0));
-      InstructionCost ScalarMemOpCost = Ctx.TTI.getMemoryOpCost(
-          UI->getOpcode(), ValTy, Alignment, AS, Ctx.CostKind, OpInfo, UI);
-      return ScalarMemOpCost + Ctx.TTI.getAddressComputationCost(
-                                   ScalarPtrTy, nullptr, nullptr, Ctx.CostKind);
-    }
+    if (VF.isScalable() && !isSingleScalar())
+      return InstructionCost::getInvalid();
+
     // TODO: See getMemInstScalarizationCost for how to handle replicating and
     // predicated cases.
-    break;
+    if (getParent()->getParent() && getParent()->getParent()->isReplicator())
+      break;
+
+    bool IsLoad = UI->getOpcode() == Instruction::Load;
+    // TODO: Handle cases where we need to pass a SCEV to
+    // getAddressComputationCost.
+    if (shouldUseAddressAccessSCEV(getOperand(!IsLoad)))
+      break;
+
+    Type *ValTy = Ctx.Types.inferScalarType(IsLoad ? this : getOperand(0));
+    Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1));
+    const Align Alignment = getLoadStoreAlignment(UI);
+    unsigned AS = getLoadStoreAddressSpace(UI);
+    TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(UI->getOperand(0));
+    InstructionCost ScalarMemOpCost = Ctx.TTI.getMemoryOpCost(
+        UI->getOpcode(), ValTy, Alignment, AS, Ctx.CostKind, OpInfo);
+
+    Type *PtrTy = isSingleScalar() ? ScalarPtrTy : toVectorTy(ScalarPtrTy, VF);
+
+    InstructionCost ScalarCost =
+        ScalarMemOpCost + Ctx.TTI.getAddressComputationCost(
+                              PtrTy, &Ctx.SE, nullptr, Ctx.CostKind);
+    if (isSingleScalar())
+      return ScalarCost;
+
+    SmallVector<const VPValue *> OpsToScalarize;
+    Type *ResultTy = Type::getVoidTy(getParent()->getPlan()->getContext());
+    // Set ResultTy and OpsToScalarize, if scalarization is needed. Currently we
+    // don't assign scalarization overhead in general, if the target prefers
+    // vectorized addressing or the loaded value is used as part of an address
+    // of another load or store.
+    if (Ctx.TTI.prefersVectorizedAddressing() ||
+        !isUsedByLoadStoreAddress(this)) {
+      if (!(IsLoad && !Ctx.TTI.prefersVectorizedAddressing()) &&
+          !(!IsLoad && Ctx.TTI.supportsEfficientVectorElementLoadStore()))
+        append_range(OpsToScalarize, operands());
+
+      if (!Ctx.TTI.supportsEfficientVectorElementLoadStore())
+        ResultTy = Ctx.Types.inferScalarType(this);
+    }
+
+    return (ScalarCost * VF.getFixedValue()) +
+           Ctx.getScalarizationOverhead(ResultTy, OpsToScalarize, VF, true);
   }
   }
 

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @fhahn! It's great to see the legacy cost model being removed from VPlanRecipes. I had a few comments ...

InstructionCost
VPCostContext::getScalarizationOverhead(Type *ResultTy,
ArrayRef<const VPValue *> Operands,
ElementCount VF, bool Skip) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the variable be more descriptive, e.g. something like SkipNonReplicatingLoadOpCost? The reason I say this is because even when Skip is true we don't actually skip if it's a VPReplicateRecipe with opcode Load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I renamed it in the header but not here. Updated to AlwaysIncludeReplicatingR, thanks

// TODO: See getMemInstScalarizationCost for how to handle replicating and
// predicated cases.
break;
if (getParent()->getParent() && getParent()->getParent()->isReplicator())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be good to create a variable for getParent()->getParent() so it's clearer what it actually is. Is this supposed to be a VPRegionBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated thanks!

return ScalarCost;

SmallVector<const VPValue *> OpsToScalarize;
Type *ResultTy = Type::getVoidTy(getParent()->getPlan()->getContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get the context more easily from another type, i.e.

  Type *ResultTy = Type::getVoidTy(PtrTy->getContext());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// don't assign scalarization overhead in general, if the target prefers
// vectorized addressing or the loaded value is used as part of an address
// of another load or store.
if (Ctx.TTI.prefersVectorizedAddressing() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps create a variable to keep the result of Ctx.TTI.prefersVectorizedAddressing, since it's also used below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// vectorized addressing or the loaded value is used as part of an address
// of another load or store.
if (Ctx.TTI.prefersVectorizedAddressing() ||
!isUsedByLoadStoreAddress(this)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is only a minor thing, but it feels more logical to ask this question first, i.e. (!isUsedByLoadStoreAddress(this) || Ctx.TTI.prefersVectorizedAddressing()). It sounds like what we're saying is that if the result is used to calculate the address for a subsequent load or store then only calculate the scalarisation overhead if the target prefers vectorised addressing, since in this case we'll need to do inserts/extracts. Is that a correct understanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t sounds like what we're saying is that if the result is used to calculate the address for a subsequent load or store then only calculate the scalarisation overhead if the target prefers vectorised addressing, since in this case we'll need to do inserts/extracts. Is that a correct understanding?

Yep exactly. The reason for checking TTI first is that it is likely cheaper than isUsedByLoadStoreAddress which needs to work the use list. I kept the order as-is for now.

!(!IsLoad && Ctx.TTI.supportsEfficientVectorElementLoadStore()))
append_range(OpsToScalarize, operands());

if (!Ctx.TTI.supportsEfficientVectorElementLoadStore())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep a copy of the result from the first invocation in a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

unsigned NumOperands = PtrR->getNumOperands();
for (unsigned Idx = 1; Idx < NumOperands; ++Idx) {
VPValue *Opd = PtrR->getOperand(Idx);
if (!(Opd->isDefinedOutsideLoopRegions()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can drop the brackets around Opd->isDefinedOutsideLoopRegions()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed thanks!

}

/// Returns true if \p Ptr is a pointer computation for which the legacy cost
/// model computes a SCEV expression when comping the address cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be when computing the address cost.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fixed, thanks

return true;
}

/// Returns true of \p V is used as part of the address of another load or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Return true if \p ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

RepR->getOperand(1) == Cur)
return true;
}
if (auto *MemR = dyn_cast<VPWidenMemoryRecipe>(U)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about histogram recipes too?

Also, do we have to worry about the EVL recipes such as VPInterleaveEVLRecipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to check for VPInterleaveBase. VPWidenMemoryRecipe should include both EVL and non EVL variants.

I don't think we need to handle histogram recipes here, as the address must be an AddRec in the current loop, so should not be able to depend on a load in the loop.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

return false;

// We are looking for a GEP with all loop invariant indices except for one
// which should be an induction variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does the comment need updating here as it doesn't seem to match the code? What the code really seems to do is "make sure all operands are either loop invariants or induction variables", i.e. there can be more than one induction variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated thanks! (original comment was just copied).

bool IsLoad = UI->getOpcode() == Instruction::Load;
// TODO: Handle cases where we need to pass a SCEV to
// getAddressComputationCost.
if (shouldUseAddressAccessSCEV(getOperand(!IsLoad)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Sorry, didn't see this before. Since you're reusing the ptr operand below as well (see Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1))) perhaps cleaner to create a variable for it?

  VPValue *PtrOp = getOperand(!IsLoad);
  if (shouldUseAddressAccessSCEV(PtrOp))
...
  Type *ScalarPtrTy = Ctx.Types.inferScalarType(PtrOp);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

@fhahn fhahn enabled auto-merge (squash) September 29, 2025 07:32
@fhahn fhahn merged commit b4be7ec into llvm:main Sep 29, 2025
7 of 9 checks passed
@fhahn fhahn deleted the vplan-replicating-cost branch September 29, 2025 08:44
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 29, 2025
…:computeCost. (#160053)

Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm/llvm-project#160053
@mgorny
Copy link
Member

mgorny commented Sep 30, 2025

Looks like this change broke building compiler-rt for me, on Gentoo/amd64:

clang++: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7079: llvm::VectorizationFact
or llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planCo
ntainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplificat
ions( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang++ -O2 -pipe -march=native --config=/etc/clang/22/gentoo-rtlib.cfg --config=/etc/clang/22/gentoo-stdlib.cfg --config=/etc/clang/22/gentoo-linker.cfg -Wall -Wno-unused-parameter -Wno-unknown-warning-option -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/include -g -Wno-covered-switch-default -Wno-suggest-override -DGTEST_NO_LLVM_SUPPORT=1 -DGTEST_HAS_RTTI=0 -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/../third-party/unittest/googletest/include -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/../third-party/unittest/googletest -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/include -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib/asan -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib/sanitizer_common/tests -DSANITIZER_COMMON_NO_REDEFINE_BUILTINS -fno-rtti -O2 -Wno-format -Werror=sign-compare -Wno-variadic-macros -gline-tables-only -DASAN_HAS_IGNORELIST=1 -DASAN_HAS_EXCEPTIONS=1 -DASAN_UAR=0 -m32 -c -o ASAN_NOINST_TEST_OBJECTS.gtest-all.cc.i386-calls.o /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/third-party/unittest/googletest/src/gtest-all.cc
1.      <eof> parser at end of file
2.      Optimizer
3.      Running pass "function<eager-inv>(drop-unnecessary-assumes,float2int,lower-constant-intrinsics,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O2>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/third-party/unittest/googletest/src/gtest-all.cc"
4.      Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZN7testing8internal13edit_distance21CalculateOptimalEditsERKSt6vectorIjSaIjEES6_"
 #0 0x00007f06855ca3d3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xdca3d3)
 #1 0x00007f06855c6d74 llvm::sys::RunSignalHandlers() (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xdc6d74)
 #2 0x00007f06854b0478 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xcb0478)
 #3 0x00007f068aa64910 (/usr/lib64/libc.so.6+0x3d910)
 #4 0x00007f068aabccbc (/usr/lib64/libc.so.6+0x95cbc)
 #5 0x00007f068aa647e6 raise (/usr/lib64/libc.so.6+0x3d7e6)
 #6 0x00007f068aa4c30b abort (/usr/lib64/libc.so.6+0x2530b)
 #7 0x00007f068aa4c275 __assert_perror_fail (/usr/lib64/libc.so.6+0x25275)
 #8 0x00007f06850bb420 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x8bb420)
 #9 0x00007f0687274154 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a74154)
#10 0x00007f068727786f llvm::LoopVectorizePass::runImpl(llvm::Function&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a7786f)
#11 0x00007f0687277dd9 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a77dd9)
#12 0x00007f068897c343 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x417c343)
#13 0x00007f06857e7b4b llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe7b4b)
#14 0x00007f06883b37f3 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x3bb37f3)
#15 0x00007f06857e80eb llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe80eb)
#16 0x00007f06883b44d3 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x3bb44d3)
#17 0x00007f06857e935f llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe935f)
#18 0x00007f068cf60576 (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2360576)
#19 0x00007f068cf648c7 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x23648c7)
#20 0x00007f068d30f0a0 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0gi
t98766d28+0x270f0a0)
#21 0x00007f068b8edbdc clang::ParseAST(clang::Sema&, bool, bool) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0xcedbdc)
#22 0x00007f068dc96e65 clang::FrontendAction::Execute() (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x3096e65)
#23 0x00007f068dc21f5e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98
766d28+0x3021f5e)
#24 0x00007f068dd316d5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x31316d5)
#25 0x000055a5d82abeff cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers
-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+++0x19eff)
#26 0x000055a5d82a3bf3 (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+
++0x11bf3)
#27 0x000055a5d82a4773 (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+
++0x12773)
#28 0x00007f068d7df16d (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2bdf16d)
#29 0x00007f06854b05de llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git987
66d28+0xcb05de)
#30 0x00007f068d7e1a0f clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<ch
ar, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2be1a0f)
#31 0x00007f068d79c0c0 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) 
const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2b9c0c0)
#32 0x00007f068d79c4a7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clan
g::driver::Command const*>>&, bool) const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2b9c4a7)
#33 0x00007f068d7ac074 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clan
g::driver::Command const*>>&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2bac074)
#34 0x000055a5d82a6334 clang_main(int, char**, llvm::ToolContext const&) (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.
9999/work/compiler-rt_build/lib/llvm/22/bin/clang+++0x14334)
#35 0x000055a5d82a1f78 main (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/c
lang+++0xff78)
#36 0x00007f068aa4e3ae (/usr/lib64/libc.so.6+0x273ae)
#37 0x00007f068aa4e469 __libc_start_main (/usr/lib64/libc.so.6+0x27469)
#38 0x000055a5d82a1fe5 _start (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin
/clang+++0xffe5)
clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 22.0.0git98766d28
Target: i386-pc-linux-gnu
Thread model: posix
InstalledDir: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin
Build config: +assertions
Configuration file: /etc/clang/22/i386-pc-linux-gnu-clang++.cfg
Configuration file: /etc/clang/22/gentoo-rtlib.cfg
Configuration file: /etc/clang/22/gentoo-stdlib.cfg
Configuration file: /etc/clang/22/gentoo-linker.cfg
clang++: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang++: note: diagnostic msg: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/temp/gtest-all-2ae0f1.cpp
clang++: note: diagnostic msg: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/temp/gtest-all-2ae0f1.sh
clang++: note: diagnostic msg: 

********************

The reproducer (2M, unpacks to 20M): repro.tar.gz

fhahn added a commit that referenced this pull request Sep 30, 2025
…mputeCost. (#160053)"

This reverts commit b4be7ec.

See #161404 for a crash
exposed by the change. Revert while I investigate.
@fhahn
Copy link
Contributor Author

fhahn commented Sep 30, 2025

@mgorny thanks, reverted while I investigate

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 2, 2025
…omputeCost. (llvm#160053)"

This reverts commit f61be43.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another memory
operation, which fixes llvm#161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm#160053
fhahn added a commit that referenced this pull request Oct 2, 2025
…omputeCost. (#160053)" (#161724)

This reverts commit f61be43.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
#161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: #160053
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 2, 2025
…ores in ::computeCost. (#160053)" (#161724)

This reverts commit f61be43.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
llvm/llvm-project#161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm/llvm-project#160053
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…t. (llvm#160053)

Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm#160053
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…mputeCost. (llvm#160053)"

This reverts commit b4be7ec.

See llvm#161404 for a crash
exposed by the change. Revert while I investigate.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…omputeCost. (llvm#160053)" (llvm#161724)

This reverts commit f61be43.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
llvm#161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm#160053
@alexey-bataev
Copy link
Member

Looks like the new version still causes the crashes, working on reproducer

@alexey-bataev
Copy link
Member

alexey-bataev commented Oct 3, 2025

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define double @test(ptr %0, ptr %1, ptr %2) {
.lr.ph218.preheader:
  br label %.lr.ph221

.lr.ph221:                                        ; preds = %3, %.lr.ph218.preheader
  %.lcssa225226 = phi double [ 0.000000e+00, %.lr.ph218.preheader ], [ %21, %3 ]
  br label %3

3:                                                ; preds = %3, %.lr.ph221
  %4 = phi i64 [ 0, %.lr.ph221 ], [ %22, %3 ]
  %5 = phi double [ %.lcssa225226, %.lr.ph221 ], [ %15, %3 ]
  %6 = add i64 %4, 1
  %7 = getelementptr i8, ptr %2, i64 %6
  %8 = getelementptr i64, ptr %0, i64 %6
  %9 = load i64, ptr %8, align 8
  %10 = getelementptr double, ptr %1, i64 %9
  %11 = load double, ptr %0, align 8
  %12 = fadd double %11, 0.000000e+00
  %13 = getelementptr i8, ptr %7, i64 8
  %14 = load double, ptr %13, align 8
  %15 = fmul double %12, 0.000000e+00
  %16 = fmul double %14, 0.000000e+00
  %17 = fadd double %16, 0.000000e+00
  %18 = fadd double %17, 1.000000e+00
  %19 = load double, ptr %10, align 8
  %20 = fdiv double %19, %18
  %21 = fsub double %5, %20
  %22 = add i64 %4, 1
  %exitcond253.not = icmp eq i64 %4, 1
  br i1 %exitcond253.not, label %.lr.ph221, label %3
}

opt --passes=loop-vectorize reduced.ll -S

lib/Transforms/Vectorize/LoopVectorize.cpp:7102: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.

@fhahn

@fhahn
Copy link
Contributor Author

fhahn commented Oct 3, 2025

Thanks, should be fixed by 1d65d9c

@mgorny
Copy link
Member

mgorny commented Oct 4, 2025

Thanks. I've started another test run now.

@mgorny
Copy link
Member

mgorny commented Oct 4, 2025

b1e29ec looks good here. Thanks again!

alexey-bataev added a commit that referenced this pull request Oct 5, 2025
…s in ::computeCost. (#160053)" (#161724)"

This reverts commit 8f2466b to fix
crashes reported in commits
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 6, 2025
…s in ::computeCost. (llvm#160053)" (llvm#161724)"

This reverts commit 8f2466b to fix
crashes reported in commits
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 6, 2025
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 6, 2025
…omputeCost. (llvm#160053)"

This reverts commit f80c0ba and 94eade6.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm#160053
fhahn added a commit that referenced this pull request Oct 6, 2025
…omputeCost. (#160053)" (#162157)

This reverts commit f80c0ba and 94eade6.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: #160053
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 6, 2025
…ores in ::computeCost. (#160053)" (#162157)

This reverts commit f80c0ba and 94eade6.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm/llvm-project#160053
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants