[VPlan] Compute cost of more replicating loads/stores in ::computeCost. #160053

fhahn · 2025-09-22T08:54:43Z

Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost model:

If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up
If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed.

Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed.

llvmbot · 2025-09-22T08:55:22Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost model:

If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up
If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed.

Full diff: https://github.com/llvm/llvm-project/pull/160053.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+10-6)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+8-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanHelpers.h (+10-6)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+103-13)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ca092dcfcb492..bc6f6c5da3aa5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3925,7 +3925,8 @@ void LoopVectorizationPlanner::emitInvalidCostRemarks(
       if (VF.isScalar())
         continue;
 
-      VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind);
+      VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind,
+                            *CM.PSE.getSE());
       precomputeCosts(*Plan, VF, CostCtx);
       auto Iter = vp_depth_first_deep(Plan->getVectorLoopRegion()->getEntry());
       for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
@@ -4182,7 +4183,8 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
 
       // Add on other costs that are modelled in VPlan, but not in the legacy
       // cost model.
-      VPCostContext CostCtx(CM.TTI, *CM.TLI, *P, CM, CM.CostKind);
+      VPCostContext CostCtx(CM.TTI, *CM.TLI, *P, CM, CM.CostKind,
+                            *CM.PSE.getSE());
       VPRegionBlock *VectorRegion = P->getVectorLoopRegion();
       assert(VectorRegion && "Expected to have a vector region!");
       for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
@@ -6871,7 +6873,7 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
 
 InstructionCost LoopVectorizationPlanner::cost(VPlan &Plan,
                                                ElementCount VF) const {
-  VPCostContext CostCtx(CM.TTI, *CM.TLI, Plan, CM, CM.CostKind);
+  VPCostContext CostCtx(CM.TTI, *CM.TLI, Plan, CM, CM.CostKind, *PSE.getSE());
   InstructionCost Cost = precomputeCosts(Plan, VF, CostCtx);
 
   // Now compute and add the VPlan-based cost.
@@ -7082,7 +7084,8 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
   // simplifications not accounted for in the legacy cost model. If that's the
   // case, don't trigger the assertion, as the extra simplifications may cause a
   // different VF to be picked by the VPlan-based cost model.
-  VPCostContext CostCtx(CM.TTI, *CM.TLI, BestPlan, CM, CM.CostKind);
+  VPCostContext CostCtx(CM.TTI, *CM.TLI, BestPlan, CM, CM.CostKind,
+                        *CM.PSE.getSE());
   precomputeCosts(BestPlan, BestFactor.Width, CostCtx);
   // Verify that the VPlan-based and legacy cost models agree, except for VPlans
   // with early exits and plans with additional VPlan simplifications. The
@@ -8704,7 +8707,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // TODO: Enable following transform when the EVL-version of extended-reduction
   // and mulacc-reduction are implemented.
   if (!CM.foldTailWithEVL()) {
-    VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind);
+    VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind,
+                          *CM.PSE.getSE());
     VPlanTransforms::runPass(VPlanTransforms::convertToAbstractRecipes, *Plan,
                              CostCtx, Range);
   }
@@ -10043,7 +10047,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     bool ForceVectorization =
         Hints.getForce() == LoopVectorizeHints::FK_Enabled;
     VPCostContext CostCtx(CM.TTI, *CM.TLI, LVP.getPlanFor(VF.Width), CM,
-                          CM.CostKind);
+                          CM.CostKind, *CM.PSE.getSE());
     if (!ForceVectorization &&
         !isOutsideLoopWorkProfitable(Checks, VF, L, PSE, CostCtx,
                                      LVP.getPlanFor(VF.Width), SEL,
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1e6f1e3aeb0ac..52a5a9053346d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1778,8 +1778,10 @@ VPCostContext::getOperandInfo(VPValue *V) const {
   return TTI::getOperandInfo(V->getLiveInIRValue());
 }
 
-InstructionCost VPCostContext::getScalarizationOverhead(
-    Type *ResultTy, ArrayRef<const VPValue *> Operands, ElementCount VF) {
+InstructionCost
+VPCostContext::getScalarizationOverhead(Type *ResultTy,
+                                        ArrayRef<const VPValue *> Operands,
+                                        ElementCount VF, bool Skip) {
   if (VF.isScalar())
     return 0;
 
@@ -1799,7 +1801,10 @@ InstructionCost VPCostContext::getScalarizationOverhead(
   SmallPtrSet<const VPValue *, 4> UniqueOperands;
   SmallVector<Type *> Tys;
   for (auto *Op : Operands) {
-    if (Op->isLiveIn() || isa<VPReplicateRecipe, VPPredInstPHIRecipe>(Op) ||
+    if (Op->isLiveIn() ||
+        (!Skip && isa<VPReplicateRecipe, VPPredInstPHIRecipe>(Op)) ||
+        (isa<VPReplicateRecipe>(Op) &&
+         cast<VPReplicateRecipe>(Op)->getOpcode() == Instruction::Load) ||
         !UniqueOperands.insert(Op).second)
       continue;
     Tys.push_back(toVectorizedTy(Types.inferScalarType(Op), VF));
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
index fe59774b7c838..2a8baec74b72b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
@@ -349,12 +349,14 @@ struct VPCostContext {
   LoopVectorizationCostModel &CM;
   SmallPtrSet<Instruction *, 8> SkipCostComputation;
   TargetTransformInfo::TargetCostKind CostKind;
+  ScalarEvolution &SE;
 
   VPCostContext(const TargetTransformInfo &TTI, const TargetLibraryInfo &TLI,
                 const VPlan &Plan, LoopVectorizationCostModel &CM,
-                TargetTransformInfo::TargetCostKind CostKind)
+                TargetTransformInfo::TargetCostKind CostKind,
+                ScalarEvolution &SE)
       : TTI(TTI), TLI(TLI), Types(Plan), LLVMCtx(Plan.getContext()), CM(CM),
-        CostKind(CostKind) {}
+        CostKind(CostKind), SE(SE) {}
 
   /// Return the cost for \p UI with \p VF using the legacy cost model as
   /// fallback until computing the cost of all recipes migrates to VPlan.
@@ -374,10 +376,12 @@ struct VPCostContext {
 
   /// Estimate the overhead of scalarizing a recipe with result type \p ResultTy
   /// and \p Operands with \p VF. This is a convenience wrapper for the
-  /// type-based getScalarizationOverhead API.
-  InstructionCost getScalarizationOverhead(Type *ResultTy,
-                                           ArrayRef<const VPValue *> Operands,
-                                           ElementCount VF);
+  /// type-based getScalarizationOverhead API. If \p AlwaysIncludeReplicatingR
+  /// is true, always compute the cost of scalarizing replicating operands.
+  InstructionCost
+  getScalarizationOverhead(Type *ResultTy, ArrayRef<const VPValue *> Operands,
+                           ElementCount VF,
+                           bool AlwaysIncludeReplicatingR = false);
 };
 
 /// This class can be used to assign names to VPValues. For VPValues without
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 8e9c3db50319f..3029599487bf9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -3075,6 +3075,63 @@ bool VPReplicateRecipe::shouldPack() const {
   });
 }
 
+/// Returns true if \p Ptr is a pointer computation for which the legacy cost
+/// model computes a SCEV expression when comping the address cost.
+static bool shouldUseAddressAccessSCEV(VPValue *Ptr) {
+  auto *PtrR = Ptr->getDefiningRecipe();
+  if (!PtrR || !((isa<VPReplicateRecipe>(PtrR) &&
+                  cast<VPReplicateRecipe>(PtrR)->getOpcode() ==
+                      Instruction::GetElementPtr) ||
+                 isa<VPWidenGEPRecipe>(PtrR)))
+    return false;
+
+  // We are looking for a gep with all loop invariant indices except for one
+  // which should be an induction variable.
+  unsigned NumOperands = PtrR->getNumOperands();
+  for (unsigned Idx = 1; Idx < NumOperands; ++Idx) {
+    VPValue *Opd = PtrR->getOperand(Idx);
+    if (!(Opd->isDefinedOutsideLoopRegions()) &&
+        !isa<VPScalarIVStepsRecipe, VPWidenIntOrFpInductionRecipe>(Opd))
+      return false;
+  }
+
+  return true;
+}
+
+/// Returns true of \p V is used as part of the address of another load or
+/// store.
+static bool isUsedByLoadStoreAddress(const VPUser *V) {
+  SmallPtrSet<const VPUser *, 4> Seen;
+  SmallVector<const VPUser *> WorkList = {V};
+
+  while (!WorkList.empty()) {
+    auto *Cur = dyn_cast<VPSingleDefRecipe>(WorkList.pop_back_val());
+    if (!Cur || !Seen.insert(Cur).second)
+      continue;
+
+    for (VPUser *U : Cur->users()) {
+      if (auto *InterleaveR = dyn_cast<VPInterleaveRecipe>(U))
+        if (InterleaveR->getAddr() == Cur)
+          return true;
+      if (auto *RepR = dyn_cast<VPReplicateRecipe>(U)) {
+        if (RepR->getOpcode() == Instruction::Load &&
+            RepR->getOperand(0) == Cur)
+          return true;
+        if (RepR->getOpcode() == Instruction::Store &&
+            RepR->getOperand(1) == Cur)
+          return true;
+      }
+      if (auto *MemR = dyn_cast<VPWidenMemoryRecipe>(U)) {
+        if (MemR->getAddr() == Cur && MemR->isConsecutive())
+          return true;
+      }
+    }
+
+    append_range(WorkList, cast<VPSingleDefRecipe>(Cur)->users());
+  }
+  return false;
+}
+
 InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
                                                VPCostContext &Ctx) const {
   Instruction *UI = cast<Instruction>(getUnderlyingValue());
@@ -3182,21 +3239,54 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
   }
   case Instruction::Load:
   case Instruction::Store: {
-    if (isSingleScalar()) {
-      bool IsLoad = UI->getOpcode() == Instruction::Load;
-      Type *ValTy = Ctx.Types.inferScalarType(IsLoad ? this : getOperand(0));
-      Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1));
-      const Align Alignment = getLoadStoreAlignment(UI);
-      unsigned AS = getLoadStoreAddressSpace(UI);
-      TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(UI->getOperand(0));
-      InstructionCost ScalarMemOpCost = Ctx.TTI.getMemoryOpCost(
-          UI->getOpcode(), ValTy, Alignment, AS, Ctx.CostKind, OpInfo, UI);
-      return ScalarMemOpCost + Ctx.TTI.getAddressComputationCost(
-                                   ScalarPtrTy, nullptr, nullptr, Ctx.CostKind);
-    }
+    if (VF.isScalable() && !isSingleScalar())
+      return InstructionCost::getInvalid();
+
     // TODO: See getMemInstScalarizationCost for how to handle replicating and
     // predicated cases.
-    break;
+    if (getParent()->getParent() && getParent()->getParent()->isReplicator())
+      break;
+
+    bool IsLoad = UI->getOpcode() == Instruction::Load;
+    // TODO: Handle cases where we need to pass a SCEV to
+    // getAddressComputationCost.
+    if (shouldUseAddressAccessSCEV(getOperand(!IsLoad)))
+      break;
+
+    Type *ValTy = Ctx.Types.inferScalarType(IsLoad ? this : getOperand(0));
+    Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1));
+    const Align Alignment = getLoadStoreAlignment(UI);
+    unsigned AS = getLoadStoreAddressSpace(UI);
+    TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(UI->getOperand(0));
+    InstructionCost ScalarMemOpCost = Ctx.TTI.getMemoryOpCost(
+        UI->getOpcode(), ValTy, Alignment, AS, Ctx.CostKind, OpInfo);
+
+    Type *PtrTy = isSingleScalar() ? ScalarPtrTy : toVectorTy(ScalarPtrTy, VF);
+
+    InstructionCost ScalarCost =
+        ScalarMemOpCost + Ctx.TTI.getAddressComputationCost(
+                              PtrTy, &Ctx.SE, nullptr, Ctx.CostKind);
+    if (isSingleScalar())
+      return ScalarCost;
+
+    SmallVector<const VPValue *> OpsToScalarize;
+    Type *ResultTy = Type::getVoidTy(getParent()->getPlan()->getContext());
+    // Set ResultTy and OpsToScalarize, if scalarization is needed. Currently we
+    // don't assign scalarization overhead in general, if the target prefers
+    // vectorized addressing or the loaded value is used as part of an address
+    // of another load or store.
+    if (Ctx.TTI.prefersVectorizedAddressing() ||
+        !isUsedByLoadStoreAddress(this)) {
+      if (!(IsLoad && !Ctx.TTI.prefersVectorizedAddressing()) &&
+          !(!IsLoad && Ctx.TTI.supportsEfficientVectorElementLoadStore()))
+        append_range(OpsToScalarize, operands());
+
+      if (!Ctx.TTI.supportsEfficientVectorElementLoadStore())
+        ResultTy = Ctx.Types.inferScalarType(this);
+    }
+
+    return (ScalarCost * VF.getFixedValue()) +
+           Ctx.getScalarizationOverhead(ResultTy, OpsToScalarize, VF, true);
   }
   }

david-arm

Thanks for this @fhahn! It's great to see the legacy cost model being removed from VPlanRecipes. I had a few comments ...

david-arm · 2025-09-23T08:58:04Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+InstructionCost
+VPCostContext::getScalarizationOverhead(Type *ResultTy,
+                                        ArrayRef<const VPValue *> Operands,
+                                        ElementCount VF, bool Skip) {


Should the variable be more descriptive, e.g. something like SkipNonReplicatingLoadOpCost? The reason I say this is because even when Skip is true we don't actually skip if it's a VPReplicateRecipe with opcode Load.

Ah yes, I renamed it in the header but not here. Updated to AlwaysIncludeReplicatingR, thanks

david-arm · 2025-09-23T09:01:19Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

    // TODO: See getMemInstScalarizationCost for how to handle replicating and
    // predicated cases.
-    break;
+    if (getParent()->getParent() && getParent()->getParent()->isReplicator())


nit: Might be good to create a variable for getParent()->getParent() so it's clearer what it actually is. Is this supposed to be a VPRegionBlock?

Yep, updated thanks!

david-arm · 2025-09-23T09:04:31Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+      return ScalarCost;
+
+    SmallVector<const VPValue *> OpsToScalarize;
+    Type *ResultTy = Type::getVoidTy(getParent()->getPlan()->getContext());


I think you can get the context more easily from another type, i.e.

Type *ResultTy = Type::getVoidTy(PtrTy->getContext());

Done thanks

david-arm · 2025-09-23T09:07:06Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    // don't assign scalarization overhead in general, if the target prefers
+    // vectorized addressing or the loaded value is used as part of an address
+    // of another load or store.
+    if (Ctx.TTI.prefersVectorizedAddressing() ||


Perhaps create a variable to keep the result of Ctx.TTI.prefersVectorizedAddressing, since it's also used below?

Done thanks

david-arm · 2025-09-23T09:10:00Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    // vectorized addressing or the loaded value is used as part of an address
+    // of another load or store.
+    if (Ctx.TTI.prefersVectorizedAddressing() ||
+        !isUsedByLoadStoreAddress(this)) {


I know this is only a minor thing, but it feels more logical to ask this question first, i.e. (!isUsedByLoadStoreAddress(this) || Ctx.TTI.prefersVectorizedAddressing()). It sounds like what we're saying is that if the result is used to calculate the address for a subsequent load or store then only calculate the scalarisation overhead if the target prefers vectorised addressing, since in this case we'll need to do inserts/extracts. Is that a correct understanding?

t sounds like what we're saying is that if the result is used to calculate the address for a subsequent load or store then only calculate the scalarisation overhead if the target prefers vectorised addressing, since in this case we'll need to do inserts/extracts. Is that a correct understanding?

Yep exactly. The reason for checking TTI first is that it is likely cheaper than isUsedByLoadStoreAddress which needs to work the use list. I kept the order as-is for now.

david-arm · 2025-09-23T09:13:06Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+          !(!IsLoad && Ctx.TTI.supportsEfficientVectorElementLoadStore()))
+        append_range(OpsToScalarize, operands());
+
+      if (!Ctx.TTI.supportsEfficientVectorElementLoadStore())


Keep a copy of the result from the first invocation in a variable?

Done thanks

david-arm · 2025-09-23T09:15:07Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  unsigned NumOperands = PtrR->getNumOperands();
+  for (unsigned Idx = 1; Idx < NumOperands; ++Idx) {
+    VPValue *Opd = PtrR->getOperand(Idx);
+    if (!(Opd->isDefinedOutsideLoopRegions()) &&


nit: I think you can drop the brackets around Opd->isDefinedOutsideLoopRegions()

Yep, removed thanks!

david-arm · 2025-09-23T09:15:54Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

 }

+/// Returns true if \p Ptr is a pointer computation for which the legacy cost
+/// model computes a SCEV expression when comping the address cost.


nit: Should this be when computing the address cost.?

Yep fixed, thanks

david-arm · 2025-09-23T09:16:17Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  return true;
+}
+
+/// Returns true of \p V is used as part of the address of another load or


nit: Return true if \p ...?

Updated, thanks

david-arm · 2025-09-23T09:29:46Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+            RepR->getOperand(1) == Cur)
+          return true;
+      }
+      if (auto *MemR = dyn_cast<VPWidenMemoryRecipe>(U)) {


Do we need to worry about histogram recipes too?

Also, do we have to worry about the EVL recipes such as VPInterleaveEVLRecipe?

I updated the code to check for VPInterleaveBase. VPWidenMemoryRecipe should include both EVL and non EVL variants.

I don't think we need to handle histogram recipes here, as the address must be an AddRec in the current loop, so should not be able to depend on a load in the loop.

david-arm

LGTM!

david-arm · 2025-09-25T08:34:45Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    return false;
+
+  // We are looking for a GEP with all loop invariant indices except for one
+  // which should be an induction variable.


nit: Does the comment need updating here as it doesn't seem to match the code? What the code really seems to do is "make sure all operands are either loop invariants or induction variables", i.e. there can be more than one induction variable.

Yep, updated thanks! (original comment was just copied).

david-arm · 2025-09-25T08:44:20Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    bool IsLoad = UI->getOpcode() == Instruction::Load;
+    // TODO: Handle cases where we need to pass a SCEV to
+    // getAddressComputationCost.
+    if (shouldUseAddressAccessSCEV(getOperand(!IsLoad)))


nit: Sorry, didn't see this before. Since you're reusing the ptr operand below as well (see Type *ScalarPtrTy = Ctx.Types.inferScalarType(getOperand(IsLoad ? 0 : 1))) perhaps cleaner to create a variable for it?

VPValue *PtrOp = getOperand(!IsLoad); if (shouldUseAddressAccessSCEV(PtrOp)) ... Type *ScalarPtrTy = Ctx.Types.inferScalarType(PtrOp);

Done thanks

…:computeCost. (#160053) Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm/llvm-project#160053

mgorny · 2025-09-30T18:55:20Z

Looks like this change broke building compiler-rt for me, on Gentoo/amd64:

clang++: /var/tmp/portage/llvm-core/llvm-22.0.0.9999/work/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7079: llvm::VectorizationFact
or llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planCo
ntainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplificat
ions( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang++ -O2 -pipe -march=native --config=/etc/clang/22/gentoo-rtlib.cfg --config=/etc/clang/22/gentoo-stdlib.cfg --config=/etc/clang/22/gentoo-linker.cfg -Wall -Wno-unused-parameter -Wno-unknown-warning-option -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/include -g -Wno-covered-switch-default -Wno-suggest-override -DGTEST_NO_LLVM_SUPPORT=1 -DGTEST_HAS_RTTI=0 -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/../third-party/unittest/googletest/include -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/../third-party/unittest/googletest -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/include -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib/asan -I/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt/lib/sanitizer_common/tests -DSANITIZER_COMMON_NO_REDEFINE_BUILTINS -fno-rtti -O2 -Wno-format -Werror=sign-compare -Wno-variadic-macros -gline-tables-only -DASAN_HAS_IGNORELIST=1 -DASAN_HAS_EXCEPTIONS=1 -DASAN_UAR=0 -m32 -c -o ASAN_NOINST_TEST_OBJECTS.gtest-all.cc.i386-calls.o /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/third-party/unittest/googletest/src/gtest-all.cc
1.      <eof> parser at end of file
2.      Optimizer
3.      Running pass "function<eager-inv>(drop-unnecessary-assumes,float2int,lower-constant-intrinsics,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O2>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/third-party/unittest/googletest/src/gtest-all.cc"
4.      Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZN7testing8internal13edit_distance21CalculateOptimalEditsERKSt6vectorIjSaIjEES6_"
 #0 0x00007f06855ca3d3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xdca3d3)
 #1 0x00007f06855c6d74 llvm::sys::RunSignalHandlers() (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xdc6d74)
 #2 0x00007f06854b0478 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xcb0478)
 #3 0x00007f068aa64910 (/usr/lib64/libc.so.6+0x3d910)
 #4 0x00007f068aabccbc (/usr/lib64/libc.so.6+0x95cbc)
 #5 0x00007f068aa647e6 raise (/usr/lib64/libc.so.6+0x3d7e6)
 #6 0x00007f068aa4c30b abort (/usr/lib64/libc.so.6+0x2530b)
 #7 0x00007f068aa4c275 __assert_perror_fail (/usr/lib64/libc.so.6+0x25275)
 #8 0x00007f06850bb420 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x8bb420)
 #9 0x00007f0687274154 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a74154)
#10 0x00007f068727786f llvm::LoopVectorizePass::runImpl(llvm::Function&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a7786f)
#11 0x00007f0687277dd9 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x2a77dd9)
#12 0x00007f068897c343 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x417c343)
#13 0x00007f06857e7b4b llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe7b4b)
#14 0x00007f06883b37f3 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x3bb37f3)
#15 0x00007f06857e80eb llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe80eb)
#16 0x00007f06883b44d3 (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0x3bb44d3)
#17 0x00007f06857e935f llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git98766d28+0xfe935f)
#18 0x00007f068cf60576 (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2360576)
#19 0x00007f068cf648c7 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x23648c7)
#20 0x00007f068d30f0a0 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0gi
t98766d28+0x270f0a0)
#21 0x00007f068b8edbdc clang::ParseAST(clang::Sema&, bool, bool) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0xcedbdc)
#22 0x00007f068dc96e65 clang::FrontendAction::Execute() (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x3096e65)
#23 0x00007f068dc21f5e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98
766d28+0x3021f5e)
#24 0x00007f068dd316d5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x31316d5)
#25 0x000055a5d82abeff cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers
-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+++0x19eff)
#26 0x000055a5d82a3bf3 (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+
++0x11bf3)
#27 0x000055a5d82a4773 (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/clang+
++0x12773)
#28 0x00007f068d7df16d (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2bdf16d)
#29 0x00007f06854b05de llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/lib/llvm/22/lib64/libLLVM.so.22.0git987
66d28+0xcb05de)
#30 0x00007f068d7e1a0f clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<ch
ar, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2be1a0f)
#31 0x00007f068d79c0c0 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) 
const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2b9c0c0)
#32 0x00007f068d79c4a7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clan
g::driver::Command const*>>&, bool) const (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2b9c4a7)
#33 0x00007f068d7ac074 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clan
g::driver::Command const*>>&) (/usr/lib/llvm/22/lib64/libclang-cpp.so.22.0git98766d28+0x2bac074)
#34 0x000055a5d82a6334 clang_main(int, char**, llvm::ToolContext const&) (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.
9999/work/compiler-rt_build/lib/llvm/22/bin/clang+++0x14334)
#35 0x000055a5d82a1f78 main (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin/c
lang+++0xff78)
#36 0x00007f068aa4e3ae (/usr/lib64/libc.so.6+0x273ae)
#37 0x00007f068aa4e469 __libc_start_main (/usr/lib64/libc.so.6+0x27469)
#38 0x000055a5d82a1fe5 _start (/var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin
/clang+++0xffe5)
clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 22.0.0git98766d28
Target: i386-pc-linux-gnu
Thread model: posix
InstalledDir: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/work/compiler-rt_build/lib/llvm/22/bin
Build config: +assertions
Configuration file: /etc/clang/22/i386-pc-linux-gnu-clang++.cfg
Configuration file: /etc/clang/22/gentoo-rtlib.cfg
Configuration file: /etc/clang/22/gentoo-stdlib.cfg
Configuration file: /etc/clang/22/gentoo-linker.cfg
clang++: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang++: note: diagnostic msg: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/temp/gtest-all-2ae0f1.cpp
clang++: note: diagnostic msg: /var/tmp/portage/llvm-runtimes/compiler-rt-sanitizers-22.0.0.9999/temp/gtest-all-2ae0f1.sh
clang++: note: diagnostic msg: 

********************

The reproducer (2M, unpacks to 20M): repro.tar.gz

…mputeCost. (#160053)" This reverts commit b4be7ec. See #161404 for a crash exposed by the change. Revert while I investigate.

fhahn · 2025-09-30T21:17:12Z

@mgorny thanks, reverted while I investigate

…omputeCost. (llvm#160053)" This reverts commit f61be43. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes llvm#161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053

…omputeCost. (#160053)" (#161724) This reverts commit f61be43. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes #161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: #160053

…ores in ::computeCost. (#160053)" (#161724) This reverts commit f61be43. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes llvm/llvm-project#161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm/llvm-project#160053

…t. (llvm#160053) Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053

…mputeCost. (llvm#160053)" This reverts commit b4be7ec. See llvm#161404 for a crash exposed by the change. Revert while I investigate.

…omputeCost. (llvm#160053)" (llvm#161724) This reverts commit f61be43. Recommit a small fix handling scalarization overhead consistently with legacy cost model if a load is used directly as operand of another memory operation, which fixes llvm#161404. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053

alexey-bataev · 2025-10-03T13:00:34Z

Looks like the new version still causes the crashes, working on reproducer

alexey-bataev · 2025-10-03T13:45:35Z

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define double @test(ptr %0, ptr %1, ptr %2) {
.lr.ph218.preheader:
  br label %.lr.ph221

.lr.ph221:                                        ; preds = %3, %.lr.ph218.preheader
  %.lcssa225226 = phi double [ 0.000000e+00, %.lr.ph218.preheader ], [ %21, %3 ]
  br label %3

3:                                                ; preds = %3, %.lr.ph221
  %4 = phi i64 [ 0, %.lr.ph221 ], [ %22, %3 ]
  %5 = phi double [ %.lcssa225226, %.lr.ph221 ], [ %15, %3 ]
  %6 = add i64 %4, 1
  %7 = getelementptr i8, ptr %2, i64 %6
  %8 = getelementptr i64, ptr %0, i64 %6
  %9 = load i64, ptr %8, align 8
  %10 = getelementptr double, ptr %1, i64 %9
  %11 = load double, ptr %0, align 8
  %12 = fadd double %11, 0.000000e+00
  %13 = getelementptr i8, ptr %7, i64 8
  %14 = load double, ptr %13, align 8
  %15 = fmul double %12, 0.000000e+00
  %16 = fmul double %14, 0.000000e+00
  %17 = fadd double %16, 0.000000e+00
  %18 = fadd double %17, 1.000000e+00
  %19 = load double, ptr %10, align 8
  %20 = fdiv double %19, %18
  %21 = fsub double %5, %20
  %22 = add i64 %4, 1
  %exitcond253.not = icmp eq i64 %4, 1
  br i1 %exitcond253.not, label %.lr.ph221, label %3
}

opt --passes=loop-vectorize reduced.ll -S

lib/Transforms/Vectorize/LoopVectorize.cpp:7102: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.

@fhahn

fhahn · 2025-10-03T21:03:49Z

Thanks, should be fixed by 1d65d9c

mgorny · 2025-10-04T05:01:44Z

Thanks. I've started another test run now.

mgorny · 2025-10-04T08:52:02Z

b1e29ec looks good here. Thanks again!

…s in ::computeCost. (#160053)" (#161724)" This reverts commit 8f2466b to fix crashes reported in commits

…s in ::computeCost. (llvm#160053)" (llvm#161724)" This reverts commit 8f2466b to fix crashes reported in commits

…es in ::computeCost. (llvm#160053)" (llvm#161724)" This reverts commit f80c0ba.

…omputeCost. (llvm#160053)" This reverts commit f80c0ba and 94eade6. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053

…omputeCost. (#160053)" (#162157) This reverts commit f80c0ba and 94eade6. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: #160053

…ores in ::computeCost. (#160053)" (#162157) This reverts commit f80c0ba and 94eade6. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm/llvm-project#160053

fhahn requested review from lukel97, ayalz, david-arm and ElvisWang123 September 22, 2025 08:54

llvmbot added vectorizers llvm:transforms labels Sep 22, 2025

david-arm reviewed Sep 23, 2025

View reviewed changes

fhahn added 2 commits September 24, 2025 13:17

Merge remote-tracking branch 'origin/main' into vplan-replicating-cost

5adaaed

!fixup address latest comments, thanks!

6c25bbc

david-arm approved these changes Sep 25, 2025

View reviewed changes

fhahn added 3 commits September 27, 2025 19:47

Merge remote-tracking branch 'origin/main' into vplan-replicating-cost

81ff5f0

!fixup updated comment, add variable

462b456

Merge branch 'main' into vplan-replicating-cost

b9c5075

fhahn enabled auto-merge (squash) September 29, 2025 07:32

fhahn merged commit b4be7ec into llvm:main Sep 29, 2025
7 of 9 checks passed

fhahn deleted the vplan-replicating-cost branch September 29, 2025 08:44

fhahn added a commit that referenced this pull request Sep 30, 2025

Revert "[VPlan] Compute cost of more replicating loads/stores in ::co…

f61be43

…mputeCost. (#160053)" This reverts commit b4be7ec. See #161404 for a crash exposed by the change. Revert while I investigate.

fhahn mentioned this pull request Oct 2, 2025

Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" #161724

Merged

alexey-bataev added a commit that referenced this pull request Oct 5, 2025

Revert "Reapply "[VPlan] Compute cost of more replicating loads/store…

f80c0ba

…s in ::computeCost. (#160053)" (#161724)" This reverts commit 8f2466b to fix crashes reported in commits

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 6, 2025

Reapply "Reapply "[VPlan] Compute cost of more replicating loads/stor…

770cb16

…es in ::computeCost. (llvm#160053)" (llvm#161724)" This reverts commit f80c0ba.

fhahn mentioned this pull request Oct 6, 2025

Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" #162157

Merged

[VPlan] Compute cost of more replicating loads/stores in ::computeCost. #160053

[VPlan] Compute cost of more replicating loads/stores in ::computeCost. #160053

Uh oh!

Conversation

fhahn commented Sep 22, 2025

Uh oh!

llvmbot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgorny commented Sep 30, 2025

Uh oh!

fhahn commented Sep 30, 2025

Uh oh!

alexey-bataev commented Oct 3, 2025

Uh oh!

alexey-bataev commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Oct 3, 2025

Uh oh!

mgorny commented Oct 4, 2025

Uh oh!

mgorny commented Oct 4, 2025

Uh oh!

Uh oh!

llvmbot commented Sep 22, 2025 •

edited

Loading

alexey-bataev commented Oct 3, 2025 •

edited

Loading