[LV] Don't create partial reductions if factor doesn't match accumulator #158603

fhahn · 2025-09-15T11:04:44Z

Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor.

Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor.

llvmbot · 2025-09-15T11:05:20Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor.

Full diff: https://github.com/llvm/llvm-project/pull/158603.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+9-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+1-15)
(modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+14)
(modified) llvm/lib/Transforms/Vectorize/VPlanUtils.h (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 640a98c622f80..2caeb1bf8a695 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8141,8 +8141,11 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
   if (isa<LoadInst>(Instr) || isa<StoreInst>(Instr))
     return tryToWidenMemory(Instr, Operands, Range);
 
-  if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr))
-    return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
+  if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr)) {
+    if (auto PartialRed =
+            tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()))
+      return PartialRed;
+  }
 
   if (!shouldWiden(Instr, Range))
     return nullptr;
@@ -8176,6 +8179,10 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
       isa<VPPartialReductionRecipe>(BinOpRecipe))
     std::swap(BinOp, Accumulator);
 
+  if (ScaleFactor !=
+      vputils::getVFScaleFactor(Accumulator->getDefiningRecipe()))
+    return nullptr;
+
   unsigned ReductionOpcode = Reduction->getOpcode();
   if (ReductionOpcode == Instruction::Sub) {
     auto *const Zero = ConstantInt::get(Reduction->getType(), 0);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index d400ceff7797c..b4e62d39173e1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -394,20 +394,6 @@ bool VPDominatorTree::properlyDominates(const VPRecipeBase *A,
   return Base::properlyDominates(ParentA, ParentB);
 }
 
-/// Get the VF scaling factor applied to the recipe's output, if the recipe has
-/// one.
-static unsigned getVFScaleFactor(VPRecipeBase *R) {
-  if (auto *RR = dyn_cast<VPReductionPHIRecipe>(R))
-    return RR->getVFScaleFactor();
-  if (auto *RR = dyn_cast<VPPartialReductionRecipe>(R))
-    return RR->getVFScaleFactor();
-  assert(
-      (!isa<VPInstruction>(R) || cast<VPInstruction>(R)->getOpcode() !=
-                                     VPInstruction::ReductionStartVector) &&
-      "getting scaling factor of reduction-start-vector not implemented yet");
-  return 1;
-}
-
 bool VPRegisterUsage::exceedsMaxNumRegs(const TargetTransformInfo &TTI,
                                         unsigned OverrideMaxNumRegs) const {
   return any_of(MaxLocalUsers, [&TTI, &OverrideMaxNumRegs](auto &LU) {
@@ -569,7 +555,7 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
         } else {
           // The output from scaled phis and scaled reductions actually has
           // fewer lanes than the VF.
-          unsigned ScaleFactor = getVFScaleFactor(R);
+          unsigned ScaleFactor = vputils::getVFScaleFactor(R);
           ElementCount VF = VFs[J].divideCoefficientBy(ScaleFactor);
           LLVM_DEBUG(if (VF != VFs[J]) {
             dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to " << VF
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index ddc4ad1977401..94c492f52e89f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -141,3 +141,17 @@ VPBasicBlock *vputils::getFirstLoopHeader(VPlan &Plan, VPDominatorTree &VPDT) {
   });
   return I == DepthFirst.end() ? nullptr : cast<VPBasicBlock>(*I);
 }
+
+unsigned vputils::getVFScaleFactor(VPRecipeBase *R) {
+  if (!R)
+    return 1;
+  if (auto *RR = dyn_cast<VPReductionPHIRecipe>(R))
+    return RR->getVFScaleFactor();
+  if (auto *RR = dyn_cast<VPPartialReductionRecipe>(R))
+    return RR->getVFScaleFactor();
+  assert(
+      (!isa<VPInstruction>(R) || cast<VPInstruction>(R)->getOpcode() !=
+                                     VPInstruction::ReductionStartVector) &&
+      "getting scaling factor of reduction-start-vector not implemented yet");
+  return 1;
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.h b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
index 77c099b271717..0dd6885da6fdd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
@@ -101,6 +101,10 @@ bool isUniformAcrossVFsAndUFs(VPValue *V);
 /// Returns the header block of the first, top-level loop, or null if none
 /// exist.
 VPBasicBlock *getFirstLoopHeader(VPlan &Plan, VPDominatorTree &VPDT);
+
+/// Get the VF scaling factor applied to the recipe's output, if the recipe has
+/// one.
+unsigned getVFScaleFactor(VPRecipeBase *R);
 } // namespace vputils
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
index c0995ec150c8d..f8b9d5db07fce 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
@@ -1404,7 +1404,7 @@ define i32 @red_extended_add_chain(ptr %start, ptr %end, i32 %offset) {
 ; CHECK-NEON-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]]
 ; CHECK-NEON-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[NEXT_GEP]], align 1
 ; CHECK-NEON-NEXT:    [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32>
-; CHECK-NEON-NEXT:    [[PARTIAL_REDUCE:%.*]] = call <16 x i32> @llvm.experimental.vector.partial.reduce.add.v16i32.v16i32(<16 x i32> [[VEC_PHI]], <16 x i32> [[TMP3]])
+; CHECK-NEON-NEXT:    [[PARTIAL_REDUCE:%.*]] = add <16 x i32> [[VEC_PHI]], [[TMP3]]
 ; CHECK-NEON-NEXT:    [[TMP4]] = add <16 x i32> [[PARTIAL_REDUCE]], [[BROADCAST_SPLAT]]
 ; CHECK-NEON-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
 ; CHECK-NEON-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
@@ -1478,7 +1478,7 @@ define i32 @red_extended_add_chain(ptr %start, ptr %end, i32 %offset) {
 ; CHECK-SVE-MAXBW-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[START]], i64 [[INDEX]]
 ; CHECK-SVE-MAXBW-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 8 x i8>, ptr [[NEXT_GEP]], align 1
 ; CHECK-SVE-MAXBW-NEXT:    [[TMP7:%.*]] = zext <vscale x 8 x i8> [[WIDE_LOAD]] to <vscale x 8 x i32>
-; CHECK-SVE-MAXBW-NEXT:    [[PARTIAL_REDUCE:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.partial.reduce.add.nxv8i32.nxv8i32(<vscale x 8 x i32> [[VEC_PHI]], <vscale x 8 x i32> [[TMP7]])
+; CHECK-SVE-MAXBW-NEXT:    [[PARTIAL_REDUCE:%.*]] = add <vscale x 8 x i32> [[VEC_PHI]], [[TMP7]]
 ; CHECK-SVE-MAXBW-NEXT:    [[TMP8]] = add <vscale x 8 x i32> [[PARTIAL_REDUCE]], [[BROADCAST_SPLAT]]
 ; CHECK-SVE-MAXBW-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
 ; CHECK-SVE-MAXBW-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]

artagnon

There are two independent changes in this PR?

artagnon · 2025-09-15T20:40:55Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-  if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr))
-    return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
+  if (std::optional<unsigned> ScaleFactor = getScalingForReduction(Instr)) {
+    if (auto PartialRed =
+            tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()))
+      return PartialRed;
+  }


Bad change?

I think this is intentional? tryToCreatePartialReduction can now return nullptr on line 8185 so I presume we need to check it here.

Yes, we need to account for tryToCreatePartialReduction now possibly returning nullptr

sdesmalen-arm · 2025-09-16T08:09:12Z

llvm/lib/Transforms/Vectorize/VPlanUtils.cpp

+  assert(
+      (!isa<VPInstruction>(R) || cast<VPInstruction>(R)->getOpcode() !=
+                                     VPInstruction::ReductionStartVector) &&
+      "getting scaling factor of reduction-start-vector not implemented yet");


Will this now result in the compiler (with asserts enabled) to fail on valid code when trying to vectorise it?
If so then this needs to be a FIXME instead.

This just moves the existing code and allows R to be a nullptr, so I don't think this should introduce a new assertion failure.

sdesmalen-arm · 2025-09-16T08:39:09Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

      isa<VPPartialReductionRecipe>(BinOpRecipe))
    std::swap(BinOp, Accumulator);

+  if (ScaleFactor !=


Why does the scale-factor of the accumulator (PHI) not match that of the reduction? Is there something missing elsewhere that causes this to not match?

Possibly unrelated, but when I print the values of BinOp/Accumulator, I see that BinOp = WIDEN_REDUCTION-PHI and Accumulator = WIDEN-CAST, which is then swapped above to result in BinOp = WIDEN-CAST. That does not look like a binary-op to me, so. Is this poorly named or does the code actually assume a 'dot-product'-like operation which is not what it gets?

Why does the scale-factor of the accumulator (PHI) not match that of the reduction? Is there something missing elsewhere that causes this to not match?

Ths can happen when there's a chain and the first entry can use a partial reduction but the second entry in the chain cannot.

Possibly unrelated, but when I print the values of BinOp/Accumulator, I see that BinOp = WIDEN_REDUCTION-PHI and Accumulator = WIDEN-CAST, which is then swapped above to result in BinOp = WIDEN-CAST. That does not look like a binary-op to me, so. Is this poorly named or does the code actually assume a 'dot-product'-like operation which is not what it gets?

I think it is poorly named, as this also supports using dot product instructions for widening adds, by multiplying with 1

SamTebbs33

This looks sensible to me, thank you.

SamTebbs33 · 2025-09-19T09:22:58Z

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll

 ; CHECK-NEON-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[NEXT_GEP]], align 1
 ; CHECK-NEON-NEXT:    [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i32>
-; CHECK-NEON-NEXT:    [[PARTIAL_REDUCE:%.*]] = call <16 x i32> @llvm.experimental.vector.partial.reduce.add.v16i32.v16i32(<16 x i32> [[VEC_PHI]], <16 x i32> [[TMP3]])
+; CHECK-NEON-NEXT:    [[PARTIAL_REDUCE:%.*]] = add <16 x i32> [[VEC_PHI]], [[TMP3]]


This test should probably be renamed now that it doesn't produce a partial reduction.

thanks, updated to red_extended_add_incomplete_chain

…ons-non-narrowing

github-actions · 2025-09-24T10:42:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…ch accumulator (#158603) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: llvm/llvm-project#158603

…tor (llvm#158603) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: llvm#158603

fhahn requested review from SamTebbs33, lukel97 and sdesmalen-arm September 15, 2025 11:04

llvmbot added vectorizers llvm:transforms labels Sep 15, 2025

artagnon reviewed Sep 15, 2025

View reviewed changes

sdesmalen-arm reviewed Sep 16, 2025

View reviewed changes

SamTebbs33 approved these changes Sep 19, 2025

View reviewed changes

fhahn added 2 commits September 24, 2025 11:23

Merge remote-tracking branch 'origin/main' into vplan-partial-reducti…

3608890

…ons-non-narrowing

!fixup update after merge, rename test

b9c39a1

!fixup fix formatting

6240599

fhahn merged commit a7b4dd4 into llvm:main Sep 24, 2025
7 of 9 checks passed

fhahn deleted the vplan-partial-reductions-non-narrowing branch September 24, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Don't create partial reductions if factor doesn't match accumulator #158603

[LV] Don't create partial reductions if factor doesn't match accumulator #158603

Uh oh!

fhahn commented Sep 15, 2025

Uh oh!

llvmbot commented Sep 15, 2025 •

edited

Loading

Uh oh!

artagnon left a comment

Uh oh!

artagnon Sep 15, 2025

Uh oh!

lukel97 Sep 16, 2025

Uh oh!

fhahn Sep 16, 2025

Uh oh!

sdesmalen-arm Sep 16, 2025

Uh oh!

fhahn Sep 16, 2025

Uh oh!

sdesmalen-arm Sep 16, 2025

Uh oh!

fhahn Sep 16, 2025

Uh oh!

SamTebbs33 left a comment

Uh oh!

SamTebbs33 Sep 19, 2025

Uh oh!

fhahn Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[LV] Don't create partial reductions if factor doesn't match accumulator #158603

[LV] Don't create partial reductions if factor doesn't match accumulator #158603

Uh oh!

Conversation

fhahn commented Sep 15, 2025

Uh oh!

llvmbot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Sep 15, 2025 •

edited

Loading

github-actions bot commented Sep 24, 2025 •

edited

Loading