Skip to content

Conversation

SamTebbs33
Copy link
Collaborator

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-vectorizers

Author: Sam Tebbs (SamTebbs33)

Changes

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464


Full diff: https://github.com/llvm/llvm-project/pull/156976.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+6-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+47)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
                            {Ext0, Ext1, Mul, Red}) {}
 
   ~VPExpressionRecipe() override {
-    for (auto *R : reverse(ExpressionRecipes))
-      delete R;
+    SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+    for (auto *R : reverse(ExpressionRecipes)) {
+      if (ExpressionRecipesSeen.insert(R).second)
+        delete R;
+    }
     for (VPValue *T : LiveInPlaceholders)
       delete T;
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
     ExpressionTypes ExpressionType,
     ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
     : VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
-      ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
-                            ExpressionRecipes.begin(), ExpressionRecipes.end())
-                            .takeVector()),
+      ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+          ExpressionRecipes.begin(), ExpressionRecipes.end())),
       ExpressionType(ExpressionType) {
   assert(!ExpressionRecipes.empty() && "Nothing to combine?");
   assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
       R->removeFromParent();
   }
 
+  // Keep track of how many instances of each recipe occur in the recipe list
+  SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+  for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    if (F == ExpressionRecipeCounts.end())
+      ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+    else
+      F->second++;
+  }
+
   // Internalize all external operands to the expression recipes. To do so,
   // create new temporary VPValues for all operands defined by a recipe outside
   // the expression. The original operands are added as operands of the
   // VPExpressionRecipe itself.
   for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    F->second--;
     for (const auto &[Idx, Op] : enumerate(R->operands())) {
       auto *Def = Op->getDefiningRecipe();
       if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
         continue;
       addOperand(Op);
-      LiveInPlaceholders.push_back(new VPValue());
-      R->setOperand(Idx, LiveInPlaceholders.back());
+      auto *Tmp = new VPValue();
+      Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+      LiveInPlaceholders.push_back(Tmp);
+      // Only modify this recipe's operands if it's the last time it occurs in
+      // the recipe list
+      if (F->second == 0)
+        R->setOperand(Idx, Tmp);
     }
   }
 }
 
 void VPExpressionRecipe::decompose() {
   for (auto *R : ExpressionRecipes)
-    R->insertBefore(this);
+    if (!R->getParent())
+      R->insertBefore(this);
 
   for (const auto &[Idx, Op] : enumerate(operands()))
     LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
   %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
   ret i64 %r.0.lcssa
 }
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK:      VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK:      vector.ph:
+; CHECK-NEXT:   EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT:     WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT:     vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT:     CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT:     vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT:     WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT:     EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT:     EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+  %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+  %load0 = load i16, ptr %arrayidx, align 4
+  %conv0 = sext i16 %load0 to i32
+  %mul = mul nsw i32 %conv0, %conv0
+  %conv = sext i32 %mul to i64
+  %rdx.next = sub nsw i64 %rdx, %conv
+  %iv.next = add nuw nsw i32 %iv, 1
+  %exitcond = icmp eq i32 %iv.next, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+  ret i64 %r.0.lcssa
+}

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Sam Tebbs (SamTebbs33)

Changes

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464


Full diff: https://github.com/llvm/llvm-project/pull/156976.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+6-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+47)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
                            {Ext0, Ext1, Mul, Red}) {}
 
   ~VPExpressionRecipe() override {
-    for (auto *R : reverse(ExpressionRecipes))
-      delete R;
+    SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+    for (auto *R : reverse(ExpressionRecipes)) {
+      if (ExpressionRecipesSeen.insert(R).second)
+        delete R;
+    }
     for (VPValue *T : LiveInPlaceholders)
       delete T;
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
     ExpressionTypes ExpressionType,
     ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
     : VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
-      ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
-                            ExpressionRecipes.begin(), ExpressionRecipes.end())
-                            .takeVector()),
+      ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+          ExpressionRecipes.begin(), ExpressionRecipes.end())),
       ExpressionType(ExpressionType) {
   assert(!ExpressionRecipes.empty() && "Nothing to combine?");
   assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
       R->removeFromParent();
   }
 
+  // Keep track of how many instances of each recipe occur in the recipe list
+  SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+  for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    if (F == ExpressionRecipeCounts.end())
+      ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+    else
+      F->second++;
+  }
+
   // Internalize all external operands to the expression recipes. To do so,
   // create new temporary VPValues for all operands defined by a recipe outside
   // the expression. The original operands are added as operands of the
   // VPExpressionRecipe itself.
   for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    F->second--;
     for (const auto &[Idx, Op] : enumerate(R->operands())) {
       auto *Def = Op->getDefiningRecipe();
       if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
         continue;
       addOperand(Op);
-      LiveInPlaceholders.push_back(new VPValue());
-      R->setOperand(Idx, LiveInPlaceholders.back());
+      auto *Tmp = new VPValue();
+      Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+      LiveInPlaceholders.push_back(Tmp);
+      // Only modify this recipe's operands if it's the last time it occurs in
+      // the recipe list
+      if (F->second == 0)
+        R->setOperand(Idx, Tmp);
     }
   }
 }
 
 void VPExpressionRecipe::decompose() {
   for (auto *R : ExpressionRecipes)
-    R->insertBefore(this);
+    if (!R->getParent())
+      R->insertBefore(this);
 
   for (const auto &[Idx, Op] : enumerate(operands()))
     LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
   %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
   ret i64 %r.0.lcssa
 }
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK:      VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK:      vector.ph:
+; CHECK-NEXT:   EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT:     WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT:     vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT:     CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT:     vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT:     WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT:     EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT:     EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+  %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+  %load0 = load i16, ptr %arrayidx, align 4
+  %conv0 = sext i16 %load0 to i32
+  %mul = mul nsw i32 %conv0, %conv0
+  %conv = sext i32 %mul to i64
+  %rdx.next = sub nsw i64 %rdx, %conv
+  %iv.next = add nuw nsw i32 %iv, 1
+  %exitcond = icmp eq i32 %iv.next, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+  ret i64 %r.0.lcssa
+}

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

Comment on lines 2801 to 2822
Tmp->setUnderlyingValue(Op->getUnderlyingValue());
LiveInPlaceholders.push_back(Tmp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set the underlying value now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed to fix a crash early on but I don't think it's needed now so I've removed it, thanks.

@SamTebbs33
Copy link
Collaborator Author

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

The VPExpressionRecipe class uses a set to store its bundled recipes. If
repeated recipes are bundled then the duplicates will be lost, causing
the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, if the extends are the
same then the 3rd element of the set will the the reduction, rather than
the expected mul, causing a cast error. With this change, the recipes
are at the expected index in the set.

Fixes llvm#156464
@SamTebbs33 SamTebbs33 force-pushed the vpexpression-keep-duplicates branch from e6c89c3 to 2300506 Compare September 19, 2025 12:27
}

// Keep track of how many instances of each recipe occur in the recipe list
SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about a slightly different case, where the same (external) input operand is shared by different expression recipes. I don't think there would be an easy way to test that at the moment, but I can see that happening with other kinds of expressions in the future.

Comment on lines 2825 to 2826
if (F->second == 0)
R->setOperand(Idx, Tmp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to keep a map of Op -> new VPValue() nodes, then you can just do:

LiveInPlaceholders.push_back(new VPValue());
MyMap[Op] = LiveInPlaceholders.back();

and then have a loop at the end to replace all values of Op by their LiveInPlaceholder (i.e. new VPValue()) values for each R in ExpressionRecipes.

That would also fix the potential issue I mentioned above, where same input operand is shared by multiple recipes passed to the expression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice. It also means we don't have to keep track of the recipe counts. Done.

delete R;
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
for (auto *R : reverse(ExpressionRecipes)) {
if (ExpressionRecipesSeen.insert(R).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the comment for SmallVector<VPSingleDefRecipe *> ExpressionRecipes (line 2973) that this vector may contain duplicates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did add it but must not have staged it.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout; ideally there would be no need to look up the individual entries in the expression list.

At the moment this just impacts printing. IMO it would be better to just have a generic print function that does not inspect the entries directly, but instead prints something like

EXPRESSION vp<%x> = ExtendedReduction operands {
  bundled recipes...
}

@SamTebbs33
Copy link
Collaborator Author

SamTebbs33 commented Sep 22, 2025

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout; ideally there would be no need to look up the individual entries in the expression list.

At the moment this just impacts printing. IMO it would be better to just have a generic print function that does not inspect the entries directly, but instead prints something like

EXPRESSION vp<%x> = ExtendedReduction operands {
  bundled recipes...
}

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

I don't quite see why it's bad to get the recipe list correct from the very beginning? Surely that's worth a little bit of complexity in the constructor?

@sdesmalen-arm
Copy link
Collaborator

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Sep 22, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. llvm#156976
2. -> This
3. llvm#147302
@fhahn
Copy link
Contributor

fhahn commented Sep 23, 2025

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

But can't this also be avoided naturally, by just getting the needed extend from the reduction/result recipe which is the root of the pattern, instead of relying on a specific order of operands? In the future, the operands to a reduction pattern could also contain live-ins (e.g. constant), and in that case there would be be no recipe at all.

I don't quite see why it's bad to get the recipe list correct from the very beginning?

I guess it depends on what 'correct' means here. Currently it is simply the list of recipes that are bundled together.

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

@SamTebbs33
Copy link
Collaborator Author

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

But can't this also be avoided naturally, by just getting the needed extend from the reduction/result recipe which is the root of the pattern, instead of relying on a specific order of operands? In the future, the operands to a reduction pattern could also contain live-ins (e.g. constant), and in that case there would be be no recipe at all.

I don't quite see why it's bad to get the recipe list correct from the very beginning?

I guess it depends on what 'correct' means here. Currently it is simply the list of recipes that are bundled together.

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

The problem with checking the root instruction is that you'll have to repeat the pattern matching that was already done in the transform pass. Whereas with this solution you can rely on them being in the recipe list so can fetch them from there, without re-doing any work. With Sander's placeholder caching suggestion, the code has become a lot simpler so I hope that's acceptable.

Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

I guess if you're reasoning from the idea that we'll only ever have this limited list of VPexpressions, then I guess this adds (marginally) more logic. But if we'd add a new expression that takes 4 external operands where expression recipes 0 and 1 could be the same, 1 and 2 could be the same, or 2 or 3 could be the same, or any other combination, then it would all get very confusing to figure out what the meaning of the values in ExpressionRecipe is. To me it therefore makes more sense for a VPExpression operation to always have the expected number of external operands and to take a little bit of care when needing to map that back to the expressions when decomposing in case there was a duplicate.

delete R;
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
for (auto *R : reverse(ExpressionRecipes)) {
if (ExpressionRecipesSeen.insert(R).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not done?

addOperand(Op);
LiveInPlaceholders.push_back(new VPValue());
R->setOperand(Idx, LiveInPlaceholders.back());
if (OperandPlaceholders.find(Op) == OperandPlaceholders.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for duplicates, you can just do:

LiveInPlaceholders.push_back(new VPValue());                                                    
OperandPlaceholders[Op] = LiveInPlaceholders.back();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I've made a variable for the placeholder so we don't have to call back().

Comment on lines 2820 to 2826
for (auto *R : ExpressionRecipes) {
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Entry = OperandPlaceholders.find(Op);
if (Entry != OperandPlaceholders.end())
R->setOperand(Idx, Entry->second);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be:

Suggested change
for (auto *R : ExpressionRecipes) {
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Entry = OperandPlaceholders.find(Op);
if (Entry != OperandPlaceholders.end())
R->setOperand(Idx, Entry->second);
}
}
for (auto *R : ExpressionRecipes)
for (auto &KV : OperandPlaceholders)
R->replaceUsesOfWith(KV.first, KV.second);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks.

}
SmallSet<VPValue *, 4> PlaceholdersSeen;
for (VPValue *T : LiveInPlaceholders) {
if (PlaceholdersSeen.insert(T).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for duplicates, there is a placeholder value for each operand, regardless of whether they're duplicate or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm caching the placeholder for each operand so that duplicate operands share the same placeholder, so we do need to care about duplicates here to avoid deleting placeholders that have already been deleted. But I've changed it so that the same operands don't share a placeholder.

void VPExpressionRecipe::decompose() {
for (auto *R : ExpressionRecipes)
R->insertBefore(this);
if (!R->getParent())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment explaining why the !R->getParent().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VPExpressionRecipe discards duplicate recipes.
4 participants