Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLP] Compute a shuffle mask for getGatherCost #85330

Merged
merged 6 commits into from
Mar 15, 2024

Conversation

preames
Copy link
Collaborator

@preames preames commented Mar 14, 2024

This is the second of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask.

This is the second of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 14, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Philip Reames (preames)

Changes

This is the second of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask.


Full diff: https://github.com/llvm/llvm-project/pull/85330.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+4-2)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index b4cce680e2876f..502c7a3d6bae39 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -10516,6 +10516,7 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
           TTI->getVectorInstrCost(Instruction::InsertElement, VecTy, CostKind,
                                   I, Constant::getNullValue(VecTy), V);
   };
+  SmallVector<int> ShuffleMask(VL.size(), -1);
   for (unsigned I = 0, E = VL.size(); I < E; ++I) {
     Value *V = VL[I];
     // No need to shuffle duplicates for constants.
@@ -10526,6 +10527,7 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
     if (!UniqueElements.insert(V).second) {
       DuplicateNonConst = true;
       ShuffledElements.setBit(I);
+      ShuffleMask[I] = I;
       continue;
     }
     EstimateInsertCost(I, V);
@@ -10535,8 +10537,8 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
         TTI->getScalarizationOverhead(VecTy, ~ShuffledElements, /*Insert*/ true,
                                       /*Extract*/ false, CostKind);
   if (DuplicateNonConst)
-    Cost +=
-        TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
+    Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
+                                VecTy, ShuffleMask);
   return Cost;
 }
 

Copy link

github-actions bot commented Mar 14, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 33960c90258ed78b9b877b1a43e219d1cbc2efce 41152f6f711e82883625e0042d79d34d516d8db8 -- llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 10f382f248..55beebbb33 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -10542,8 +10542,8 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
         TTI->getScalarizationOverhead(VecTy, ~ShuffledElements, /*Insert*/ true,
                                       /*Extract*/ false, CostKind);
   if (DuplicateNonConst)
-    Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
-                                VecTy, ShuffleMask);
+    Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy,
+                                ShuffleMask);
   return Cost;
 }
 

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp Outdated Show resolved Hide resolved
@@ -10526,6 +10527,7 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
if (!UniqueElements.insert(V).second) {
DuplicateNonConst = true;
ShuffledElements.setBit(I);
ShuffleMask[I] = I;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look correct. You generated identity mask here, which is not. Need to record the position of the first unique value and then use it here as index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly it is time for me to stop for the day. I wrote the correct code here - and then managed to commit the wrong version. Oops. Will push a corrected version tomorrow.

preames and others added 3 commits March 15, 2024 07:56
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

@preames
Copy link
Collaborator Author

preames commented Mar 15, 2024

Can you add a test?

I have no idea how to construct an SLP test which exercises this.

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@preames preames merged commit 0674ed7 into llvm:main Mar 15, 2024
2 of 4 checks passed
@preames preames deleted the pr-slp-mask-in-getGatherCost branch March 15, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants