[SLP] no need to generate extract for in-tree uses for original scala… #76077

Enna1 · 2023-12-20T17:12:40Z

…r instruction.

Before 77a609b,
we always skip in-tree uses of the vectorized scalars in buildExternalUses(),
that commit handles the case that if the in-tree use is scalar operand in vectorized instruction,
we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:

The pointer operand of vectorized LoadInst uses an in-tree scalar
The pointer operand of vectorized StoreInst uses an in-tree scalar
The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are implemented in BoUpSLP::vectorizeTree():

However, 77a609b
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and get erased later.

This patch replaces extracts for original scalar instructions with corresponding vectorized instructions,
and remove

llvmbot · 2023-12-20T17:13:09Z

@llvm/pr-subscribers-llvm-transforms

Author: Enna1 (Enna1)

Changes

…r instruction.

Before 77a609b, we always skip in-tree uses of the vectorized scalars in buildExternalUses(), that commit handle the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:

The pointer operand of vectorized LoadInst uses an in-tree scalar
The pointer operand of vectorized StoreInst uses an in-tree scalar
The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are implemented in BoUpSLP::vectorizeTree():

However, 77a609b not only generates extract for vectorized instructions, but also generate extract for original scalar instructions.

There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. Extract for origin scalar instrutions are also generated because in BoUpSLP::buildExternalUses() if doesInTreeUserNeedToExtract() return true, <in-tree scalar, scalar instruction use in-tree scalar> will be pushed to ExternalUses.

To omit generating extract for original scalar instruction, this patch remove doesInTreeUserNeedToExtract() check, and fold the follwing if expression to always true.

if (UseScalar != U ||
    UseEntry-&gt;State == TreeEntry::ScatterVectorize ||
    UseEntry-&gt;State == TreeEntry::PossibleStridedVectorize ||
    !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI))

With this change, it is also more likely to be profitable to vectorize since we remove the unneed entries in ExternalUses and get less extraction cost. E.g. the llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll testcase is updated as the test2() function is successfully vectorized with this change.

Full diff: https://github.com/llvm/llvm-project/pull/76077.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+4-40)
(modified) llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll (+8-11)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5c325ad8a291a2..4e52421c287250 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -771,33 +771,6 @@ static bool allSameType(ArrayRef<Value *> VL) {
   return all_of(VL.drop_front(), [&](Value *V) { return V->getType() == Ty; });
 }
 
-/// \returns True if in-tree use also needs extract. This refers to
-/// possible scalar operand in vectorized instruction.
-static bool doesInTreeUserNeedToExtract(Value *Scalar, Instruction *UserInst,
-                                        TargetLibraryInfo *TLI) {
-  unsigned Opcode = UserInst->getOpcode();
-  switch (Opcode) {
-  case Instruction::Load: {
-    LoadInst *LI = cast<LoadInst>(UserInst);
-    return (LI->getPointerOperand() == Scalar);
-  }
-  case Instruction::Store: {
-    StoreInst *SI = cast<StoreInst>(UserInst);
-    return (SI->getPointerOperand() == Scalar);
-  }
-  case Instruction::Call: {
-    CallInst *CI = cast<CallInst>(UserInst);
-    Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
-    return any_of(enumerate(CI->args()), [&](auto &&Arg) {
-      return isVectorIntrinsicWithScalarOpAtArg(ID, Arg.index()) &&
-             Arg.value().get() == Scalar;
-    });
-  }
-  default:
-    return false;
-  }
-}
-
 /// \returns the AA location that is being access by the instruction.
 static MemoryLocation getLocation(Instruction *I) {
   if (StoreInst *SI = dyn_cast<StoreInst>(I))
@@ -4933,19 +4906,10 @@ void BoUpSLP::buildExternalUses(
 
         // Skip in-tree scalars that become vectors
         if (TreeEntry *UseEntry = getTreeEntry(U)) {
-          Value *UseScalar = UseEntry->Scalars[0];
-          // Some in-tree scalars will remain as scalar in vectorized
-          // instructions. If that is the case, the one in Lane 0 will
-          // be used.
-          if (UseScalar != U ||
-              UseEntry->State == TreeEntry::ScatterVectorize ||
-              UseEntry->State == TreeEntry::PossibleStridedVectorize ||
-              !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
-            LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
-                              << ".\n");
-            assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
-            continue;
-          }
+          LLVM_DEBUG(dbgs()
+                     << "SLP: \tInternal user will be removed:" << *U << ".\n");
+          assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
+          continue;
         }
 
         // Ignore users in the user ignore list.
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
index a2c2e41d81b426..e2adeb911670a5 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
@@ -52,17 +52,14 @@ define void @test(ptr %r, ptr %p, ptr %q) #0 {
 
 define void @test2(i64* %a, i64* %b) {
 ; CHECK-LABEL: @test2(
-; CHECK-NEXT:    [[A1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 1
-; CHECK-NEXT:    [[A2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 2
-; CHECK-NEXT:    [[I1:%.*]] = ptrtoint ptr [[A1]] to i64
-; CHECK-NEXT:    [[B3:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 3
-; CHECK-NEXT:    [[I2:%.*]] = ptrtoint ptr [[B3]] to i64
-; CHECK-NEXT:    [[V1:%.*]] = load i64, ptr [[A1]], align 8
-; CHECK-NEXT:    [[V2:%.*]] = load i64, ptr [[A2]], align 8
-; CHECK-NEXT:    [[ADD1:%.*]] = add i64 [[I1]], [[V1]]
-; CHECK-NEXT:    [[ADD2:%.*]] = add i64 [[I2]], [[V2]]
-; CHECK-NEXT:    store i64 [[ADD1]], ptr [[A1]], align 8
-; CHECK-NEXT:    store i64 [[ADD2]], ptr [[A2]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x ptr> poison, ptr [[A:%.*]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x ptr> [[TMP1]], ptr [[B:%.*]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i64, <2 x ptr> [[TMP2]], <2 x i64> <i64 1, i64 3>
+; CHECK-NEXT:    [[TMP4:%.*]] = ptrtoint <2 x ptr> [[TMP3]] to <2 x i64>
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0
+; CHECK-NEXT:    [[TMP6:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
+; CHECK-NEXT:    [[TMP7:%.*]] = add <2 x i64> [[TMP4]], [[TMP6]]
+; CHECK-NEXT:    store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8
 ; CHECK-NEXT:    ret void
 ;
   %a1 = getelementptr inbounds i64, i64* %a, i64 1

alexey-bataev · 2023-12-20T17:45:09Z

That does not look correct, since it excludes these extracts from the cost estimation model. What I would suggest instead is to fix this check and remove
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667

extracts.

Enna1 · 2023-12-21T07:35:07Z

That does not look correct, since it excludes these extracts from the cost estimation model.

Thanks for pointing out this!
I got it. I missed that the ExternalUser added in buildExternalUses() will do cost estimation, but the added ExternalUser added in vectorizeTree() will not.

I have updated the patch, replace extracts for original scalar instructions with corresponding vectorized instructions,
and remove
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667
extracts.

PTAL, thanks!

alexey-bataev · 2023-12-21T13:25:57Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+    // Generate extract for in-tree uses if the use is scalar operand in
+    // vectorized instruction.
+    if (auto *UserVecTE = getTreeEntry(User))
+      if (doesInTreeUserNeedToExtract(Scalar, cast<Instruction>(User), TLI))
+        User = cast<llvm::User>(UserVecTE->VectorizedValue);
+


Why need this? I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.

Why need this?

Adding new vectorized load/store/call that use in-tree scalar to ExternalUses in BoUpSLP::vectorizeTree() is removed, we still should generate extract for these new added vectorized load/store/call. As we add the original scalar load/store/call to ExternalUses in buildingExtractUsers(), so I add a check here, generate extract for new added vectorized load/store/call instead of the scalar load/store/call that will be erased later.

Take a llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll as an example.

define i32 @fn1() { entry: %0 = load ptr, ptr @a, align 8 %add.ptr = getelementptr inbounds i64, ptr %0, i64 11 %1 = ptrtoint ptr %add.ptr to i64 store i64 %1, ptr %add.ptr, align 8 %add.ptr1 = getelementptr inbounds i64, ptr %0, i64 56 %2 = ptrtoint ptr %add.ptr1 to i64 %arrayidx2 = getelementptr inbounds i64, ptr %0, i64 12 store i64 %2, ptr %arrayidx2, align 8 ; store <2 x i64> %5, ptr %add.ptr, align 8 ret i32 undef }

Without this change, when generating extracts, ExternalUses have
{%add.ptr, store i64 %1, ptr %add.ptr, align 8} and
{%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8},
what this patch want to do is only generate extract for {%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8} since there is no need to generate extract for {%add.ptr, store i64 %1, ptr %add.ptr, align 8}.

I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.

Sorry, I don't quite understand this, could you please explain this a bit more?

Ok, I see why you need this.
Instead, I suggest another approach. When building the external uses entry for such scalars, just mark that there is no exact user. In this case all uses of this scalar will be automatically replaced by extractelement and you don't need this change anymore.
But I think some adjustment in the original code also should be done.

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index 5c325ad8a291..e66e2a7d110e 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -4928,33 +4928,33 @@ void BoUpSLP::buildExternalUses( if (!UserInst) continue; - if (isDeleted(UserInst)) - continue; - // Skip in-tree scalars that become vectors if (TreeEntry *UseEntry = getTreeEntry(U)) { - Value *UseScalar = UseEntry->Scalars[0]; // Some in-tree scalars will remain as scalar in vectorized - // instructions. If that is the case, the one in Lane 0 will + // instructions. If that is the case, the one in FoundLane will // be used. - if (UseScalar != U || - UseEntry->State == TreeEntry::ScatterVectorize || + if (UseEntry->State == TreeEntry::ScatterVectorize || UseEntry->State == TreeEntry::PossibleStridedVectorize || - !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) { + !doesInTreeUserNeedToExtract( + Scalar, cast<Instruction>(UseEntry->Scalars.front()), TLI)) { LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U << ".\n"); assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state"); continue; } + U = nullptr; } + if (isDeleted(UserInst)) + continue; + // Ignore users in the user ignore list. if (UserIgnoreList && UserIgnoreList->contains(UserInst)) continue; LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " << Lane << " from " << *Scalar << ".\n"); - ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane)); + ExternalUses.emplace_back(Scalar, U, FoundLane); } } }

Enna1 · 2023-12-25T16:02:41Z

Thanks for your kind response and detailed explanation.

I have updated the patch based on your diff.
I don't like the way how I relax the assertion https://github.com/llvm/llvm-project/pull/76077/files#diff-e3b933e8c46156c0eaf7cbb67866b712f69a8484bc941d10acec9d4d17b9f061L11954-L11956, But I didn't come up a better way.

PTAL.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

…r instruction. Before 77a609b, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, 77a609b not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch replaces extracts for original scalar instructions with corresponding vectorized instructions, and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.

alexey-bataev · 2023-12-26T17:21:40Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+        continue;
+      assert((ExternallyUsedValues.count(Scalar) ||
+              any_of(Scalar->users(),
+                     [this, Scalar](llvm::User *U) {


Suggested change

[this, Scalar](llvm::User *U) {

[&](User *U) {

alexey-bataev · 2023-12-26T17:22:38Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+              any_of(Scalar->users(),
+                     [this, Scalar](llvm::User *U) {
+                       TreeEntry *UseEntry = getTreeEntry(U);
+                       return UseEntry &&


Suggested change

return UseEntry &&

return UseEntry && UseEntry->State != TreeEntry::Vectorize &&

I suppose this should be UseEntry->State == TreeEntry::Vectorize ?

Yes, looks so, also should be E->State != TreeEntry::Vectorize

updated to UseEntry && UseEntry->State == TreeEntry::Vectorize && E->State == TreeEntry::Vectorize && doesInTreeUserNeedToExtract()

github-actions · 2023-12-27T02:10:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot added vectorization llvm:transforms labels Dec 20, 2023

Enna1 requested a review from alexey-bataev December 20, 2023 17:13

Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from f8c9f75 to 40eba8a Compare December 21, 2023 07:26

alexey-bataev reviewed Dec 21, 2023

View reviewed changes

Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from 40eba8a to dd0dd25 Compare December 25, 2023 15:56

Enna1 commented Dec 25, 2023

View reviewed changes

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp Outdated Show resolved Hide resolved

Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from dd0dd25 to 335b5b8 Compare December 26, 2023 07:13

alexey-bataev reviewed Dec 26, 2023

View reviewed changes

update from comment

721914d

Enna1 added 2 commits December 27, 2023 10:12

clang-format

6533e84

update from comment

58c6692

Enna1 requested a review from alexey-bataev December 28, 2023 15:28

alexey-bataev approved these changes Dec 29, 2023

View reviewed changes

Enna1 merged commit a51c2f3 into main Dec 30, 2023
4 checks passed

Enna1 deleted the users/Enna1/slp-extract-in-tree-user branch December 30, 2023 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLP] no need to generate extract for in-tree uses for original scala… #76077

[SLP] no need to generate extract for in-tree uses for original scala… #76077

Enna1 commented Dec 20, 2023 •

edited

llvmbot commented Dec 20, 2023

alexey-bataev commented Dec 20, 2023

Enna1 commented Dec 21, 2023

alexey-bataev Dec 21, 2023

Enna1 Dec 22, 2023

alexey-bataev Dec 22, 2023 •

edited

Enna1 commented Dec 25, 2023 •

edited

alexey-bataev Dec 26, 2023

alexey-bataev Dec 26, 2023

Enna1 Dec 27, 2023

alexey-bataev Dec 27, 2023

Enna1 Dec 29, 2023

github-actions bot commented Dec 27, 2023 •

edited

	return UseEntry &&
	return UseEntry && UseEntry->State != TreeEntry::Vectorize &&

[SLP] no need to generate extract for in-tree uses for original scala… #76077

[SLP] no need to generate extract for in-tree uses for original scala… #76077

Conversation

Enna1 commented Dec 20, 2023 • edited

llvmbot commented Dec 20, 2023

alexey-bataev commented Dec 20, 2023

Enna1 commented Dec 21, 2023

alexey-bataev Dec 21, 2023

Choose a reason for hiding this comment

Enna1 Dec 22, 2023

Choose a reason for hiding this comment

alexey-bataev Dec 22, 2023 • edited

Choose a reason for hiding this comment

Enna1 commented Dec 25, 2023 • edited

alexey-bataev Dec 26, 2023

Choose a reason for hiding this comment

alexey-bataev Dec 26, 2023

Choose a reason for hiding this comment

Enna1 Dec 27, 2023

Choose a reason for hiding this comment

alexey-bataev Dec 27, 2023

Choose a reason for hiding this comment

Enna1 Dec 29, 2023

Choose a reason for hiding this comment

github-actions bot commented Dec 27, 2023 • edited

Enna1 commented Dec 20, 2023 •

edited

alexey-bataev Dec 22, 2023 •

edited

Enna1 commented Dec 25, 2023 •

edited

github-actions bot commented Dec 27, 2023 •

edited