Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLP] no need to generate extract for in-tree uses for original scala… #76077

Merged
merged 4 commits into from
Dec 30, 2023

Conversation

Enna1
Copy link
Contributor

@Enna1 Enna1 commented Dec 20, 2023

…r instruction.

Before 77a609b,
we always skip in-tree uses of the vectorized scalars in buildExternalUses(),
that commit handles the case that if the in-tree use is scalar operand in vectorized instruction,
we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:

  • The pointer operand of vectorized LoadInst uses an in-tree scalar
  • The pointer operand of vectorized StoreInst uses an in-tree scalar
  • The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are implemented in BoUpSLP::vectorizeTree():

However, 77a609b
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and get erased later.

This patch replaces extracts for original scalar instructions with corresponding vectorized instructions,
and remove

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 20, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Enna1 (Enna1)

Changes

…r instruction.

Before 77a609b, we always skip in-tree uses of the vectorized scalars in buildExternalUses(), that commit handle the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:

  • The pointer operand of vectorized LoadInst uses an in-tree scalar
  • The pointer operand of vectorized StoreInst uses an in-tree scalar
  • The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are implemented in BoUpSLP::vectorizeTree():

However, 77a609b not only generates extract for vectorized instructions, but also generate extract for original scalar instructions.

There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. Extract for origin scalar instrutions are also generated because in BoUpSLP::buildExternalUses() if doesInTreeUserNeedToExtract() return true, <in-tree scalar, scalar instruction use in-tree scalar> will be pushed to ExternalUses.

To omit generating extract for original scalar instruction, this patch remove doesInTreeUserNeedToExtract() check, and fold the follwing if expression to always true.

if (UseScalar != U ||
    UseEntry-&gt;State == TreeEntry::ScatterVectorize ||
    UseEntry-&gt;State == TreeEntry::PossibleStridedVectorize ||
    !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI))

With this change, it is also more likely to be profitable to vectorize since we remove the unneed entries in ExternalUses and get less extraction cost. E.g. the llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll testcase is updated as the test2() function is successfully vectorized with this change.


Full diff: https://github.com/llvm/llvm-project/pull/76077.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+4-40)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll (+8-11)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5c325ad8a291a2..4e52421c287250 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -771,33 +771,6 @@ static bool allSameType(ArrayRef<Value *> VL) {
   return all_of(VL.drop_front(), [&](Value *V) { return V->getType() == Ty; });
 }
 
-/// \returns True if in-tree use also needs extract. This refers to
-/// possible scalar operand in vectorized instruction.
-static bool doesInTreeUserNeedToExtract(Value *Scalar, Instruction *UserInst,
-                                        TargetLibraryInfo *TLI) {
-  unsigned Opcode = UserInst->getOpcode();
-  switch (Opcode) {
-  case Instruction::Load: {
-    LoadInst *LI = cast<LoadInst>(UserInst);
-    return (LI->getPointerOperand() == Scalar);
-  }
-  case Instruction::Store: {
-    StoreInst *SI = cast<StoreInst>(UserInst);
-    return (SI->getPointerOperand() == Scalar);
-  }
-  case Instruction::Call: {
-    CallInst *CI = cast<CallInst>(UserInst);
-    Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
-    return any_of(enumerate(CI->args()), [&](auto &&Arg) {
-      return isVectorIntrinsicWithScalarOpAtArg(ID, Arg.index()) &&
-             Arg.value().get() == Scalar;
-    });
-  }
-  default:
-    return false;
-  }
-}
-
 /// \returns the AA location that is being access by the instruction.
 static MemoryLocation getLocation(Instruction *I) {
   if (StoreInst *SI = dyn_cast<StoreInst>(I))
@@ -4933,19 +4906,10 @@ void BoUpSLP::buildExternalUses(
 
         // Skip in-tree scalars that become vectors
         if (TreeEntry *UseEntry = getTreeEntry(U)) {
-          Value *UseScalar = UseEntry->Scalars[0];
-          // Some in-tree scalars will remain as scalar in vectorized
-          // instructions. If that is the case, the one in Lane 0 will
-          // be used.
-          if (UseScalar != U ||
-              UseEntry->State == TreeEntry::ScatterVectorize ||
-              UseEntry->State == TreeEntry::PossibleStridedVectorize ||
-              !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
-            LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
-                              << ".\n");
-            assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
-            continue;
-          }
+          LLVM_DEBUG(dbgs()
+                     << "SLP: \tInternal user will be removed:" << *U << ".\n");
+          assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
+          continue;
         }
 
         // Ignore users in the user ignore list.
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
index a2c2e41d81b426..e2adeb911670a5 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
@@ -52,17 +52,14 @@ define void @test(ptr %r, ptr %p, ptr %q) #0 {
 
 define void @test2(i64* %a, i64* %b) {
 ; CHECK-LABEL: @test2(
-; CHECK-NEXT:    [[A1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 1
-; CHECK-NEXT:    [[A2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 2
-; CHECK-NEXT:    [[I1:%.*]] = ptrtoint ptr [[A1]] to i64
-; CHECK-NEXT:    [[B3:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 3
-; CHECK-NEXT:    [[I2:%.*]] = ptrtoint ptr [[B3]] to i64
-; CHECK-NEXT:    [[V1:%.*]] = load i64, ptr [[A1]], align 8
-; CHECK-NEXT:    [[V2:%.*]] = load i64, ptr [[A2]], align 8
-; CHECK-NEXT:    [[ADD1:%.*]] = add i64 [[I1]], [[V1]]
-; CHECK-NEXT:    [[ADD2:%.*]] = add i64 [[I2]], [[V2]]
-; CHECK-NEXT:    store i64 [[ADD1]], ptr [[A1]], align 8
-; CHECK-NEXT:    store i64 [[ADD2]], ptr [[A2]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x ptr> poison, ptr [[A:%.*]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x ptr> [[TMP1]], ptr [[B:%.*]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i64, <2 x ptr> [[TMP2]], <2 x i64> <i64 1, i64 3>
+; CHECK-NEXT:    [[TMP4:%.*]] = ptrtoint <2 x ptr> [[TMP3]] to <2 x i64>
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0
+; CHECK-NEXT:    [[TMP6:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
+; CHECK-NEXT:    [[TMP7:%.*]] = add <2 x i64> [[TMP4]], [[TMP6]]
+; CHECK-NEXT:    store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8
 ; CHECK-NEXT:    ret void
 ;
   %a1 = getelementptr inbounds i64, i64* %a, i64 1

@alexey-bataev
Copy link
Member

@Enna1 Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from f8c9f75 to 40eba8a Compare December 21, 2023 07:26
@Enna1
Copy link
Contributor Author

Enna1 commented Dec 21, 2023

That does not look correct, since it excludes these extracts from the cost estimation model.

Thanks for pointing out this!
I got it. I missed that the ExternalUser added in buildExternalUses() will do cost estimation, but the added ExternalUser added in vectorizeTree() will not.

I have updated the patch, replace extracts for original scalar instructions with corresponding vectorized instructions,
and remove
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667
extracts.

PTAL, thanks!

Comment on lines 11867 to 11872
// Generate extract for in-tree uses if the use is scalar operand in
// vectorized instruction.
if (auto *UserVecTE = getTreeEntry(User))
if (doesInTreeUserNeedToExtract(Scalar, cast<Instruction>(User), TLI))
User = cast<llvm::User>(UserVecTE->VectorizedValue);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this? I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this?

Adding new vectorized load/store/call that use in-tree scalar to ExternalUses in BoUpSLP::vectorizeTree() is removed, we still should generate extract for these new added vectorized load/store/call. As we add the original scalar load/store/call to ExternalUses in buildingExtractUsers(), so I add a check here, generate extract for new added vectorized load/store/call instead of the scalar load/store/call that will be erased later.

Take a llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll as an example.

define i32 @fn1() {
entry:
  %0 = load ptr, ptr @a, align 8
  %add.ptr = getelementptr inbounds i64, ptr %0, i64 11
  %1 = ptrtoint ptr %add.ptr to i64
  store i64 %1, ptr %add.ptr, align 8
  %add.ptr1 = getelementptr inbounds i64, ptr %0, i64 56
  %2 = ptrtoint ptr %add.ptr1 to i64
  %arrayidx2 = getelementptr inbounds i64, ptr %0, i64 12
  store i64 %2, ptr %arrayidx2, align 8
  ; store <2 x i64> %5, ptr %add.ptr, align 8
  ret i32 undef
}

Without this change, when generating extracts, ExternalUses have
{%add.ptr, store i64 %1, ptr %add.ptr, align 8} and
{%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8},
what this patch want to do is only generate extract for {%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8} since there is no need to generate extract for {%add.ptr, store i64 %1, ptr %add.ptr, align 8}.


I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.

Sorry, I don't quite understand this, could you please explain this a bit more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see why you need this.
Instead, I suggest another approach. When building the external uses entry for such scalars, just mark that there is no exact user. In this case all uses of this scalar will be automatically replaced by extractelement and you don't need this change anymore.
But I think some adjustment in the original code also should be done.

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5c325ad8a291..e66e2a7d110e 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -4928,33 +4928,33 @@ void BoUpSLP::buildExternalUses(
         if (!UserInst)
           continue;

-        if (isDeleted(UserInst))
-          continue;
-
         // Skip in-tree scalars that become vectors
         if (TreeEntry *UseEntry = getTreeEntry(U)) {
-          Value *UseScalar = UseEntry->Scalars[0];
           // Some in-tree scalars will remain as scalar in vectorized
-          // instructions. If that is the case, the one in Lane 0 will
+          // instructions. If that is the case, the one in FoundLane will
           // be used.
-          if (UseScalar != U ||
-              UseEntry->State == TreeEntry::ScatterVectorize ||
+          if (UseEntry->State == TreeEntry::ScatterVectorize ||
               UseEntry->State == TreeEntry::PossibleStridedVectorize ||
-              !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
+              !doesInTreeUserNeedToExtract(
+                  Scalar, cast<Instruction>(UseEntry->Scalars.front()), TLI)) {
             LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
                               << ".\n");
             assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
             continue;
           }
+          U = nullptr;
         }

+        if (isDeleted(UserInst))
+          continue;
+
         // Ignore users in the user ignore list.
         if (UserIgnoreList && UserIgnoreList->contains(UserInst))
           continue;

         LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "
                           << Lane << " from " << *Scalar << ".\n");
-        ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
+        ExternalUses.emplace_back(Scalar, U, FoundLane);
       }
     }
   }

@Enna1 Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from 40eba8a to dd0dd25 Compare December 25, 2023 15:56
@Enna1
Copy link
Contributor Author

Enna1 commented Dec 25, 2023

Thanks for your kind response and detailed explanation.

I have updated the patch based on your diff.
I don't like the way how I relax the assertion https://github.com/llvm/llvm-project/pull/76077/files#diff-e3b933e8c46156c0eaf7cbb67866b712f69a8484bc941d10acec9d4d17b9f061L11954-L11956, But I didn't come up a better way.

PTAL.

…r instruction.

Before 77a609b,
we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`,
that commit handles the case that if the in-tree use is scalar operand in vectorized instruction,
we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:
- The pointer operand of vectorized LoadInst uses an in-tree scalar
- The pointer operand of vectorized StoreInst uses an in-tree scalar
- The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`:
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667

However, 77a609b
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and get erased later.

This patch replaces extracts for original scalar instructions with corresponding vectorized instructions,
and remove
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
- https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667
extracts.
@Enna1 Enna1 force-pushed the users/Enna1/slp-extract-in-tree-user branch from dd0dd25 to 335b5b8 Compare December 26, 2023 07:13
continue;
assert((ExternallyUsedValues.count(Scalar) ||
any_of(Scalar->users(),
[this, Scalar](llvm::User *U) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[this, Scalar](llvm::User *U) {
[&](User *U) {

any_of(Scalar->users(),
[this, Scalar](llvm::User *U) {
TreeEntry *UseEntry = getTreeEntry(U);
return UseEntry &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return UseEntry &&
return UseEntry && UseEntry->State != TreeEntry::Vectorize &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this should be UseEntry->State == TreeEntry::Vectorize ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks so, also should be E->State != TreeEntry::Vectorize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to UseEntry && UseEntry->State == TreeEntry::Vectorize && E->State == TreeEntry::Vectorize && doesInTreeUserNeedToExtract()

Copy link

github-actions bot commented Dec 27, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@Enna1 Enna1 merged commit a51c2f3 into main Dec 30, 2023
4 checks passed
@Enna1 Enna1 deleted the users/Enna1/slp-extract-in-tree-user branch December 30, 2023 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants