-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP] no need to generate extract for in-tree uses for original scala… #76077
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Enna1 (Enna1) Changes…r instruction. Before 77a609b, we always skip in-tree uses of the vectorized scalars in in-tree uses remain as scalar in vectorized instructions can be 3 cases:
Generating extract for in-tree uses for vectorized instructions are implemented in BoUpSLP::vectorizeTree():
However, 77a609b not only generates extract for vectorized instructions, but also generate extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. Extract for origin scalar instrutions are also generated because in To omit generating extract for original scalar instruction, this patch remove
With this change, it is also more likely to be profitable to vectorize since we remove the unneed entries in Full diff: https://github.com/llvm/llvm-project/pull/76077.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5c325ad8a291a2..4e52421c287250 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -771,33 +771,6 @@ static bool allSameType(ArrayRef<Value *> VL) {
return all_of(VL.drop_front(), [&](Value *V) { return V->getType() == Ty; });
}
-/// \returns True if in-tree use also needs extract. This refers to
-/// possible scalar operand in vectorized instruction.
-static bool doesInTreeUserNeedToExtract(Value *Scalar, Instruction *UserInst,
- TargetLibraryInfo *TLI) {
- unsigned Opcode = UserInst->getOpcode();
- switch (Opcode) {
- case Instruction::Load: {
- LoadInst *LI = cast<LoadInst>(UserInst);
- return (LI->getPointerOperand() == Scalar);
- }
- case Instruction::Store: {
- StoreInst *SI = cast<StoreInst>(UserInst);
- return (SI->getPointerOperand() == Scalar);
- }
- case Instruction::Call: {
- CallInst *CI = cast<CallInst>(UserInst);
- Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
- return any_of(enumerate(CI->args()), [&](auto &&Arg) {
- return isVectorIntrinsicWithScalarOpAtArg(ID, Arg.index()) &&
- Arg.value().get() == Scalar;
- });
- }
- default:
- return false;
- }
-}
-
/// \returns the AA location that is being access by the instruction.
static MemoryLocation getLocation(Instruction *I) {
if (StoreInst *SI = dyn_cast<StoreInst>(I))
@@ -4933,19 +4906,10 @@ void BoUpSLP::buildExternalUses(
// Skip in-tree scalars that become vectors
if (TreeEntry *UseEntry = getTreeEntry(U)) {
- Value *UseScalar = UseEntry->Scalars[0];
- // Some in-tree scalars will remain as scalar in vectorized
- // instructions. If that is the case, the one in Lane 0 will
- // be used.
- if (UseScalar != U ||
- UseEntry->State == TreeEntry::ScatterVectorize ||
- UseEntry->State == TreeEntry::PossibleStridedVectorize ||
- !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
- LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
- << ".\n");
- assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
- continue;
- }
+ LLVM_DEBUG(dbgs()
+ << "SLP: \tInternal user will be removed:" << *U << ".\n");
+ assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
+ continue;
}
// Ignore users in the user ignore list.
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
index a2c2e41d81b426..e2adeb911670a5 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll
@@ -52,17 +52,14 @@ define void @test(ptr %r, ptr %p, ptr %q) #0 {
define void @test2(i64* %a, i64* %b) {
; CHECK-LABEL: @test2(
-; CHECK-NEXT: [[A1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 1
-; CHECK-NEXT: [[A2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 2
-; CHECK-NEXT: [[I1:%.*]] = ptrtoint ptr [[A1]] to i64
-; CHECK-NEXT: [[B3:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 3
-; CHECK-NEXT: [[I2:%.*]] = ptrtoint ptr [[B3]] to i64
-; CHECK-NEXT: [[V1:%.*]] = load i64, ptr [[A1]], align 8
-; CHECK-NEXT: [[V2:%.*]] = load i64, ptr [[A2]], align 8
-; CHECK-NEXT: [[ADD1:%.*]] = add i64 [[I1]], [[V1]]
-; CHECK-NEXT: [[ADD2:%.*]] = add i64 [[I2]], [[V2]]
-; CHECK-NEXT: store i64 [[ADD1]], ptr [[A1]], align 8
-; CHECK-NEXT: store i64 [[ADD2]], ptr [[A2]], align 8
+; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x ptr> poison, ptr [[A:%.*]], i32 0
+; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x ptr> [[TMP1]], ptr [[B:%.*]], i32 1
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i64, <2 x ptr> [[TMP2]], <2 x i64> <i64 1, i64 3>
+; CHECK-NEXT: [[TMP4:%.*]] = ptrtoint <2 x ptr> [[TMP3]] to <2 x i64>
+; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0
+; CHECK-NEXT: [[TMP6:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
+; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP4]], [[TMP6]]
+; CHECK-NEXT: store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8
; CHECK-NEXT: ret void
;
%a1 = getelementptr inbounds i64, i64* %a, i64 1
|
That does not look correct, since it excludes these extracts from the cost estimation model. What I would suggest instead is to fix this check and remove extracts. |
f8c9f75
to
40eba8a
Compare
Thanks for pointing out this! I have updated the patch, replace extracts for original scalar instructions with corresponding vectorized instructions, PTAL, thanks! |
// Generate extract for in-tree uses if the use is scalar operand in | ||
// vectorized instruction. | ||
if (auto *UserVecTE = getTreeEntry(User)) | ||
if (doesInTreeUserNeedToExtract(Scalar, cast<Instruction>(User), TLI)) | ||
User = cast<llvm::User>(UserVecTE->VectorizedValue); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need this? I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need this?
Adding new vectorized load/store/call that use in-tree scalar to ExternalUses
in BoUpSLP::vectorizeTree()
is removed, we still should generate extract for these new added vectorized load/store/call. As we add the original scalar load/store/call to ExternalUses
in buildingExtractUsers()
, so I add a check here, generate extract for new added vectorized load/store/call instead of the scalar load/store/call that will be erased later.
Take a llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll as an example.
define i32 @fn1() {
entry:
%0 = load ptr, ptr @a, align 8
%add.ptr = getelementptr inbounds i64, ptr %0, i64 11
%1 = ptrtoint ptr %add.ptr to i64
store i64 %1, ptr %add.ptr, align 8
%add.ptr1 = getelementptr inbounds i64, ptr %0, i64 56
%2 = ptrtoint ptr %add.ptr1 to i64
%arrayidx2 = getelementptr inbounds i64, ptr %0, i64 12
store i64 %2, ptr %arrayidx2, align 8
; store <2 x i64> %5, ptr %add.ptr, align 8
ret i32 undef
}
Without this change, when generating extracts, ExternalUses
have
{%add.ptr, store i64 %1, ptr %add.ptr, align 8}
and
{%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8}
,
what this patch want to do is only generate extract for {%add.ptr, store <2 x i64> %5, ptr %add.ptr, align 8}
since there is no need to generate extract for {%add.ptr, store i64 %1, ptr %add.ptr, align 8}
.
I meant try to fix the place where we buildingExtractUsers directly, I think it can be fixed by proper checking of the scalar element.
Sorry, I don't quite understand this, could you please explain this a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see why you need this.
Instead, I suggest another approach. When building the external uses entry for such scalars, just mark that there is no exact user. In this case all uses of this scalar will be automatically replaced by extractelement and you don't need this change anymore.
But I think some adjustment in the original code also should be done.
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5c325ad8a291..e66e2a7d110e 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -4928,33 +4928,33 @@ void BoUpSLP::buildExternalUses(
if (!UserInst)
continue;
- if (isDeleted(UserInst))
- continue;
-
// Skip in-tree scalars that become vectors
if (TreeEntry *UseEntry = getTreeEntry(U)) {
- Value *UseScalar = UseEntry->Scalars[0];
// Some in-tree scalars will remain as scalar in vectorized
- // instructions. If that is the case, the one in Lane 0 will
+ // instructions. If that is the case, the one in FoundLane will
// be used.
- if (UseScalar != U ||
- UseEntry->State == TreeEntry::ScatterVectorize ||
+ if (UseEntry->State == TreeEntry::ScatterVectorize ||
UseEntry->State == TreeEntry::PossibleStridedVectorize ||
- !doesInTreeUserNeedToExtract(Scalar, UserInst, TLI)) {
+ !doesInTreeUserNeedToExtract(
+ Scalar, cast<Instruction>(UseEntry->Scalars.front()), TLI)) {
LLVM_DEBUG(dbgs() << "SLP: \tInternal user will be removed:" << *U
<< ".\n");
assert(UseEntry->State != TreeEntry::NeedToGather && "Bad state");
continue;
}
+ U = nullptr;
}
+ if (isDeleted(UserInst))
+ continue;
+
// Ignore users in the user ignore list.
if (UserIgnoreList && UserIgnoreList->contains(UserInst))
continue;
LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "
<< Lane << " from " << *Scalar << ".\n");
- ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
+ ExternalUses.emplace_back(Scalar, U, FoundLane);
}
}
}
40eba8a
to
dd0dd25
Compare
Thanks for your kind response and detailed explanation. I have updated the patch based on your diff. PTAL. |
…r instruction. Before 77a609b, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, 77a609b not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch replaces extracts for original scalar instructions with corresponding vectorized instructions, and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.
dd0dd25
to
335b5b8
Compare
continue; | ||
assert((ExternallyUsedValues.count(Scalar) || | ||
any_of(Scalar->users(), | ||
[this, Scalar](llvm::User *U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[this, Scalar](llvm::User *U) { | |
[&](User *U) { |
any_of(Scalar->users(), | ||
[this, Scalar](llvm::User *U) { | ||
TreeEntry *UseEntry = getTreeEntry(U); | ||
return UseEntry && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return UseEntry && | |
return UseEntry && UseEntry->State != TreeEntry::Vectorize && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this should be UseEntry->State == TreeEntry::Vectorize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks so, also should be E->State != TreeEntry::Vectorize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to UseEntry && UseEntry->State == TreeEntry::Vectorize && E->State == TreeEntry::Vectorize && doesInTreeUserNeedToExtract()
✅ With the latest revision this PR passed the C/C++ code formatter. |
…r instruction.
Before 77a609b,
we always skip in-tree uses of the vectorized scalars in
buildExternalUses()
,that commit handles the case that if the in-tree use is scalar operand in vectorized instruction,
we need to generate extract for these in-tree uses.
in-tree uses remain as scalar in vectorized instructions can be 3 cases:
Generating extract for in-tree uses for vectorized instructions are implemented in
BoUpSLP::vectorizeTree()
:However, 77a609b
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and get erased later.
This patch replaces extracts for original scalar instructions with corresponding vectorized instructions,
and remove
extracts.