-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SimplifyCFG] Use hash map to continue hoisting the common instructions #78615
base: main
Are you sure you want to change the base?
[SimplifyCFG] Use hash map to continue hoisting the common instructions #78615
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-llvm-transforms Author: None (RouzbehPaktinat) ChangesThis patch fixes the hoisting problem discussed in this issue: #68395 Currently, SimplifyCFG tries to hoist identical instructions from successors of a basic block, only if those instructions are located at the same level in the successor blocks which causes missing many hoisting opportunities. This patch solves the problem by creating hashmaps from instructions of all successors except the first one. It then iterates over the instructions of the first successor and looks up hashmaps to find identical instructions in order to hoist them. Measurements on llvm-test-suit/MultiSource showed that it improves compile time by ~1%. Full diff: https://github.com/llvm/llvm-project/pull/78615.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index f3994b6cc39fefb..0611b581c145e95 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -1526,6 +1526,17 @@ static bool shouldHoistCommonInstructions(Instruction *I1, Instruction *I2,
return true;
}
+// Hash instructions based on following factors:
+// 1- Instruction Opcode
+// 2- Instruction type
+// 3- Instruction operands
+llvm::hash_code getHash(Instruction *Instr) {
+ std::vector<Value *> operands(Instr->op_begin(), Instr->op_end());
+ return llvm::hash_combine(
+ Instr->getOpcode(), Instr->getType(),
+ hash_combine_range(operands.begin(), operands.end()));
+}
+
/// Hoist any common code in the successor blocks up into the block. This
/// function guarantees that BB dominates all successors. If EqTermsOnly is
/// given, only perform hoisting in case both blocks only contain a terminator.
@@ -1533,12 +1544,11 @@ static bool shouldHoistCommonInstructions(Instruction *I1, Instruction *I2,
/// added.
bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
bool EqTermsOnly) {
- // This does very trivial matching, with limited scanning, to find identical
- // instructions in the two blocks. In particular, we don't want to get into
- // O(N1*N2*...) situations here where Ni are the sizes of these successors. As
- // such, we currently just scan for obviously identical instructions in an
- // identical order, possibly separated by the same number of non-identical
- // instructions.
+ // We first sort successors based on the number of instructions each block
+ // holds. Then for each successor we make a hashmap from its instructions,
+ // except for the first successor. After that, we iterate over the
+ // instructions of the first successor. If we find identical instructions from
+ // every other successor, we hoist all of them into the predeccessor.
unsigned int SuccSize = succ_size(BB);
if (SuccSize < 2)
return false;
@@ -1552,10 +1562,21 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
auto *TI = BB->getTerminator();
+ SmallVector<BasicBlock *> SuccessorBlocks;
+ for (auto *Succ : successors(BB))
+ SuccessorBlocks.push_back(Succ);
+
+ // Sort successor blocks based on the number of instructions.
+ // This is because we always want to iterate over instructions
+ // of the smallest block.
+ llvm::stable_sort(SuccessorBlocks, [](BasicBlock *BB1, BasicBlock *BB2) {
+ return BB1->sizeWithoutDebug() < BB2->sizeWithoutDebug();
+ });
+
// The second of pair is a SkipFlags bitmask.
using SuccIterPair = std::pair<BasicBlock::iterator, unsigned>;
SmallVector<SuccIterPair, 8> SuccIterPairs;
- for (auto *Succ : successors(BB)) {
+ for (auto *Succ : SuccessorBlocks) {
BasicBlock::iterator SuccItr = Succ->begin();
if (isa<PHINode>(*SuccItr))
return false;
@@ -1589,80 +1610,121 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
}
bool Changed = false;
+ auto *SuccIterPairBegin = SuccIterPairs.begin();
+ SuccIterPairBegin++;
+ auto OtherSuccIterPairRange =
+ iterator_range(SuccIterPairBegin, SuccIterPairs.end());
+ auto OtherSuccIterRange = make_first_range(OtherSuccIterPairRange);
+ using InstrFlagPair = std::pair<Instruction *, unsigned>;
+ SmallVector<DenseMap<llvm::hash_code, InstrFlagPair>, 2> OtherSuccessorsHash;
+
+ for (auto BBItrPair : OtherSuccIterRange) {
+ // Fill the hashmap for every other successor
+ DenseMap<llvm::hash_code, InstrFlagPair> hashMap;
+ unsigned skipFlag = 0;
+ Instruction *I = nullptr;
+ do {
+ I = &*BBItrPair;
+ skipFlag |= skippedInstrFlags(I);
+ hashMap[getHash(I)] = InstrFlagPair(I, skipFlag);
+ BBItrPair++;
+ } while (!I->isTerminator());
+ OtherSuccessorsHash.push_back(hashMap);
+ }
+ // Keep track of instructions skipped in the first successor
+ unsigned SkipFlagsBB1 = 0;
+ bool SameLevelHoist = true;
for (;;) {
auto *SuccIterPairBegin = SuccIterPairs.begin();
auto &BB1ItrPair = *SuccIterPairBegin++;
auto OtherSuccIterPairRange =
iterator_range(SuccIterPairBegin, SuccIterPairs.end());
- auto OtherSuccIterRange = make_first_range(OtherSuccIterPairRange);
-
Instruction *I1 = &*BB1ItrPair.first;
auto *BB1 = I1->getParent();
-
- // Skip debug info if it is not identical.
- bool AllDbgInstsAreIdentical = all_of(OtherSuccIterRange, [I1](auto &Iter) {
- Instruction *I2 = &*Iter;
- return I1->isIdenticalToWhenDefined(I2);
- });
- if (!AllDbgInstsAreIdentical) {
- while (isa<DbgInfoIntrinsic>(I1))
- I1 = &*++BB1ItrPair.first;
- for (auto &SuccIter : OtherSuccIterRange) {
- Instruction *I2 = &*SuccIter;
- while (isa<DbgInfoIntrinsic>(I2))
- I2 = &*++SuccIter;
+ bool HasIdenticalInst = true;
+
+ // Check if there are identical instructions in all other successors
+ for (auto &map : OtherSuccessorsHash) {
+ Instruction *I2 = map[getHash(I1)].first;
+ // We might face with same hash values for different instructions.
+ // If that happens, ignore the instruction.
+ if (!I2 || !I1->isIdenticalTo(I2)) {
+ HasIdenticalInst = false;
+ break;
}
}
- bool AllInstsAreIdentical = true;
- bool HasTerminator = I1->isTerminator();
- for (auto &SuccIter : OtherSuccIterRange) {
- Instruction *I2 = &*SuccIter;
- HasTerminator |= I2->isTerminator();
- if (AllInstsAreIdentical && !I1->isIdenticalToWhenDefined(I2))
- AllInstsAreIdentical = false;
+ if (!HasIdenticalInst) {
+ if (NumSkipped >= HoistCommonSkipLimit)
+ return Changed;
+ SkipFlagsBB1 |= skippedInstrFlags(I1);
+ if (SameLevelHoist) {
+ for (auto &SuccIterPair : OtherSuccIterPairRange) {
+ Instruction *I = &*SuccIterPair.first++;
+ SuccIterPair.second |= skippedInstrFlags(I);
+ }
+ }
+ NumSkipped++;
+ if (I1->isTerminator())
+ return Changed;
+ ++BB1ItrPair.first;
+ continue;
}
// If we are hoisting the terminator instruction, don't move one (making a
// broken BB), instead clone it, and remove BI.
- if (HasTerminator) {
+ if (I1->isTerminator()) {
// Even if BB, which contains only one unreachable instruction, is ignored
// at the beginning of the loop, we can hoist the terminator instruction.
// If any instructions remain in the block, we cannot hoist terminators.
- if (NumSkipped || !AllInstsAreIdentical)
+ if (NumSkipped)
return Changed;
SmallVector<Instruction *, 8> Insts;
- for (auto &SuccIter : OtherSuccIterRange)
- Insts.push_back(&*SuccIter);
+ for (auto &map : OtherSuccessorsHash) {
+ Instruction *I2 = map[getHash(I1)].first;
+ // BB holding I2 should only contain the branch instruction
+ auto itr = I2->getParent()->instructionsWithoutDebug();
+ if (&*itr.begin() != I2)
+ return Changed;
+ Insts.push_back(I2);
+ }
return hoistSuccIdenticalTerminatorToSwitchOrIf(TI, I1, Insts) || Changed;
}
- if (AllInstsAreIdentical) {
- unsigned SkipFlagsBB1 = BB1ItrPair.second;
- AllInstsAreIdentical =
- isSafeToHoistInstr(I1, SkipFlagsBB1) &&
- all_of(OtherSuccIterPairRange, [=](const auto &Pair) {
- Instruction *I2 = &*Pair.first;
- unsigned SkipFlagsBB2 = Pair.second;
- // Even if the instructions are identical, it may not
- // be safe to hoist them if we have skipped over
- // instructions with side effects or their operands
- // weren't hoisted.
- return isSafeToHoistInstr(I2, SkipFlagsBB2) &&
- shouldHoistCommonInstructions(I1, I2, TTI);
- });
+ bool SafeToHoist = isSafeToHoistInstr(I1, SkipFlagsBB1);
+ unsigned index = 0;
+ for (auto &SuccIterPair : OtherSuccIterPairRange) {
+ Instruction *I2 = OtherSuccessorsHash[index][getHash(I1)].first;
+ // If instructions of all successors are at the same level, use the
+ // skipFlag of its BB, i.e., SameLevelHoist. Otherwise, use the skipFlag
+ // that was calculated initially for this instruction in the hashmap
+ if (SameLevelHoist && I2 == (&*(SuccIterPair.first))) {
+ SafeToHoist = SafeToHoist &&
+ isSafeToHoistInstr(I2, SuccIterPair.second) &&
+ shouldHoistCommonInstructions(I1, I2, TTI);
+ } else {
+ unsigned skipFlag = OtherSuccessorsHash[index][getHash(I1)].second;
+ SafeToHoist = SafeToHoist && isSafeToHoistInstr(I2, skipFlag) &&
+ shouldHoistCommonInstructions(I1, I2, TTI);
+ SameLevelHoist = false;
+ }
+ index++;
}
- if (AllInstsAreIdentical) {
+ if (SafeToHoist) {
BB1ItrPair.first++;
+ if (SameLevelHoist) {
+ for (auto &SuccIterPair : OtherSuccIterPairRange)
+ SuccIterPair.first++;
+ }
if (isa<DbgInfoIntrinsic>(I1)) {
// The debug location is an integral part of a debug info intrinsic
// and can't be separated from it or replaced. Instead of attempting
// to merge locations, simply hoist both copies of the intrinsic.
I1->moveBeforePreserving(TI);
- for (auto &SuccIter : OtherSuccIterRange) {
- auto *I2 = &*SuccIter++;
+ for (auto &map : OtherSuccessorsHash) {
+ Instruction *I2 = map[getHash(I1)].first;
assert(isa<DbgInfoIntrinsic>(I2));
I2->moveBeforePreserving(TI);
}
@@ -1672,8 +1734,8 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
// we remove the now redundant second instruction.
I1->moveBeforePreserving(TI);
BB->splice(TI->getIterator(), BB1, I1->getIterator());
- for (auto &SuccIter : OtherSuccIterRange) {
- Instruction *I2 = &*SuccIter++;
+ for (auto &map : OtherSuccessorsHash) {
+ Instruction *I2 = map[getHash(I1)].first;
assert(I2 != I1);
if (!I2->use_empty())
I2->replaceAllUsesWith(I1);
@@ -1695,9 +1757,12 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB,
// We are about to skip over a pair of non-identical instructions. Record
// if any have characteristics that would prevent reordering instructions
// across them.
- for (auto &SuccIterPair : SuccIterPairs) {
- Instruction *I = &*SuccIterPair.first++;
- SuccIterPair.second |= skippedInstrFlags(I);
+ SkipFlagsBB1 |= skippedInstrFlags(I1);
+ if (SameLevelHoist) {
+ for (auto &SuccIterPair : OtherSuccIterPairRange) { // update flags
+ Instruction *I = &*SuccIterPair.first;
+ SuccIterPair.second |= skippedInstrFlags(I);
+ }
}
++NumSkipped;
}
@@ -1741,7 +1806,6 @@ bool SimplifyCFGOpt::hoistSuccIdenticalTerminatorToSwitchOrIf(
Value *BB2V = PN.getIncomingValueForBlock(OtherSuccTI->getParent());
if (BB1V == BB2V)
continue;
-
// In the case of an if statement, check for
// passingValueIsAlwaysUndefined here because we would rather eliminate
// undefined control flow then converting it to a select.
@@ -1810,20 +1874,16 @@ bool SimplifyCFGOpt::hoistSuccIdenticalTerminatorToSwitchOrIf(
}
}
}
-
SmallVector<DominatorTree::UpdateType, 4> Updates;
-
// Update any PHI nodes in our new successors.
for (BasicBlock *Succ : successors(BB1)) {
AddPredecessorToBlock(Succ, TIParent, BB1);
if (DTU)
Updates.push_back({DominatorTree::Insert, TIParent, Succ});
}
-
if (DTU)
for (BasicBlock *Succ : successors(TI))
Updates.push_back({DominatorTree::Delete, TIParent, Succ});
-
EraseTerminatorAndDCECond(TI);
if (DTU)
DTU->applyUpdates(Updates);
diff --git a/llvm/test/Transforms/SimplifyCFG/hoist-common-code.ll b/llvm/test/Transforms/SimplifyCFG/hoist-common-code.ll
index bfe31d8345d506c..285062455e4f5f3 100644
--- a/llvm/test/Transforms/SimplifyCFG/hoist-common-code.ll
+++ b/llvm/test/Transforms/SimplifyCFG/hoist-common-code.ll
@@ -24,6 +24,58 @@ F: ; preds = %0
ret void
}
+define void @test_unordered(ptr noalias %b, ptr noalias %c, ptr noalias %Q, ptr noalias %R, i32 %i ) {
+; CHECK-LABEL: @test_unordered(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[ldR1:%.*]] = load i32, ptr [[R:%.*]], align 8
+; CHECK-NEXT: switch i32 %i, label %bb0 [
+; CHECK-NEXT: i32 2, label %bb1
+; CHECK-NEXT: i32 3, label %bb2
+; CHECK-NEXT: ]
+; CHECK: common.ret:
+; CHECK-NEXT: ret void
+; CHECK: bb0:
+; CHECK-NEXT: [[ldQ:%.*]] = load i32, ptr [[Q:%.*]], align 8
+; CHECK-NEXT: [[mul:%.*]] = mul i32 [[ldQ:%.*]], 2
+; CHECK-NEXT: [[add:%.*]] = add i32 [[ldR1:%.*]], [[mul:%.*]]
+; CHECK-NEXT: store i32 [[add:%.*]], ptr [[c:%.*]], align 8
+; CHECK-NEXT: br label [[COMMON_RET:%.*]]
+; CHECK: bb1:
+; CHECK-NEXT: store i32 [[ldR1:%.*]], ptr [[c:%.*]], align 4
+; CHECK-NEXT: br label [[COMMON_RET:%.*]]
+; CHECK: bb2:
+; CHECK-NEXT: [[ldQ2:%.*]] = load i32, ptr [[Q:%.*]], align 8
+; CHECK-NEXT: [[sub:%.*]] = sub i32 [[ldR1:%.*]], [[ldQ2:%.*]]
+; CHECK-NEXT: store i32 [[sub:%.*]], ptr [[c:%.*]], align 8
+; CHECK-NEXT: br label [[COMMON_RET:%.*]]
+
+entry:
+ switch i32 %i, label %bb0 [
+ i32 2, label %bb1
+ i32 3, label %bb2
+ ]
+
+bb0: ; preds = %entry
+ %ldQ1 = load i32, ptr %Q, align 8
+ %mul = mul i32 %ldQ1, 2
+ %ldR1 = load i32, ptr %R, align 8
+ %add = add i32 %ldR1, %mul
+ store i32 %add, ptr %c, align 8
+ ret void
+
+bb1: ; preds = entry
+ %ldR2 = load i32, ptr %R, align 8
+ store i32 %ldR2, ptr %c
+ ret void
+
+bb2: ; preds = entry
+ %ldQ2 = load i32, ptr %Q, align 8
+ %ldR3 = load i32, ptr %R, align 8
+ %sub = sub i32 %ldR3, %ldQ2
+ store i32 %sub, ptr %c, align 8
+ ret void
+}
+
define void @test_switch(i64 %i, ptr %Q) {
; CHECK-LABEL: @test_switch(
; CHECK-NEXT: common.ret:
|
Causes a crash in stage2 clang: https://llvm-compile-time-tracker.com/show_error.php?commit=e36825ea6feefeef44cca7710d775fba287d3fb2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These things can make reviewing easier.
- Use godbot in PR descriptions
- Add a Pre-commit to show the impact of the changes. e.g. [MemCpyOpt] Combine alias metadatas when replacing byval arguments #70580
- Add a test case from the issue to PhaseOrder?
I'll start reviewing as soon as possible. :3 (I wasn't getting around to it this weekend.)
}); | ||
bool SafeToHoist = isSafeToHoistInstr(I1, SkipFlagsBB1); | ||
unsigned index = 0; | ||
for (auto &SuccIterPair : OtherSuccIterPairRange) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep using all_of
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here I'm basically iterating over two vectors, OtherSuccIterPairRange
and OtherSuccessorsHash
which makes it very difficult to use all_off
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could use zip
?
Hi DianQK, |
bc38299
to
24ed0a8
Compare
Hi @nikic |
Could you add the corresponding test cases? I'd like to know what the problem is here. // We are about to skip over a pair of non-identical instructions. Record
// if any have characteristics that would prevent reordering instructions
// across them.
BB1ItrPair.first++;
SkipFlagsBB1 |= skippedInstrFlags(I1);
if (SameLevelHoist) {
for (auto &SuccIterPair : OtherSuccIterPairRange) { // update flags
Instruction *I = &*SuccIterPair.first;
Instruction *I = &*SuccIterPair.first++;
SuccIterPair.second |= skippedInstrFlags(I);
}
} |
The stage2 build still fails, though in a different way: https://llvm-compile-time-tracker.com/show_error.php?commit=d14b351d616d40f8b8c153923f96467e256fe931 |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Problem wasn't about the skipFlags. It was a small bug, I had forgotten to erase the map entry after hoisting instructions. |
Fixed the problem. |
ed3718f
to
1190014
Compare
Can you please rebase to fix the conflict? |
1190014
to
1982871
Compare
1982871
to
210ac6f
Compare
Sure, it's done. |
Thank you, we can bootstrap successfully now. Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=5ce2f73b2e5e6664d74b49ee45f11505f8306577&to=294d231e634b32489035a8b223edc612d41ed2d1&stat=instructions:u We'll want to reduce the compile-time impact if possible, but the code size numbers and the stage2 improvements looks quite promising to me. |
I'm sorry too. I'm back. New result: https://llvm-compile-time-tracker.com/compare.php?from=ca65da084c8e40ce2767da5986e2210cb53666f0&to=50d51f038124518356a22469e56c3de61e16c1e3&stat=instructions:u. Doesn't look like it's improving? Maybe I'm wrong. |
// in the hashmap execept for the ones that have the same hash as some | ||
// previous instruction in that BB. | ||
if (OtherSuccessorsHash.find(HashValue) == OtherSuccessorsHash.end()) { | ||
SmallVector<Instruction *, 2> vec{I}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to raise this value.
// previous instruction in that BB. | ||
if (OtherSuccessorsHash.find(HashValue) == OtherSuccessorsHash.end()) { | ||
SmallVector<Instruction *, 2> vec{I}; | ||
OtherSuccessorsHash[HashValue] = vec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reuse OtherSuccessorsHash.find(HashValue)
?
// first instruction with this hash value from current successor. | ||
if (OtherSuccessorsHash.find(HashValue) != OtherSuccessorsHash.end() && | ||
OtherSuccessorsHash[HashValue].size() == Index) { | ||
OtherSuccessorsHash[HashValue].push_back(I); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too. This seems to trigger three lookups?
}); | ||
bool SafeToHoist = isSafeToHoistInstr(I1, SkipFlagsBB1); | ||
unsigned index = 0; | ||
for (auto &SuccIterPair : OtherSuccIterPairRange) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could use zip
?
@@ -1742,15 +1767,6 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB, | |||
continue; | |||
} | |||
|
|||
SmallVector<Instruction *, 8> OtherInsts; | |||
if (SameLevelHoist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove SameLevelHoist
?
I think it makes sense. You can build the hash map when you can't hoist at the same level. This can reduce the size of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still doing same level hoisting but those instructions are in the hash map as well. If I want to avoid putting same level instructions in the hash map, the logic for filling the map will become more complicated and needs iterating BBs more than once which brings overhead. Take this case for example:
BB1 BB2
I0 I2
I1 I1
I2 I3
Let's say instructions with same numbers are identical.
While checking I0
from BB1
, if I want to check the same level instruction of BB2
(which is I2
here), after finding that I0
and I2
are not identical I need to put I2
in the map (hoping that I will find an identical instruction from BB1 later) but that would also increase map size if no identical instruction is found. The other option is not putting I2
in the map at this point, but wait until we are done iterating BB1, then re-iterate BB2. In this way when we reach I2
in BB2, we have already added its identical instruction from BB1 so it can be added to the map. But this is not desirable as we want to iterate each BB only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a solution here that splits the hoist common code into three steps (function or class?).
- The current simple matching method.
- Use hash map for matching.
- Hoist the termination instruction.
Also, I think with the hash map, we can simplify the first step of the logic by removing the skip part and going straight to the second step once we encounter a difference.
Could you create a PR first to prepare for this change? Perhaps, we just need to split the current code into two (1 and 3) steps in order to insert the hash map code. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it makes sense to split the code into 3 steps to address those cases independently. Just as a bug fix, can we merge the current code? I think it covers the problem that was mentioned in the original issue. I can work on your proposal in a separate PR a bit later :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have discussed this patch with Rouzbeh offline (before implementation and also regarding the recent discussion). About your suggestion of separating steps 1,2 and 3 above this is how I think about it.
- separating step 3 makes sense. It makes the code cleaner and also it is easy to implement.
- separating step 1 and 2 from each other sounds a little problematic for the following reasons: (a) I do not agree that separating this two steps makes the code easier to read. If it does, the improvement is marginal. I think current implementation is more elegant. (b) the separation could result in subtle issues. It is not clear when we want to go from 1 to 2 and different decisions could potentially cause different problems: either compile time overhead, or corner cases that are difficult to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rephrase my second point: The proposed change is not neccarily algorithmically equivalent to the current implementation. IIUC, and the motivation is only to make the code easier to understand, then
- this improvement is not that significant.
- The work would require understanding corner cases, where the two algorithm behave differently and making sure they are handled properly.
So my question for @DianQK is could you clarify what are the benefits of separating (1) and (2).
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @DianQK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just finding that the code here is getting more complicated. Of course, a big part of the problem here is my previous commits :3.
There is a possibility of ambiguity in this change, and I can verify my thoughts in subsequent PRs.
(Sorry, my github notifications are starting to get hard to sift through the important parts.)
OtherSuccessorsHash[NewHash].push_back(PrevUser); | ||
} else { | ||
SmallVector<Instruction *, 2> InstVec{PrevUser}; | ||
OtherSuccessorsHash[NewHash] = InstVec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler if we replaced all the instructions we need to replace at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give more details please? I'm just updating hash value of users here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think that we can hoist a batch of instruction when use hash map. Then we may don't need to update the hash value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM. Even though the code here has become more complex, I didn't find any significant problems.
I'll merge in the following days. That way, others may have time to come and review it.
BTW, when I have time, I'll try to simplify the code here.
OtherSuccessorsHash[NewHash].push_back(PrevUser); | ||
} else { | ||
SmallVector<Instruction *, 2> InstVec{PrevUser}; | ||
OtherSuccessorsHash[NewHash] = InstVec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think that we can hoist a batch of instruction when use hash map. Then we may don't need to update the hash value.
@@ -1742,15 +1767,6 @@ bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(BasicBlock *BB, | |||
continue; | |||
} | |||
|
|||
SmallVector<Instruction *, 8> OtherInsts; | |||
if (SameLevelHoist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just finding that the code here is getting more complicated. Of course, a big part of the problem here is my previous commits :3.
There is a possibility of ambiguity in this change, and I can verify my thoughts in subsequent PRs.
(Sorry, my github notifications are starting to get hard to sift through the important parts.)
@dtcxzyw Can you please test this patch? |
std::vector<Value *> operands(Instr->op_begin(), Instr->op_end()); | ||
return llvm::hash_combine( | ||
Instr->getOpcode(), Instr->getType(), | ||
hash_combine_range(operands.begin(), operands.end())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<Value *> operands(Instr->op_begin(), Instr->op_end()); | |
return llvm::hash_combine( | |
Instr->getOpcode(), Instr->getType(), | |
hash_combine_range(operands.begin(), operands.end())); | |
return llvm::hash_combine( | |
Instr->getOpcode(), Instr->getType(), | |
hash_combine_range(Instr->op_begin(), Instr->op_end())); |
TBH I think it is dangerous to calculate hashes for pointers :(
It will make the optimization result non-deterministic if it depends on the iteration order of the hashmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right but I never iterate the hashmap, I just look it up using a key. It won't have that non-deterministic behavior.
std::iter_swap( | ||
SuccessorBBs.begin(), | ||
std::find(SuccessorBBs.begin(), SuccessorBBs.end(), SmallestBB)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set a limit on the size of SmallestBB
? It should be a bit larger than HoistCommonSkipLimit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to regress existing behavior though? Currently basic hoisting will work independently of block size and it seems undesirable to lose that entirely. Or do you think that hoisting for large blocks will not matter?
I do agree that we need to avoid needlessly hashing large blocks, but I don't think we can do this using a simple block size limit. One possibility would be to compute hashes up to N instructions ahead rather than doing it all upfront.
It looks like a huge change :) |
TBH I don't know why |
@dtcxzyw |
IIRC that is because of functional issues in GVNHoist |
Gently ping @dtcxzyw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would like to see the impact to downstream :)
@@ -3647,7 +3776,6 @@ static bool FoldTwoEntryPHINode(PHINode *PN, const TargetTransformInfo &TTI, | |||
// Change the PHI node into a select instruction. | |||
Value *TrueVal = PN->getIncomingValueForBlock(IfTrue); | |||
Value *FalseVal = PN->getIncomingValueForBlock(IfFalse); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review this in more detail tomorrow, just a quick note...
std::iter_swap(SuccessorBBs.begin(), SmallestBBPos); | ||
BasicBlock *SmallestBB = *SmallestBBPos; | ||
|
||
if (SmallestBB->size() > BBSizeHoistLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These size() calls should be sizeWithoutDebug(), otherwise the debug invariance property will be violated.
Could you please rebase this so that the conflict goes away? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is really pushing the bounds of what is appropriate to do in SimplifyCFG. The code has become quite hard to understand.
My primary concern with this implementation would be with non-determinism -- I don't think that your approach of only supporting hoisting for the first instruction with a given hash is viable (unless I am misunderstanding something).
|
||
unsigned skipFlag = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned skipFlag = 0; | |
unsigned SkipFlag = 0; |
Variables should start with an uppercase letter.
Instruction *I = nullptr; | ||
do { | ||
I = &*BBItrPair.first; | ||
auto HashValue = getHash(I); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto HashValue = getHash(I); | |
hash_code HashValue = getHash(I); |
auto HashValue = getHash(I); | ||
skipFlag |= skippedInstrFlags(I); | ||
// For the first successor we created hashmap for, put all instructions | ||
// in the hashmap execept for the ones that have the same hash as some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// in the hashmap execept for the ones that have the same hash as some | |
// in the hashmap except for the ones that have the same hash as some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this implementation only supports hoisting the first instruction with a given hash? If there are multiple instructions with the same hash, then the later ones will not be hoisted?
If so, this is going to cause non-determinism. The hash depends on address values, so you will see different hash collisions between different runs and depending on that different instructions will get hoisted.
The more proper way to handle this would be to use a custom DenseMapInfo where the hash it determined as you do, but isEqual() still checks the full instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the current implementation could have non-deterministic behavior in some very corner cases. I can fix it by using DenseMapInfo as you explained.
// in the hashmap execept for the ones that have the same hash as some | ||
// previous instruction in that BB. | ||
if (OtherSuccessorsHash.find(HashValue) == OtherSuccessorsHash.end()) { | ||
OtherSuccessorsHash[HashValue] = {I}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use try_emplace to avoid duplicate hash lookup.
auto HashValue = getHash(I); | ||
skipFlag |= skippedInstrFlags(I); | ||
auto &InstVec = OtherSuccessorsHash[HashValue]; | ||
// For other successors put the instrcution in the map only if there are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// For other successors put the instrcution in the map only if there are | |
// For other successors put the instruction in the map only if there are |
if (OtherSuccessorsHash.find(NewHash) != | ||
OtherSuccessorsHash.end()) { | ||
OtherSuccessorsHash[NewHash].push_back(PrevUser); | ||
} else { | ||
SmallVector<Instruction *, 2> InstVec{PrevUser}; | ||
OtherSuccessorsHash[NewHash] = InstVec; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (OtherSuccessorsHash.find(NewHash) != | |
OtherSuccessorsHash.end()) { | |
OtherSuccessorsHash[NewHash].push_back(PrevUser); | |
} else { | |
SmallVector<Instruction *, 2> InstVec{PrevUser}; | |
OtherSuccessorsHash[NewHash] = InstVec; | |
} | |
OtherSuccessorsHash[NewHash].push_back(PrevUser); |
should work, I think? It will default initialize and then push if it doesn't exist yet.
if (!I2->use_empty()) | ||
// Update hashcode of all instructions using I2 | ||
if (!I2->use_empty()) { | ||
SmallVector<llvm::Instruction *, 8> PrevUsers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use llvm::
prefix in upstream LLVM.
// recompute the hash codes for them. We need to do this only for | ||
// the instructions located in the same block as I2 because we | ||
// initially only hashed those instructions. | ||
for (auto *user : I2->users()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto *user : I2->users()) { | |
for (User *U : I2->users()) { |
%sub = sub i32 %ldR3, %ldQ2 | ||
store i32 %sub, ptr %c, align 8 | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some very weak test coverage for a change this complex. At the very least I would expect to see some tests for the correct handling of SkipFlags, as well as for the hash update here.
std::iter_swap( | ||
SuccessorBBs.begin(), | ||
std::find(SuccessorBBs.begin(), SuccessorBBs.end(), SmallestBB)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to regress existing behavior though? Currently basic hoisting will work independently of block size and it seems undesirable to lose that entirely. Or do you think that hoisting for large blocks will not matter?
I do agree that we need to avoid needlessly hashing large blocks, but I don't think we can do this using a simple block size limit. One possibility would be to compute hashes up to N instructions ahead rather than doing it all upfront.
GVNHoist does conceptually seems like the right way to do this -- but at this point the pass would probably need a full re-implementation to be viable... |
SmallVector<BasicBlock *, 8> SuccessorBBs; | ||
for (auto *Succ : successors(BB)) { | ||
// If we find an unreachable instruction at the beginning of a basic block, | ||
// we can still hoist instructions from the rest of the basic blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the hasAddressTaken
check in this loop?
} while (!I->isTerminator()); | ||
|
||
unsigned Index = 1; | ||
for (auto BBItrPair : OtherSuccIterPairRange) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remain this to avoid shadow variable?
|
||
unsigned skipFlag = 0; | ||
Instruction *I = nullptr; | ||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just iterating over the BB with a for(Instruction *I:BBItrPair.first) { ... if(I->isTerminator()) break; }
is a lot more clear.
unsigned Index = 1; | ||
for (auto BBItrPair : OtherSuccIterPairRange) { | ||
// Fill the hashmap for every other successor | ||
unsigned skipFlag = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise shadow variable here. Maybe just add a scope to the above loop?
bool HasIdenticalInst = | ||
OtherSuccessorsHash.find(getHash(I1)) != OtherSuccessorsHash.end() && | ||
OtherInsts.size() == (SuccIterPairs.size() - 1) && | ||
all_of(OtherInsts, [&](Instruction *I2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be any_of
and you just grab the ones that are indentical?
@@ -1788,8 +1919,6 @@ bool SimplifyCFGOpt::hoistSuccIdenticalTerminatorToSwitchOrIf( | |||
auto *BB2 = I2->getParent(); | |||
if (BI) { | |||
assert(OtherSuccTIs.size() == 1); | |||
assert(BI->getSuccessor(0) == I1->getParent()); | |||
assert(BI->getSuccessor(1) == I2->getParent()); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should drop braces.
}); | ||
bool SafeToHoist = isSafeToHoistInstr(I1, SkipFlagsBB1); | ||
for (auto [SuccIterPair, I2] : zip(OtherSuccIterPairRange, OtherInsts)) { | ||
// If instructions of all successors are at the same level, use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think simpler and faster would just be put if(!SafeToHoist) break
. then get rid of the SafeToHoist &&
below.
I can fix non-deterministic behavior and try to make code simpler so that it's easier to understand. Doing that along with addressing your other comments, will the code be in a good status to be merged? |
Hi @nikic Your comment here is a bit ambiguous and it is appreciated if you could clarify. From the first sentence it looks like you have a fundamental objection to the design of the solution. But the rest of comment highlights two specific issues:
I really appreciate your time and effort to help review this PR. |
This patch fixes the hoisting problem discussed in this issue: #68395.
Currently, SimplifyCFG tries to hoist identical instructions from successors of a basic block, only if those instructions are located at the same level in the successor blocks which causes missing many hoisting opportunities.
This is root cause of the issue #68395 which is shown in the following simplified example:
Current implementation of SimplifyCFG is unable to hoist "%ldR1", "%ldR2" and "%ldR3" as can be seen here but with this patch now it can hoist them all into the
entry
block.We solve the problem by creating hashmaps from instructions of all successors except the first one. It then iterates over the instructions of the first successor and looks up hashmaps to find identical instructions in order to hoist them.
Measurements on llvm-test-suit/CTMark showed that it improves compile time by ~1%.