-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DFAJumpThreading] Avoid exploring the paths that never come back #85505
Conversation
@llvm/pr-subscribers-llvm-transforms Author: XChy (XChy) ChangesThis patch does:
Full diff: https://github.com/llvm/llvm-project/pull/85505.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index e66a1a2ccb318c..d359b701646722 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -126,7 +126,7 @@ class SelectInstToUnfold {
explicit operator bool() const { return SI && SIUse; }
};
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
std::vector<SelectInstToUnfold> *NewSIsToUnfold,
std::vector<BasicBlock *> *NewBBs);
@@ -152,7 +152,7 @@ class DFAJumpThreading {
std::vector<SelectInstToUnfold> NewSIsToUnfold;
std::vector<BasicBlock *> NewBBs;
- unfold(&DTU, SIToUnfold, &NewSIsToUnfold, &NewBBs);
+ unfold(&DTU, LI, SIToUnfold, &NewSIsToUnfold, &NewBBs);
// Put newly discovered select instructions into the work list.
for (const SelectInstToUnfold &NewSIToUnfold : NewSIsToUnfold)
@@ -196,7 +196,7 @@ void createBasicBlockAndSinkSelectInst(
/// created basic blocks into \p NewBBs.
///
/// TODO: merge it with CodeGenPrepare::optimizeSelectInst() if possible.
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
std::vector<SelectInstToUnfold> *NewSIsToUnfold,
std::vector<BasicBlock *> *NewBBs) {
SelectInst *SI = SIToUnfold.getInst();
@@ -302,6 +302,12 @@ void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
{DominatorTree::Insert, StartBlock, FT}});
+ // Preserve loop info
+ if (Loop *L = LI->getLoopFor(SI->getParent())) {
+ for (BasicBlock *NewBB : *NewBBs)
+ L->addBasicBlockToLoop(NewBB, *LI);
+ }
+
// The select is now dead.
assert(SI->use_empty() && "Select must be dead now");
SI->eraseFromParent();
@@ -501,9 +507,10 @@ struct MainSwitch {
};
struct AllSwitchPaths {
- AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE)
- : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()),
- ORE(ORE) {}
+ AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE,
+ LoopInfo *LI)
+ : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()), ORE(ORE),
+ LI(LI) {}
std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
unsigned getNumThreadingPaths() { return TPaths.size(); }
@@ -575,6 +582,12 @@ struct AllSwitchPaths {
Visited.insert(BB);
+ // Stop if we have reached the BB out of loop, since its successors have no
+ // impact on the DFA.
+ // TODO: Do we need to stop exploring if BB is the outer loop of the switch?
+ if (!LI->getLoopFor(BB))
+ return Res;
+
// Some blocks have multiple edges to the same successor, and this set
// is used to prevent a duplicate path from being generated
SmallSet<BasicBlock *, 4> Successors;
@@ -716,6 +729,7 @@ struct AllSwitchPaths {
BasicBlock *SwitchBlock;
OptimizationRemarkEmitter *ORE;
std::vector<ThreadingPath> TPaths;
+ LoopInfo *LI;
};
struct TransformDFA {
@@ -1283,7 +1297,7 @@ bool DFAJumpThreading::run(Function &F) {
if (!Switch.getSelectInsts().empty())
MadeChanges = true;
- AllSwitchPaths SwitchPaths(&Switch, ORE);
+ AllSwitchPaths SwitchPaths(&Switch, ORE, LI);
SwitchPaths.run();
if (SwitchPaths.getNumThreadingPaths() > 0) {
|
Can you please add something like |
Or maybe better, just add the verify call to
|
@@ -575,6 +582,12 @@ struct AllSwitchPaths { | |||
|
|||
Visited.insert(BB); | |||
|
|||
// Stop if we have reached the BB out of loop, since its successors have no | |||
// impact on the DFA. | |||
// TODO: Do we need to stop exploring if BB is the outer loop of the switch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Would that cause any test changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would break test2 in dfa-jump-threading-transform.ll. This test contains nested loops. If we don't see the outer loop, we can only thread part of switch.
I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in |
I don't think it's a good idea to use an analysis that's not preserved. Is it possible to preserve LI through the threading optimization? Sorry I completely missed this when looking at #85360. |
I think if we don't skip over switch statement after cloning a new path, it's possible because the cloned path is in the same loop. But meanwhile, the skip have to be done in later optimizations. See also
I'll give it a try. |
@nikic , after investigating the codebase and cases for days, I found it's very hard to preserve the loop analysis during threading (just like JumpThreading can only track loop headers). Because the switch block will be duplicated and the paths are threaded one by one, the loops are broken and created during threading. Only after threading all paths, can we determine the the loop info. But calculating the loop info after threading looks the same as not preserving it to me... I wonder why we can't use unpreserved loop analysis here, it seems that JumpThreading doesn't preserve it after DFAJumpThreading in pipeline. |
@nikic Any comments about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch preserves loop info while unfolding, for use in paths(...), which looks okay to me.
I don't have an knowledgeable opinion on whether the pass should preserve the loop analysis as a whole or not. I searched a little and it looks like only loop passes are required to preserve it and (even though it operates on loops) DFAJumpThreading is a function pass. Considering that this pass does transformations in a very small number of functions I would assume that the total cost to recreate the loop analysis would be reasonable.
There isn't any technical reason you have to preserve LoopInfo, as far as I know. The only real downside is that it has to be recomputed later. That said, using LoopInfo and not preserving it is extremely rare; I'd expect explicit comments noting it in the list of preserved analysis, and at the point where the LoopInfo ceases to be valid. You should still be able to verify() the LoopInfo before it becomes invalid. Given the way the pass is structured, that can just be guarded by NDEBUG (running it once per function isn't that expensive; we guard it with EXPENSIVE_CHECKS in a lot of places because running it once per loop doesn't scale). |
My general concern with not preserving LoopInfo is that we still continue using LoopInfo even after it has been invalidated. Let's say we have two candidates for DFA jump threading in a function. When we process the first one, we modify the CFG without (fully) updating LoopInfo. Then for the second one, we query LoopInfo -- but the LoopInfo is broken now. Can this cause any issues? I find it hard to tell... |
Okay, I think I missed a key piece here: llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp Lines 1313 to 1317 in 156ab4d
We only allow one DFA jump thread per invocation of the pass. If that's the case, then not preserving loop info is indeed okay. I'd update that comment though to say that we'd have to handle LoopInfo invalidation if we wanted to change that. |
Yes, if we process more than one candidate, that would be a problem (though there are already other underlying problems if we thread more). These problems should be solved in the future. |
4c0f84b
to
ae79f1c
Compare
ae79f1c
to
94cbd89
Compare
Verify it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch does: