Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFAJumpThreading] Avoid exploring the paths that never come back #85505

Merged
merged 2 commits into from
Apr 27, 2024

Conversation

XChy
Copy link
Member

@XChy XChy commented Mar 16, 2024

This patch does:

  • Preserve loop info when unfolding selects.
  • Reduce the search space for loop paths.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 16, 2024

@llvm/pr-subscribers-llvm-transforms

Author: XChy (XChy)

Changes

This patch does:

  • Preserve loop info when unfolding selects.
  • Reduce the search space for loop paths.

Full diff: https://github.com/llvm/llvm-project/pull/85505.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+21-7)
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index e66a1a2ccb318c..d359b701646722 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -126,7 +126,7 @@ class SelectInstToUnfold {
   explicit operator bool() const { return SI && SIUse; }
 };
 
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
             std::vector<SelectInstToUnfold> *NewSIsToUnfold,
             std::vector<BasicBlock *> *NewBBs);
 
@@ -152,7 +152,7 @@ class DFAJumpThreading {
 
       std::vector<SelectInstToUnfold> NewSIsToUnfold;
       std::vector<BasicBlock *> NewBBs;
-      unfold(&DTU, SIToUnfold, &NewSIsToUnfold, &NewBBs);
+      unfold(&DTU, LI, SIToUnfold, &NewSIsToUnfold, &NewBBs);
 
       // Put newly discovered select instructions into the work list.
       for (const SelectInstToUnfold &NewSIToUnfold : NewSIsToUnfold)
@@ -196,7 +196,7 @@ void createBasicBlockAndSinkSelectInst(
 /// created basic blocks into \p NewBBs.
 ///
 /// TODO: merge it with CodeGenPrepare::optimizeSelectInst() if possible.
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
             std::vector<SelectInstToUnfold> *NewSIsToUnfold,
             std::vector<BasicBlock *> *NewBBs) {
   SelectInst *SI = SIToUnfold.getInst();
@@ -302,6 +302,12 @@ void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
   DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
                      {DominatorTree::Insert, StartBlock, FT}});
 
+  // Preserve loop info
+  if (Loop *L = LI->getLoopFor(SI->getParent())) {
+    for (BasicBlock *NewBB : *NewBBs)
+      L->addBasicBlockToLoop(NewBB, *LI);
+  }
+
   // The select is now dead.
   assert(SI->use_empty() && "Select must be dead now");
   SI->eraseFromParent();
@@ -501,9 +507,10 @@ struct MainSwitch {
 };
 
 struct AllSwitchPaths {
-  AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE)
-      : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()),
-        ORE(ORE) {}
+  AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE,
+                 LoopInfo *LI)
+      : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()), ORE(ORE),
+        LI(LI) {}
 
   std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
   unsigned getNumThreadingPaths() { return TPaths.size(); }
@@ -575,6 +582,12 @@ struct AllSwitchPaths {
 
     Visited.insert(BB);
 
+    // Stop if we have reached the BB out of loop, since its successors have no
+    // impact on the DFA.
+    // TODO: Do we need to stop exploring if BB is the outer loop of the switch?
+    if (!LI->getLoopFor(BB))
+      return Res;
+
     // Some blocks have multiple edges to the same successor, and this set
     // is used to prevent a duplicate path from being generated
     SmallSet<BasicBlock *, 4> Successors;
@@ -716,6 +729,7 @@ struct AllSwitchPaths {
   BasicBlock *SwitchBlock;
   OptimizationRemarkEmitter *ORE;
   std::vector<ThreadingPath> TPaths;
+  LoopInfo *LI;
 };
 
 struct TransformDFA {
@@ -1283,7 +1297,7 @@ bool DFAJumpThreading::run(Function &F) {
     if (!Switch.getSelectInsts().empty())
       MadeChanges = true;
 
-    AllSwitchPaths SwitchPaths(&Switch, ORE);
+    AllSwitchPaths SwitchPaths(&Switch, ORE, LI);
     SwitchPaths.run();
 
     if (SwitchPaths.getNumThreadingPaths() > 0) {

@nikic
Copy link
Contributor

nikic commented Mar 16, 2024

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

@nikic
Copy link
Contributor

nikic commented Mar 16, 2024

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

Or maybe better, just add the verify call to

assert(DT->verify(DominatorTree::VerificationLevel::Full));
and then run it locally once with EXPENSIVE_CHECKS replaced by 1.

@@ -575,6 +582,12 @@ struct AllSwitchPaths {

Visited.insert(BB);

// Stop if we have reached the BB out of loop, since its successors have no
// impact on the DFA.
// TODO: Do we need to stop exploring if BB is the outer loop of the switch?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Would that cause any test changes?

Copy link
Member Author

@XChy XChy Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would break test2 in dfa-jump-threading-transform.ll. This test contains nested loops. If we don't see the outer loop, we can only thread part of switch.

@XChy
Copy link
Member Author

XChy commented Mar 16, 2024

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

@nikic
Copy link
Contributor

nikic commented Mar 16, 2024

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

I don't think it's a good idea to use an analysis that's not preserved. Is it possible to preserve LI through the threading optimization?

Sorry I completely missed this when looking at #85360.

@XChy
Copy link
Member Author

XChy commented Mar 16, 2024

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

I don't think it's a good idea to use an analysis that's not preserved. Is it possible to preserve LI through the threading optimization?

Sorry I completely missed this when looking at #85360.

I think if we don't skip over switch statement after cloning a new path, it's possible because the cloned path is in the same loop. But meanwhile, the skip have to be done in later optimizations. See also

/// Note that this is an optional step and would have been done in later

I'll give it a try.

@XChy
Copy link
Member Author

XChy commented Mar 21, 2024

@nikic , after investigating the codebase and cases for days, I found it's very hard to preserve the loop analysis during threading (just like JumpThreading can only track loop headers). Because the switch block will be duplicated and the paths are threaded one by one, the loops are broken and created during threading. Only after threading all paths, can we determine the the loop info. But calculating the loop info after threading looks the same as not preserving it to me...

I wonder why we can't use unpreserved loop analysis here, it seems that JumpThreading doesn't preserve it after DFAJumpThreading in pipeline.

@UsmanNadeem
Copy link
Contributor

@nikic , after investigating the codebase and cases for days, I found it's very hard to preserve the loop analysis during threading (just like JumpThreading can only track loop headers). Because the switch block will be duplicated and the paths are threaded one by one, the loops are broken and created during threading. Only after threading all paths, can we determine the the loop info. But calculating the loop info after threading looks the same as not preserving it to me...

I wonder why we can't use unpreserved loop analysis here, it seems that JumpThreading doesn't preserve it after DFAJumpThreading in pipeline.

@nikic Any comments about this?

Copy link
Contributor

@UsmanNadeem UsmanNadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch preserves loop info while unfolding, for use in paths(...), which looks okay to me.

I don't have an knowledgeable opinion on whether the pass should preserve the loop analysis as a whole or not. I searched a little and it looks like only loop passes are required to preserve it and (even though it operates on loops) DFAJumpThreading is a function pass. Considering that this pass does transformations in a very small number of functions I would assume that the total cost to recreate the loop analysis would be reasonable.

@efriedma-quic
Copy link
Collaborator

There isn't any technical reason you have to preserve LoopInfo, as far as I know. The only real downside is that it has to be recomputed later.

That said, using LoopInfo and not preserving it is extremely rare; I'd expect explicit comments noting it in the list of preserved analysis, and at the point where the LoopInfo ceases to be valid.

You should still be able to verify() the LoopInfo before it becomes invalid. Given the way the pass is structured, that can just be guarded by NDEBUG (running it once per function isn't that expensive; we guard it with EXPENSIVE_CHECKS in a lot of places because running it once per loop doesn't scale).

@nikic
Copy link
Contributor

nikic commented Apr 18, 2024

My general concern with not preserving LoopInfo is that we still continue using LoopInfo even after it has been invalidated. Let's say we have two candidates for DFA jump threading in a function. When we process the first one, we modify the CFG without (fully) updating LoopInfo. Then for the second one, we query LoopInfo -- but the LoopInfo is broken now. Can this cause any issues? I find it hard to tell...

@nikic
Copy link
Contributor

nikic commented Apr 18, 2024

Okay, I think I missed a key piece here:

// For the time being limit this optimization to occurring once in a
// function since it can change the CFG significantly. This is not a
// strict requirement but it can cause buggy behavior if there is an
// overlap of blocks in different opportunities. There is a lot of room to
// experiment with catching more opportunities here.

We only allow one DFA jump thread per invocation of the pass. If that's the case, then not preserving loop info is indeed okay. I'd update that comment though to say that we'd have to handle LoopInfo invalidation if we wanted to change that.

@XChy
Copy link
Member Author

XChy commented Apr 18, 2024

My general concern with not preserving LoopInfo is that we still continue using LoopInfo even after it has been invalidated. Let's say we have two candidates for DFA jump threading in a function. When we process the first one, we modify the CFG without (fully) updating LoopInfo. Then for the second one, we query LoopInfo -- but the LoopInfo is broken now. Can this cause any issues? I find it hard to tell...

Yes, if we process more than one candidate, that would be a problem (though there are already other underlying problems if we thread more). These problems should be solved in the future.

@XChy XChy force-pushed the dfa-jump-thread-paths-noloop branch from 4c0f84b to ae79f1c Compare April 20, 2024 15:32
@XChy XChy force-pushed the dfa-jump-thread-paths-noloop branch from ae79f1c to 94cbd89 Compare April 20, 2024 15:35
@XChy
Copy link
Member Author

XChy commented Apr 20, 2024

There isn't any technical reason you have to preserve LoopInfo, as far as I know. The only real downside is that it has to be recomputed later.

That said, using LoopInfo and not preserving it is extremely rare; I'd expect explicit comments noting it in the list of preserved analysis, and at the point where the LoopInfo ceases to be valid.

You should still be able to verify() the LoopInfo before it becomes invalid. Given the way the pass is structured, that can just be guarded by NDEBUG (running it once per function isn't that expensive; we guard it with EXPENSIVE_CHECKS in a lot of places because running it once per loop doesn't scale).

Verify it now.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XChy XChy merged commit c229f76 into llvm:main Apr 27, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants