[DFAJumpThreading] Avoid exploring the paths that never come back #85505

XChy · 2024-03-16T08:02:14Z

This patch does:

Preserve loop info when unfolding selects.
Reduce the search space for loop paths.

llvmbot · 2024-03-16T08:02:48Z

@llvm/pr-subscribers-llvm-transforms

Author: XChy (XChy)

Changes

This patch does:

Preserve loop info when unfolding selects.
Reduce the search space for loop paths.

Full diff: https://github.com/llvm/llvm-project/pull/85505.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+21-7)

diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index e66a1a2ccb318c..d359b701646722 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -126,7 +126,7 @@ class SelectInstToUnfold {
   explicit operator bool() const { return SI && SIUse; }
 };
 
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
             std::vector<SelectInstToUnfold> *NewSIsToUnfold,
             std::vector<BasicBlock *> *NewBBs);
 
@@ -152,7 +152,7 @@ class DFAJumpThreading {
 
       std::vector<SelectInstToUnfold> NewSIsToUnfold;
       std::vector<BasicBlock *> NewBBs;
-      unfold(&DTU, SIToUnfold, &NewSIsToUnfold, &NewBBs);
+      unfold(&DTU, LI, SIToUnfold, &NewSIsToUnfold, &NewBBs);
 
       // Put newly discovered select instructions into the work list.
       for (const SelectInstToUnfold &NewSIToUnfold : NewSIsToUnfold)
@@ -196,7 +196,7 @@ void createBasicBlockAndSinkSelectInst(
 /// created basic blocks into \p NewBBs.
 ///
 /// TODO: merge it with CodeGenPrepare::optimizeSelectInst() if possible.
-void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
+void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
             std::vector<SelectInstToUnfold> *NewSIsToUnfold,
             std::vector<BasicBlock *> *NewBBs) {
   SelectInst *SI = SIToUnfold.getInst();
@@ -302,6 +302,12 @@ void unfold(DomTreeUpdater *DTU, SelectInstToUnfold SIToUnfold,
   DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
                      {DominatorTree::Insert, StartBlock, FT}});
 
+  // Preserve loop info
+  if (Loop *L = LI->getLoopFor(SI->getParent())) {
+    for (BasicBlock *NewBB : *NewBBs)
+      L->addBasicBlockToLoop(NewBB, *LI);
+  }
+
   // The select is now dead.
   assert(SI->use_empty() && "Select must be dead now");
   SI->eraseFromParent();
@@ -501,9 +507,10 @@ struct MainSwitch {
 };
 
 struct AllSwitchPaths {
-  AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE)
-      : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()),
-        ORE(ORE) {}
+  AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE,
+                 LoopInfo *LI)
+      : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()), ORE(ORE),
+        LI(LI) {}
 
   std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
   unsigned getNumThreadingPaths() { return TPaths.size(); }
@@ -575,6 +582,12 @@ struct AllSwitchPaths {
 
     Visited.insert(BB);
 
+    // Stop if we have reached the BB out of loop, since its successors have no
+    // impact on the DFA.
+    // TODO: Do we need to stop exploring if BB is the outer loop of the switch?
+    if (!LI->getLoopFor(BB))
+      return Res;
+
     // Some blocks have multiple edges to the same successor, and this set
     // is used to prevent a duplicate path from being generated
     SmallSet<BasicBlock *, 4> Successors;
@@ -716,6 +729,7 @@ struct AllSwitchPaths {
   BasicBlock *SwitchBlock;
   OptimizationRemarkEmitter *ORE;
   std::vector<ThreadingPath> TPaths;
+  LoopInfo *LI;
 };
 
 struct TransformDFA {
@@ -1283,7 +1297,7 @@ bool DFAJumpThreading::run(Function &F) {
     if (!Switch.getSelectInsts().empty())
       MadeChanges = true;
 
-    AllSwitchPaths SwitchPaths(&Switch, ORE);
+    AllSwitchPaths SwitchPaths(&Switch, ORE, LI);
     SwitchPaths.run();
 
     if (SwitchPaths.getNumThreadingPaths() > 0) {

nikic · 2024-03-16T08:50:37Z

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

nikic · 2024-03-16T08:51:51Z

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

Or maybe better, just add the verify call to

llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

Line 1312 in dbbdee2

assert(DT->verify(DominatorTree::VerificationLevel::Full));

and then run it locally once with EXPENSIVE_CHECKS replaced by 1.

nikic · 2024-03-16T08:54:37Z

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

@@ -575,6 +582,12 @@ struct AllSwitchPaths {

    Visited.insert(BB);

+    // Stop if we have reached the BB out of loop, since its successors have no
+    // impact on the DFA.
+    // TODO: Do we need to stop exploring if BB is the outer loop of the switch?


I think so. Would that cause any test changes?

Yes, it would break test2 in dfa-jump-threading-transform.ll. This test contains nested loops. If we don't see the outer loop, we can only thread part of switch.

XChy · 2024-03-16T11:29:32Z

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

nikic · 2024-03-16T11:59:57Z

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

I don't think it's a good idea to use an analysis that's not preserved. Is it possible to preserve LI through the threading optimization?

Sorry I completely missed this when looking at #85360.

XChy · 2024-03-16T19:09:12Z

Can you please add something like if (VerifyLoopInfo) LI->verify(DT) and add -verify-loop-info to all the DFAJumpThreading tests? Can also mark the analysis as preserved.

I don't think we preserve the loop analysis completely. When transforming DFA, we clone and thread BBs, which may break loop analysis. This patch only preserves it before transforming DFA, so that we get the correct result in AllSwitchPaths::paths.

I don't think it's a good idea to use an analysis that's not preserved. Is it possible to preserve LI through the threading optimization?

Sorry I completely missed this when looking at #85360.

I think if we don't skip over switch statement after cloning a new path, it's possible because the cloned path is in the same loop. But meanwhile, the skip have to be done in later optimizations. See also

llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

Line 1173 in c9325f8

/// Note that this is an optional step and would have been done in later

I'll give it a try.

XChy · 2024-03-21T13:49:58Z

@nikic , after investigating the codebase and cases for days, I found it's very hard to preserve the loop analysis during threading (just like JumpThreading can only track loop headers). Because the switch block will be duplicated and the paths are threaded one by one, the loops are broken and created during threading. Only after threading all paths, can we determine the the loop info. But calculating the loop info after threading looks the same as not preserving it to me...

I wonder why we can't use unpreserved loop analysis here, it seems that JumpThreading doesn't preserve it after DFAJumpThreading in pipeline.

UsmanNadeem · 2024-04-01T01:27:37Z

@nikic , after investigating the codebase and cases for days, I found it's very hard to preserve the loop analysis during threading (just like JumpThreading can only track loop headers). Because the switch block will be duplicated and the paths are threaded one by one, the loops are broken and created during threading. Only after threading all paths, can we determine the the loop info. But calculating the loop info after threading looks the same as not preserving it to me...

I wonder why we can't use unpreserved loop analysis here, it seems that JumpThreading doesn't preserve it after DFAJumpThreading in pipeline.

@nikic Any comments about this?

UsmanNadeem

This patch preserves loop info while unfolding, for use in paths(...), which looks okay to me.

I don't have an knowledgeable opinion on whether the pass should preserve the loop analysis as a whole or not. I searched a little and it looks like only loop passes are required to preserve it and (even though it operates on loops) DFAJumpThreading is a function pass. Considering that this pass does transformations in a very small number of functions I would assume that the total cost to recreate the loop analysis would be reasonable.

efriedma-quic · 2024-04-17T16:58:54Z

There isn't any technical reason you have to preserve LoopInfo, as far as I know. The only real downside is that it has to be recomputed later.

That said, using LoopInfo and not preserving it is extremely rare; I'd expect explicit comments noting it in the list of preserved analysis, and at the point where the LoopInfo ceases to be valid.

You should still be able to verify() the LoopInfo before it becomes invalid. Given the way the pass is structured, that can just be guarded by NDEBUG (running it once per function isn't that expensive; we guard it with EXPENSIVE_CHECKS in a lot of places because running it once per loop doesn't scale).

nikic · 2024-04-18T00:57:19Z

My general concern with not preserving LoopInfo is that we still continue using LoopInfo even after it has been invalidated. Let's say we have two candidates for DFA jump threading in a function. When we process the first one, we modify the CFG without (fully) updating LoopInfo. Then for the second one, we query LoopInfo -- but the LoopInfo is broken now. Can this cause any issues? I find it hard to tell...

nikic · 2024-04-18T00:59:53Z

Okay, I think I missed a key piece here:

llvm-project/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

Lines 1313 to 1317 in 156ab4d

    
           // For the time being limit this optimization to occurring once in a 
        
           // function since it can change the CFG significantly. This is not a 
        
           // strict requirement but it can cause buggy behavior if there is an 
        
           // overlap of blocks in different opportunities. There is a lot of room to 
        
           // experiment with catching more opportunities here.

We only allow one DFA jump thread per invocation of the pass. If that's the case, then not preserving loop info is indeed okay. I'd update that comment though to say that we'd have to handle LoopInfo invalidation if we wanted to change that.

XChy · 2024-04-18T09:17:34Z

My general concern with not preserving LoopInfo is that we still continue using LoopInfo even after it has been invalidated. Let's say we have two candidates for DFA jump threading in a function. When we process the first one, we modify the CFG without (fully) updating LoopInfo. Then for the second one, we query LoopInfo -- but the LoopInfo is broken now. Can this cause any issues? I find it hard to tell...

Yes, if we process more than one candidate, that would be a problem (though there are already other underlying problems if we thread more). These problems should be solved in the future.

XChy · 2024-04-20T15:56:01Z

There isn't any technical reason you have to preserve LoopInfo, as far as I know. The only real downside is that it has to be recomputed later.

That said, using LoopInfo and not preserving it is extremely rare; I'd expect explicit comments noting it in the list of preserved analysis, and at the point where the LoopInfo ceases to be valid.

You should still be able to verify() the LoopInfo before it becomes invalid. Given the way the pass is structured, that can just be guarded by NDEBUG (running it once per function isn't that expensive; we guard it with EXPENSIVE_CHECKS in a lot of places because running it once per loop doesn't scale).

Verify it now.

efriedma-quic

LGTM

XChy requested review from nikic and UsmanNadeem March 16, 2024 08:02

llvmbot added the llvm:transforms label Mar 16, 2024

nikic reviewed Mar 16, 2024

View reviewed changes

nikic mentioned this pull request Mar 16, 2024

[DFAJumpThreading] Add an early exit heuristic for unpredictable values #85015

Merged

UsmanNadeem approved these changes Apr 15, 2024

View reviewed changes

[DFAJumpThreading] Avoid exploring the paths that never come back

35bc0a3

XChy force-pushed the dfa-jump-thread-paths-noloop branch from 4c0f84b to ae79f1c Compare April 20, 2024 15:32

[DFAJumpThreading] Add checks about LoopInfo

94cbd89

XChy force-pushed the dfa-jump-thread-paths-noloop branch from ae79f1c to 94cbd89 Compare April 20, 2024 15:35

UsmanNadeem requested a review from efriedma-quic April 22, 2024 03:40

efriedma-quic approved these changes Apr 24, 2024

View reviewed changes

XChy merged commit c229f76 into llvm:main Apr 27, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DFAJumpThreading] Avoid exploring the paths that never come back #85505

[DFAJumpThreading] Avoid exploring the paths that never come back #85505

XChy commented Mar 16, 2024

llvmbot commented Mar 16, 2024

nikic commented Mar 16, 2024

nikic commented Mar 16, 2024

nikic Mar 16, 2024

XChy Mar 16, 2024 •

edited

XChy commented Mar 16, 2024

nikic commented Mar 16, 2024

XChy commented Mar 16, 2024

XChy commented Mar 21, 2024 •

edited

UsmanNadeem commented Apr 1, 2024

UsmanNadeem left a comment

efriedma-quic commented Apr 17, 2024

nikic commented Apr 18, 2024

nikic commented Apr 18, 2024

XChy commented Apr 18, 2024

XChy commented Apr 20, 2024

efriedma-quic left a comment

[DFAJumpThreading] Avoid exploring the paths that never come back #85505

[DFAJumpThreading] Avoid exploring the paths that never come back #85505

Conversation

XChy commented Mar 16, 2024

llvmbot commented Mar 16, 2024

nikic commented Mar 16, 2024

nikic commented Mar 16, 2024

nikic Mar 16, 2024

Choose a reason for hiding this comment

XChy Mar 16, 2024 • edited

Choose a reason for hiding this comment

XChy commented Mar 16, 2024

nikic commented Mar 16, 2024

XChy commented Mar 16, 2024

XChy commented Mar 21, 2024 • edited

UsmanNadeem commented Apr 1, 2024

UsmanNadeem left a comment

Choose a reason for hiding this comment

efriedma-quic commented Apr 17, 2024

nikic commented Apr 18, 2024

nikic commented Apr 18, 2024

XChy commented Apr 18, 2024

XChy commented Apr 20, 2024

efriedma-quic left a comment

Choose a reason for hiding this comment

XChy Mar 16, 2024 •

edited

XChy commented Mar 21, 2024 •

edited