[DependenceAnalysis] Extending SIV to handle fusable loops #128782

1997alireza · 2025-02-25T22:43:31Z

When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.

llvmbot · 2025-02-25T22:44:06Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Alireza Torabian (1997alireza)

Changes

When there is a dependency between two memory instructions in separate loops, SIV will be able to test them and compute the direction and the distance of the dependency.

Patch is 48.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128782.diff

4 Files Affected:

(modified) llvm/include/llvm/Analysis/DependenceAnalysis.h (+55-17)
(modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+222-92)
(modified) llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll (+1-1)
(added) llvm/test/Analysis/DependenceAnalysis/SIVSeparateLoops.ll (+209)

diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 426ac757b4b0d..8e86c091c60a2 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -97,10 +97,11 @@ namespace llvm {
       bool PeelFirst : 1; // Peeling the first iteration will break dependence.
       bool PeelLast  : 1; // Peeling the last iteration will break the dependence.
       bool Splitable : 1; // Splitting the loop will break dependence.
+      bool SeparateLoops : 1; // Is performed across two separate loop nests.
       const SCEV *Distance = nullptr; // NULL implies no distance available.
       DVEntry()
           : Direction(ALL), Scalar(true), PeelFirst(false), PeelLast(false),
-            Splitable(false) {}
+            Splitable(false), SeparateLoops(false) {}
     };
 
     /// getSrc - Returns the source instruction for this dependence.
@@ -182,6 +183,10 @@ namespace llvm {
     /// the dependence.
     virtual bool isSplitable(unsigned Level) const { return false; }
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    virtual bool inSeparateLoops(unsigned Level) const { return false; }
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -275,6 +280,10 @@ namespace llvm {
     /// the dependence.
     bool isSplitable(unsigned Level) const override;
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    bool inSeparateLoops(unsigned Level) const override;
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -405,7 +414,8 @@ namespace llvm {
       const SCEV *A;
       const SCEV *B;
       const SCEV *C;
-      const Loop *AssociatedLoop;
+      const Loop *AssociatedSrcLoop;
+      const Loop *AssociatedDstLoop;
 
     public:
       /// isEmpty - Return true if the constraint is of kind Empty.
@@ -449,18 +459,25 @@ namespace llvm {
       /// Otherwise assert.
       const SCEV *getD() const;
 
-      /// getAssociatedLoop - Returns the loop associated with this constraint.
-      const Loop *getAssociatedLoop() const;
+      /// getAssociatedSrcLoop - Returns the source loop associated with this
+      /// constraint.
+      const Loop *getAssociatedSrcLoop() const;
+
+      /// getAssociatedDstLoop - Returns the destination loop associated with
+      /// this constraint.
+      const Loop *getAssociatedDstLoop() const;
 
       /// setPoint - Change a constraint to Point.
-      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentLoop);
+      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentSrcLoop,
+                    const Loop *CurrentDstLoop);
 
       /// setLine - Change a constraint to Line.
-      void setLine(const SCEV *A, const SCEV *B,
-                   const SCEV *C, const Loop *CurrentLoop);
+      void setLine(const SCEV *A, const SCEV *B, const SCEV *C,
+                   const Loop *CurrentSrcLoop, const Loop *CurrentDstLoop);
 
       /// setDistance - Change a constraint to Distance.
-      void setDistance(const SCEV *D, const Loop *CurrentLoop);
+      void setDistance(const SCEV *D, const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop);
 
       /// setEmpty - Change a constraint to Empty.
       void setEmpty();
@@ -473,6 +490,10 @@ namespace llvm {
       void dump(raw_ostream &OS) const;
     };
 
+    /// Returns true if two loops are the same or they have the same upperbound
+    /// and depth
+    bool areLoopsSimilar(const Loop *SrcLoop, const Loop *DstLoop) const;
+
     /// establishNestingLevels - Examines the loop nesting of the Src and Dst
     /// instructions and establishes their shared loops. Sets the variables
     /// CommonLevels, SrcLevels, and MaxLevels.
@@ -523,10 +544,22 @@ namespace llvm {
     ///     e - 5
     ///     f - 6
     ///     g - 7 = MaxLevels
-    void establishNestingLevels(const Instruction *Src,
-                                const Instruction *Dst);
-
-    unsigned CommonLevels, SrcLevels, MaxLevels;
+    /// If ConsiderSeparateLoops is true then we also want to consider similar
+    /// seperate loops. Assume that loop nests at level c and e are similar,
+    /// meaning that they have the same upperbound and depth. Then we consider
+    /// them as a common level.
+    ///     a      - 1
+    ///     b      - 2
+    ///     <c, e> - 3 = CommonLevels
+    ///     d      - 4 = SrcLevels
+    ///     f      - 5
+    ///     g      - 6 = MaxLevels
+    /// SeparateLevels means that how many of the last common levels are
+    /// separated, which is 1 in this case.
+    void establishNestingLevels(const Instruction *Src, const Instruction *Dst,
+                                bool ConsiderSeparateLoops = false);
+
+    unsigned CommonLevels, SrcLevels, MaxLevels, SeparateLevels;
 
     /// mapSrcLoop - Given one of the loops containing the source, return
     /// its level index in our numbering scheme.
@@ -668,7 +701,8 @@ namespace llvm {
     bool strongSIVtest(const SCEV *Coeff,
                        const SCEV *SrcConst,
                        const SCEV *DstConst,
-                       const Loop *CurrentLoop,
+                       const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop,
                        unsigned Level,
                        FullDependence &Result,
                        Constraint &NewConstraint) const;
@@ -686,7 +720,8 @@ namespace llvm {
     bool weakCrossingSIVtest(const SCEV *SrcCoeff,
                              const SCEV *SrcConst,
                              const SCEV *DstConst,
-                             const Loop *CurrentLoop,
+                             const Loop *CurrentSrcLoop,
+                             const Loop *CurrentDstLoop,
                              unsigned Level,
                              FullDependence &Result,
                              Constraint &NewConstraint,
@@ -705,7 +740,8 @@ namespace llvm {
                       const SCEV *DstCoeff,
                       const SCEV *SrcConst,
                       const SCEV *DstConst,
-                      const Loop *CurrentLoop,
+                      const Loop *CurrentSrcLoop,
+                      const Loop *CurrentDstLoop,
                       unsigned Level,
                       FullDependence &Result,
                       Constraint &NewConstraint) const;
@@ -723,7 +759,8 @@ namespace llvm {
     bool weakZeroSrcSIVtest(const SCEV *DstCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
@@ -741,7 +778,8 @@ namespace llvm {
     bool weakZeroDstSIVtest(const SCEV *SrcCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index dc0ed22dbcc0b..b947e92a6375b 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -104,6 +104,7 @@ STATISTIC(GCDindependence, "GCD independence");
 STATISTIC(BanerjeeApplications, "Banerjee applications");
 STATISTIC(BanerjeeIndependence, "Banerjee independence");
 STATISTIC(BanerjeeSuccesses, "Banerjee successes");
+STATISTIC(SeparateLoopsConsidered, "Separate loops considered");
 
 static cl::opt<bool>
     Delinearize("da-delinearize", cl::init(true), cl::Hidden,
@@ -377,6 +378,13 @@ bool FullDependence::isSplitable(unsigned Level) const {
 }
 
 
+// Returns true if this level is performed across two separate loop nests.
+bool FullDependence::inSeparateLoops(unsigned Level) const {
+  assert(0 < Level && Level <= Levels && "Level out of range");
+  return DV[Level - 1].SeparateLoops;
+}
+
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo::Constraint methods
 
@@ -431,37 +439,52 @@ const SCEV *DependenceInfo::Constraint::getD() const {
 }
 
 
-// Returns the loop associated with this constraint.
-const Loop *DependenceInfo::Constraint::getAssociatedLoop() const {
+// Returns the source loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedSrcLoop() const {
+  assert((Kind == Distance || Kind == Line || Kind == Point) &&
+         "Kind should be Distance, Line, or Point");
+  return AssociatedSrcLoop;
+}
+
+
+// Returns the destination loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedDstLoop() const {
   assert((Kind == Distance || Kind == Line || Kind == Point) &&
          "Kind should be Distance, Line, or Point");
-  return AssociatedLoop;
+  return AssociatedDstLoop;
 }
 
+
 void DependenceInfo::Constraint::setPoint(const SCEV *X, const SCEV *Y,
-                                          const Loop *CurLoop) {
+                                          const Loop *CurSrcLoop,
+                                          const Loop *CurDstLoop) {
   Kind = Point;
   A = X;
   B = Y;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setLine(const SCEV *AA, const SCEV *BB,
-                                         const SCEV *CC, const Loop *CurLoop) {
+                                         const SCEV *CC, const Loop *CurSrcLoop,
+                                         const Loop *CurDstLoop) {
   Kind = Line;
   A = AA;
   B = BB;
   C = CC;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setDistance(const SCEV *D,
-                                             const Loop *CurLoop) {
+                                             const Loop *CurSrcLoop,
+                                             const Loop *CurDstLoop) {
   Kind = Distance;
   A = SE->getOne(D->getType());
   B = SE->getNegativeSCEV(A);
   C = SE->getNegativeSCEV(D);
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setEmpty() { Kind = Empty; }
@@ -608,8 +631,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
         ++DeltaSuccesses;
         return true;
       }
-      if (const SCEVConstant *CUB =
-          collectConstantUpperBound(X->getAssociatedLoop(), Prod1->getType())) {
+      if (const SCEVConstant *CUB = collectConstantUpperBound(
+              X->getAssociatedSrcLoop(), Prod1->getType())) {
         const APInt &UpperBound = CUB->getAPInt();
         LLVM_DEBUG(dbgs() << "\t\tupper bound = " << UpperBound << "\n");
         if (Xq.sgt(UpperBound) || Yq.sgt(UpperBound)) {
@@ -620,7 +643,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
       }
       X->setPoint(SE->getConstant(Xq),
                   SE->getConstant(Yq),
-                  X->getAssociatedLoop());
+                  X->getAssociatedSrcLoop(),
+                  X->getAssociatedDstLoop());
       ++DeltaSuccesses;
       return true;
     }
@@ -656,6 +680,7 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
 // For debugging purposes. Dumps a dependence to OS.
 void Dependence::dump(raw_ostream &OS) const {
   bool Splitable = false;
+  bool SeparatesStarted = false;
   if (isConfused())
     OS << "confused";
   else {
@@ -672,6 +697,10 @@ void Dependence::dump(raw_ostream &OS) const {
     unsigned Levels = getLevels();
     OS << " [";
     for (unsigned II = 1; II <= Levels; ++II) {
+      if (!SeparatesStarted && inSeparateLoops(II)) {
+        SeparatesStarted = true;
+        OS << "/ ";
+      }
       if (isSplitable(II))
         Splitable = true;
       if (isPeelFirst(II))
@@ -758,6 +787,35 @@ bool isLoadOrStore(const Instruction *I) {
   return false;
 }
 
+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
+                                     const Loop *DstLoop) const {
+  if (SrcLoop == DstLoop)
+    return true;
+
+  if (SrcLoop->getLoopDepth() != DstLoop->getLoopDepth())
+    return false;
+
+  if (!SrcLoop || !SrcLoop->getLoopLatch() || !DstLoop ||
+      !DstLoop->getLoopLatch())
+    return false;
+
+  const SCEV *SrcUB, *DstUP;
+  if (SE->hasLoopInvariantBackedgeTakenCount(SrcLoop))
+    SrcUB = SE->getBackedgeTakenCount(SrcLoop);
+  if (SE->hasLoopInvariantBackedgeTakenCount(DstLoop))
+    DstUP = SE->getBackedgeTakenCount(DstLoop);
+
+  if (SrcUB == nullptr || DstUP == nullptr)
+    return false;
+
+  if (SE->isKnownPredicate(ICmpInst::ICMP_EQ, SrcUB, DstUP))
+    return true;
+
+  return false;
+}
+
 
 // Examines the loop nesting of the Src and Dst
 // instructions and establishes their shared loops. Sets the variables
@@ -809,8 +867,21 @@ bool isLoadOrStore(const Instruction *I) {
 //     e - 5
 //     f - 6
 //     g - 7 = MaxLevels
+// If ConsiderSeparateLoops is true then we also want to consider similar
+// seperate loops. Assume that loop nests at level c and e are similar,
+// meaning that they have the same tripcount and depth. Then we consider
+// them as a common level.
+//     a      - 1
+//     b      - 2
+//     <c, e> - 3 = CommonLevels
+//     d      - 4 = SrcLevels
+//     f      - 5
+//     g      - 6 = MaxLevels
+// SeparateLevels means that how many of the last common levels are
+// separated, which is 1 in this case.
 void DependenceInfo::establishNestingLevels(const Instruction *Src,
-                                            const Instruction *Dst) {
+                                            const Instruction *Dst,
+                                            bool ConsiderSeparateLoops) {
   const BasicBlock *SrcBlock = Src->getParent();
   const BasicBlock *DstBlock = Dst->getParent();
   unsigned SrcLevel = LI->getLoopDepth(SrcBlock);
@@ -819,6 +890,7 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
   const Loop *DstLoop = LI->getLoopFor(DstBlock);
   SrcLevels = SrcLevel;
   MaxLevels = SrcLevel + DstLevel;
+  SeparateLevels = 0;
   while (SrcLevel > DstLevel) {
     SrcLoop = SrcLoop->getParentLoop();
     SrcLevel--;
@@ -827,11 +899,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
     DstLoop = DstLoop->getParentLoop();
     DstLevel--;
   }
-  while (SrcLoop != DstLoop) {
-    SrcLoop = SrcLoop->getParentLoop();
-    DstLoop = DstLoop->getParentLoop();
-    SrcLevel--;
-  }
+  if (ConsiderSeparateLoops) {
+    while (!areLoopsSimilar(SrcLoop, DstLoop)) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SeparateLevels++;
+    }
+  } else
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
   CommonLevels = SrcLevel;
   MaxLevels -= CommonLevels;
 }
@@ -1223,8 +1307,9 @@ bool DependenceInfo::testZIV(const SCEV *Src, const SCEV *Dst,
 //
 // Return true if dependence disproved.
 bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
-                                   const SCEV *DstConst, const Loop *CurLoop,
-                                   unsigned Level, FullDependence &Result,
+                                   const SCEV *DstConst, const Loop *CurSrcLoop,
+                                   const Loop *CurDstLoop, unsigned Level,
+                                   FullDependence &Result,
                                    Constraint &NewConstraint) const {
   LLVM_DEBUG(dbgs() << "\tStrong SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff);
@@ -1242,7 +1327,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   LLVM_DEBUG(dbgs() << ", " << *Delta->getType() << "\n");
 
   // check that |Delta| < iteration count
-  if (const SCEV *UpperBound = collectUpperBound(CurLoop, Delta->getType())) {
+  if (const SCEV *UpperBound =
+          collectUpperBound(CurSrcLoop, Delta->getType())) {
     LLVM_DEBUG(dbgs() << "\t    UpperBound = " << *UpperBound);
     LLVM_DEBUG(dbgs() << ", " << *UpperBound->getType() << "\n");
     const SCEV *AbsDelta =
@@ -1275,7 +1361,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
       return true;
     }
     Result.DV[Level].Distance = SE->getConstant(Distance);
-    NewConstraint.setDistance(SE->getConstant(Distance), CurLoop);
+    NewConstraint.setDistance(SE->getConstant(Distance), CurSrcLoop,
+                              CurDstLoop);
     if (Distance.sgt(0))
       Result.DV[Level].Direction &= Dependence::DVEntry::LT;
     else if (Distance.slt(0))
@@ -1287,7 +1374,7 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   else if (Delta->isZero()) {
     // since 0/X == 0
     Result.DV[Level].Distance = Delta;
-    NewConstraint.setDistance(Delta, CurLoop);
+    NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     Result.DV[Level].Direction &= Dependence::DVEntry::EQ;
     ++StrongSIVsuccesses;
   }
@@ -1295,13 +1382,12 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
     if (Coeff->isOne()) {
       LLVM_DEBUG(dbgs() << "\t    Distance = " << *Delta << "\n");
       Result.DV[Level].Distance = Delta; // since X/1 == X
-      NewConstraint.setDistance(Delta, CurLoop);
+      NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     }
     else {
       Result.Consistent = false;
-      NewConstraint.setLine(Coeff,
-                            SE->getNegativeSCEV(Coeff),
-                            SE->getNegativeSCEV(Delta), CurLoop);
+      NewConstraint.setLine(Coeff, SE->getNegativeSCEV(Coeff),
+                            SE->getNegativeSCEV(Delta), CurSrcLoop, CurDstLoop);
     }
 
     // maybe we can get a useful direction
@@ -1360,8 +1446,9 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
 // Return true if dependence disproved.
 bool DependenceInfo::weakCrossingSIVtest(
     const SCEV *Coeff, const SCEV *SrcConst, const SCEV *DstConst,
-    const Loop *CurLoop, unsigned Level, FullDependence &Result,
-    Constraint &NewConstraint, const SCEV *&SplitIter) const {
+    const Loop *CurSrcLoop, const Loop *CurDstLoop, unsigned Level,
+    FullDependence &Result, Constraint &NewConstraint,
+    const SCEV *&SplitIter) const {
   LLVM_DEBUG(dbgs() << "\tWeak-Crossing SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff << "\n");
   LLVM_DEBUG(dbgs() << "\t    SrcConst = " << *SrcConst << "\n");
@@ -1372,7 +1459,7 @@ bool DependenceInfo::weakCrossingSIVtest(
   Result.Consistent = false;
   const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
   LLVM_DEBUG(dbgs() << "\t    Delta = " << *Delta << "\n");
-  NewConstraint.setLine(Coeff, Coeff, Delta, CurLoop);
+  NewConstraint.setLine(Coeff, Coeff, Delta, CurSrcLoop, CurDstLoop);
   if (Delta->isZero()) {
     Result.DV[Level].Direction &= ~Dependence::DVEntry::LT;
     Result.DV[Level].Direction &= ~Dependence::DVEntry::GT;
@@ -1420,7 +1507,8 @@ bool DependenceInfo::weakCrossingSIVtest(
 
   // We're certain that Delta > 0 and ConstCoeff > 0.
   // Check Delta/(2*ConstCoeff) against ...
[truncated]

github-actions · 2025-02-25T22:46:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Meinersbur

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as

for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;

loop fusion would optimally be

for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}

or after LoopBoundSplitPass etc.

A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;

i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

llvm/include/llvm/Analysis/DependenceAnalysis.h

1997alireza · 2025-03-10T21:03:00Z

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Good point! To avoid the confusion among the original distances or directions and the new information we provide by this patch, now I will provide them in a different array. I added a new array of DVEntry named DVSeparate to include the DVEntry for separate levels and keep the DV array unchanged.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as
for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;
loop fusion would optimally be
for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}
or after LoopBoundSplitPass etc.
A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;
i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

Loop fusion or or any other optimization passes that want to use the analysis results can decide how to use it. For the case you mentioned, loop fusion can peel the iterations first to make the trip counts the same and then apply DA.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

We prefer the DA to provide information across two loops, enabling loop fusion to identify dependencies before applying the optimization.

Meinersbur

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Meinersbur · 2025-03-11T14:23:46Z

llvm/test/Analysis/DependenceAnalysis/SIVSeparateLoops.ll

+;;  for (long int i = 0; i < n; i++) {
+;;    for (long int j = 0; j < n; j++) {
+;;      for (long int k = 0; k < n; k++) {
+;;        A[i][j][k] = i;
+;;      }
+;;      for (long int k = 0; k < n; k++) {
+;;        *B++ = A[i + 3][j + 2][k + 1];


This is the same case as @p2 from PreliminaryNoValidityCheckFixedSize.ll ?

Is replaced with another test case in which there are two separate levels now.

Meinersbur · 2025-03-11T14:49:28Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+    // MIV is not handled yet on separate loops; check if there is any MIV test
+    for (unsigned P = 0; P < Pairs; ++P) {
+      Pair[P].Loops.resize(MaxLevels + 1);
+      auto classification = classifyPair(


[Style] Local variables start with capital letters; No Almost-Always-Auto

Meinersbur · 2025-03-11T14:55:34Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+//     a      - 1
+//     b      - 2 = CommonLevels
+//     <c, e> - 3 : A SeparateLevel
+//     d      - 4 = SrcLevels
+//     f      - 6
+//     g      - 7 = MaxLevels


What happened with level 5?

There are lots of references to the loop numbering scheme other than SIV that don't account for the change, e.g. getSplitIteration.

Now I updated my changes such that it does not affect the loop numbering scheme anymore. Now this api will only provide an additional info which is SeparateLevels.

Meinersbur · 2025-03-11T14:56:42Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+      auto classification = classifyPair(
+          Pair[P].Src, LI->getLoopFor(Src->getParent()), Pair[P].Dst,
+          LI->getLoopFor(Dst->getParent()), Pair[P].Loops);
+      if (classification == Subscript::MIV) {


MIV gets it special treatment here, with reverting to old scheme, but what about RDIV, ZIV?

classifyPair itself relies on the old counting scheme, and will classify e.g. what was RDIV before to SIV.

There is also the case that with non-fused interpretation, we actually might be able to resolve dependencies, but do not with ConsiderSeparateLoops. What ensures that does not happen?

SIV, RDIV or ZIV are able to handle pairs from different loops. RDIV and ZIV could do that before and SIV is able to handle it by this patch.

I cannot see a case where we may miss resolving any dependencies by considering separate loops optimization. Could you please clarify on that?

Meinersbur · 2025-03-11T15:00:35Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+// the separate level extraction at the end of the depends api we have
+//     a      - 1
+//     b      - 2 = CommonLevels
+//     <c, e> - 3 : A SeparateLevel


Suggested change

// <c, e> - 3 : A SeparateLevel

// <c, e> - 3 = SeparateLevels

This introduced 3 different meanings of levels and it is not clear what interpretation at what point it the "correct" one. Better integrate into the text above, where SeparateLevels is zero unless ConsiderSeparateLoops.

Fixed the confusion as mentioned in another comment above.

Meinersbur · 2025-03-11T15:14:44Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+    assert(Levels < Level && Level <= Levels + SeparateLevels &&
+           "Separate level out of range");
+    return DVSeparate[Level - Levels - 1].Direction;


Why not introduce a helper function that returns the correct index into DVSeparate so you don't need the if/else here?

Thanks for the comment. Added.

1997alireza · 2025-04-22T03:17:43Z

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Thanks Michael for your time and comments. I tried to resolve them and provide a more clean patch along with changes applied to the loop fusion pass to use the info provided by DA. Please take a look and let me know what you think. I apologize for the delay, I was caught up with some other projects.

Meinersbur

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

Meinersbur · 2025-04-30T11:31:15Z

llvm/lib/Analysis/DependenceAnalysis.cpp

This goes from innermost loops to outer ones, and stop at the innermost unequal one. But what if the second innermost is equal just the outermost is not:

for (int i = 0; i < 42; ++i) { // fusable for (int j = 0; j < 21; ++j) { // not fusable } } for (int i = 0; i < 42; ++i) { // fusable for (int j = 0; j < 42; ++j) { } }

I think the i-loop should be comon levels, just the j-loops not.

Good catch! It is fixed now in this code fragment.

Meinersbur · 2025-04-30T11:32:56Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

Not messing up with the level count is a good thing.

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

Meinersbur · 2025-04-30T11:41:07Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

There is nothing in the header file that even explains what a "separate level" actually is. Rather than having this sentence not saying anything here, consider central documentation whati t means, e.g. as class comment.

Could contain:

Number of loops inside common loops that are "similar" (+definition of similar)

Have different llvm::Loop objects but can be interpreted as a single fused loop with Separate=true

Comment is added in the header file.

Meinersbur · 2025-04-30T11:45:34Z

llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll

Rather than merging two DA interpretations into the same output, have you considered keeping the separate for better understandability. E.g.:

da analyze - flow [-3 -2]! / assuming 1 fused loop: [-3 -2 -1]!

Because of how FileCheck works, this test wouldn't even needed to be changes.

1997alireza · 2025-05-30T19:46:28Z

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Meinersbur · 2025-06-03T13:38:27Z

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

1997alireza · 2025-06-30T16:19:01Z

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

Reverted the merged PR. Now the loop fusion changes are in another PR #146383.

Meinersbur

Some bikeshedding left but otherwise the patch looks good

Meinersbur · 2025-07-02T10:30:37Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

+    }
+
+    /// inSeparateLoops - Returns true if this level is a separate level, i.e.,
+    /// performed across two separate loop nests.


Suggested change

/// performed across two separate loop nests.

/// performed across two separate loop nests that a treated like a single fused loop.

llvm/include/llvm/Analysis/DependenceAnalysis.h

Meinersbur · 2025-07-02T10:51:43Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

amehsan · 2025-09-16T15:51:43Z

(Sorry to bother you with my nitpicky comments.)

Let me clarify my point, using the same example.
for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
  for (int j = 0; j < N; j++)
    A[i][j] = 43;
}
Your argument seems correct if we are analyzing the C language (or other high-level languages). But our target is LLVM IR which does not have explicit loop constructs such as for or while statements. Thus, loop-releated information (e.g., backedge-taken count) must be inferred in some way. Loop guards can be utilized for this purpose, for instance, a predicate like N > 0 may be used to compute the backedge-taken count of the loop. From this perspective, I think it can be said that DA makes use of loop guards. The condition i < 10 in the above example would also be treated as a loop guard. I think they are both equally loop guards and it should be impossible to distinguish information related to backedge-taken count from any other information. Therefore it's safer to think that DA might be indirectly using i < 10 (although I don't know if that condition actually provides useful information).

What is relevant to DA is starting value and final value of induction variable. And the patch makes sure that those values satisfy certain conditions. That is enough to ensure correctness of the results.

amehsan · 2025-09-16T15:53:01Z

@1997alireza I meant to start a detailed review of the code, but I have been very busy. I will try to start it today or tomorrow.

kasuga-fj · 2025-09-16T18:19:48Z

I believe I have mentioned this multiple times, but the key point here is that it is checked that the two inner loops have the same backedge-taken count. How this value is calculated is irrelevant.

I apologize if I made you feel like you had to explain the same thing multiple times. However, your last comment made me realize the following new points.

How this value is calculated is irrelevant.

This is exactly the part where I cannot be confident. I'm not sure whether it really doesn’t matter how the backedge-taken counts are computed. But maybe I'm just overthinking (which is why I called myself "nitpicky" in the previous comment), and this might actually be irrelevant.

So, sorry to have made you explain at such length, but I personally still cannot feel fully confident about correctness. That said, I’ll refrain from further comments on correctness. If someone else reviews this discussion and believes it is correct, they will probably approve this PR. Thanks again for taking the time to explain it.

kasuga-fj

Let me leave other nitpicky comments.

kasuga-fj · 2025-09-16T18:22:31Z

llvm/test/Analysis/DependenceAnalysis/Fusable.ll

@@ -0,0 +1,345 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \


I think -aa-pipeline=... is unnecessary.

kasuga-fj · 2025-09-16T18:27:16Z

llvm/test/Analysis/DependenceAnalysis/Fusable.ll

+;;      }
+;;      for (long int k = 1; k < n+1; k++) {
+;;        for (long int l = 0; l < n; l++)
+;;          *B++ = A[i + 4][j + 3][k + 2][l + 1];


I guess this comes from another test, but is *B++ really needed here? If not, I think it would be better to replace this with a simpler statement like A[i + 4][j + 3][k + 2][l + 1] = l;

Replaced and removed B.

kasuga-fj · 2025-09-16T18:34:22Z

llvm/test/Analysis/DependenceAnalysis/Fusable.ll

+
+for.body9:                                        ; preds = %for.body9.preheader, %for.body9
+  %l.02 = phi i64 [ %inc11, %for.body9 ], [ 0, %for.body9.preheader ]
+  %arrayidx12 = getelementptr inbounds [100 x [100 x [100 x i64]]], ptr %A, i64 %i.013, i64 %j.09, i64 %k.07, i64 %l.02


You may already know this, but the type information of getelementptr is planned to be removed. That is, the delinearization here will probably fail in the future, which will make the test fail. To minimize the maintenance cost, I think it's better to replace %n with constant value.

%n is removed.

kasuga-fj · 2025-09-16T18:37:06Z

llvm/test/Analysis/DependenceAnalysis/Fusable.ll

+;; No information with fused loops assumption is gnerated as MIV test is not
+;; supported on separate loops yet. 
+;;  for (long int i = 0; i < n; i++) {
+;;    for (long int j = 0; j < n; j++) {
+;;      for (long int k = 0; k < n; k++) {
+;;        for (long int l = 0; l < n; l++)
+;;          A[i][j][k][l] = i;
+;;      }
+;;      for (long int k = 1; k < n+1; k++) {
+;;        for (long int l = 0; l < n; l++)
+;;          *B++ = A[i + 4][j + 3][k + 2][k + l];
+
+define void @NonFusable0(i64 %n, ptr %A, ptr %B) nounwind uwtable ssp {


Could you please add another case where areLoopsSimilar fails due to the difference between backedge-taken counts?

Added another case.

Another case added.

kasuga-fj · 2025-09-16T19:25:22Z

The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original:
for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }
}

// Or, written in a form closer to LLVM IR
for (int i = 0; i < M; i++) {
  if (i < 10 && N > 0) {
    int j = 0;
    do {
      A[i+10][j] = 42;
      A[i][j] = 43;
      j++;
    } while (j < N)
  }
}
I've been thinking about why this happens, and I've come to feel that the root cause is trying to map different loops to the same Level. If this patch is modified so that CommonLevels is not changed due to fusability, then it may also address my concern.
I don't see the point of the example. If the two loops have different backedge taken counts then the new logic is not used. If they have the same iteration count then from DA point of view we have something like this:
for (int i = 0; i < M; i++) {
   <There might be some code here>
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }
} >
The key point here is that the set of instances of A[i+10][j] and A[i][j] that is considered by DA is a superset (or potentially equal) to the set of instances that will be executed at run time. So DA result is always correct, but it might be sometimes conservative.

Just to clarify, what I wanted to emphasize was: what happens if the <there might be some code here> part contains a branch instruction? At least to me, it doesn’t look like a "superset".

(What I’ve been concerned with is, for example, how the nsw/nuw flags on an SCEVAddRecExpr are deduced. I guess it uses the loop-guards...)

amehsan · 2025-09-16T19:53:07Z

I believe I have mentioned this multiple times, but the key point here is that it is checked that the two inner loops have the same backedge-taken count. How this value is calculated is irrelevant.

I apologize if I made you feel like you had to explain the same thing multiple times. However, your last comment made me realize the following new points.

How this value is calculated is irrelevant.

This is exactly the part where I cannot be confident. I'm not sure whether it really doesn’t matter how the backedge-taken counts are computed. But maybe I'm just overthinking (which is why I called myself "nitpicky" in the previous comment), and this might actually be irrelevant.

So, sorry to have made you explain at such length, but I personally still cannot feel fully confident about correctness. That said, I’ll refrain from further comments on correctness. If someone else reviews this discussion and believes it is correct, they will probably approve this PR. Thanks again for taking the time to explain it.

Thank you and I appreciate your attention to the quality of the open source compiler.

amehsan · 2025-09-16T21:21:13Z

The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original:
for (int i = 0; i < M; i++) {
   <There might be some code here>
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }
} >
The key point here is that the set of instances of A[i+10][j] and A[i][j] that is considered by DA is a superset (or potentially equal) to the set of instances that will be executed at run time. So DA result is always correct, but it might be sometimes conservative.
Just to clarify, what I wanted to emphasize was: what happens if the <there might be some code here> part contains a branch instruction? At least to me, it doesn’t look like a "superset".

(What I’ve been concerned with is, for example, how the nsw/nuw flags on an SCEVAddRecExpr are deduced. I guess it uses the loop-guards...)

LoopInfo only recongizes natural loops. So a branch there cannot have a target that is somewhere in the middle of the loop.

Putting that case aside, when you have a branch there, you have one of the two cases: The loop is executed or it is not. DA doesn't care about the case that loop is NOT executed. DA assumes that the loop is executed. (Same thing can be said if the two loops are separate. DA assumes both loops are executed. If one or both of them are not executed then DA might be conservative. But it is still correct).

The induction variable j can go take any integer from 0 to N. Having or not having a branch there doesn't change this fact. No matter what code you have in that region (or after the inner loop), that is the range that we have for 'j'. And DA considers that.

I really suggest you make a real example that you can compile and show us that something strange happens. I don't really know what that would be, but it will really help the conversation (and save huge amount of time ) if you have a real example. If you have difficulty creating an example to demostrate your point, may be the point is not valid?

kasuga-fj · 2025-09-17T01:05:56Z

What I was concerned about is, for example, the following case:

godbolt: https://godbolt.org/z/aTjxa44qx

; for (i = 0; i < 9223372036854775806; i++) {
;   if (i < 2147483640)
;     for (j = 0; j < 2147483640; j++)
;       a[i + j * 4294967296] = 0;
;
;   for (j = 0; j < 2147483640; j++)
;     a[j * 2] = 0;
; }
;
define void @f(ptr %a) {
entry:
  br label %loop.i.header

loop.i.header:
  %i = phi i64 [ 0 , %entry ], [ %i.next, %loop.i.latch ]
  br label %loop.j.0.pr

loop.j.0.pr:
  %guard.j.0 = icmp slt i64 %i, 2147483640
  br i1 %guard.j.0, label %loop.j.0, label %loop.j.1

loop.j.0:
  %j.0 = phi i64 [ 0, %loop.j.0.pr ], [ %j.0.next, %loop.j.0 ]
  %val.0 = phi i64 [ %i, %loop.j.0.pr ], [ %val.0.next, %loop.j.0 ]
  %j.0.next = add nsw i64 %j.0, 1
  %idx.0 = getelementptr inbounds i8, ptr %a, i64 %val.0
  store i8 0, ptr %idx.0
  %val.0.next = add nsw i64 %val.0, 4294967296
  %exitcond.j.0 = icmp eq i64 %j.0.next, 2147483640
  br i1 %exitcond.j.0, label %loop.j.1, label %loop.j.0

loop.j.1:
  %j.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %j.1.next, %loop.j.1 ]
  %val.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %val.1.next, %loop.j.1 ]
  %idx.1 = getelementptr inbounds i8, ptr %a, i64 %val.1
  store i8 0, ptr %idx.1
  %j.1.next = add nsw i64 %j.1, 1
  %val.1.next = add nsw i64 %val.1, 2
  %exitcond.j.1 = icmp eq i64 %j.1.next, 2147483640
  br i1 %exitcond.j.1, label %loop.i.latch, label %loop.j.1

loop.i.latch:
  %i.next = add nsw i64 %i, 1
  %exitcond.i = icmp eq i64 %i.next, 9223372036854775806
  br i1 %exitcond.i, label %exit, label %loop.i.header

exit:
  ret void
}

The result of DA:

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i8 0, ptr %idx.0, align 1 --> Dst:  store i8 0, ptr %idx.0, align 1
  da analyze - none!
Src:  store i8 0, ptr %idx.0, align 1 --> Dst:  store i8 0, ptr %idx.1, align 1
  da analyze - none!
Src:  store i8 0, ptr %idx.1, align 1 --> Dst:  store i8 0, ptr %idx.1, align 1
  da analyze - consistent output [S 0]!
Compiler returned: 0

Here is the SCEV for %val.0:

 %val.0 = phi i64 [ %i, %loop.j.0.pr ], [ %val.0.next, %loop.j.0 ]
  -->  {{0,+,1}<nuw><nsw><%loop.i.header>,+,4294967296}<nuw><nsw><%loop.j.0>

The result of DA already seems incorrect. I guess the root cause is that DA currently doesn't take offset (%val.0 in this case) wrapping into account. I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.
Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases? In this specific case, probably it would be classified to MIV so it shouldn't be a problem. But what about other similar cases? Is it impossible for something like this to happen? Or am I missing something? What I personally would like to avoid is, fusable assumption leads to the problem becoming worse.

(While working on this case, I realized that this issue is probably not limited to fusion. Perhaps the analysis should always bail out when such loop guards are present...)

Anyway, I really will keep quiet this time about my concern.

EDIT: The more I look at it, the more it seems like this example points to a more fundamental bug in the DA, rather than this patch. I sincerely hope I'm misunderstanding something...

amehsan · 2025-09-17T13:06:37Z

EDIT: The more I look at it, the more it seems like this example points to a more fundamental bug in the DA, rather than this patch. I sincerely hope I'm misunderstanding something...

Yes, my first impression is that this example is not related to this patch. I will look into it in more depth and get back to you. Thanks for providing an example. This helps to have a much better conversation. (Even if unrelated to the patch, it is good to know more about the issues in DA).

amehsan · 2025-09-17T18:49:28Z

llvm/test/Analysis/DependenceAnalysis/Fusable.ll

@@ -0,0 +1,345 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5


use lowercase for filename: fusable.ll (All test files seem to follow that style).

DA test files starts with uppercase letters.

amehsan · 2025-09-17T18:51:38Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+    dumpImp(OS);
+    unsigned FusableLevels = getFusableLevels();
+    if (FusableLevels > 0) {
+      OS << "! / assuming " << FusableLevels << " fused loop(s): ";


shouldn't it be assuming " << FusableLevels << " loop levels fused: "; ?

amehsan

@1997alireza I added a couple of minor cosmetic comments. Will look into it a bit more later today or tomorrow.

amehsan · 2025-09-17T18:53:20Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

+  unsigned getFusableLevels() const override { return FusableLevels; }
+
+  /// getDVEntry - Returns the DV entry associated with a regular or a
+  /// fusable level


add a period at the end

amehsan · 2025-09-17T19:15:04Z

I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.

Need to do more investigation. But checking the loop guard in DA is almost certainly incorrect. One evidence is that the following loop with no loop guard has the same problem

void foo2 (int *a ) {

  unsigned long long i;
  unsigned long long j;

  for (i = 0; i < 9223372036854775806; i++) {
    for (j = 0; j < 2147483640; j++) {
      a[i + j * 4294967296] = 0;
      a[j * 2] = 0;
    }
  }
}

Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases?

It is a completely orthogonal issue. The root cause of this particular bug might be in SCEV or in DA (We are looking into it. Please wait). But anyways, not related to this patch.

amehsan · 2025-09-17T22:53:52Z

One evidence is that the following loop

I compiled this with -O3 -fno-vectorize. Let me know if you have a problem reproducing it.

amehsan

the code LGTM assuming the issues raised so far is addressed. I will give an explicit approval after comments are addressed. I may look more and add more comments though (but I don't expect anything fundamental)

amehsan · 2025-09-18T15:40:58Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  CommonLevels += FusableLevels;
+  MaxLevels -= FusableLevels;
+  if (FusableLevels > 0) {
+    // Not all tests are handled yet over fusable loops
+    // Revoke if there are any tests other than ZIV, SIV or RDIV
+    for (unsigned P = 0; P < Pairs; ++P) {
+      SmallBitVector Loops;
+      Subscript::ClassificationKind TestClass =
+          classifyPair(Pair[P].Src, LI->getLoopFor(Src->getParent()),
+                       Pair[P].Dst, LI->getLoopFor(Dst->getParent()), Loops);
+
+      if (TestClass != Subscript::ZIV && TestClass != Subscript::SIV &&
+          TestClass != Subscript::RDIV) {
+        // Revert the levels to not consider the fusable levels
+        CommonLevels -= FusableLevels;
+        MaxLevels += FusableLevels;
+        FusableLevels = 0;
+        break;
+      }
+    }
+  }


you don't need to increment commonLevels and then decrement it. Can you write the code so that it is only incremented when it should be?

classifyPair uses CommonLevels and MaxLevels. To be able to identify the dependency types considering the fusable levels, the value of levels must be modified before calling classifyPair.

amehsan · 2025-09-18T15:43:44Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  if (FusableLevels > 0) {
+    Result.Consistent = false;
+    FusableLoopsConsidered++;
+  }
+


Other fields of Result is updated later in line 4135. Also there is no use of FusableLoopsConsidered between here and there. Is there a reason that this code is separate from the other conditional on FusableLevels > 0 ?

Relocated the Result update line.

amehsan · 2025-09-18T15:47:37Z

llvm/lib/Analysis/DependenceAnalysis.cpp


+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,


you have a notion of FusableLoops everywhere in the code. What are the conditions for Fusable? I believe they are part of the conditions checked here. So

Separate conditions for fusability and name them properly in a different function.

Similar is vague and general. Use a more specific name.

Fuseable is not a good name. We need to find a better name that better reflects the conditions that you check.

I updated the terminology by defining a new term SIDLevels, which indicates the levels with the Same Iteration and Depth.
All required conditions to consider two loops as SID are in function isSID.

Thanks. You still need to change your commit message. Write your commit message so that someone who doesn't know anything about your patch understands it. (e.g. Don't use acronyms without first introducing them in the commit messag).

Also change "Same Iteration" to "Same Bounds" or "Same Iteration Space". Make sure the acronym is introduced explicitly and clearly in the comments of the code as well.

Modified the commit message as well.
I changed the acronym to "SameSD" standing for same iteration space and depth.

SameSD stands for Same iteration Space and Depth indicating the loops that are in the same nesting depth and have the same backedge count. When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.

kasuga-fj · 2025-09-19T18:47:56Z

I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.

Need to do more investigation. But checking the loop guard in DA is almost certainly incorrect. One evidence is that the following loop with no loop guard has the same problem
void foo2 (int *a ) {

  unsigned long long i;
  unsigned long long j;

  for (i = 0; i < 9223372036854775806; i++) {
    for (j = 0; j < 2147483640; j++) {
      a[i + j * 4294967296] = 0;
      a[j * 2] = 0;
    }
  }
}
Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases?

It is a completely orthogonal issue. The root cause of this particular bug might be in SCEV or in DA (We are looking into it. Please wait). But anyways, not related to this patch.

(In the hope of saving your time)

It is known that DA has several issues (some of which you may find in GitHub issues), such as the following:

The interpretations of signed and unsigned are mixed up.
It does not take AddRec wrapping into account.
Overflows occur during the analysis.¹
- The overflow I’m referring to here is not in the original program, but in the arithmetic operations inside DA.

These issues violate the assumptions of the underlying algorithm, which can lead to incorrect results. Your example foo2 seems to be one such case (though I’m not exactly sure what happens where). I’m trying to address this problem in #154527 by proving that signed-wrap does not occur when nsw is attached to an AddRec. The example I raised (with a loop guard if (i < 2147483640)) is slightly different from yours, since the AddRec corresponding to i + j * 4294967296 has the nsw flag. However, it turns out that checking this flag alone is not sufficient. I’m almost certain now that loop guards need to be checked somewhere in DA, regardless of whether the loops can be fused or not...

I apologize for my bad communication (the more I looked into DA, the more bugs I found, so I became overly nervous...). However, I found the discussion very meaningful, and I sincerely appreciate it. Thank you.

Because of these circumstances, for example, overestimating backedge-taken count might seem conservative (at least theoretically), but in fact it could lead to incorrect outcomes in the current DA, I think. ↩

kasuga-fj

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

amehsan

Thanks. LGTM as well.

amehsan · 2025-09-19T21:05:17Z

The overflow I’m referring to here is not in the original program, but in the arithmetic operations inside DA.

These issues violate the assumptions of the underlying algorithm, which can lead to incorrect results. Your example foo2 seems to be one such case (though I’m not exactly sure what happens where). I’m trying to address this problem in #154527 by proving that signed-wrap does not occur when nsw is attached to an AddRec. The example I raised (with a loop guard if (i < 2147483640)) is slightly different from yours, since the AddRec corresponding to i + j * 4294967296 has the nsw flag. However, it turns out that checking this flag alone is not sufficient. I’m almost certain now that loop guards need to be checked somewhere in DA, regardless of whether the loops can be fused or not...

No worries. Let's continue the work and we can also help to improve the quality of DA. I didn't know about your issue, and I opened a new one. We can close it if it is a duplicate.

We can continue the discussion on the issue pages. But I strongly disagree with checking the loop guards. I will add a more detailed comment on one of the issue pages.

fhahn

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

FWIW I agree, increasing the complexity before fixing the soundness problem makes it even more difficult to enable any pass depending on DA

amehsan · 2025-09-21T11:37:56Z

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

FWIW I agree, increasing the complexity before fixing the soundness problem makes it even more difficult to enable any pass depending on DA

I suggest this kind of comment is made when RFC is posted or in the beginning of code review. I think it is also important that development is not a frustrating experience for the people invovled.

But going forward, we are OK with focusing on correctness issues of DA. We are also going to help in that direction.

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Feb 25, 2025

1997alireza force-pushed the da-siv-separate-loops branch from 68d430c to f840335 Compare February 25, 2025 22:50

Meinersbur reviewed Feb 26, 2025

View reviewed changes

llvm/include/llvm/Analysis/DependenceAnalysis.h Outdated Show resolved Hide resolved

1997alireza force-pushed the da-siv-separate-loops branch from f840335 to 3cfd2e0 Compare March 10, 2025 21:01

Meinersbur reviewed Mar 11, 2025

View reviewed changes

1997alireza force-pushed the da-siv-separate-loops branch from 3cfd2e0 to 9d511f5 Compare April 22, 2025 03:00

llvmbot added the llvm:transforms label Apr 22, 2025

1997alireza force-pushed the da-siv-separate-loops branch from 9d511f5 to 8e263a3 Compare April 22, 2025 03:06

1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from b3df426 to 4420416 Compare April 23, 2025 15:24

CongzheUalberta requested review from fhahn and sebpop April 25, 2025 20:20

Meinersbur reviewed Apr 30, 2025

View reviewed changes

1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from 17552ab to 21ef00f Compare May 30, 2025 18:42

1997alireza force-pushed the da-siv-separate-loops branch from 7466249 to 21ef00f Compare June 11, 2025 15:22

1997alireza requested a review from Meinersbur June 23, 2025 15:42

1997alireza mentioned this pull request Jun 30, 2025

[LoopFusion] Detecting legal dependencies for fusion using DA info #146383

Merged

Meinersbur reviewed Jul 2, 2025

View reviewed changes

kasuga-fj reviewed Sep 16, 2025

View reviewed changes

amehsan reviewed Sep 17, 2025

View reviewed changes

amehsan reviewed Sep 18, 2025

View reviewed changes

1997alireza force-pushed the da-siv-separate-loops branch 3 times, most recently from 81d0300 to b06447f Compare September 19, 2025 16:41

1997alireza force-pushed the da-siv-separate-loops branch from b06447f to b41a42f Compare September 19, 2025 17:17

kasuga-fj approved these changes Sep 19, 2025

View reviewed changes

amehsan mentioned this pull request Sep 19, 2025

[DA] Incorrect DA results likely due to mishandling of signed/unsigned comparisons #159846

Open

amehsan approved these changes Sep 19, 2025

View reviewed changes

CongzheUalberta merged commit b4a17b1 into llvm:main Sep 19, 2025
9 checks passed

fhahn reviewed Sep 20, 2025

View reviewed changes

	// <c, e> - 3 : A SeparateLevel
	// <c, e> - 3 = SeparateLevels

	/// performed across two separate loop nests.
	/// performed across two separate loop nests that a treated like a single fused loop.

		@@ -0,0 +1,345 @@
		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
		; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \

[DependenceAnalysis] Extending SIV to handle fusable loops #128782

[DependenceAnalysis] Extending SIV to handle fusable loops #128782

Uh oh!

Conversation

1997alireza commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Meinersbur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

1997alireza commented Mar 10, 2025

Uh oh!

Meinersbur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1997alireza commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Meinersbur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1997alireza commented May 30, 2025

Uh oh!

Meinersbur commented Jun 3, 2025

Uh oh!

1997alireza commented Jun 30, 2025

Uh oh!

Meinersbur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1997alireza commented Feb 25, 2025 •

edited

Loading

llvmbot commented Feb 25, 2025 •

edited

Loading

github-actions bot commented Feb 25, 2025 •

edited

Loading

Meinersbur left a comment •

edited

Loading

1997alireza commented Apr 22, 2025 •

edited

Loading

kasuga-fj commented Sep 16, 2025 •

edited

Loading

kasuga-fj commented Sep 16, 2025 •

edited

Loading