Skip to content

Conversation

1997alireza
Copy link
Contributor

@1997alireza 1997alireza commented Feb 25, 2025

When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Feb 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Alireza Torabian (1997alireza)

Changes

When there is a dependency between two memory instructions in separate loops, SIV will be able to test them and compute the direction and the distance of the dependency.


Patch is 48.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128782.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/DependenceAnalysis.h (+55-17)
  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+222-92)
  • (modified) llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll (+1-1)
  • (added) llvm/test/Analysis/DependenceAnalysis/SIVSeparateLoops.ll (+209)
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 426ac757b4b0d..8e86c091c60a2 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -97,10 +97,11 @@ namespace llvm {
       bool PeelFirst : 1; // Peeling the first iteration will break dependence.
       bool PeelLast  : 1; // Peeling the last iteration will break the dependence.
       bool Splitable : 1; // Splitting the loop will break dependence.
+      bool SeparateLoops : 1; // Is performed across two separate loop nests.
       const SCEV *Distance = nullptr; // NULL implies no distance available.
       DVEntry()
           : Direction(ALL), Scalar(true), PeelFirst(false), PeelLast(false),
-            Splitable(false) {}
+            Splitable(false), SeparateLoops(false) {}
     };
 
     /// getSrc - Returns the source instruction for this dependence.
@@ -182,6 +183,10 @@ namespace llvm {
     /// the dependence.
     virtual bool isSplitable(unsigned Level) const { return false; }
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    virtual bool inSeparateLoops(unsigned Level) const { return false; }
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -275,6 +280,10 @@ namespace llvm {
     /// the dependence.
     bool isSplitable(unsigned Level) const override;
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    bool inSeparateLoops(unsigned Level) const override;
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -405,7 +414,8 @@ namespace llvm {
       const SCEV *A;
       const SCEV *B;
       const SCEV *C;
-      const Loop *AssociatedLoop;
+      const Loop *AssociatedSrcLoop;
+      const Loop *AssociatedDstLoop;
 
     public:
       /// isEmpty - Return true if the constraint is of kind Empty.
@@ -449,18 +459,25 @@ namespace llvm {
       /// Otherwise assert.
       const SCEV *getD() const;
 
-      /// getAssociatedLoop - Returns the loop associated with this constraint.
-      const Loop *getAssociatedLoop() const;
+      /// getAssociatedSrcLoop - Returns the source loop associated with this
+      /// constraint.
+      const Loop *getAssociatedSrcLoop() const;
+
+      /// getAssociatedDstLoop - Returns the destination loop associated with
+      /// this constraint.
+      const Loop *getAssociatedDstLoop() const;
 
       /// setPoint - Change a constraint to Point.
-      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentLoop);
+      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentSrcLoop,
+                    const Loop *CurrentDstLoop);
 
       /// setLine - Change a constraint to Line.
-      void setLine(const SCEV *A, const SCEV *B,
-                   const SCEV *C, const Loop *CurrentLoop);
+      void setLine(const SCEV *A, const SCEV *B, const SCEV *C,
+                   const Loop *CurrentSrcLoop, const Loop *CurrentDstLoop);
 
       /// setDistance - Change a constraint to Distance.
-      void setDistance(const SCEV *D, const Loop *CurrentLoop);
+      void setDistance(const SCEV *D, const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop);
 
       /// setEmpty - Change a constraint to Empty.
       void setEmpty();
@@ -473,6 +490,10 @@ namespace llvm {
       void dump(raw_ostream &OS) const;
     };
 
+    /// Returns true if two loops are the same or they have the same upperbound
+    /// and depth
+    bool areLoopsSimilar(const Loop *SrcLoop, const Loop *DstLoop) const;
+
     /// establishNestingLevels - Examines the loop nesting of the Src and Dst
     /// instructions and establishes their shared loops. Sets the variables
     /// CommonLevels, SrcLevels, and MaxLevels.
@@ -523,10 +544,22 @@ namespace llvm {
     ///     e - 5
     ///     f - 6
     ///     g - 7 = MaxLevels
-    void establishNestingLevels(const Instruction *Src,
-                                const Instruction *Dst);
-
-    unsigned CommonLevels, SrcLevels, MaxLevels;
+    /// If ConsiderSeparateLoops is true then we also want to consider similar
+    /// seperate loops. Assume that loop nests at level c and e are similar,
+    /// meaning that they have the same upperbound and depth. Then we consider
+    /// them as a common level.
+    ///     a      - 1
+    ///     b      - 2
+    ///     <c, e> - 3 = CommonLevels
+    ///     d      - 4 = SrcLevels
+    ///     f      - 5
+    ///     g      - 6 = MaxLevels
+    /// SeparateLevels means that how many of the last common levels are
+    /// separated, which is 1 in this case.
+    void establishNestingLevels(const Instruction *Src, const Instruction *Dst,
+                                bool ConsiderSeparateLoops = false);
+
+    unsigned CommonLevels, SrcLevels, MaxLevels, SeparateLevels;
 
     /// mapSrcLoop - Given one of the loops containing the source, return
     /// its level index in our numbering scheme.
@@ -668,7 +701,8 @@ namespace llvm {
     bool strongSIVtest(const SCEV *Coeff,
                        const SCEV *SrcConst,
                        const SCEV *DstConst,
-                       const Loop *CurrentLoop,
+                       const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop,
                        unsigned Level,
                        FullDependence &Result,
                        Constraint &NewConstraint) const;
@@ -686,7 +720,8 @@ namespace llvm {
     bool weakCrossingSIVtest(const SCEV *SrcCoeff,
                              const SCEV *SrcConst,
                              const SCEV *DstConst,
-                             const Loop *CurrentLoop,
+                             const Loop *CurrentSrcLoop,
+                             const Loop *CurrentDstLoop,
                              unsigned Level,
                              FullDependence &Result,
                              Constraint &NewConstraint,
@@ -705,7 +740,8 @@ namespace llvm {
                       const SCEV *DstCoeff,
                       const SCEV *SrcConst,
                       const SCEV *DstConst,
-                      const Loop *CurrentLoop,
+                      const Loop *CurrentSrcLoop,
+                      const Loop *CurrentDstLoop,
                       unsigned Level,
                       FullDependence &Result,
                       Constraint &NewConstraint) const;
@@ -723,7 +759,8 @@ namespace llvm {
     bool weakZeroSrcSIVtest(const SCEV *DstCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
@@ -741,7 +778,8 @@ namespace llvm {
     bool weakZeroDstSIVtest(const SCEV *SrcCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index dc0ed22dbcc0b..b947e92a6375b 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -104,6 +104,7 @@ STATISTIC(GCDindependence, "GCD independence");
 STATISTIC(BanerjeeApplications, "Banerjee applications");
 STATISTIC(BanerjeeIndependence, "Banerjee independence");
 STATISTIC(BanerjeeSuccesses, "Banerjee successes");
+STATISTIC(SeparateLoopsConsidered, "Separate loops considered");
 
 static cl::opt<bool>
     Delinearize("da-delinearize", cl::init(true), cl::Hidden,
@@ -377,6 +378,13 @@ bool FullDependence::isSplitable(unsigned Level) const {
 }
 
 
+// Returns true if this level is performed across two separate loop nests.
+bool FullDependence::inSeparateLoops(unsigned Level) const {
+  assert(0 < Level && Level <= Levels && "Level out of range");
+  return DV[Level - 1].SeparateLoops;
+}
+
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo::Constraint methods
 
@@ -431,37 +439,52 @@ const SCEV *DependenceInfo::Constraint::getD() const {
 }
 
 
-// Returns the loop associated with this constraint.
-const Loop *DependenceInfo::Constraint::getAssociatedLoop() const {
+// Returns the source loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedSrcLoop() const {
+  assert((Kind == Distance || Kind == Line || Kind == Point) &&
+         "Kind should be Distance, Line, or Point");
+  return AssociatedSrcLoop;
+}
+
+
+// Returns the destination loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedDstLoop() const {
   assert((Kind == Distance || Kind == Line || Kind == Point) &&
          "Kind should be Distance, Line, or Point");
-  return AssociatedLoop;
+  return AssociatedDstLoop;
 }
 
+
 void DependenceInfo::Constraint::setPoint(const SCEV *X, const SCEV *Y,
-                                          const Loop *CurLoop) {
+                                          const Loop *CurSrcLoop,
+                                          const Loop *CurDstLoop) {
   Kind = Point;
   A = X;
   B = Y;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setLine(const SCEV *AA, const SCEV *BB,
-                                         const SCEV *CC, const Loop *CurLoop) {
+                                         const SCEV *CC, const Loop *CurSrcLoop,
+                                         const Loop *CurDstLoop) {
   Kind = Line;
   A = AA;
   B = BB;
   C = CC;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setDistance(const SCEV *D,
-                                             const Loop *CurLoop) {
+                                             const Loop *CurSrcLoop,
+                                             const Loop *CurDstLoop) {
   Kind = Distance;
   A = SE->getOne(D->getType());
   B = SE->getNegativeSCEV(A);
   C = SE->getNegativeSCEV(D);
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setEmpty() { Kind = Empty; }
@@ -608,8 +631,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
         ++DeltaSuccesses;
         return true;
       }
-      if (const SCEVConstant *CUB =
-          collectConstantUpperBound(X->getAssociatedLoop(), Prod1->getType())) {
+      if (const SCEVConstant *CUB = collectConstantUpperBound(
+              X->getAssociatedSrcLoop(), Prod1->getType())) {
         const APInt &UpperBound = CUB->getAPInt();
         LLVM_DEBUG(dbgs() << "\t\tupper bound = " << UpperBound << "\n");
         if (Xq.sgt(UpperBound) || Yq.sgt(UpperBound)) {
@@ -620,7 +643,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
       }
       X->setPoint(SE->getConstant(Xq),
                   SE->getConstant(Yq),
-                  X->getAssociatedLoop());
+                  X->getAssociatedSrcLoop(),
+                  X->getAssociatedDstLoop());
       ++DeltaSuccesses;
       return true;
     }
@@ -656,6 +680,7 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
 // For debugging purposes. Dumps a dependence to OS.
 void Dependence::dump(raw_ostream &OS) const {
   bool Splitable = false;
+  bool SeparatesStarted = false;
   if (isConfused())
     OS << "confused";
   else {
@@ -672,6 +697,10 @@ void Dependence::dump(raw_ostream &OS) const {
     unsigned Levels = getLevels();
     OS << " [";
     for (unsigned II = 1; II <= Levels; ++II) {
+      if (!SeparatesStarted && inSeparateLoops(II)) {
+        SeparatesStarted = true;
+        OS << "/ ";
+      }
       if (isSplitable(II))
         Splitable = true;
       if (isPeelFirst(II))
@@ -758,6 +787,35 @@ bool isLoadOrStore(const Instruction *I) {
   return false;
 }
 
+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
+                                     const Loop *DstLoop) const {
+  if (SrcLoop == DstLoop)
+    return true;
+
+  if (SrcLoop->getLoopDepth() != DstLoop->getLoopDepth())
+    return false;
+
+  if (!SrcLoop || !SrcLoop->getLoopLatch() || !DstLoop ||
+      !DstLoop->getLoopLatch())
+    return false;
+
+  const SCEV *SrcUB, *DstUP;
+  if (SE->hasLoopInvariantBackedgeTakenCount(SrcLoop))
+    SrcUB = SE->getBackedgeTakenCount(SrcLoop);
+  if (SE->hasLoopInvariantBackedgeTakenCount(DstLoop))
+    DstUP = SE->getBackedgeTakenCount(DstLoop);
+
+  if (SrcUB == nullptr || DstUP == nullptr)
+    return false;
+
+  if (SE->isKnownPredicate(ICmpInst::ICMP_EQ, SrcUB, DstUP))
+    return true;
+
+  return false;
+}
+
 
 // Examines the loop nesting of the Src and Dst
 // instructions and establishes their shared loops. Sets the variables
@@ -809,8 +867,21 @@ bool isLoadOrStore(const Instruction *I) {
 //     e - 5
 //     f - 6
 //     g - 7 = MaxLevels
+// If ConsiderSeparateLoops is true then we also want to consider similar
+// seperate loops. Assume that loop nests at level c and e are similar,
+// meaning that they have the same tripcount and depth. Then we consider
+// them as a common level.
+//     a      - 1
+//     b      - 2
+//     <c, e> - 3 = CommonLevels
+//     d      - 4 = SrcLevels
+//     f      - 5
+//     g      - 6 = MaxLevels
+// SeparateLevels means that how many of the last common levels are
+// separated, which is 1 in this case.
 void DependenceInfo::establishNestingLevels(const Instruction *Src,
-                                            const Instruction *Dst) {
+                                            const Instruction *Dst,
+                                            bool ConsiderSeparateLoops) {
   const BasicBlock *SrcBlock = Src->getParent();
   const BasicBlock *DstBlock = Dst->getParent();
   unsigned SrcLevel = LI->getLoopDepth(SrcBlock);
@@ -819,6 +890,7 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
   const Loop *DstLoop = LI->getLoopFor(DstBlock);
   SrcLevels = SrcLevel;
   MaxLevels = SrcLevel + DstLevel;
+  SeparateLevels = 0;
   while (SrcLevel > DstLevel) {
     SrcLoop = SrcLoop->getParentLoop();
     SrcLevel--;
@@ -827,11 +899,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
     DstLoop = DstLoop->getParentLoop();
     DstLevel--;
   }
-  while (SrcLoop != DstLoop) {
-    SrcLoop = SrcLoop->getParentLoop();
-    DstLoop = DstLoop->getParentLoop();
-    SrcLevel--;
-  }
+  if (ConsiderSeparateLoops) {
+    while (!areLoopsSimilar(SrcLoop, DstLoop)) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SeparateLevels++;
+    }
+  } else
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
   CommonLevels = SrcLevel;
   MaxLevels -= CommonLevels;
 }
@@ -1223,8 +1307,9 @@ bool DependenceInfo::testZIV(const SCEV *Src, const SCEV *Dst,
 //
 // Return true if dependence disproved.
 bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
-                                   const SCEV *DstConst, const Loop *CurLoop,
-                                   unsigned Level, FullDependence &Result,
+                                   const SCEV *DstConst, const Loop *CurSrcLoop,
+                                   const Loop *CurDstLoop, unsigned Level,
+                                   FullDependence &Result,
                                    Constraint &NewConstraint) const {
   LLVM_DEBUG(dbgs() << "\tStrong SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff);
@@ -1242,7 +1327,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   LLVM_DEBUG(dbgs() << ", " << *Delta->getType() << "\n");
 
   // check that |Delta| < iteration count
-  if (const SCEV *UpperBound = collectUpperBound(CurLoop, Delta->getType())) {
+  if (const SCEV *UpperBound =
+          collectUpperBound(CurSrcLoop, Delta->getType())) {
     LLVM_DEBUG(dbgs() << "\t    UpperBound = " << *UpperBound);
     LLVM_DEBUG(dbgs() << ", " << *UpperBound->getType() << "\n");
     const SCEV *AbsDelta =
@@ -1275,7 +1361,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
       return true;
     }
     Result.DV[Level].Distance = SE->getConstant(Distance);
-    NewConstraint.setDistance(SE->getConstant(Distance), CurLoop);
+    NewConstraint.setDistance(SE->getConstant(Distance), CurSrcLoop,
+                              CurDstLoop);
     if (Distance.sgt(0))
       Result.DV[Level].Direction &= Dependence::DVEntry::LT;
     else if (Distance.slt(0))
@@ -1287,7 +1374,7 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   else if (Delta->isZero()) {
     // since 0/X == 0
     Result.DV[Level].Distance = Delta;
-    NewConstraint.setDistance(Delta, CurLoop);
+    NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     Result.DV[Level].Direction &= Dependence::DVEntry::EQ;
     ++StrongSIVsuccesses;
   }
@@ -1295,13 +1382,12 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
     if (Coeff->isOne()) {
       LLVM_DEBUG(dbgs() << "\t    Distance = " << *Delta << "\n");
       Result.DV[Level].Distance = Delta; // since X/1 == X
-      NewConstraint.setDistance(Delta, CurLoop);
+      NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     }
     else {
       Result.Consistent = false;
-      NewConstraint.setLine(Coeff,
-                            SE->getNegativeSCEV(Coeff),
-                            SE->getNegativeSCEV(Delta), CurLoop);
+      NewConstraint.setLine(Coeff, SE->getNegativeSCEV(Coeff),
+                            SE->getNegativeSCEV(Delta), CurSrcLoop, CurDstLoop);
     }
 
     // maybe we can get a useful direction
@@ -1360,8 +1446,9 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
 // Return true if dependence disproved.
 bool DependenceInfo::weakCrossingSIVtest(
     const SCEV *Coeff, const SCEV *SrcConst, const SCEV *DstConst,
-    const Loop *CurLoop, unsigned Level, FullDependence &Result,
-    Constraint &NewConstraint, const SCEV *&SplitIter) const {
+    const Loop *CurSrcLoop, const Loop *CurDstLoop, unsigned Level,
+    FullDependence &Result, Constraint &NewConstraint,
+    const SCEV *&SplitIter) const {
   LLVM_DEBUG(dbgs() << "\tWeak-Crossing SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff << "\n");
   LLVM_DEBUG(dbgs() << "\t    SrcConst = " << *SrcConst << "\n");
@@ -1372,7 +1459,7 @@ bool DependenceInfo::weakCrossingSIVtest(
   Result.Consistent = false;
   const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
   LLVM_DEBUG(dbgs() << "\t    Delta = " << *Delta << "\n");
-  NewConstraint.setLine(Coeff, Coeff, Delta, CurLoop);
+  NewConstraint.setLine(Coeff, Coeff, Delta, CurSrcLoop, CurDstLoop);
   if (Delta->isZero()) {
     Result.DV[Level].Direction &= ~Dependence::DVEntry::LT;
     Result.DV[Level].Direction &= ~Dependence::DVEntry::GT;
@@ -1420,7 +1507,8 @@ bool DependenceInfo::weakCrossingSIVtest(
 
   // We're certain that Delta > 0 and ConstCoeff > 0.
   // Check Delta/(2*ConstCoeff) against ...
[truncated]

Copy link

github-actions bot commented Feb 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from 68d430c to f840335 Compare February 25, 2025 22:50
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as

for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;

loop fusion would optimally be

for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}

or after LoopBoundSplitPass etc.

A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;

i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from f840335 to 3cfd2e0 Compare March 10, 2025 21:01
@1997alireza
Copy link
Contributor Author

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Good point! To avoid the confusion among the original distances or directions and the new information we provide by this patch, now I will provide them in a different array. I added a new array of DVEntry named DVSeparate to include the DVEntry for separate levels and keep the DV array unchanged.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as

for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;

loop fusion would optimally be

for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}

or after LoopBoundSplitPass etc.

A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;

i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

Loop fusion or or any other optimization passes that want to use the analysis results can decide how to use it. For the case you mentioned, loop fusion can peel the iterations first to make the trip counts the same and then apply DA.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

We prefer the DA to provide information across two loops, enabling loop fusion to identify dependencies before applying the optimization.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Comment on lines 7 to 13
;; for (long int i = 0; i < n; i++) {
;; for (long int j = 0; j < n; j++) {
;; for (long int k = 0; k < n; k++) {
;; A[i][j][k] = i;
;; }
;; for (long int k = 0; k < n; k++) {
;; *B++ = A[i + 3][j + 2][k + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same case as @p2 from PreliminaryNoValidityCheckFixedSize.ll ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replaced with another test case in which there are two separate levels now.

// MIV is not handled yet on separate loops; check if there is any MIV test
for (unsigned P = 0; P < Pairs; ++P) {
Pair[P].Loops.resize(MaxLevels + 1);
auto classification = classifyPair(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Local variables start with capital letters; No Almost-Always-Auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 912 to 917
// a - 1
// b - 2 = CommonLevels
// <c, e> - 3 : A SeparateLevel
// d - 4 = SrcLevels
// f - 6
// g - 7 = MaxLevels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened with level 5?

There are lots of references to the loop numbering scheme other than SIV that don't account for the change, e.g. getSplitIteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I updated my changes such that it does not affect the loop numbering scheme anymore. Now this api will only provide an additional info which is SeparateLevels.

auto classification = classifyPair(
Pair[P].Src, LI->getLoopFor(Src->getParent()), Pair[P].Dst,
LI->getLoopFor(Dst->getParent()), Pair[P].Loops);
if (classification == Subscript::MIV) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIV gets it special treatment here, with reverting to old scheme, but what about RDIV, ZIV?

classifyPair itself relies on the old counting scheme, and will classify e.g. what was RDIV before to SIV.

There is also the case that with non-fused interpretation, we actually might be able to resolve dependencies, but do not with ConsiderSeparateLoops. What ensures that does not happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIV, RDIV or ZIV are able to handle pairs from different loops. RDIV and ZIV could do that before and SIV is able to handle it by this patch.

I cannot see a case where we may miss resolving any dependencies by considering separate loops optimization. Could you please clarify on that?

// the separate level extraction at the end of the depends api we have
// a - 1
// b - 2 = CommonLevels
// <c, e> - 3 : A SeparateLevel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// <c, e> - 3 : A SeparateLevel
// <c, e> - 3 = SeparateLevels

This introduced 3 different meanings of levels and it is not clear what interpretation at what point it the "correct" one. Better integrate into the text above, where SeparateLevels is zero unless ConsiderSeparateLoops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the confusion as mentioned in another comment above.

Comment on lines 342 to 344
assert(Levels < Level && Level <= Levels + SeparateLevels &&
"Separate level out of range");
return DVSeparate[Level - Levels - 1].Direction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not introduce a helper function that returns the correct index into DVSeparate so you don't need the if/else here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Added.

@1997alireza
Copy link
Contributor Author

1997alireza commented Apr 22, 2025

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Thanks Michael for your time and comments. I tried to resolve them and provide a more clean patch along with changes applied to the loop fusion pass to use the info provided by DA. Please take a look and let me know what you think. I apologize for the delay, I was caught up with some other projects.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from b3df426 to 4420416 Compare April 23, 2025 15:24
@CongzheUalberta CongzheUalberta requested review from fhahn and sebpop April 25, 2025 20:20
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes from innermost loops to outer ones, and stop at the innermost unequal one. But what if the second innermost is equal just the outermost is not:

for (int i = 0; i < 42; ++i) { // fusable
  for (int j = 0; j < 21; ++j) { // not fusable
  }
}
for (int i = 0; i < 42; ++i) { // fusable
  for (int j = 0; j < 42; ++j) {
  }
}

I think the i-loop should be comon levels, just the j-loops not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It is fixed now in this code fragment.

Comment on lines 597 to 625
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not messing up with the level count is a good thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in the header file that even explains what a "separate level" actually is. Rather than having this sentence not saying anything here, consider central documentation whati t means, e.g. as class comment.

Could contain:

  • Number of loops inside common loops that are "similar" (+definition of similar)
  • Have different llvm::Loop objects but can be interpreted as a single fused loop with Separate=true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is added in the header file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than merging two DA interpretations into the same output, have you considered keeping the separate for better understandability. E.g.:

da analyze - flow [-3 -2]! / assuming 1 fused loop: [-3 -2 -1]!

Because of how FileCheck works, this test wouldn't even needed to be changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from 17552ab to 21ef00f Compare May 30, 2025 18:42
@1997alireza
Copy link
Contributor Author

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

A separate commit and PR has been created for the changes applied to the loop fusion pass.

@Meinersbur
Copy link
Member

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

@1997alireza
Copy link
Contributor Author

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

Reverted the merged PR. Now the loop fusion changes are in another PR #146383.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some bikeshedding left but otherwise the patch looks good

}

/// inSeparateLoops - Returns true if this level is a separate level, i.e.,
/// performed across two separate loop nests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// performed across two separate loop nests.
/// performed across two separate loop nests that a treated like a single fused loop.

Comment on lines 597 to 625
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

@amehsan
Copy link
Contributor

amehsan commented Sep 16, 2025

(Sorry to bother you with my nitpicky comments.)

Let me clarify my point, using the same example.

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
  for (int j = 0; j < N; j++)
    A[i][j] = 43;
}

Your argument seems correct if we are analyzing the C language (or other high-level languages). But our target is LLVM IR which does not have explicit loop constructs such as for or while statements. Thus, loop-releated information (e.g., backedge-taken count) must be inferred in some way. Loop guards can be utilized for this purpose, for instance, a predicate like N > 0 may be used to compute the backedge-taken count of the loop. From this perspective, I think it can be said that DA makes use of loop guards. The condition i < 10 in the above example would also be treated as a loop guard. I think they are both equally loop guards and it should be impossible to distinguish information related to backedge-taken count from any other information. Therefore it's safer to think that DA might be indirectly using i < 10 (although I don't know if that condition actually provides useful information).

What is relevant to DA is starting value and final value of induction variable. And the patch makes sure that those values satisfy certain conditions. That is enough to ensure correctness of the results.

@amehsan
Copy link
Contributor

amehsan commented Sep 16, 2025

@1997alireza I meant to start a detailed review of the code, but I have been very busy. I will try to start it today or tomorrow.

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Sep 16, 2025

I believe I have mentioned this multiple times, but the key point here is that it is checked that the two inner loops have the same backedge-taken count. How this value is calculated is irrelevant.

I apologize if I made you feel like you had to explain the same thing multiple times. However, your last comment made me realize the following new points.

How this value is calculated is irrelevant.

This is exactly the part where I cannot be confident. I'm not sure whether it really doesn’t matter how the backedge-taken counts are computed. But maybe I'm just overthinking (which is why I called myself "nitpicky" in the previous comment), and this might actually be irrelevant.

So, sorry to have made you explain at such length, but I personally still cannot feel fully confident about correctness. That said, I’ll refrain from further comments on correctness. If someone else reviews this discussion and believes it is correct, they will probably approve this PR. Thanks again for taking the time to explain it.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me leave other nitpicky comments.

@@ -0,0 +1,345 @@
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -aa-pipeline=... is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

;; }
;; for (long int k = 1; k < n+1; k++) {
;; for (long int l = 0; l < n; l++)
;; *B++ = A[i + 4][j + 3][k + 2][l + 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this comes from another test, but is *B++ really needed here? If not, I think it would be better to replace this with a simpler statement like A[i + 4][j + 3][k + 2][l + 1] = l;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced and removed B.


for.body9: ; preds = %for.body9.preheader, %for.body9
%l.02 = phi i64 [ %inc11, %for.body9 ], [ 0, %for.body9.preheader ]
%arrayidx12 = getelementptr inbounds [100 x [100 x [100 x i64]]], ptr %A, i64 %i.013, i64 %j.09, i64 %k.07, i64 %l.02
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may already know this, but the type information of getelementptr is planned to be removed. That is, the delinearization here will probably fail in the future, which will make the test fail. To minimize the maintenance cost, I think it's better to replace %n with constant value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%n is removed.

Comment on lines 198 to 210
;; No information with fused loops assumption is gnerated as MIV test is not
;; supported on separate loops yet.
;; for (long int i = 0; i < n; i++) {
;; for (long int j = 0; j < n; j++) {
;; for (long int k = 0; k < n; k++) {
;; for (long int l = 0; l < n; l++)
;; A[i][j][k][l] = i;
;; }
;; for (long int k = 1; k < n+1; k++) {
;; for (long int l = 0; l < n; l++)
;; *B++ = A[i + 4][j + 3][k + 2][k + l];

define void @NonFusable0(i64 %n, ptr %A, ptr %B) nounwind uwtable ssp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add another case where areLoopsSimilar fails due to the difference between backedge-taken counts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case added.

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Sep 16, 2025

The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original:

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }
}

// Or, written in a form closer to LLVM IR
for (int i = 0; i < M; i++) {
  if (i < 10 && N > 0) {
    int j = 0;
    do {
      A[i+10][j] = 42;
      A[i][j] = 43;
      j++;
    } while (j < N)
  }
}

I've been thinking about why this happens, and I've come to feel that the root cause is trying to map different loops to the same Level. If this patch is modified so that CommonLevels is not changed due to fusability, then it may also address my concern.

I don't see the point of the example. If the two loops have different backedge taken counts then the new logic is not used. If they have the same iteration count then from DA point of view we have something like this:

for (int i = 0; i < M; i++) {
   <There might be some code here>
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }

} >
The key point here is that the set of instances of A[i+10][j] and A[i][j] that is considered by DA is a superset (or potentially equal) to the set of instances that will be executed at run time. So DA result is always correct, but it might be sometimes conservative.

Just to clarify, what I wanted to emphasize was: what happens if the <there might be some code here> part contains a branch instruction? At least to me, it doesn’t look like a "superset".

(What I’ve been concerned with is, for example, how the nsw/nuw flags on an SCEVAddRecExpr are deduced. I guess it uses the loop-guards...)

@amehsan
Copy link
Contributor

amehsan commented Sep 16, 2025

I believe I have mentioned this multiple times, but the key point here is that it is checked that the two inner loops have the same backedge-taken count. How this value is calculated is irrelevant.

I apologize if I made you feel like you had to explain the same thing multiple times. However, your last comment made me realize the following new points.

How this value is calculated is irrelevant.

This is exactly the part where I cannot be confident. I'm not sure whether it really doesn’t matter how the backedge-taken counts are computed. But maybe I'm just overthinking (which is why I called myself "nitpicky" in the previous comment), and this might actually be irrelevant.

So, sorry to have made you explain at such length, but I personally still cannot feel fully confident about correctness. That said, I’ll refrain from further comments on correctness. If someone else reviews this discussion and believes it is correct, they will probably approve this PR. Thanks again for taking the time to explain it.

Thank you and I appreciate your attention to the quality of the open source compiler.

@amehsan
Copy link
Contributor

amehsan commented Sep 16, 2025

The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original:

for (int i = 0; i < M; i++) {
   <There might be some code here>
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }

} >
The key point here is that the set of instances of A[i+10][j] and A[i][j] that is considered by DA is a superset (or potentially equal) to the set of instances that will be executed at run time. So DA result is always correct, but it might be sometimes conservative.

Just to clarify, what I wanted to emphasize was: what happens if the <there might be some code here> part contains a branch instruction? At least to me, it doesn’t look like a "superset".

(What I’ve been concerned with is, for example, how the nsw/nuw flags on an SCEVAddRecExpr are deduced. I guess it uses the loop-guards...)

LoopInfo only recongizes natural loops. So a branch there cannot have a target that is somewhere in the middle of the loop.

Putting that case aside, when you have a branch there, you have one of the two cases: The loop is executed or it is not. DA doesn't care about the case that loop is NOT executed. DA assumes that the loop is executed. (Same thing can be said if the two loops are separate. DA assumes both loops are executed. If one or both of them are not executed then DA might be conservative. But it is still correct).

The induction variable j can go take any integer from 0 to N. Having or not having a branch there doesn't change this fact. No matter what code you have in that region (or after the inner loop), that is the range that we have for 'j'. And DA considers that.

I really suggest you make a real example that you can compile and show us that something strange happens. I don't really know what that would be, but it will really help the conversation (and save huge amount of time ) if you have a real example. If you have difficulty creating an example to demostrate your point, may be the point is not valid?

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Sep 17, 2025

What I was concerned about is, for example, the following case:

godbolt: https://godbolt.org/z/aTjxa44qx

; for (i = 0; i < 9223372036854775806; i++) {
;   if (i < 2147483640)
;     for (j = 0; j < 2147483640; j++)
;       a[i + j * 4294967296] = 0;
;
;   for (j = 0; j < 2147483640; j++)
;     a[j * 2] = 0;
; }
;
define void @f(ptr %a) {
entry:
  br label %loop.i.header

loop.i.header:
  %i = phi i64 [ 0 , %entry ], [ %i.next, %loop.i.latch ]
  br label %loop.j.0.pr

loop.j.0.pr:
  %guard.j.0 = icmp slt i64 %i, 2147483640
  br i1 %guard.j.0, label %loop.j.0, label %loop.j.1

loop.j.0:
  %j.0 = phi i64 [ 0, %loop.j.0.pr ], [ %j.0.next, %loop.j.0 ]
  %val.0 = phi i64 [ %i, %loop.j.0.pr ], [ %val.0.next, %loop.j.0 ]
  %j.0.next = add nsw i64 %j.0, 1
  %idx.0 = getelementptr inbounds i8, ptr %a, i64 %val.0
  store i8 0, ptr %idx.0
  %val.0.next = add nsw i64 %val.0, 4294967296
  %exitcond.j.0 = icmp eq i64 %j.0.next, 2147483640
  br i1 %exitcond.j.0, label %loop.j.1, label %loop.j.0

loop.j.1:
  %j.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %j.1.next, %loop.j.1 ]
  %val.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %val.1.next, %loop.j.1 ]
  %idx.1 = getelementptr inbounds i8, ptr %a, i64 %val.1
  store i8 0, ptr %idx.1
  %j.1.next = add nsw i64 %j.1, 1
  %val.1.next = add nsw i64 %val.1, 2
  %exitcond.j.1 = icmp eq i64 %j.1.next, 2147483640
  br i1 %exitcond.j.1, label %loop.i.latch, label %loop.j.1

loop.i.latch:
  %i.next = add nsw i64 %i, 1
  %exitcond.i = icmp eq i64 %i.next, 9223372036854775806
  br i1 %exitcond.i, label %exit, label %loop.i.header

exit:
  ret void
}

The result of DA:

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i8 0, ptr %idx.0, align 1 --> Dst:  store i8 0, ptr %idx.0, align 1
  da analyze - none!
Src:  store i8 0, ptr %idx.0, align 1 --> Dst:  store i8 0, ptr %idx.1, align 1
  da analyze - none!
Src:  store i8 0, ptr %idx.1, align 1 --> Dst:  store i8 0, ptr %idx.1, align 1
  da analyze - consistent output [S 0]!
Compiler returned: 0

Here is the SCEV for %val.0:

 %val.0 = phi i64 [ %i, %loop.j.0.pr ], [ %val.0.next, %loop.j.0 ]
  -->  {{0,+,1}<nuw><nsw><%loop.i.header>,+,4294967296}<nuw><nsw><%loop.j.0>

The result of DA already seems incorrect. I guess the root cause is that DA currently doesn't take offset (%val.0 in this case) wrapping into account. I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.
Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases? In this specific case, probably it would be classified to MIV so it shouldn't be a problem. But what about other similar cases? Is it impossible for something like this to happen? Or am I missing something? What I personally would like to avoid is, fusable assumption leads to the problem becoming worse.

(While working on this case, I realized that this issue is probably not limited to fusion. Perhaps the analysis should always bail out when such loop guards are present...)

Anyway, I really will keep quiet this time about my concern.

EDIT: The more I look at it, the more it seems like this example points to a more fundamental bug in the DA, rather than this patch. I sincerely hope I'm misunderstanding something...

@amehsan
Copy link
Contributor

amehsan commented Sep 17, 2025

EDIT: The more I look at it, the more it seems like this example points to a more fundamental bug in the DA, rather than this patch. I sincerely hope I'm misunderstanding something...

Yes, my first impression is that this example is not related to this patch. I will look into it in more depth and get back to you. Thanks for providing an example. This helps to have a much better conversation. (Even if unrelated to the patch, it is good to know more about the issues in DA).

@@ -0,0 +1,345 @@
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
Copy link
Contributor

@amehsan amehsan Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lowercase for filename: fusable.ll (All test files seem to follow that style).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DA test files starts with uppercase letters.

dumpImp(OS);
unsigned FusableLevels = getFusableLevels();
if (FusableLevels > 0) {
OS << "! / assuming " << FusableLevels << " fused loop(s): ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be assuming " << FusableLevels << " loop levels fused: "; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced.

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1997alireza I added a couple of minor cosmetic comments. Will look into it a bit more later today or tomorrow.

unsigned getFusableLevels() const override { return FusableLevels; }

/// getDVEntry - Returns the DV entry associated with a regular or a
/// fusable level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a period at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@amehsan
Copy link
Contributor

amehsan commented Sep 17, 2025

I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.

Need to do more investigation. But checking the loop guard in DA is almost certainly incorrect. One evidence is that the following loop with no loop guard has the same problem

void foo2 (int *a ) {

  unsigned long long i;
  unsigned long long j;

  for (i = 0; i < 9223372036854775806; i++) {
    for (j = 0; j < 2147483640; j++) {
      a[i + j * 4294967296] = 0;
      a[j * 2] = 0;
    }
  }
}

Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases?

It is a completely orthogonal issue. The root cause of this particular bug might be in SCEV or in DA (We are looking into it. Please wait). But anyways, not related to this patch.

@amehsan
Copy link
Contributor

amehsan commented Sep 17, 2025

One evidence is that the following loop

I compiled this with -O3 -fno-vectorize. Let me know if you have a problem reproducing it.

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code LGTM assuming the issues raised so far is addressed. I will give an explicit approval after comments are addressed. I may look more and add more comments though (but I don't expect anything fundamental)

Comment on lines 3787 to 3852
CommonLevels += FusableLevels;
MaxLevels -= FusableLevels;
if (FusableLevels > 0) {
// Not all tests are handled yet over fusable loops
// Revoke if there are any tests other than ZIV, SIV or RDIV
for (unsigned P = 0; P < Pairs; ++P) {
SmallBitVector Loops;
Subscript::ClassificationKind TestClass =
classifyPair(Pair[P].Src, LI->getLoopFor(Src->getParent()),
Pair[P].Dst, LI->getLoopFor(Dst->getParent()), Loops);

if (TestClass != Subscript::ZIV && TestClass != Subscript::SIV &&
TestClass != Subscript::RDIV) {
// Revert the levels to not consider the fusable levels
CommonLevels -= FusableLevels;
MaxLevels += FusableLevels;
FusableLevels = 0;
break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to increment commonLevels and then decrement it. Can you write the code so that it is only incremented when it should be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classifyPair uses CommonLevels and MaxLevels. To be able to identify the dependency types considering the fusable levels, the value of levels must be modified before calling classifyPair.

Comment on lines 3813 to 3817
if (FusableLevels > 0) {
Result.Consistent = false;
FusableLoopsConsidered++;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other fields of Result is updated later in line 4135. Also there is no use of FusableLoopsConsidered between here and there. Is there a reason that this code is separate from the other conditional on FusableLevels > 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relocated the Result update line.


// Returns true if two loops are the same or they have the same tripcount and
// depth
bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a notion of FusableLoops everywhere in the code. What are the conditions for Fusable? I believe they are part of the conditions checked here. So

  1. Separate conditions for fusability and name them properly in a different function.
  2. Similar is vague and general. Use a more specific name.
  3. Fuseable is not a good name. We need to find a better name that better reflects the conditions that you check.

Copy link
Contributor Author

@1997alireza 1997alireza Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the terminology by defining a new term SIDLevels, which indicates the levels with the Same Iteration and Depth.
All required conditions to consider two loops as SID are in function isSID.

Copy link
Contributor

@amehsan amehsan Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You still need to change your commit message. Write your commit message so that someone who doesn't know anything about your patch understands it. (e.g. Don't use acronyms without first introducing them in the commit messag).

Also change "Same Iteration" to "Same Bounds" or "Same Iteration Space". Make sure the acronym is introduced explicitly and clearly in the comments of the code as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the commit message as well.
I changed the acronym to "SameSD" standing for same iteration space and depth.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 3 times, most recently from 81d0300 to b06447f Compare September 19, 2025 16:41
SameSD stands for Same iteration Space and Depth indicating the loops
that are in the same nesting depth and have the same backedge count.
When there is a dependency between two memory instructions in separate
loops that have the same iteration space and depth, SIV will be able to
test them and compute the direction and the distance of the dependency.
@kasuga-fj
Copy link
Contributor

I believe this issue will be fixed by checking the loop-guard and/or nsw/nuw properly.

Need to do more investigation. But checking the loop guard in DA is almost certainly incorrect. One evidence is that the following loop with no loop guard has the same problem

void foo2 (int *a ) {

  unsigned long long i;
  unsigned long long j;

  for (i = 0; i < 9223372036854775806; i++) {
    for (j = 0; j < 2147483640; j++) {
      a[i + j * 4294967296] = 0;
      a[j * 2] = 0;
    }
  }
}

Here, what I'm not confident about is, what happens when assuming the two loops (%loop.j.0 and %loop.j.1) are fused in such cases?

It is a completely orthogonal issue. The root cause of this particular bug might be in SCEV or in DA (We are looking into it. Please wait). But anyways, not related to this patch.

(In the hope of saving your time)

It is known that DA has several issues (some of which you may find in GitHub issues), such as the following:

  • The interpretations of signed and unsigned are mixed up.
  • It does not take AddRec wrapping into account.
  • Overflows occur during the analysis.1
    • The overflow I’m referring to here is not in the original program, but in the arithmetic operations inside DA.

These issues violate the assumptions of the underlying algorithm, which can lead to incorrect results. Your example foo2 seems to be one such case (though I’m not exactly sure what happens where). I’m trying to address this problem in #154527 by proving that signed-wrap does not occur when nsw is attached to an AddRec. The example I raised (with a loop guard if (i < 2147483640)) is slightly different from yours, since the AddRec corresponding to i + j * 4294967296 has the nsw flag. However, it turns out that checking this flag alone is not sufficient. I’m almost certain now that loop guards need to be checked somewhere in DA, regardless of whether the loops can be fused or not...

I apologize for my bad communication (the more I looked into DA, the more bugs I found, so I became overly nervous...). However, I found the discussion very meaningful, and I sincerely appreciate it. Thank you.

Footnotes

  1. Because of these circumstances, for example, overestimating backedge-taken count might seem conservative (at least theoretically), but in fact it could lead to incorrect outcomes in the current DA, I think.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM as well.

@amehsan
Copy link
Contributor

amehsan commented Sep 19, 2025

  • The overflow I’m referring to here is not in the original program, but in the arithmetic operations inside DA.

These issues violate the assumptions of the underlying algorithm, which can lead to incorrect results. Your example foo2 seems to be one such case (though I’m not exactly sure what happens where). I’m trying to address this problem in #154527 by proving that signed-wrap does not occur when nsw is attached to an AddRec. The example I raised (with a loop guard if (i < 2147483640)) is slightly different from yours, since the AddRec corresponding to i + j * 4294967296 has the nsw flag. However, it turns out that checking this flag alone is not sufficient. I’m almost certain now that loop guards need to be checked somewhere in DA, regardless of whether the loops can be fused or not...

No worries. Let's continue the work and we can also help to improve the quality of DA. I didn't know about your issue, and I opened a new one. We can close it if it is a duplicate.

We can continue the discussion on the issue pages. But I strongly disagree with checking the loop guards. I will add a more detailed comment on one of the issue pages.

@CongzheUalberta CongzheUalberta merged commit b4a17b1 into llvm:main Sep 19, 2025
9 checks passed
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

FWIW I agree, increasing the complexity before fixing the soundness problem makes it even more difficult to enable any pass depending on DA

@amehsan
Copy link
Contributor

amehsan commented Sep 21, 2025

Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.

FWIW I agree, increasing the complexity before fixing the soundness problem makes it even more difficult to enable any pass depending on DA

I suggest this kind of comment is made when RFC is posted or in the beginning of code review. I think it is also important that development is not a frustrating experience for the people invovled.

But going forward, we are OK with focusing on correctness issues of DA. We are also going to help in that direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants