-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[DependenceAnalysis] Extending SIV to handle fusable loops #128782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Alireza Torabian (1997alireza) ChangesWhen there is a dependency between two memory instructions in separate loops, SIV will be able to test them and compute the direction and the distance of the dependency. Patch is 48.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128782.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 426ac757b4b0d..8e86c091c60a2 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -97,10 +97,11 @@ namespace llvm {
bool PeelFirst : 1; // Peeling the first iteration will break dependence.
bool PeelLast : 1; // Peeling the last iteration will break the dependence.
bool Splitable : 1; // Splitting the loop will break dependence.
+ bool SeparateLoops : 1; // Is performed across two separate loop nests.
const SCEV *Distance = nullptr; // NULL implies no distance available.
DVEntry()
: Direction(ALL), Scalar(true), PeelFirst(false), PeelLast(false),
- Splitable(false) {}
+ Splitable(false), SeparateLoops(false) {}
};
/// getSrc - Returns the source instruction for this dependence.
@@ -182,6 +183,10 @@ namespace llvm {
/// the dependence.
virtual bool isSplitable(unsigned Level) const { return false; }
+ /// inSeparateLoops - Returns true if this level is performed across
+ /// two separate loop nests.
+ virtual bool inSeparateLoops(unsigned Level) const { return false; }
+
/// isScalar - Returns true if a particular level is scalar; that is,
/// if no subscript in the source or destination mention the induction
/// variable associated with the loop at this level.
@@ -275,6 +280,10 @@ namespace llvm {
/// the dependence.
bool isSplitable(unsigned Level) const override;
+ /// inSeparateLoops - Returns true if this level is performed across
+ /// two separate loop nests.
+ bool inSeparateLoops(unsigned Level) const override;
+
/// isScalar - Returns true if a particular level is scalar; that is,
/// if no subscript in the source or destination mention the induction
/// variable associated with the loop at this level.
@@ -405,7 +414,8 @@ namespace llvm {
const SCEV *A;
const SCEV *B;
const SCEV *C;
- const Loop *AssociatedLoop;
+ const Loop *AssociatedSrcLoop;
+ const Loop *AssociatedDstLoop;
public:
/// isEmpty - Return true if the constraint is of kind Empty.
@@ -449,18 +459,25 @@ namespace llvm {
/// Otherwise assert.
const SCEV *getD() const;
- /// getAssociatedLoop - Returns the loop associated with this constraint.
- const Loop *getAssociatedLoop() const;
+ /// getAssociatedSrcLoop - Returns the source loop associated with this
+ /// constraint.
+ const Loop *getAssociatedSrcLoop() const;
+
+ /// getAssociatedDstLoop - Returns the destination loop associated with
+ /// this constraint.
+ const Loop *getAssociatedDstLoop() const;
/// setPoint - Change a constraint to Point.
- void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentLoop);
+ void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop);
/// setLine - Change a constraint to Line.
- void setLine(const SCEV *A, const SCEV *B,
- const SCEV *C, const Loop *CurrentLoop);
+ void setLine(const SCEV *A, const SCEV *B, const SCEV *C,
+ const Loop *CurrentSrcLoop, const Loop *CurrentDstLoop);
/// setDistance - Change a constraint to Distance.
- void setDistance(const SCEV *D, const Loop *CurrentLoop);
+ void setDistance(const SCEV *D, const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop);
/// setEmpty - Change a constraint to Empty.
void setEmpty();
@@ -473,6 +490,10 @@ namespace llvm {
void dump(raw_ostream &OS) const;
};
+ /// Returns true if two loops are the same or they have the same upperbound
+ /// and depth
+ bool areLoopsSimilar(const Loop *SrcLoop, const Loop *DstLoop) const;
+
/// establishNestingLevels - Examines the loop nesting of the Src and Dst
/// instructions and establishes their shared loops. Sets the variables
/// CommonLevels, SrcLevels, and MaxLevels.
@@ -523,10 +544,22 @@ namespace llvm {
/// e - 5
/// f - 6
/// g - 7 = MaxLevels
- void establishNestingLevels(const Instruction *Src,
- const Instruction *Dst);
-
- unsigned CommonLevels, SrcLevels, MaxLevels;
+ /// If ConsiderSeparateLoops is true then we also want to consider similar
+ /// seperate loops. Assume that loop nests at level c and e are similar,
+ /// meaning that they have the same upperbound and depth. Then we consider
+ /// them as a common level.
+ /// a - 1
+ /// b - 2
+ /// <c, e> - 3 = CommonLevels
+ /// d - 4 = SrcLevels
+ /// f - 5
+ /// g - 6 = MaxLevels
+ /// SeparateLevels means that how many of the last common levels are
+ /// separated, which is 1 in this case.
+ void establishNestingLevels(const Instruction *Src, const Instruction *Dst,
+ bool ConsiderSeparateLoops = false);
+
+ unsigned CommonLevels, SrcLevels, MaxLevels, SeparateLevels;
/// mapSrcLoop - Given one of the loops containing the source, return
/// its level index in our numbering scheme.
@@ -668,7 +701,8 @@ namespace llvm {
bool strongSIVtest(const SCEV *Coeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -686,7 +720,8 @@ namespace llvm {
bool weakCrossingSIVtest(const SCEV *SrcCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint,
@@ -705,7 +740,8 @@ namespace llvm {
const SCEV *DstCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -723,7 +759,8 @@ namespace llvm {
bool weakZeroSrcSIVtest(const SCEV *DstCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -741,7 +778,8 @@ namespace llvm {
bool weakZeroDstSIVtest(const SCEV *SrcCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index dc0ed22dbcc0b..b947e92a6375b 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -104,6 +104,7 @@ STATISTIC(GCDindependence, "GCD independence");
STATISTIC(BanerjeeApplications, "Banerjee applications");
STATISTIC(BanerjeeIndependence, "Banerjee independence");
STATISTIC(BanerjeeSuccesses, "Banerjee successes");
+STATISTIC(SeparateLoopsConsidered, "Separate loops considered");
static cl::opt<bool>
Delinearize("da-delinearize", cl::init(true), cl::Hidden,
@@ -377,6 +378,13 @@ bool FullDependence::isSplitable(unsigned Level) const {
}
+// Returns true if this level is performed across two separate loop nests.
+bool FullDependence::inSeparateLoops(unsigned Level) const {
+ assert(0 < Level && Level <= Levels && "Level out of range");
+ return DV[Level - 1].SeparateLoops;
+}
+
+
//===----------------------------------------------------------------------===//
// DependenceInfo::Constraint methods
@@ -431,37 +439,52 @@ const SCEV *DependenceInfo::Constraint::getD() const {
}
-// Returns the loop associated with this constraint.
-const Loop *DependenceInfo::Constraint::getAssociatedLoop() const {
+// Returns the source loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedSrcLoop() const {
+ assert((Kind == Distance || Kind == Line || Kind == Point) &&
+ "Kind should be Distance, Line, or Point");
+ return AssociatedSrcLoop;
+}
+
+
+// Returns the destination loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedDstLoop() const {
assert((Kind == Distance || Kind == Line || Kind == Point) &&
"Kind should be Distance, Line, or Point");
- return AssociatedLoop;
+ return AssociatedDstLoop;
}
+
void DependenceInfo::Constraint::setPoint(const SCEV *X, const SCEV *Y,
- const Loop *CurLoop) {
+ const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Point;
A = X;
B = Y;
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setLine(const SCEV *AA, const SCEV *BB,
- const SCEV *CC, const Loop *CurLoop) {
+ const SCEV *CC, const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Line;
A = AA;
B = BB;
C = CC;
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setDistance(const SCEV *D,
- const Loop *CurLoop) {
+ const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Distance;
A = SE->getOne(D->getType());
B = SE->getNegativeSCEV(A);
C = SE->getNegativeSCEV(D);
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setEmpty() { Kind = Empty; }
@@ -608,8 +631,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
++DeltaSuccesses;
return true;
}
- if (const SCEVConstant *CUB =
- collectConstantUpperBound(X->getAssociatedLoop(), Prod1->getType())) {
+ if (const SCEVConstant *CUB = collectConstantUpperBound(
+ X->getAssociatedSrcLoop(), Prod1->getType())) {
const APInt &UpperBound = CUB->getAPInt();
LLVM_DEBUG(dbgs() << "\t\tupper bound = " << UpperBound << "\n");
if (Xq.sgt(UpperBound) || Yq.sgt(UpperBound)) {
@@ -620,7 +643,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
}
X->setPoint(SE->getConstant(Xq),
SE->getConstant(Yq),
- X->getAssociatedLoop());
+ X->getAssociatedSrcLoop(),
+ X->getAssociatedDstLoop());
++DeltaSuccesses;
return true;
}
@@ -656,6 +680,7 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
// For debugging purposes. Dumps a dependence to OS.
void Dependence::dump(raw_ostream &OS) const {
bool Splitable = false;
+ bool SeparatesStarted = false;
if (isConfused())
OS << "confused";
else {
@@ -672,6 +697,10 @@ void Dependence::dump(raw_ostream &OS) const {
unsigned Levels = getLevels();
OS << " [";
for (unsigned II = 1; II <= Levels; ++II) {
+ if (!SeparatesStarted && inSeparateLoops(II)) {
+ SeparatesStarted = true;
+ OS << "/ ";
+ }
if (isSplitable(II))
Splitable = true;
if (isPeelFirst(II))
@@ -758,6 +787,35 @@ bool isLoadOrStore(const Instruction *I) {
return false;
}
+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
+ const Loop *DstLoop) const {
+ if (SrcLoop == DstLoop)
+ return true;
+
+ if (SrcLoop->getLoopDepth() != DstLoop->getLoopDepth())
+ return false;
+
+ if (!SrcLoop || !SrcLoop->getLoopLatch() || !DstLoop ||
+ !DstLoop->getLoopLatch())
+ return false;
+
+ const SCEV *SrcUB, *DstUP;
+ if (SE->hasLoopInvariantBackedgeTakenCount(SrcLoop))
+ SrcUB = SE->getBackedgeTakenCount(SrcLoop);
+ if (SE->hasLoopInvariantBackedgeTakenCount(DstLoop))
+ DstUP = SE->getBackedgeTakenCount(DstLoop);
+
+ if (SrcUB == nullptr || DstUP == nullptr)
+ return false;
+
+ if (SE->isKnownPredicate(ICmpInst::ICMP_EQ, SrcUB, DstUP))
+ return true;
+
+ return false;
+}
+
// Examines the loop nesting of the Src and Dst
// instructions and establishes their shared loops. Sets the variables
@@ -809,8 +867,21 @@ bool isLoadOrStore(const Instruction *I) {
// e - 5
// f - 6
// g - 7 = MaxLevels
+// If ConsiderSeparateLoops is true then we also want to consider similar
+// seperate loops. Assume that loop nests at level c and e are similar,
+// meaning that they have the same tripcount and depth. Then we consider
+// them as a common level.
+// a - 1
+// b - 2
+// <c, e> - 3 = CommonLevels
+// d - 4 = SrcLevels
+// f - 5
+// g - 6 = MaxLevels
+// SeparateLevels means that how many of the last common levels are
+// separated, which is 1 in this case.
void DependenceInfo::establishNestingLevels(const Instruction *Src,
- const Instruction *Dst) {
+ const Instruction *Dst,
+ bool ConsiderSeparateLoops) {
const BasicBlock *SrcBlock = Src->getParent();
const BasicBlock *DstBlock = Dst->getParent();
unsigned SrcLevel = LI->getLoopDepth(SrcBlock);
@@ -819,6 +890,7 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
const Loop *DstLoop = LI->getLoopFor(DstBlock);
SrcLevels = SrcLevel;
MaxLevels = SrcLevel + DstLevel;
+ SeparateLevels = 0;
while (SrcLevel > DstLevel) {
SrcLoop = SrcLoop->getParentLoop();
SrcLevel--;
@@ -827,11 +899,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
DstLoop = DstLoop->getParentLoop();
DstLevel--;
}
- while (SrcLoop != DstLoop) {
- SrcLoop = SrcLoop->getParentLoop();
- DstLoop = DstLoop->getParentLoop();
- SrcLevel--;
- }
+ if (ConsiderSeparateLoops) {
+ while (!areLoopsSimilar(SrcLoop, DstLoop)) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SrcLevel--;
+ }
+ while (SrcLoop != DstLoop) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SeparateLevels++;
+ }
+ } else
+ while (SrcLoop != DstLoop) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SrcLevel--;
+ }
CommonLevels = SrcLevel;
MaxLevels -= CommonLevels;
}
@@ -1223,8 +1307,9 @@ bool DependenceInfo::testZIV(const SCEV *Src, const SCEV *Dst,
//
// Return true if dependence disproved.
bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
- const SCEV *DstConst, const Loop *CurLoop,
- unsigned Level, FullDependence &Result,
+ const SCEV *DstConst, const Loop *CurSrcLoop,
+ const Loop *CurDstLoop, unsigned Level,
+ FullDependence &Result,
Constraint &NewConstraint) const {
LLVM_DEBUG(dbgs() << "\tStrong SIV test\n");
LLVM_DEBUG(dbgs() << "\t Coeff = " << *Coeff);
@@ -1242,7 +1327,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
LLVM_DEBUG(dbgs() << ", " << *Delta->getType() << "\n");
// check that |Delta| < iteration count
- if (const SCEV *UpperBound = collectUpperBound(CurLoop, Delta->getType())) {
+ if (const SCEV *UpperBound =
+ collectUpperBound(CurSrcLoop, Delta->getType())) {
LLVM_DEBUG(dbgs() << "\t UpperBound = " << *UpperBound);
LLVM_DEBUG(dbgs() << ", " << *UpperBound->getType() << "\n");
const SCEV *AbsDelta =
@@ -1275,7 +1361,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
return true;
}
Result.DV[Level].Distance = SE->getConstant(Distance);
- NewConstraint.setDistance(SE->getConstant(Distance), CurLoop);
+ NewConstraint.setDistance(SE->getConstant(Distance), CurSrcLoop,
+ CurDstLoop);
if (Distance.sgt(0))
Result.DV[Level].Direction &= Dependence::DVEntry::LT;
else if (Distance.slt(0))
@@ -1287,7 +1374,7 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
else if (Delta->isZero()) {
// since 0/X == 0
Result.DV[Level].Distance = Delta;
- NewConstraint.setDistance(Delta, CurLoop);
+ NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
Result.DV[Level].Direction &= Dependence::DVEntry::EQ;
++StrongSIVsuccesses;
}
@@ -1295,13 +1382,12 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
if (Coeff->isOne()) {
LLVM_DEBUG(dbgs() << "\t Distance = " << *Delta << "\n");
Result.DV[Level].Distance = Delta; // since X/1 == X
- NewConstraint.setDistance(Delta, CurLoop);
+ NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
}
else {
Result.Consistent = false;
- NewConstraint.setLine(Coeff,
- SE->getNegativeSCEV(Coeff),
- SE->getNegativeSCEV(Delta), CurLoop);
+ NewConstraint.setLine(Coeff, SE->getNegativeSCEV(Coeff),
+ SE->getNegativeSCEV(Delta), CurSrcLoop, CurDstLoop);
}
// maybe we can get a useful direction
@@ -1360,8 +1446,9 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
// Return true if dependence disproved.
bool DependenceInfo::weakCrossingSIVtest(
const SCEV *Coeff, const SCEV *SrcConst, const SCEV *DstConst,
- const Loop *CurLoop, unsigned Level, FullDependence &Result,
- Constraint &NewConstraint, const SCEV *&SplitIter) const {
+ const Loop *CurSrcLoop, const Loop *CurDstLoop, unsigned Level,
+ FullDependence &Result, Constraint &NewConstraint,
+ const SCEV *&SplitIter) const {
LLVM_DEBUG(dbgs() << "\tWeak-Crossing SIV test\n");
LLVM_DEBUG(dbgs() << "\t Coeff = " << *Coeff << "\n");
LLVM_DEBUG(dbgs() << "\t SrcConst = " << *SrcConst << "\n");
@@ -1372,7 +1459,7 @@ bool DependenceInfo::weakCrossingSIVtest(
Result.Consistent = false;
const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
LLVM_DEBUG(dbgs() << "\t Delta = " << *Delta << "\n");
- NewConstraint.setLine(Coeff, Coeff, Delta, CurLoop);
+ NewConstraint.setLine(Coeff, Coeff, Delta, CurSrcLoop, CurDstLoop);
if (Delta->isZero()) {
Result.DV[Level].Direction &= ~Dependence::DVEntry::LT;
Result.DV[Level].Direction &= ~Dependence::DVEntry::GT;
@@ -1420,7 +1507,8 @@ bool DependenceInfo::weakCrossingSIVtest(
// We're certain that Delta > 0 and ConstCoeff > 0.
// Check Delta/(2*ConstCoeff) against ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
68d430c
to
f840335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively, with SeparateLoops=true, this will handle loops as if they are fused.
But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d
computed by e.g. strongSIVtest
may result in d > 0
, where in reality all the instances of write occur before the read because their loops are sequential, not nested.
Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as
for (int i = 0; i < n; +=1)
A[i+1] += 1;
for (int i = 0; i < n; +=1)
A[i] += 2;
loop fusion would optimally be
for (int i = 0; i < n+1; +=1) {
if (i > 0) A[i] += 1;
if (i < n) A[i] += 2;
}
or after LoopBoundSplitPass etc.
A[0] += 2;
for (int i = 0; i < n+1; +=1)
A[i] += 3;
A[n] += 1;
i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.
I think instead of a SeparateLoops
parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.
f840335
to
3cfd2e0
Compare
Good point! To avoid the confusion among the original distances or directions and the new information we provide by this patch, now I will provide them in a different array. I added a new array of
Loop fusion or or any other optimization passes that want to use the analysis results can decide how to use it. For the case you mentioned, loop fusion can peel the iterations first to make the trip counts the same and then apply DA.
We prefer the DA to provide information across two loops, enabling loop fusion to identify dependencies before applying the optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.
;; for (long int i = 0; i < n; i++) { | ||
;; for (long int j = 0; j < n; j++) { | ||
;; for (long int k = 0; k < n; k++) { | ||
;; A[i][j][k] = i; | ||
;; } | ||
;; for (long int k = 0; k < n; k++) { | ||
;; *B++ = A[i + 3][j + 2][k + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same case as @p2
from PreliminaryNoValidityCheckFixedSize.ll ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is replaced with another test case in which there are two separate levels now.
// MIV is not handled yet on separate loops; check if there is any MIV test | ||
for (unsigned P = 0; P < Pairs; ++P) { | ||
Pair[P].Loops.resize(MaxLevels + 1); | ||
auto classification = classifyPair( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Local variables start with capital letters; No Almost-Always-Auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// a - 1 | ||
// b - 2 = CommonLevels | ||
// <c, e> - 3 : A SeparateLevel | ||
// d - 4 = SrcLevels | ||
// f - 6 | ||
// g - 7 = MaxLevels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened with level 5?
There are lots of references to the loop numbering scheme other than SIV that don't account for the change, e.g. getSplitIteration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I updated my changes such that it does not affect the loop numbering scheme anymore. Now this api will only provide an additional info which is SeparateLevels
.
auto classification = classifyPair( | ||
Pair[P].Src, LI->getLoopFor(Src->getParent()), Pair[P].Dst, | ||
LI->getLoopFor(Dst->getParent()), Pair[P].Loops); | ||
if (classification == Subscript::MIV) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIV
gets it special treatment here, with reverting to old scheme, but what about RDIV, ZIV?
classifyPair
itself relies on the old counting scheme, and will classify e.g. what was RDIV before to SIV.
There is also the case that with non-fused interpretation, we actually might be able to resolve dependencies, but do not with ConsiderSeparateLoops. What ensures that does not happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIV, RDIV or ZIV are able to handle pairs from different loops. RDIV and ZIV could do that before and SIV is able to handle it by this patch.
I cannot see a case where we may miss resolving any dependencies by considering separate loops optimization. Could you please clarify on that?
// the separate level extraction at the end of the depends api we have | ||
// a - 1 | ||
// b - 2 = CommonLevels | ||
// <c, e> - 3 : A SeparateLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// <c, e> - 3 : A SeparateLevel | |
// <c, e> - 3 = SeparateLevels |
This introduced 3 different meanings of levels and it is not clear what interpretation at what point it the "correct" one. Better integrate into the text above, where SeparateLevels is zero unless ConsiderSeparateLoops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the confusion as mentioned in another comment above.
assert(Levels < Level && Level <= Levels + SeparateLevels && | ||
"Separate level out of range"); | ||
return DVSeparate[Level - Levels - 1].Direction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not introduce a helper function that returns the correct index into DVSeparate
so you don't need the if/else here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Added.
3cfd2e0
to
9d511f5
Compare
9d511f5
to
8e263a3
Compare
Thanks Michael for your time and comments. I tried to resolve them and provide a more clean patch along with changes applied to the loop fusion pass to use the info provided by DA. Please take a look and let me know what you think. I apologize for the delay, I was caught up with some other projects. |
b3df426
to
4420416
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.
With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes from innermost loops to outer ones, and stop at the innermost unequal one. But what if the second innermost is equal just the outermost is not:
for (int i = 0; i < 42; ++i) { // fusable
for (int j = 0; j < 21; ++j) { // not fusable
}
}
for (int i = 0; i < 42; ++i) { // fusable
for (int j = 0; j < 42; ++j) {
}
}
I think the i-loop should be comon levels, just the j-loops not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It is fixed now in this code fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not messing up with the level count is a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true
, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.
[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance
(hence level
should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing in the header file that even explains what a "separate level" actually is. Rather than having this sentence not saying anything here, consider central documentation whati t means, e.g. as class comment.
Could contain:
- Number of loops inside common loops that are "similar" (+definition of similar)
- Have different
llvm::Loop
objects but can be interpreted as a single fused loop withSeparate=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is added in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than merging two DA interpretations into the same output, have you considered keeping the separate for better understandability. E.g.:
da analyze - flow [-3 -2]! / assuming 1 fused loop: [-3 -2 -1]!
Because of how FileCheck works, this test wouldn't even needed to be changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
17552ab
to
21ef00f
Compare
A separate commit and PR has been created for the changes applied to the loop fusion pass. |
Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them. |
7466249
to
21ef00f
Compare
Reverted the merged PR. Now the loop fusion changes are in another PR #146383. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some bikeshedding left but otherwise the patch looks good
} | ||
|
||
/// inSeparateLoops - Returns true if this level is a separate level, i.e., | ||
/// performed across two separate loop nests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// performed across two separate loop nests. | |
/// performed across two separate loop nests that a treated like a single fused loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true
, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.
[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance
(hence level
should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.
What is relevant to DA is starting value and final value of induction variable. And the patch makes sure that those values satisfy certain conditions. That is enough to ensure correctness of the results. |
@1997alireza I meant to start a detailed review of the code, but I have been very busy. I will try to start it today or tomorrow. |
I apologize if I made you feel like you had to explain the same thing multiple times. However, your last comment made me realize the following new points.
This is exactly the part where I cannot be confident. I'm not sure whether it really doesn’t matter how the backedge-taken counts are computed. But maybe I'm just overthinking (which is why I called myself "nitpicky" in the previous comment), and this might actually be irrelevant. So, sorry to have made you explain at such length, but I personally still cannot feel fully confident about correctness. That said, I’ll refrain from further comments on correctness. If someone else reviews this discussion and believes it is correct, they will probably approve this PR. Thanks again for taking the time to explain it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me leave other nitpicky comments.
@@ -0,0 +1,345 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think -aa-pipeline=...
is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
;; } | ||
;; for (long int k = 1; k < n+1; k++) { | ||
;; for (long int l = 0; l < n; l++) | ||
;; *B++ = A[i + 4][j + 3][k + 2][l + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this comes from another test, but is *B++
really needed here? If not, I think it would be better to replace this with a simpler statement like A[i + 4][j + 3][k + 2][l + 1] = l;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced and removed B.
|
||
for.body9: ; preds = %for.body9.preheader, %for.body9 | ||
%l.02 = phi i64 [ %inc11, %for.body9 ], [ 0, %for.body9.preheader ] | ||
%arrayidx12 = getelementptr inbounds [100 x [100 x [100 x i64]]], ptr %A, i64 %i.013, i64 %j.09, i64 %k.07, i64 %l.02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may already know this, but the type information of getelementptr
is planned to be removed. That is, the delinearization here will probably fail in the future, which will make the test fail. To minimize the maintenance cost, I think it's better to replace %n
with constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%n is removed.
;; No information with fused loops assumption is gnerated as MIV test is not | ||
;; supported on separate loops yet. | ||
;; for (long int i = 0; i < n; i++) { | ||
;; for (long int j = 0; j < n; j++) { | ||
;; for (long int k = 0; k < n; k++) { | ||
;; for (long int l = 0; l < n; l++) | ||
;; A[i][j][k][l] = i; | ||
;; } | ||
;; for (long int k = 1; k < n+1; k++) { | ||
;; for (long int l = 0; l < n; l++) | ||
;; *B++ = A[i + 4][j + 3][k + 2][k + l]; | ||
|
||
define void @NonFusable0(i64 %n, ptr %A, ptr %B) nounwind uwtable ssp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add another case where areLoopsSimilar
fails due to the difference between backedge-taken counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case added.
Just to clarify, what I wanted to emphasize was: what happens if the (What I’ve been concerned with is, for example, how the |
Thank you and I appreciate your attention to the quality of the open source compiler. |
LoopInfo only recongizes natural loops. So a branch there cannot have a target that is somewhere in the middle of the loop. Putting that case aside, when you have a branch there, you have one of the two cases: The loop is executed or it is not. DA doesn't care about the case that loop is NOT executed. DA assumes that the loop is executed. (Same thing can be said if the two loops are separate. DA assumes both loops are executed. If one or both of them are not executed then DA might be conservative. But it is still correct). The induction variable I really suggest you make a real example that you can compile and show us that something strange happens. I don't really know what that would be, but it will really help the conversation (and save huge amount of time ) if you have a real example. If you have difficulty creating an example to demostrate your point, may be the point is not valid? |
What I was concerned about is, for example, the following case: godbolt: https://godbolt.org/z/aTjxa44qx ; for (i = 0; i < 9223372036854775806; i++) {
; if (i < 2147483640)
; for (j = 0; j < 2147483640; j++)
; a[i + j * 4294967296] = 0;
;
; for (j = 0; j < 2147483640; j++)
; a[j * 2] = 0;
; }
;
define void @f(ptr %a) {
entry:
br label %loop.i.header
loop.i.header:
%i = phi i64 [ 0 , %entry ], [ %i.next, %loop.i.latch ]
br label %loop.j.0.pr
loop.j.0.pr:
%guard.j.0 = icmp slt i64 %i, 2147483640
br i1 %guard.j.0, label %loop.j.0, label %loop.j.1
loop.j.0:
%j.0 = phi i64 [ 0, %loop.j.0.pr ], [ %j.0.next, %loop.j.0 ]
%val.0 = phi i64 [ %i, %loop.j.0.pr ], [ %val.0.next, %loop.j.0 ]
%j.0.next = add nsw i64 %j.0, 1
%idx.0 = getelementptr inbounds i8, ptr %a, i64 %val.0
store i8 0, ptr %idx.0
%val.0.next = add nsw i64 %val.0, 4294967296
%exitcond.j.0 = icmp eq i64 %j.0.next, 2147483640
br i1 %exitcond.j.0, label %loop.j.1, label %loop.j.0
loop.j.1:
%j.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %j.1.next, %loop.j.1 ]
%val.1 = phi i64 [ 0, %loop.j.0 ], [ 0, %loop.j.0.pr ], [ %val.1.next, %loop.j.1 ]
%idx.1 = getelementptr inbounds i8, ptr %a, i64 %val.1
store i8 0, ptr %idx.1
%j.1.next = add nsw i64 %j.1, 1
%val.1.next = add nsw i64 %val.1, 2
%exitcond.j.1 = icmp eq i64 %j.1.next, 2147483640
br i1 %exitcond.j.1, label %loop.i.latch, label %loop.j.1
loop.i.latch:
%i.next = add nsw i64 %i, 1
%exitcond.i = icmp eq i64 %i.next, 9223372036854775806
br i1 %exitcond.i, label %exit, label %loop.i.header
exit:
ret void
} The result of DA:
Here is the SCEV for
The result of DA already seems incorrect. I guess the root cause is that DA currently doesn't take offset ( (While working on this case, I realized that this issue is probably not limited to fusion. Perhaps the analysis should always bail out when such loop guards are present...) Anyway, I really will keep quiet this time about my concern. EDIT: The more I look at it, the more it seems like this example points to a more fundamental bug in the DA, rather than this patch. I sincerely hope I'm misunderstanding something... |
Yes, my first impression is that this example is not related to this patch. I will look into it in more depth and get back to you. Thanks for providing an example. This helps to have a much better conversation. (Even if unrelated to the patch, it is good to know more about the issues in DA). |
@@ -0,0 +1,345 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use lowercase for filename: fusable.ll (All test files seem to follow that style).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DA test files starts with uppercase letters.
dumpImp(OS); | ||
unsigned FusableLevels = getFusableLevels(); | ||
if (FusableLevels > 0) { | ||
OS << "! / assuming " << FusableLevels << " fused loop(s): "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be assuming " << FusableLevels << " loop levels fused: ";
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1997alireza I added a couple of minor cosmetic comments. Will look into it a bit more later today or tomorrow.
unsigned getFusableLevels() const override { return FusableLevels; } | ||
|
||
/// getDVEntry - Returns the DV entry associated with a regular or a | ||
/// fusable level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a period at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Need to do more investigation. But checking the loop guard in DA is almost certainly incorrect. One evidence is that the following loop with no loop guard has the same problem
It is a completely orthogonal issue. The root cause of this particular bug might be in SCEV or in DA (We are looking into it. Please wait). But anyways, not related to this patch. |
I compiled this with -O3 -fno-vectorize. Let me know if you have a problem reproducing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code LGTM assuming the issues raised so far is addressed. I will give an explicit approval after comments are addressed. I may look more and add more comments though (but I don't expect anything fundamental)
CommonLevels += FusableLevels; | ||
MaxLevels -= FusableLevels; | ||
if (FusableLevels > 0) { | ||
// Not all tests are handled yet over fusable loops | ||
// Revoke if there are any tests other than ZIV, SIV or RDIV | ||
for (unsigned P = 0; P < Pairs; ++P) { | ||
SmallBitVector Loops; | ||
Subscript::ClassificationKind TestClass = | ||
classifyPair(Pair[P].Src, LI->getLoopFor(Src->getParent()), | ||
Pair[P].Dst, LI->getLoopFor(Dst->getParent()), Loops); | ||
|
||
if (TestClass != Subscript::ZIV && TestClass != Subscript::SIV && | ||
TestClass != Subscript::RDIV) { | ||
// Revert the levels to not consider the fusable levels | ||
CommonLevels -= FusableLevels; | ||
MaxLevels += FusableLevels; | ||
FusableLevels = 0; | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to increment commonLevels and then decrement it. Can you write the code so that it is only incremented when it should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classifyPair uses CommonLevels and MaxLevels. To be able to identify the dependency types considering the fusable levels, the value of levels must be modified before calling classifyPair.
if (FusableLevels > 0) { | ||
Result.Consistent = false; | ||
FusableLoopsConsidered++; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other fields of Result is updated later in line 4135. Also there is no use of FusableLoopsConsidered
between here and there. Is there a reason that this code is separate from the other conditional on FusableLevels > 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relocated the Result update line.
|
||
// Returns true if two loops are the same or they have the same tripcount and | ||
// depth | ||
bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have a notion of FusableLoops
everywhere in the code. What are the conditions for Fusable
? I believe they are part of the conditions checked here. So
- Separate conditions for fusability and name them properly in a different function.
Similar
is vague and general. Use a more specific name.Fuseable
is not a good name. We need to find a better name that better reflects the conditions that you check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the terminology by defining a new term SIDLevels
, which indicates the levels with the Same Iteration and Depth.
All required conditions to consider two loops as SID are in function isSID
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. You still need to change your commit message. Write your commit message so that someone who doesn't know anything about your patch understands it. (e.g. Don't use acronyms without first introducing them in the commit messag).
Also change "Same Iteration" to "Same Bounds" or "Same Iteration Space". Make sure the acronym is introduced explicitly and clearly in the comments of the code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the commit message as well.
I changed the acronym to "SameSD" standing for same iteration space and depth.
81d0300
to
b06447f
Compare
SameSD stands for Same iteration Space and Depth indicating the loops that are in the same nesting depth and have the same backedge count. When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
b06447f
to
b41a42f
Compare
(In the hope of saving your time) It is known that DA has several issues (some of which you may find in GitHub issues), such as the following:
These issues violate the assumptions of the underlying algorithm, which can lead to incorrect results. Your example I apologize for my bad communication (the more I looked into DA, the more bugs I found, so I became overly nervous...). However, I found the discussion very meaningful, and I sincerely appreciate it. Thank you. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM as well.
No worries. Let's continue the work and we can also help to improve the quality of DA. I didn't know about your issue, and I opened a new one. We can close it if it is a duplicate. We can continue the discussion on the issue pages. But I strongly disagree with checking the loop guards. I will add a more detailed comment on one of the issue pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I’d prefer to avoid adding new features to an already broken pass, but the code itself LGTM.
FWIW I agree, increasing the complexity before fixing the soundness problem makes it even more difficult to enable any pass depending on DA
I suggest this kind of comment is made when RFC is posted or in the beginning of code review. I think it is also important that development is not a frustrating experience for the people invovled. But going forward, we are OK with focusing on correctness issues of DA. We are also going to help in that direction. |
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.