-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[LV] Compute SCEV for memcheck before unlinking #160326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When generating runtime memory checks for outer loops, we split blocks and later unlink them, making the memcheck block unreachable. For instructions in unreachable blocks, ScalarEvolution returns an unknown/poison SCEV, which is treated as a constant and thus loop-invariant. The cost model then assumes the check can be hoisted and underestimates its cost. Set OuterLoop early in GeneratedRTChecks::create and compute the SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the cached expression rather than a poison constant.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Igor Kirillov (igogo-x86) ChangesWhen generating runtime memory checks for outer loops, we split blocks and later unlink them, making the memcheck block unreachable. For instructions in unreachable blocks, ScalarEvolution returns an unknown/poison SCEV, which is treated as a constant and thus loop-invariant. The cost model then assumes the check can be hoisted and underestimates its cost. See this code in
Set OuterLoop early in GeneratedRTChecks::create and compute the SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the cached expression rather than a poison constant. Full diff: https://github.com/llvm/llvm-project/pull/160326.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ca092dcfcb492..575f45b051cf6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1807,6 +1807,10 @@ class GeneratedRTChecks {
BasicBlock *LoopHeader = L->getHeader();
BasicBlock *Preheader = L->getLoopPreheader();
+ // Outer loop is used as part of later cost calculations (e.g. to
+ // determine if runtime checks are loop-invariant and can be hoisted).
+ OuterLoop = L->getParentLoop();
+
// Use SplitBlock to create blocks for SCEV & memory runtime checks to
// ensure the blocks are properly added to LoopInfo & DominatorTree. Those
// may be used by SCEVExpander. The blocks will be un-linked from their
@@ -1850,6 +1854,11 @@ class GeneratedRTChecks {
assert(MemRuntimeCheckCond &&
"no RT checks generated although RtPtrChecking "
"claimed checks are required");
+ // Compute SCEV while the block is reachable.
+ // After unlinking, SCEV returns unknown/poison (constant -> invariant),
+ // which makes getCost() wrongly discount hoisted checks.
+ if (OuterLoop)
+ PSE.getSE()->getSCEV(MemRuntimeCheckCond);
}
SCEVExp.eraseDeadInstructions(SCEVCheckCond);
@@ -1889,8 +1898,6 @@ class GeneratedRTChecks {
LI->removeBlock(SCEVCheckBlock);
}
- // Outer loop is used as part of the later cost calculations.
- OuterLoop = L->getParentLoop();
}
InstructionCost getCost() {
|
@david-arm Also worth checking if this patch doesn't open up the regression fixed by #160326 |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Isn't #160326 this PR, which hasn't landed yet? |
Can you add a test case please? |
Ah, yes indeed - it was #76034. I added a test and it showed that my approach was also incorrect, so I had to go deeper and ask pointers for invariance |
|
||
/// Add code that checks at runtime if the accessed arrays in \p PointerChecks | ||
/// overlap. Returns the final comparator value or NULL if no check is needed. | ||
/// If \p HoistRuntimeChecks and \p TheLoop has a parent, sets \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Perhaps this should be something like:
/// If \p HoistRuntimeChecks is true and \p TheLoop has a parent, then \p HoistRuntimeChecks
/// is set to true when all checks are outer-loop invariant, i.e. hoistable,
/// or false otherwise.
What do you think?
Value *MemRuntimeCheckCond = nullptr; | ||
|
||
/// True if memory checks are outer-loop invariant (hoistable). | ||
/// Used to discount check cost for inner loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Perhaps clearer written as 'Used to discount the cost of performing runtime checks for inner loops'?
IsConflict = ChkBuilder.CreateOr(IsConflict, IsNegativeStride); | ||
} | ||
|
||
if (AllChecksHoisted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure all these checks are necessary. Can't you just do a single check at the very end before we return from the function, i.e.
auto *SE = Exp.getSE();
auto *OuterLoop = TheLoop->getParentLoop();
AllChecksHoisted = false;
if (HoistRuntimeChecks && OuterLoop != nullptr)
AllChecksHoisted = SE->isLoopInvariant(SE->getSCEV(MemoryRuntimeCheck));
Exp.eraseDeadInstructions(MemoryRuntimeCheck);
return MemoryRuntimeCheck;
const SmallVectorImpl<RuntimePointerCheck> &PointerChecks, | ||
SCEVExpander &Expander, bool HoistRuntimeChecks = false); | ||
SCEVExpander &Expander, bool HoistRuntimeChecks, | ||
bool &AllChecksHoisted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible/better to return this together with the runtime check using std::pair<> or somerhing like that?
} | ||
|
||
if (AllChecksHoisted) { | ||
AllChecksHoisted &= SE->isLoopInvariant(SE->getSCEV(A.Start), OuterLoop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid going from IR value -> SCEV and just check the entries in PointerChecks
for invariance?
Constructing SCEVs here for values that may be removed later may leave dangling IR value entries in the SCEV expression cache?
When generating runtime memory checks for outer loops, we split blocks and later unlink them, making the memcheck block unreachable. For instructions in unreachable blocks, ScalarEvolution returns an unknown/poison SCEV, which is treated as a constant and thus loop-invariant. The cost model then assumes the check can be hoisted and underestimates its cost. See this code in
GeneratedRTChecks::getCost
:Set OuterLoop early in GeneratedRTChecks::create and compute the SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the cached expression rather than a poison constant.