Skip to content

Conversation

igogo-x86
Copy link
Contributor

When generating runtime memory checks for outer loops, we split blocks and later unlink them, making the memcheck block unreachable. For instructions in unreachable blocks, ScalarEvolution returns an unknown/poison SCEV, which is treated as a constant and thus loop-invariant. The cost model then assumes the check can be hoisted and underestimates its cost. See this code in GeneratedRTChecks::getCost:

        const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
        if (SE->isLoopInvariant(Cond, OuterLoop)) {

Set OuterLoop early in GeneratedRTChecks::create and compute the SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the cached expression rather than a poison constant.

When generating runtime memory checks for outer loops, we split
blocks and later unlink them, making the memcheck block unreachable.
For instructions in unreachable blocks, ScalarEvolution returns an
unknown/poison SCEV, which is treated as a constant and thus
loop-invariant. The cost model then assumes the check can be hoisted
and underestimates its cost.

Set OuterLoop early in GeneratedRTChecks::create and compute the
SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the
cached expression rather than a poison constant.
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Igor Kirillov (igogo-x86)

Changes

When generating runtime memory checks for outer loops, we split blocks and later unlink them, making the memcheck block unreachable. For instructions in unreachable blocks, ScalarEvolution returns an unknown/poison SCEV, which is treated as a constant and thus loop-invariant. The cost model then assumes the check can be hoisted and underestimates its cost. See this code in GeneratedRTChecks::getCost:

        const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
        if (SE->isLoopInvariant(Cond, OuterLoop)) {

Set OuterLoop early in GeneratedRTChecks::create and compute the SCEV for MemRuntimeCheckCond before unlinking, so getCost() sees the cached expression rather than a poison constant.


Full diff: https://github.com/llvm/llvm-project/pull/160326.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+9-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ca092dcfcb492..575f45b051cf6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1807,6 +1807,10 @@ class GeneratedRTChecks {
     BasicBlock *LoopHeader = L->getHeader();
     BasicBlock *Preheader = L->getLoopPreheader();
 
+    // Outer loop is used as part of later cost calculations (e.g. to
+    // determine if runtime checks are loop-invariant and can be hoisted).
+    OuterLoop = L->getParentLoop();
+
     // Use SplitBlock to create blocks for SCEV & memory runtime checks to
     // ensure the blocks are properly added to LoopInfo & DominatorTree. Those
     // may be used by SCEVExpander. The blocks will be un-linked from their
@@ -1850,6 +1854,11 @@ class GeneratedRTChecks {
       assert(MemRuntimeCheckCond &&
              "no RT checks generated although RtPtrChecking "
              "claimed checks are required");
+      // Compute SCEV while the block is reachable.
+      // After unlinking, SCEV returns unknown/poison (constant -> invariant),
+      // which makes getCost() wrongly discount hoisted checks.
+      if (OuterLoop)
+        PSE.getSE()->getSCEV(MemRuntimeCheckCond);
     }
 
     SCEVExp.eraseDeadInstructions(SCEVCheckCond);
@@ -1889,8 +1898,6 @@ class GeneratedRTChecks {
       LI->removeBlock(SCEVCheckBlock);
     }
 
-    // Outer loop is used as part of the later cost calculations.
-    OuterLoop = L->getParentLoop();
   }
 
   InstructionCost getCost() {

@igogo-x86
Copy link
Contributor Author

@david-arm Also worth checking if this patch doesn't open up the regression fixed by #160326

Copy link

github-actions bot commented Sep 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@david-arm
Copy link
Contributor

@david-arm Also worth checking if this patch doesn't open up the regression fixed by #160326

Isn't #160326 this PR, which hasn't landed yet?

@david-arm
Copy link
Contributor

Can you add a test case please?

@igogo-x86
Copy link
Contributor Author

Ah, yes indeed - it was #76034. I added a test and it showed that my approach was also incorrect, so I had to go deeper and ask pointers for invariance


/// Add code that checks at runtime if the accessed arrays in \p PointerChecks
/// overlap. Returns the final comparator value or NULL if no check is needed.
/// If \p HoistRuntimeChecks and \p TheLoop has a parent, sets \p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps this should be something like:

  /// If \p HoistRuntimeChecks is true and \p TheLoop has a parent, then \p HoistRuntimeChecks
  /// is set to true when all checks are outer-loop invariant, i.e. hoistable,
  /// or false otherwise.

What do you think?

Value *MemRuntimeCheckCond = nullptr;

/// True if memory checks are outer-loop invariant (hoistable).
/// Used to discount check cost for inner loops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps clearer written as 'Used to discount the cost of performing runtime checks for inner loops'?

IsConflict = ChkBuilder.CreateOr(IsConflict, IsNegativeStride);
}

if (AllChecksHoisted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all these checks are necessary. Can't you just do a single check at the very end before we return from the function, i.e.

  auto *SE = Exp.getSE();
  auto *OuterLoop = TheLoop->getParentLoop();
  AllChecksHoisted = false;
  if (HoistRuntimeChecks && OuterLoop != nullptr)
    AllChecksHoisted = SE->isLoopInvariant(SE->getSCEV(MemoryRuntimeCheck));

  Exp.eraseDeadInstructions(MemoryRuntimeCheck);
  return MemoryRuntimeCheck;

const SmallVectorImpl<RuntimePointerCheck> &PointerChecks,
SCEVExpander &Expander, bool HoistRuntimeChecks = false);
SCEVExpander &Expander, bool HoistRuntimeChecks,
bool &AllChecksHoisted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible/better to return this together with the runtime check using std::pair<> or somerhing like that?

}

if (AllChecksHoisted) {
AllChecksHoisted &= SE->isLoopInvariant(SE->getSCEV(A.Start), OuterLoop);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid going from IR value -> SCEV and just check the entries in PointerChecks for invariance?

Constructing SCEVs here for values that may be removed later may leave dangling IR value entries in the SCEV expression cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants