Skip to content

Conversation

@fiigii
Copy link
Contributor

@fiigii fiigii commented Nov 26, 2025

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.
  • Correctness enhencment
    • Fix a correctness issue that reusing instructions with the same SCEV may introduce poison.

fiigii and others added 2 commits November 25, 2025 17:33
This PR implements parts of
llvm#162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
- **Path compression for better ILP**:
- Compress chains of rewrites to a deeper dominating basis when a
constant delta exists along the path, reducing dependent bumps on
critical paths.
- **Dependency-aware rewrite ordering**:
- Build a dependency graph (basis, stride, variable delta producers) and
rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial
strength reduction.
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-amdgpu

Author: Fei Peng (fiigii)

Changes

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.
  • Correctness enhencment
    • Fix a correctness issue that reusing instructions with the same SCEV may introduce poison.

Patch is 181.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169614.diff

18 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp (+864-276)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+9-11)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-reassociate-bug.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+82-79)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+233-234)
  • (modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll (+133-196)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/pr23975.ll (+1-1)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/reassociate-geps-and-slsr-addrspace.ll (+5-5)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll (+271)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-invalid.ll (+141)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-var-delta.ll (+70)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll (+35)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/pick-candidate.ll (+32)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-add.ll (+120)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll (+156-7)
diff --git a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
index e94ad1999e32a..57d9bf5aed90b 100644
--- a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
@@ -12,17 +12,16 @@
 // effective in simplifying arithmetic statements derived from an unrolled loop.
 // It can also simplify the logic of SeparateConstOffsetFromGEP.
 //
-// There are many optimizations we can perform in the domain of SLSR. This file
-// for now contains only an initial step. Specifically, we look for strength
-// reduction candidates in the following forms:
+// There are many optimizations we can perform in the domain of SLSR.
+// We look for strength reduction candidates in the following forms:
 //
-// Form 1: B + i * S
-// Form 2: (B + i) * S
-// Form 3: &B[i * S]
+// Form Add: B + i * S
+// Form Mul: (B + i) * S
+// Form GEP: &B[i * S]
 //
 // where S is an integer variable, and i is a constant integer. If we found two
 // candidates S1 and S2 in the same form and S1 dominates S2, we may rewrite S2
-// in a simpler way with respect to S1. For example,
+// in a simpler way with respect to S1 (index delta). For example,
 //
 // S1: X = B + i * S
 // S2: Y = B + i' * S   => X + (i' - i) * S
@@ -35,8 +34,29 @@
 //
 // Note: (i' - i) * S is folded to the extent possible.
 //
+// For Add and GEP forms, we can also rewrite a candidate in a simpler way
+// with respect to other dominating candidates if their B or S are different
+// but other parts are the same. For example,
+//
+// Base Delta:
+// S1: X = B  + i * S
+// S2: Y = B' + i * S   => X + (B' - B)
+//
+// S1: X = &B [i * S]
+// S2: Y = &B'[i * S]   => X + (B' - B)
+//
+// Stride Delta:
+// S1: X = B + i * S
+// S2: Y = B + i * S'   => X + i * (S' - S)
+//
+// S1: X = &B[i * S]
+// S2: Y = &B[i * S']   => X + i * (S' - S)
+//
+// PS: Stride delta rewrite on Mul form is usually non-profitable, and Base
+// delta rewrite sometimes is profitable, so we do not support them on Mul.
+//
 // This rewriting is in general a good idea. The code patterns we focus on
-// usually come from loop unrolling, so (i' - i) * S is likely the same
+// usually come from loop unrolling, so the delta is likely the same
 // across iterations and can be reused. When that happens, the optimized form
 // takes only one add starting from the second iteration.
 //
@@ -47,19 +67,14 @@
 // TODO:
 //
 // - Floating point arithmetics when fast math is enabled.
-//
-// - SLSR may decrease ILP at the architecture level. Targets that are very
-//   sensitive to ILP may want to disable it. Having SLSR to consider ILP is
-//   left as future work.
-//
-// - When (i' - i) is constant but i and i' are not, we could still perform
-//   SLSR.
 
 #include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DepthFirstIterator.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
@@ -86,16 +101,19 @@
 #include <cstdint>
 #include <limits>
 #include <list>
+#include <queue>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 
+#define DEBUG_TYPE "slsr"
+
 static const unsigned UnknownAddressSpace =
     std::numeric_limits<unsigned>::max();
 
 DEBUG_COUNTER(StraightLineStrengthReduceCounter, "slsr-counter",
-              "Controls whether rewriteCandidateWithBasis is executed.");
+              "Controls whether rewriteCandidate is executed.");
 
 namespace {
 
@@ -142,15 +160,23 @@ class StraightLineStrengthReduce {
       GEP,     // &B[..][i * S][..]
     };
 
+    enum DKind {
+      InvalidDelta, // reserved for the default constructor
+      IndexDelta,   // Delta is a constant from Index
+      BaseDelta,    // Delta is a constant or variable from Base
+      StrideDelta,  // Delta is a constant or variable from Stride
+    };
+
     Candidate() = default;
     Candidate(Kind CT, const SCEV *B, ConstantInt *Idx, Value *S,
-              Instruction *I)
-        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I) {}
+              Instruction *I, const SCEV *StrideSCEV)
+        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
+          StrideSCEV(StrideSCEV) {}
 
     Kind CandidateKind = Invalid;
 
     const SCEV *Base = nullptr;
-
+    // TODO: Swap Index and Stride's name.
     // Note that Index and Stride of a GEP candidate do not necessarily have the
     // same integer type. In that case, during rewriting, Stride will be
     // sign-extended or truncated to Index's type.
@@ -177,22 +203,164 @@ class StraightLineStrengthReduce {
     // Points to the immediate basis of this candidate, or nullptr if we cannot
     // find any basis for this candidate.
     Candidate *Basis = nullptr;
+
+    DKind DeltaKind = InvalidDelta;
+
+    // Store SCEV of Stride to compute delta from different strides
+    const SCEV *StrideSCEV = nullptr;
+
+    // Points to (Y - X) that will be used to rewrite this candidate.
+    Value *Delta = nullptr;
+
+    /// Cost model: Evaluate the computational efficiency of the candidate.
+    ///
+    /// Efficiency levels (higher is better):
+    ///   ZeroInst (5) - [Variable] or [Const]
+    ///   OneInstOneVar (4) - [Variable + Const] or [Variable * Const]
+    ///   OneInstTwoVar (3) - [Variable + Variable] or [Variable * Variable]
+    ///   TwoInstOneVar (2) - [Const + Const * Variable]
+    ///   TwoInstTwoVar (1) - [Variable + Const * Variable]
+    enum EfficiencyLevel : unsigned {
+      Unknown = 0,
+      TwoInstTwoVar = 1,
+      TwoInstOneVar = 2,
+      OneInstTwoVar = 3,
+      OneInstOneVar = 4,
+      ZeroInst = 5
+    };
+
+    static EfficiencyLevel
+    getComputationEfficiency(Kind CandidateKind, const ConstantInt *Index,
+                             const Value *Stride, const SCEV *Base = nullptr) {
+      bool IsConstantBase = false;
+      bool IsZeroBase = false;
+      // When evaluating the efficiency of a rewrite, if the Base's SCEV is
+      // not available, conservatively assume the base is not constant.
+      if (auto *ConstBase = dyn_cast_or_null<SCEVConstant>(Base)) {
+        IsConstantBase = true;
+        IsZeroBase = ConstBase->getValue()->isZero();
+      }
+
+      bool IsConstantStride = isa<ConstantInt>(Stride);
+      bool IsZeroStride =
+          IsConstantStride && cast<ConstantInt>(Stride)->isZero();
+      // All constants
+      if (IsConstantBase && IsConstantStride)
+        return ZeroInst;
+
+      // (Base + Index) * Stride
+      if (CandidateKind == Mul) {
+        if (IsZeroStride)
+          return ZeroInst;
+        if (Index->isZero())
+          return (IsConstantStride || IsConstantBase) ? OneInstOneVar
+                                                      : OneInstTwoVar;
+
+        if (IsConstantBase)
+          return IsZeroBase && (Index->isOne() || Index->isMinusOne())
+                     ? ZeroInst
+                     : OneInstOneVar;
+
+        if (IsConstantStride) {
+          auto *CI = cast<ConstantInt>(Stride);
+          return (CI->isOne() || CI->isMinusOne()) ? OneInstOneVar
+                                                   : TwoInstOneVar;
+        }
+        return TwoInstTwoVar;
+      }
+
+      // Base + Index * Stride
+      assert(CandidateKind == Add || CandidateKind == GEP);
+      if (Index->isZero() || IsZeroStride)
+        return ZeroInst;
+
+      bool IsSimpleIndex = Index->isOne() || Index->isMinusOne();
+
+      if (IsConstantBase)
+        return IsZeroBase ? (IsSimpleIndex ? ZeroInst : OneInstOneVar)
+                          : (IsSimpleIndex ? OneInstOneVar : TwoInstOneVar);
+
+      if (IsConstantStride)
+        return IsZeroStride ? ZeroInst : OneInstOneVar;
+
+      if (IsSimpleIndex)
+        return OneInstTwoVar;
+
+      return TwoInstTwoVar;
+    }
+
+    // Evaluate if the given delta is profitable to rewrite this candidate.
+    bool isProfitableRewrite(const Value *Delta, const DKind DeltaKind) const {
+      // This function cannot accurately evaluate the profit of whole expression
+      // with context. A candidate (B + I * S) cannot express whether this
+      // instruction needs to compute on its own (I * S), which may be shared
+      // with other candidates or may need instructions to compute.
+      // If the rewritten form has the same strength, still rewrite to
+      // (X + Delta) since it may expose more CSE opportunities on Delta, as
+      // unrolled loops usually have identical Delta for each unrolled body.
+      //
+      // Note, this function should only be used on Index Delta rewrite.
+      // Base and Stride delta need context info to evaluate the register
+      // pressure impact from variable delta.
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) <=
+             getRewriteEfficiency(Delta, DeltaKind);
+    }
+
+    // Evaluate the rewrite efficiency of this candidate with its Basis
+    EfficiencyLevel getRewriteEfficiency() const {
+      return Basis ? getRewriteEfficiency(Delta, DeltaKind) : Unknown;
+    }
+
+    // Evaluate the rewrite efficiency of this candidate with a given delta
+    EfficiencyLevel getRewriteEfficiency(const Value *Delta,
+                                         const DKind DeltaKind) const {
+      switch (DeltaKind) {
+      case BaseDelta: // [X + Delta]
+        return getComputationEfficiency(
+            CandidateKind,
+            ConstantInt::get(cast<IntegerType>(Delta->getType()), 1), Delta);
+      case StrideDelta: // [X + Index * Delta]
+        return getComputationEfficiency(CandidateKind, Index, Delta);
+      case IndexDelta: // [X + Delta * Stride]
+        return getComputationEfficiency(CandidateKind, cast<ConstantInt>(Delta),
+                                        Stride);
+      default:
+        return Unknown;
+      }
+    }
+
+    bool isHighEfficiency() const {
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) >=
+             OneInstOneVar;
+    }
+
+    // Verify that this candidate has valid delta components relative to the
+    // basis
+    bool hasValidDelta(const Candidate &Basis) const {
+      switch (DeltaKind) {
+      case IndexDelta:
+        // Index differs, Base and Stride must match
+        return Base == Basis.Base && StrideSCEV == Basis.StrideSCEV;
+      case StrideDelta:
+        // Stride differs, Base and Index must match
+        return Base == Basis.Base && Index == Basis.Index;
+      case BaseDelta:
+        // Base differs, Stride and Index must match
+        return StrideSCEV == Basis.StrideSCEV && Index == Basis.Index;
+      default:
+        return false;
+      }
+    }
   };
 
   bool runOnFunction(Function &F);
 
 private:
-  // Returns true if Basis is a basis for C, i.e., Basis dominates C and they
-  // share the same base and stride.
-  bool isBasisFor(const Candidate &Basis, const Candidate &C);
-
+  // Fetch straight-line basis for rewriting C, update C.Basis to point to it,
+  // and store the delta between C and its Basis in C.Delta.
+  void setBasisAndDeltaFor(Candidate &C);
   // Returns whether the candidate can be folded into an addressing mode.
-  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI,
-                  const DataLayout *DL);
-
-  // Returns true if C is already in a simplest form and not worth being
-  // rewritten.
-  bool isSimplestForm(const Candidate &C);
+  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI);
 
   // Checks whether I is in a candidate form. If so, adds all the matching forms
   // to Candidates, and tries to find the immediate basis for each of them.
@@ -216,12 +384,6 @@ class StraightLineStrengthReduce {
   // Allocate candidates and find bases for GetElementPtr instructions.
   void allocateCandidatesAndFindBasisForGEP(GetElementPtrInst *GEP);
 
-  // A helper function that scales Idx with ElementSize before invoking
-  // allocateCandidatesAndFindBasis.
-  void allocateCandidatesAndFindBasisForGEP(const SCEV *B, ConstantInt *Idx,
-                                            Value *S, uint64_t ElementSize,
-                                            Instruction *I);
-
   // Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
   // basis.
   void allocateCandidatesAndFindBasis(Candidate::Kind CT, const SCEV *B,
@@ -229,13 +391,7 @@ class StraightLineStrengthReduce {
                                       Instruction *I);
 
   // Rewrites candidate C with respect to Basis.
-  void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
-
-  // A helper function that factors ArrayIdx to a product of a stride and a
-  // constant index, and invokes allocateCandidatesAndFindBasis with the
-  // factorings.
-  void factorArrayIndex(Value *ArrayIdx, const SCEV *Base, uint64_t ElementSize,
-                        GetElementPtrInst *GEP);
+  void rewriteCandidate(const Candidate &C);
 
   // Emit code that computes the "bump" from Basis to C.
   static Value *emitBump(const Candidate &Basis, const Candidate &C,
@@ -247,12 +403,205 @@ class StraightLineStrengthReduce {
   TargetTransformInfo *TTI = nullptr;
   std::list<Candidate> Candidates;
 
-  // Temporarily holds all instructions that are unlinked (but not deleted) by
-  // rewriteCandidateWithBasis. These instructions will be actually removed
-  // after all rewriting finishes.
-  std::vector<Instruction *> UnlinkedInstructions;
+  // Map from SCEV to instructions that represent the value,
+  // instructions are sorted in depth-first order.
+  DenseMap<const SCEV *, SmallSetVector<Instruction *, 2>> SCEVToInsts;
+
+  // Record the dependency between instructions. If C.Basis == B, we would have
+  // {B.Ins -> {C.Ins, ...}}.
+  MapVector<Instruction *, std::vector<Instruction *>> DependencyGraph;
+
+  // Map between each instruction and its possible candidates.
+  DenseMap<Instruction *, SmallVector<Candidate *, 3>> RewriteCandidates;
+
+  // All instructions that have candidates sort in topological order based on
+  // dependency graph, from roots to leaves.
+  std::vector<Instruction *> SortedCandidateInsts;
+
+  // Record all instructions that are already rewritten and will be removed
+  // later.
+  std::vector<Instruction *> DeadInstructions;
+
+  // Classify candidates against Delta kind
+  class CandidateDictTy {
+  public:
+    using CandsTy = SmallVector<Candidate *, 8>;
+    using BBToCandsTy = DenseMap<const BasicBlock *, CandsTy>;
+
+  private:
+    // Index delta Basis must have the same (Base, StrideSCEV, Inst.Type)
+    using IndexDeltaKeyTy = std::tuple<const SCEV *, const SCEV *, Type *>;
+    DenseMap<IndexDeltaKeyTy, BBToCandsTy> IndexDeltaCandidates;
+
+    // Base delta Basis must have the same (StrideSCEV, Index, Inst.Type)
+    using BaseDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<BaseDeltaKeyTy, BBToCandsTy> BaseDeltaCandidates;
+
+    // Stride delta Basis must have the same (Base, Index, Inst.Type)
+    using StrideDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<StrideDeltaKeyTy, BBToCandsTy> StrideDeltaCandidates;
+
+  public:
+    // TODO: Disable index delta on GEP after we completely move
+    // from typed GEP to PtrAdd.
+    const BBToCandsTy *getCandidatesWithDeltaKind(const Candidate &C,
+                                                  Candidate::DKind K) const {
+      assert(K != Candidate::InvalidDelta);
+      if (K == Candidate::IndexDelta) {
+        IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, C.Ins->getType());
+        auto It = IndexDeltaCandidates.find(IndexDeltaKey);
+        if (It != IndexDeltaCandidates.end())
+          return &It->second;
+      } else if (K == Candidate::BaseDelta) {
+        BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, C.Ins->getType());
+        auto It = BaseDeltaCandidates.find(BaseDeltaKey);
+        if (It != BaseDeltaCandidates.end())
+          return &It->second;
+      } else {
+        assert(K == Candidate::StrideDelta);
+        StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, C.Ins->getType());
+        auto It = StrideDeltaCandidates.find(StrideDeltaKey);
+        if (It != StrideDeltaCandidates.end())
+          return &It->second;
+      }
+      return nullptr;
+    }
+
+    // Pointers to C must remain valid until CandidateDict is cleared.
+    void add(Candidate &C) {
+      Type *ValueType = C.Ins->getType();
+      BasicBlock *BB = C.Ins->getParent();
+      IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, ValueType);
+      BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, ValueType);
+      StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, ValueType);
+      IndexDeltaCandidates[IndexDeltaKey][BB].push_back(&C);
+      BaseDeltaCandidates[BaseDeltaKey][BB].push_back(&C);
+      StrideDeltaCandidates[StrideDeltaKey][BB].push_back(&C);
+    }
+    // Remove all mappings from set
+    void clear() {
+      IndexDeltaCandidates.clear();
+      BaseDeltaCandidates.clear();
+      StrideDeltaCandidates.clear();
+    }
+  } CandidateDict;
+
+  const SCEV *getAndRecordSCEV(Value *V) {
+    auto *S = SE->getSCEV(V);
+    if (isa<Instruction>(V) && !(isa<SCEVCouldNotCompute>(S) ||
+                                 isa<SCEVUnknown>(S) || isa<SCEVConstant>(S)))
+      SCEVToInsts[S].insert(cast<Instruction>(V));
+
+    return S;
+  }
+
+  // Get the nearest instruction before CI that represents the value of S,
+  // return nullptr if no instruction is associated with S or S is not a
+  // reusable expression.
+  Value *getNearestValueOfSCEV(const SCEV *S, const Instruction *CI) const {
+    if (isa<SCEVCouldNotCompute>(S))
+      return nullptr;
+
+    if (auto *SU = dyn_cast<SCEVUnknown>(S))
+      return SU->getValue();
+    if (auto *SC = dyn_cast<SCEVConstant>(S))
+      return SC->getValue();
+
+    auto It = SCEVToInsts.find(S);
+    if (It == SCEVToInsts.end())
+      return nullptr;
+
+    // Instructions are sorted in depth-first order, so search for the nearest
+    // instruction by walking the list in reverse order.
+    for (Instruction *I : reverse(It->second))
+      if (DT->dominates(I, CI))
+        return I;
+
+    return nullptr;
+  }
+
+  struct DeltaInfo {
+    Candidate *Cand;
+    Candidate::DKind DeltaKind;
+    Value *Delta;
+
+    DeltaInfo()
+        : Cand(nullptr), DeltaKind(Candidate::InvalidDelta), Delta(nullptr) {}
+    DeltaInfo(Candidate *Cand, Candidate::DKind DeltaKind, Value *Delta)
+        : Cand(Cand), DeltaKind(DeltaKind), Delta(Delta) {}
+    operator bool() const { return Cand != nullptr; }
+  };
+
+  friend raw_ostream &operator<<(raw_ostream &OS, const DeltaInfo &DI);
+
+  DeltaInfo compressPath(Candidate &C, Candidate *Basis) const;
+
+  Candidate *pickRewriteCandidate(Instruction *I) const;
+  void sortCandidateInstructions();
+  Value *getDelta(const Candidate &C, const Candidate &Basis,
+                  Candidate::DKind K) const;
+  static bool isSimilar(Candidate &C, Candidate &Basis, Candidate::DKind K);
+
+  // Add Basis -> C in DependencyGraph and propagate
+  // C.Stride and C.Delta's dependency to C
+  void addDependency(Candidate &C, Candidate *Basis) {
+    if (Basis)
+      DependencyGraph[Basis->Ins].emplace_back(C.Ins);
+
+    // If any candidate of Inst has a basis, then Inst will be rewritten,
+    // C must be rewritten after rewriting Inst, so we need to propagate
+    // the dependency to C
+    auto PropagateDependency = [&](Instruction *Inst) {
+      if (auto CandsIt = RewriteCandidates.find(Inst);
+          CandsIt != RewriteCandidates.end() &&
+          llvm::any_of(CandsIt->second,
+                       [](Candidate *Cand) { return Cand->Basis; }))
+        DependencyGraph[Inst].emplace_back(C.Ins);
+    };
+
+    // If C has a variable delta and the delta is a candidate,
+    // propagate its dependency to C
+    if (auto *DeltaInst = dyn_cast_or_null<Instruction>(C.Delta))
+      PropagateDependency(DeltaInst);
+
+    // If the stride is a candidate, propagate its dependency to C
+    if (auto *StrideInst = dyn_cast<Instruction>(C.Stride))
+      PropagateDependency(StrideInst);
+  };
 };
 
+inline raw_ostream &operato...
[truncated]

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fiigii
Copy link
Contributor Author

fiigii commented Nov 26, 2025

@akshayrdeodhar @dakersnar @Artem-B @lialan PTAL

@fiigii
Copy link
Contributor Author

fiigii commented Nov 26, 2025

This commit (6519805) solves the correctness issue on top of the original change. It calls SE->canReuseInstruction to check if reusing an instruction may introduce more poison than its SCEV. The commit also does a little refactoring to avoid calling canReuseInstruction at multiple positions.

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

🐧 Linux x64 Test Results

  • 186632 tests passed
  • 4888 tests skipped

fiigii and others added 3 commits November 26, 2025 00:27
This patch fixes:

  llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp:689:45:
  error: lambda capture 'DT' is not used
  [-Werror,-Wunused-lambda-capture]

  llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp:584:1:
  error: unused function 'operator<<' [-Werror,-Wunused-function]
@Artem-B Artem-B self-requested a review November 26, 2025 19:02
Copy link
Member

Artem-B commented Nov 26, 2025

Considering that there appear to be nontrivial changes it's not exactly a re-land. It's more of a new iteration.

Would it be possible for you to re-stack you changes in a way that the original reverted patch is the starting point, so we can see what exactly has changed to fix the problem, so we do not have to go over complete patch of 2K lines of changes, again, and only deal with the incremental changes.

Including reviewers of the original patch would also be useful.

Copy link
Member

Artem-B commented Nov 26, 2025

Would it be possible for you to re-stack you changes in a way that the original reverted patch is the starting point

Hmm. looks like you already did. It's just graphite appears to hide the base, but github shows it:

image.png

image.png

@fiigii
Copy link
Contributor Author

fiigii commented Nov 26, 2025

@Artem-B Yes, please focus on the fix commit. That commit has a unit test to demonstrate the correctness issue.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fiigii
Copy link
Contributor Author

fiigii commented Nov 26, 2025

@dakersnar @Artem-B Thanks for reviewing. After this PR, the SLSR output will become more conservative for correctness, i.e., dropping more inbounds or nsw/nuw. I am concerned that the latter optimizations may be impacted. For example, in open source LLVM's NVPTX target, load-store-vectorization runs after SLSR. More conservative overflow annotations may hurt alias analysis and consecutive analysis (SCEV), and may prevent some vectorization that is one of the most improtant optimizations for GPU. I suggest you give it a try to run LSV a bit early, before SLSR.

Additionally, I believe this pass still has some correctness issues regarding integer overflow, which have existed before my redesign. I will open follow-up PRs to solve them, and we will see more conservative output (fewer inbounds!).

@Artem-B
Copy link
Member

Artem-B commented Nov 27, 2025

If you suspect there may be unintended side effects of this patch, would it be feasible to introduce a command line option to preserve current behavior? If would give users an escape hatch if/when they run into trouble. It would also make A/B comparisons much easier, as it could be done with the same compiler, without having to rebuild it. We could also start with the patch disabled, and enable it only for benchmark runs.

I can give it a try on our benchmarks next week. If we had a command line option to enable/disable the new behavior, it would make it quite a bit easier for me to do so.

Potential perf regression concern also brings another question -- what exactly does this patch buy us? There are correctness improvements, those are always good to have. But is there any data on impact of these changes on performance? Is there an important enough use case where the change improves things substantially? It would be great to have specific data we could use if/when we need to make the call on what we're willing to trade-off if the patch triggers regressions.

@fiigii
Copy link
Contributor Author

fiigii commented Nov 27, 2025

Sure, I will add a command-line flag to disable the correctness guard and set it to false by default.

At the same time, I strongly recommend trying to run LSV earlier. We have always run LSV earlier in the pipeline than upstream LLVM, and its performance and codegen quality have consistently been good.

Update: added the flag -enable-poison-reuse-guard, default to true.

@fiigii
Copy link
Contributor Author

fiigii commented Nov 27, 2025

@lialan would you like to try this version with your project before merging?

@lialan
Copy link
Member

lialan commented Nov 27, 2025

@lialan would you like to try this version with your project before merging?

I will let you know shortly. Thanks for the reminder!

@fiigii
Copy link
Contributor Author

fiigii commented Nov 27, 2025

No rush, enjoy the holiday :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants