Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" #169614

fiigii · 2025-11-26T06:44:44Z

This PR implements parts of #162376

Broader equivalence than constant index deltas:
- Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
Dominance-aware dictionary instead of linear scans:
- Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
Simple cost model and best-rewrite selection:
- Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
- Skip rewriting when expressions are already foldable or high-efficiency.
Path compression for better ILP:
- Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
Dependency-aware rewrite ordering:
- Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial strength reduction.
Correctness enhencment
- Fix a correctness issue that reusing instructions with the same SCEV may introduce poison.

This PR implements parts of llvm#162376 - **Broader equivalence than constant index deltas**: - Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas. - Reuse enabled for both constant and variable deltas when an available IR value dominates the user. - **Dominance-aware dictionary instead of linear scans**: - Tuple-keyed candidate dictionary grouped by basic block. - Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically. - **Simple cost model and best-rewrite selection**: - Score candidate expressions and rewrites; select the highest-profit rewrite per instruction. - Skip rewriting when expressions are already foldable or high-efficiency. - **Path compression for better ILP**: - Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths. - **Dependency-aware rewrite ordering**: - Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order. - This dependency graph will be needed by the next PR that adds partial strength reduction.

llvmbot · 2025-11-26T06:45:15Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-amdgpu

Author: Fei Peng (fiigii)

Changes

This PR implements parts of #162376

Broader equivalence than constant index deltas:
- Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
Dominance-aware dictionary instead of linear scans:
- Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
Simple cost model and best-rewrite selection:
- Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
- Skip rewriting when expressions are already foldable or high-efficiency.
Path compression for better ILP:
- Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
Dependency-aware rewrite ordering:
- Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial strength reduction.
Correctness enhencment
- Fix a correctness issue that reusing instructions with the same SCEV may introduce poison.

Patch is 181.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169614.diff

18 Files Affected:

(modified) llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp (+864-276)
(modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+9-11)
(modified) llvm/test/CodeGen/AMDGPU/dagcombine-reassociate-bug.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+82-79)
(modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+233-234)
(modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+9-9)
(modified) llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll (+133-196)
(modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/pr23975.ll (+1-1)
(modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/reassociate-geps-and-slsr-addrspace.ll (+5-5)
(added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll (+271)
(added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-invalid.ll (+141)
(added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-var-delta.ll (+70)
(added) llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll (+35)
(added) llvm/test/Transforms/StraightLineStrengthReduce/pick-candidate.ll (+32)
(modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-add.ll (+120)
(modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll (+156-7)

diff --git a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
index e94ad1999e32a..57d9bf5aed90b 100644
--- a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
@@ -12,17 +12,16 @@
 // effective in simplifying arithmetic statements derived from an unrolled loop.
 // It can also simplify the logic of SeparateConstOffsetFromGEP.
 //
-// There are many optimizations we can perform in the domain of SLSR. This file
-// for now contains only an initial step. Specifically, we look for strength
-// reduction candidates in the following forms:
+// There are many optimizations we can perform in the domain of SLSR.
+// We look for strength reduction candidates in the following forms:
 //
-// Form 1: B + i * S
-// Form 2: (B + i) * S
-// Form 3: &B[i * S]
+// Form Add: B + i * S
+// Form Mul: (B + i) * S
+// Form GEP: &B[i * S]
 //
 // where S is an integer variable, and i is a constant integer. If we found two
 // candidates S1 and S2 in the same form and S1 dominates S2, we may rewrite S2
-// in a simpler way with respect to S1. For example,
+// in a simpler way with respect to S1 (index delta). For example,
 //
 // S1: X = B + i * S
 // S2: Y = B + i' * S   => X + (i' - i) * S
@@ -35,8 +34,29 @@
 //
 // Note: (i' - i) * S is folded to the extent possible.
 //
+// For Add and GEP forms, we can also rewrite a candidate in a simpler way
+// with respect to other dominating candidates if their B or S are different
+// but other parts are the same. For example,
+//
+// Base Delta:
+// S1: X = B  + i * S
+// S2: Y = B' + i * S   => X + (B' - B)
+//
+// S1: X = &B [i * S]
+// S2: Y = &B'[i * S]   => X + (B' - B)
+//
+// Stride Delta:
+// S1: X = B + i * S
+// S2: Y = B + i * S'   => X + i * (S' - S)
+//
+// S1: X = &B[i * S]
+// S2: Y = &B[i * S']   => X + i * (S' - S)
+//
+// PS: Stride delta rewrite on Mul form is usually non-profitable, and Base
+// delta rewrite sometimes is profitable, so we do not support them on Mul.
+//
 // This rewriting is in general a good idea. The code patterns we focus on
-// usually come from loop unrolling, so (i' - i) * S is likely the same
+// usually come from loop unrolling, so the delta is likely the same
 // across iterations and can be reused. When that happens, the optimized form
 // takes only one add starting from the second iteration.
 //
@@ -47,19 +67,14 @@
 // TODO:
 //
 // - Floating point arithmetics when fast math is enabled.
-//
-// - SLSR may decrease ILP at the architecture level. Targets that are very
-//   sensitive to ILP may want to disable it. Having SLSR to consider ILP is
-//   left as future work.
-//
-// - When (i' - i) is constant but i and i' are not, we could still perform
-//   SLSR.
 
 #include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DepthFirstIterator.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
@@ -86,16 +101,19 @@
 #include <cstdint>
 #include <limits>
 #include <list>
+#include <queue>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 
+#define DEBUG_TYPE "slsr"
+
 static const unsigned UnknownAddressSpace =
     std::numeric_limits<unsigned>::max();
 
 DEBUG_COUNTER(StraightLineStrengthReduceCounter, "slsr-counter",
-              "Controls whether rewriteCandidateWithBasis is executed.");
+              "Controls whether rewriteCandidate is executed.");
 
 namespace {
 
@@ -142,15 +160,23 @@ class StraightLineStrengthReduce {
       GEP,     // &B[..][i * S][..]
     };
 
+    enum DKind {
+      InvalidDelta, // reserved for the default constructor
+      IndexDelta,   // Delta is a constant from Index
+      BaseDelta,    // Delta is a constant or variable from Base
+      StrideDelta,  // Delta is a constant or variable from Stride
+    };
+
     Candidate() = default;
     Candidate(Kind CT, const SCEV *B, ConstantInt *Idx, Value *S,
-              Instruction *I)
-        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I) {}
+              Instruction *I, const SCEV *StrideSCEV)
+        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
+          StrideSCEV(StrideSCEV) {}
 
     Kind CandidateKind = Invalid;
 
     const SCEV *Base = nullptr;
-
+    // TODO: Swap Index and Stride's name.
     // Note that Index and Stride of a GEP candidate do not necessarily have the
     // same integer type. In that case, during rewriting, Stride will be
     // sign-extended or truncated to Index's type.
@@ -177,22 +203,164 @@ class StraightLineStrengthReduce {
     // Points to the immediate basis of this candidate, or nullptr if we cannot
     // find any basis for this candidate.
     Candidate *Basis = nullptr;
+
+    DKind DeltaKind = InvalidDelta;
+
+    // Store SCEV of Stride to compute delta from different strides
+    const SCEV *StrideSCEV = nullptr;
+
+    // Points to (Y - X) that will be used to rewrite this candidate.
+    Value *Delta = nullptr;
+
+    /// Cost model: Evaluate the computational efficiency of the candidate.
+    ///
+    /// Efficiency levels (higher is better):
+    ///   ZeroInst (5) - [Variable] or [Const]
+    ///   OneInstOneVar (4) - [Variable + Const] or [Variable * Const]
+    ///   OneInstTwoVar (3) - [Variable + Variable] or [Variable * Variable]
+    ///   TwoInstOneVar (2) - [Const + Const * Variable]
+    ///   TwoInstTwoVar (1) - [Variable + Const * Variable]
+    enum EfficiencyLevel : unsigned {
+      Unknown = 0,
+      TwoInstTwoVar = 1,
+      TwoInstOneVar = 2,
+      OneInstTwoVar = 3,
+      OneInstOneVar = 4,
+      ZeroInst = 5
+    };
+
+    static EfficiencyLevel
+    getComputationEfficiency(Kind CandidateKind, const ConstantInt *Index,
+                             const Value *Stride, const SCEV *Base = nullptr) {
+      bool IsConstantBase = false;
+      bool IsZeroBase = false;
+      // When evaluating the efficiency of a rewrite, if the Base's SCEV is
+      // not available, conservatively assume the base is not constant.
+      if (auto *ConstBase = dyn_cast_or_null<SCEVConstant>(Base)) {
+        IsConstantBase = true;
+        IsZeroBase = ConstBase->getValue()->isZero();
+      }
+
+      bool IsConstantStride = isa<ConstantInt>(Stride);
+      bool IsZeroStride =
+          IsConstantStride && cast<ConstantInt>(Stride)->isZero();
+      // All constants
+      if (IsConstantBase && IsConstantStride)
+        return ZeroInst;
+
+      // (Base + Index) * Stride
+      if (CandidateKind == Mul) {
+        if (IsZeroStride)
+          return ZeroInst;
+        if (Index->isZero())
+          return (IsConstantStride || IsConstantBase) ? OneInstOneVar
+                                                      : OneInstTwoVar;
+
+        if (IsConstantBase)
+          return IsZeroBase && (Index->isOne() || Index->isMinusOne())
+                     ? ZeroInst
+                     : OneInstOneVar;
+
+        if (IsConstantStride) {
+          auto *CI = cast<ConstantInt>(Stride);
+          return (CI->isOne() || CI->isMinusOne()) ? OneInstOneVar
+                                                   : TwoInstOneVar;
+        }
+        return TwoInstTwoVar;
+      }
+
+      // Base + Index * Stride
+      assert(CandidateKind == Add || CandidateKind == GEP);
+      if (Index->isZero() || IsZeroStride)
+        return ZeroInst;
+
+      bool IsSimpleIndex = Index->isOne() || Index->isMinusOne();
+
+      if (IsConstantBase)
+        return IsZeroBase ? (IsSimpleIndex ? ZeroInst : OneInstOneVar)
+                          : (IsSimpleIndex ? OneInstOneVar : TwoInstOneVar);
+
+      if (IsConstantStride)
+        return IsZeroStride ? ZeroInst : OneInstOneVar;
+
+      if (IsSimpleIndex)
+        return OneInstTwoVar;
+
+      return TwoInstTwoVar;
+    }
+
+    // Evaluate if the given delta is profitable to rewrite this candidate.
+    bool isProfitableRewrite(const Value *Delta, const DKind DeltaKind) const {
+      // This function cannot accurately evaluate the profit of whole expression
+      // with context. A candidate (B + I * S) cannot express whether this
+      // instruction needs to compute on its own (I * S), which may be shared
+      // with other candidates or may need instructions to compute.
+      // If the rewritten form has the same strength, still rewrite to
+      // (X + Delta) since it may expose more CSE opportunities on Delta, as
+      // unrolled loops usually have identical Delta for each unrolled body.
+      //
+      // Note, this function should only be used on Index Delta rewrite.
+      // Base and Stride delta need context info to evaluate the register
+      // pressure impact from variable delta.
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) <=
+             getRewriteEfficiency(Delta, DeltaKind);
+    }
+
+    // Evaluate the rewrite efficiency of this candidate with its Basis
+    EfficiencyLevel getRewriteEfficiency() const {
+      return Basis ? getRewriteEfficiency(Delta, DeltaKind) : Unknown;
+    }
+
+    // Evaluate the rewrite efficiency of this candidate with a given delta
+    EfficiencyLevel getRewriteEfficiency(const Value *Delta,
+                                         const DKind DeltaKind) const {
+      switch (DeltaKind) {
+      case BaseDelta: // [X + Delta]
+        return getComputationEfficiency(
+            CandidateKind,
+            ConstantInt::get(cast<IntegerType>(Delta->getType()), 1), Delta);
+      case StrideDelta: // [X + Index * Delta]
+        return getComputationEfficiency(CandidateKind, Index, Delta);
+      case IndexDelta: // [X + Delta * Stride]
+        return getComputationEfficiency(CandidateKind, cast<ConstantInt>(Delta),
+                                        Stride);
+      default:
+        return Unknown;
+      }
+    }
+
+    bool isHighEfficiency() const {
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) >=
+             OneInstOneVar;
+    }
+
+    // Verify that this candidate has valid delta components relative to the
+    // basis
+    bool hasValidDelta(const Candidate &Basis) const {
+      switch (DeltaKind) {
+      case IndexDelta:
+        // Index differs, Base and Stride must match
+        return Base == Basis.Base && StrideSCEV == Basis.StrideSCEV;
+      case StrideDelta:
+        // Stride differs, Base and Index must match
+        return Base == Basis.Base && Index == Basis.Index;
+      case BaseDelta:
+        // Base differs, Stride and Index must match
+        return StrideSCEV == Basis.StrideSCEV && Index == Basis.Index;
+      default:
+        return false;
+      }
+    }
   };
 
   bool runOnFunction(Function &F);
 
 private:
-  // Returns true if Basis is a basis for C, i.e., Basis dominates C and they
-  // share the same base and stride.
-  bool isBasisFor(const Candidate &Basis, const Candidate &C);
-
+  // Fetch straight-line basis for rewriting C, update C.Basis to point to it,
+  // and store the delta between C and its Basis in C.Delta.
+  void setBasisAndDeltaFor(Candidate &C);
   // Returns whether the candidate can be folded into an addressing mode.
-  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI,
-                  const DataLayout *DL);
-
-  // Returns true if C is already in a simplest form and not worth being
-  // rewritten.
-  bool isSimplestForm(const Candidate &C);
+  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI);
 
   // Checks whether I is in a candidate form. If so, adds all the matching forms
   // to Candidates, and tries to find the immediate basis for each of them.
@@ -216,12 +384,6 @@ class StraightLineStrengthReduce {
   // Allocate candidates and find bases for GetElementPtr instructions.
   void allocateCandidatesAndFindBasisForGEP(GetElementPtrInst *GEP);
 
-  // A helper function that scales Idx with ElementSize before invoking
-  // allocateCandidatesAndFindBasis.
-  void allocateCandidatesAndFindBasisForGEP(const SCEV *B, ConstantInt *Idx,
-                                            Value *S, uint64_t ElementSize,
-                                            Instruction *I);
-
   // Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
   // basis.
   void allocateCandidatesAndFindBasis(Candidate::Kind CT, const SCEV *B,
@@ -229,13 +391,7 @@ class StraightLineStrengthReduce {
                                       Instruction *I);
 
   // Rewrites candidate C with respect to Basis.
-  void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
-
-  // A helper function that factors ArrayIdx to a product of a stride and a
-  // constant index, and invokes allocateCandidatesAndFindBasis with the
-  // factorings.
-  void factorArrayIndex(Value *ArrayIdx, const SCEV *Base, uint64_t ElementSize,
-                        GetElementPtrInst *GEP);
+  void rewriteCandidate(const Candidate &C);
 
   // Emit code that computes the "bump" from Basis to C.
   static Value *emitBump(const Candidate &Basis, const Candidate &C,
@@ -247,12 +403,205 @@ class StraightLineStrengthReduce {
   TargetTransformInfo *TTI = nullptr;
   std::list<Candidate> Candidates;
 
-  // Temporarily holds all instructions that are unlinked (but not deleted) by
-  // rewriteCandidateWithBasis. These instructions will be actually removed
-  // after all rewriting finishes.
-  std::vector<Instruction *> UnlinkedInstructions;
+  // Map from SCEV to instructions that represent the value,
+  // instructions are sorted in depth-first order.
+  DenseMap<const SCEV *, SmallSetVector<Instruction *, 2>> SCEVToInsts;
+
+  // Record the dependency between instructions. If C.Basis == B, we would have
+  // {B.Ins -> {C.Ins, ...}}.
+  MapVector<Instruction *, std::vector<Instruction *>> DependencyGraph;
+
+  // Map between each instruction and its possible candidates.
+  DenseMap<Instruction *, SmallVector<Candidate *, 3>> RewriteCandidates;
+
+  // All instructions that have candidates sort in topological order based on
+  // dependency graph, from roots to leaves.
+  std::vector<Instruction *> SortedCandidateInsts;
+
+  // Record all instructions that are already rewritten and will be removed
+  // later.
+  std::vector<Instruction *> DeadInstructions;
+
+  // Classify candidates against Delta kind
+  class CandidateDictTy {
+  public:
+    using CandsTy = SmallVector<Candidate *, 8>;
+    using BBToCandsTy = DenseMap<const BasicBlock *, CandsTy>;
+
+  private:
+    // Index delta Basis must have the same (Base, StrideSCEV, Inst.Type)
+    using IndexDeltaKeyTy = std::tuple<const SCEV *, const SCEV *, Type *>;
+    DenseMap<IndexDeltaKeyTy, BBToCandsTy> IndexDeltaCandidates;
+
+    // Base delta Basis must have the same (StrideSCEV, Index, Inst.Type)
+    using BaseDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<BaseDeltaKeyTy, BBToCandsTy> BaseDeltaCandidates;
+
+    // Stride delta Basis must have the same (Base, Index, Inst.Type)
+    using StrideDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<StrideDeltaKeyTy, BBToCandsTy> StrideDeltaCandidates;
+
+  public:
+    // TODO: Disable index delta on GEP after we completely move
+    // from typed GEP to PtrAdd.
+    const BBToCandsTy *getCandidatesWithDeltaKind(const Candidate &C,
+                                                  Candidate::DKind K) const {
+      assert(K != Candidate::InvalidDelta);
+      if (K == Candidate::IndexDelta) {
+        IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, C.Ins->getType());
+        auto It = IndexDeltaCandidates.find(IndexDeltaKey);
+        if (It != IndexDeltaCandidates.end())
+          return &It->second;
+      } else if (K == Candidate::BaseDelta) {
+        BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, C.Ins->getType());
+        auto It = BaseDeltaCandidates.find(BaseDeltaKey);
+        if (It != BaseDeltaCandidates.end())
+          return &It->second;
+      } else {
+        assert(K == Candidate::StrideDelta);
+        StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, C.Ins->getType());
+        auto It = StrideDeltaCandidates.find(StrideDeltaKey);
+        if (It != StrideDeltaCandidates.end())
+          return &It->second;
+      }
+      return nullptr;
+    }
+
+    // Pointers to C must remain valid until CandidateDict is cleared.
+    void add(Candidate &C) {
+      Type *ValueType = C.Ins->getType();
+      BasicBlock *BB = C.Ins->getParent();
+      IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, ValueType);
+      BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, ValueType);
+      StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, ValueType);
+      IndexDeltaCandidates[IndexDeltaKey][BB].push_back(&C);
+      BaseDeltaCandidates[BaseDeltaKey][BB].push_back(&C);
+      StrideDeltaCandidates[StrideDeltaKey][BB].push_back(&C);
+    }
+    // Remove all mappings from set
+    void clear() {
+      IndexDeltaCandidates.clear();
+      BaseDeltaCandidates.clear();
+      StrideDeltaCandidates.clear();
+    }
+  } CandidateDict;
+
+  const SCEV *getAndRecordSCEV(Value *V) {
+    auto *S = SE->getSCEV(V);
+    if (isa<Instruction>(V) && !(isa<SCEVCouldNotCompute>(S) ||
+                                 isa<SCEVUnknown>(S) || isa<SCEVConstant>(S)))
+      SCEVToInsts[S].insert(cast<Instruction>(V));
+
+    return S;
+  }
+
+  // Get the nearest instruction before CI that represents the value of S,
+  // return nullptr if no instruction is associated with S or S is not a
+  // reusable expression.
+  Value *getNearestValueOfSCEV(const SCEV *S, const Instruction *CI) const {
+    if (isa<SCEVCouldNotCompute>(S))
+      return nullptr;
+
+    if (auto *SU = dyn_cast<SCEVUnknown>(S))
+      return SU->getValue();
+    if (auto *SC = dyn_cast<SCEVConstant>(S))
+      return SC->getValue();
+
+    auto It = SCEVToInsts.find(S);
+    if (It == SCEVToInsts.end())
+      return nullptr;
+
+    // Instructions are sorted in depth-first order, so search for the nearest
+    // instruction by walking the list in reverse order.
+    for (Instruction *I : reverse(It->second))
+      if (DT->dominates(I, CI))
+        return I;
+
+    return nullptr;
+  }
+
+  struct DeltaInfo {
+    Candidate *Cand;
+    Candidate::DKind DeltaKind;
+    Value *Delta;
+
+    DeltaInfo()
+        : Cand(nullptr), DeltaKind(Candidate::InvalidDelta), Delta(nullptr) {}
+    DeltaInfo(Candidate *Cand, Candidate::DKind DeltaKind, Value *Delta)
+        : Cand(Cand), DeltaKind(DeltaKind), Delta(Delta) {}
+    operator bool() const { return Cand != nullptr; }
+  };
+
+  friend raw_ostream &operator<<(raw_ostream &OS, const DeltaInfo &DI);
+
+  DeltaInfo compressPath(Candidate &C, Candidate *Basis) const;
+
+  Candidate *pickRewriteCandidate(Instruction *I) const;
+  void sortCandidateInstructions();
+  Value *getDelta(const Candidate &C, const Candidate &Basis,
+                  Candidate::DKind K) const;
+  static bool isSimilar(Candidate &C, Candidate &Basis, Candidate::DKind K);
+
+  // Add Basis -> C in DependencyGraph and propagate
+  // C.Stride and C.Delta's dependency to C
+  void addDependency(Candidate &C, Candidate *Basis) {
+    if (Basis)
+      DependencyGraph[Basis->Ins].emplace_back(C.Ins);
+
+    // If any candidate of Inst has a basis, then Inst will be rewritten,
+    // C must be rewritten after rewriting Inst, so we need to propagate
+    // the dependency to C
+    auto PropagateDependency = [&](Instruction *Inst) {
+      if (auto CandsIt = RewriteCandidates.find(Inst);
+          CandsIt != RewriteCandidates.end() &&
+          llvm::any_of(CandsIt->second,
+                       [](Candidate *Cand) { return Cand->Basis; }))
+        DependencyGraph[Inst].emplace_back(C.Ins);
+    };
+
+    // If C has a variable delta and the delta is a candidate,
+    // propagate its dependency to C
+    if (auto *DeltaInst = dyn_cast_or_null<Instruction>(C.Delta))
+      PropagateDependency(DeltaInst);
+
+    // If the stride is a candidate, propagate its dependency to C
+    if (auto *StrideInst = dyn_cast<Instruction>(C.Stride))
+      PropagateDependency(StrideInst);
+  };
 };
 
+inline raw_ostream &operato...
[truncated]

github-actions · 2025-11-26T06:46:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

fiigii · 2025-11-26T06:53:03Z

@akshayrdeodhar @dakersnar @Artem-B @lialan PTAL

fiigii · 2025-11-26T07:02:39Z

This commit (6519805) solves the correctness issue on top of the original change. It calls SE->canReuseInstruction to check if reusing an instruction may introduce more poison than its SCEV. The commit also does a little refactoring to avoid calling canReuseInstruction at multiple positions.

github-actions · 2025-11-26T07:19:07Z

🐧 Linux x64 Test Results

186632 tests passed
4888 tests skipped

This patch fixes: llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp:689:45: error: lambda capture 'DT' is not used [-Werror,-Wunused-lambda-capture] llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp:584:1: error: unused function 'operator<<' [-Werror,-Wunused-function]

Artem-B · 2025-11-26T19:16:22Z

Considering that there appear to be nontrivial changes it's not exactly a re-land. It's more of a new iteration.

Would it be possible for you to re-stack you changes in a way that the original reverted patch is the starting point, so we can see what exactly has changed to fix the problem, so we do not have to go over complete patch of 2K lines of changes, again, and only deal with the incremental changes.

Including reviewers of the original patch would also be useful.

Artem-B · 2025-11-26T19:20:13Z

Would it be possible for you to re-stack you changes in a way that the original reverted patch is the starting point

Hmm. looks like you already did. It's just graphite appears to hide the base, but github shows it:

fiigii · 2025-11-26T19:31:47Z

@Artem-B Yes, please focus on the fix commit. That commit has a unit test to demonstrate the correctness issue.

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll

Artem-B

LGTM

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

fiigii · 2025-11-26T23:58:59Z

@dakersnar @Artem-B Thanks for reviewing. After this PR, the SLSR output will become more conservative for correctness, i.e., dropping more inbounds or nsw/nuw. I am concerned that the latter optimizations may be impacted. For example, in open source LLVM's NVPTX target, load-store-vectorization runs after SLSR. More conservative overflow annotations may hurt alias analysis and consecutive analysis (SCEV), and may prevent some vectorization that is one of the most improtant optimizations for GPU. I suggest you give it a try to run LSV a bit early, before SLSR.

Additionally, I believe this pass still has some correctness issues regarding integer overflow, which have existed before my redesign. I will open follow-up PRs to solve them, and we will see more conservative output (fewer inbounds!).

Artem-B · 2025-11-27T00:18:29Z

If you suspect there may be unintended side effects of this patch, would it be feasible to introduce a command line option to preserve current behavior? If would give users an escape hatch if/when they run into trouble. It would also make A/B comparisons much easier, as it could be done with the same compiler, without having to rebuild it. We could also start with the patch disabled, and enable it only for benchmark runs.

I can give it a try on our benchmarks next week. If we had a command line option to enable/disable the new behavior, it would make it quite a bit easier for me to do so.

Potential perf regression concern also brings another question -- what exactly does this patch buy us? There are correctness improvements, those are always good to have. But is there any data on impact of these changes on performance? Is there an important enough use case where the change improves things substantially? It would be great to have specific data we could use if/when we need to make the call on what we're willing to trade-off if the patch triggers regressions.

fiigii · 2025-11-27T00:26:45Z

Sure, I will add a command-line flag to disable the correctness guard and set it to false by default.

At the same time, I strongly recommend trying to run LSV earlier. We have always run LSV earlier in the pipeline than upstream LLVM, and its performance and codegen quality have consistently been good.

Update: added the flag -enable-poison-reuse-guard, default to true.

fiigii · 2025-11-27T00:29:04Z

@lialan would you like to try this version with your project before merging?

lialan · 2025-11-27T01:44:20Z

@lialan would you like to try this version with your project before merging?

I will let you know shortly. Thanks for the reminder!

fiigii · 2025-11-27T01:48:14Z

No rush, enjoy the holiday :)

fiigii and others added 2 commits November 25, 2025 17:33

Prevent invlaid SCEV reuse by canReuseInstruction

6519805

llvmbot added backend:AMDGPU backend:NVPTX llvm:transforms labels Nov 26, 2025

fiigii mentioned this pull request Nov 26, 2025

Revert "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" #169546

Merged

Clang-format

6912017

fiigii and others added 3 commits November 26, 2025 00:27

Fix test

a6a26fb

[Scalar] Avoid repeated hash lookups (NFC) (llvm#168217)

72efb11

dakersnar approved these changes Nov 26, 2025

View reviewed changes

Artem-B self-requested a review November 26, 2025 19:02

Artem-B reviewed Nov 26, 2025

View reviewed changes

Address review comments

b1c758c

fiigii force-pushed the feip/reland-slsr branch from e2dbf30 to b1c758c Compare November 26, 2025 21:20

Artem-B approved these changes Nov 26, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp Show resolved Hide resolved

llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll Show resolved Hide resolved

Artem-B reviewed Nov 26, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp Show resolved Hide resolved

Address review comments.

3dc98c9

Add a flag to control poison-reuse guard

a03535c

Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" #169614

Are you sure you want to change the base?

Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930)" #169614

Conversation

fiigii commented Nov 26, 2025

Uh oh!

llvmbot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Nov 26, 2025

Uh oh!

fiigii commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Artem-B commented Nov 26, 2025

Uh oh!

Artem-B commented Nov 26, 2025

Uh oh!

fiigii commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fiigii commented Nov 26, 2025

Uh oh!

Artem-B commented Nov 27, 2025

Uh oh!

fiigii commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Nov 27, 2025

Uh oh!

lialan commented Nov 27, 2025

Uh oh!

fiigii commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Nov 26, 2025 •

edited

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading

fiigii commented Nov 27, 2025 •

edited

Loading