Skip to content

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Aug 25, 2025

This patch depends on #155266

SCEV was losing NSW flags during AddRec operations, causing Dependence
Analysis to add unnecessary runtime assumptions for inbounds GEPs.

This patch fixes getAddExpr: when combining AddRecs from the same loop, preserve
compatible NSW/NUW flags from input AddRecs instead of always using FlagAnyWrap.

sebpop added 2 commits August 24, 2025 02:16
Previously, LAA would incorrectly classify Write-After-Write dependencies
with negative distances as safe Forward dependencies, allowing inappropriate
vectorization of loops with bidirectional WAW dependencies.

The issue occurred in loops like:
  for(int i = 0; i < n; ++i) {
      A[(i+1)*4] = 10;  // First store
      A[i] = 100;       // Second store
  }

The dependence distance from first store to second store is negative:
{-16,+,-12}. However, this represents a bidirectional WAW dependency
that DependenceAnalysis would report as 'output [<>]!', indicating
dependency in both directions.

This patch fixes LAA to properly detect WAW dependencies with negative
distances as unsafe, preventing incorrect vectorization.

The fix adds a check specifically for Write-After-Write dependencies
before the general negative distance handling, ensuring they are
classified as Unknown (unsafe) rather than Forward (safe).

A proper fix, not implemented here because it would be a major rework of LAA,
is to implement bidirectional dependence checking that computes distances
in both directions and detects inconsistent direction vectors.
SCEV was losing NSW flags during AddRec operations, causing Dependence
Analysis to add unnecessary runtime assumptions for inbounds GEPs.

This patch fixes getAddExpr: when combining AddRecs from the same loop, preserve
compatible NSW/NUW flags from input AddRecs instead of always using FlagAnyWrap.
@sebpop sebpop requested a review from nikic as a code owner August 25, 2025 16:25
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Aug 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 25, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Sebastian Pop (sebpop)

Changes

This patch depends on #155266

SCEV was losing NSW flags during AddRec operations, causing Dependence
Analysis to add unnecessary runtime assumptions for inbounds GEPs.

This patch fixes getAddExpr: when combining AddRecs from the same loop, preserve
compatible NSW/NUW flags from input AddRecs instead of always using FlagAnyWrap.


Patch is 79.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155267.diff

12 Files Affected:

  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+15)
  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+29-2)
  • (modified) llvm/test/Analysis/Delinearization/fixed_size_array.ll (+2-2)
  • (added) llvm/test/Analysis/DependenceAnalysis/scev-nsw-flags-enable-analysis.ll (+45)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll (+4-3)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll (+1-1)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll (+1-1)
  • (added) llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll (+109)
  • (modified) llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll (+15-15)
  • (added) llvm/test/Analysis/ScalarEvolution/gep-nsw-flag-preservation.ll (+100)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+22-53)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+11-73)
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index bceddd0325276..e522d1392f7f9 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2196,6 +2196,21 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
       return Dependence::Unknown;
     }
 
+    // For WAW (Write-After-Write) dependencies, negative distances in one
+    // direction can still represent unsafe dependencies. Since we only check
+    // dependencies in program order (AIdx < BIdx), a negative distance means
+    // the later write accesses memory locations before the earlier write.
+    // However, in a vectorized loop, both writes could execute simultaneously,
+    // potentially causing incorrect behavior. Therefore, WAW with negative
+    // distances should be treated as unsafe.
+    bool IsWriteAfterWrite = (AIsWrite && BIsWrite);
+    if (IsWriteAfterWrite) {
+      LLVM_DEBUG(
+          dbgs() << "LAA: WAW dependence with negative distance is unsafe\n");
+      return CheckCompletelyBeforeOrAfter() ? Dependence::NoDep
+                                            : Dependence::Unknown;
+    }
+
     bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
     // Check if the first access writes to a location that is read in a later
     // iteration, where the distance between them is not a multiple of a vector
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index d2c445f1ffaa0..9867994527c45 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -2951,25 +2951,52 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
       if (AddRecLoop == cast<SCEVAddRecExpr>(Ops[OtherIdx])->getLoop()) {
         // Other + {A,+,B}<L> + {C,+,D}<L>  -->  Other + {A+C,+,B+D}<L>
         SmallVector<const SCEV *, 4> AddRecOps(AddRec->operands());
+
+        // Track flags: start with the flags from the first AddRec.
+        bool AllHaveNSW = AddRec->hasNoSignedWrap();
+        bool AllHaveNUW = AddRec->hasNoUnsignedWrap();
+
         for (; OtherIdx != Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]);
              ++OtherIdx) {
           const auto *OtherAddRec = cast<SCEVAddRecExpr>(Ops[OtherIdx]);
           if (OtherAddRec->getLoop() == AddRecLoop) {
+            // Update flags based on this AddRec
+            if (!OtherAddRec->hasNoSignedWrap())
+              AllHaveNSW = false;
+            if (!OtherAddRec->hasNoUnsignedWrap())
+              AllHaveNUW = false;
             for (unsigned i = 0, e = OtherAddRec->getNumOperands();
                  i != e; ++i) {
               if (i >= AddRecOps.size()) {
                 append_range(AddRecOps, OtherAddRec->operands().drop_front(i));
                 break;
               }
+              // Preserve no-wrap flags when combining AddRec operands.
+              SCEV::NoWrapFlags CombineFlags = SCEV::FlagAnyWrap;
+              if (auto *AR1 = dyn_cast<SCEVAddRecExpr>(AddRecOps[i]))
+                if (auto *AR2 =
+                        dyn_cast<SCEVAddRecExpr>(OtherAddRec->getOperand(i))) {
+                  if (AR1->hasNoSignedWrap() && AR2->hasNoSignedWrap())
+                    CombineFlags = setFlags(CombineFlags, SCEV::FlagNSW);
+                  if (AR1->hasNoUnsignedWrap() && AR2->hasNoUnsignedWrap())
+                    CombineFlags = setFlags(CombineFlags, SCEV::FlagNUW);
+                }
               SmallVector<const SCEV *, 2> TwoOps = {
                   AddRecOps[i], OtherAddRec->getOperand(i)};
-              AddRecOps[i] = getAddExpr(TwoOps, SCEV::FlagAnyWrap, Depth + 1);
+              AddRecOps[i] = getAddExpr(TwoOps, CombineFlags, Depth + 1);
             }
             Ops.erase(Ops.begin() + OtherIdx); --OtherIdx;
           }
         }
         // Step size has changed, so we cannot guarantee no self-wraparound.
-        Ops[Idx] = getAddRecExpr(AddRecOps, AddRecLoop, SCEV::FlagAnyWrap);
+        // However, preserve NSW/NUW flags if all combined AddRecs had them.
+        SCEV::NoWrapFlags FinalFlags = SCEV::FlagAnyWrap;
+        if (AllHaveNSW)
+          FinalFlags = setFlags(FinalFlags, SCEV::FlagNSW);
+        if (AllHaveNUW)
+          FinalFlags = setFlags(FinalFlags, SCEV::FlagNUW);
+
+        Ops[Idx] = getAddRecExpr(AddRecOps, AddRecLoop, FinalFlags);
         return getAddExpr(Ops, SCEV::FlagAnyWrap, Depth + 1);
       }
     }
diff --git a/llvm/test/Analysis/Delinearization/fixed_size_array.ll b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
index 634850bb4a5a2..ffd0202e205ce 100644
--- a/llvm/test/Analysis/Delinearization/fixed_size_array.ll
+++ b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
@@ -339,7 +339,7 @@ define void @a_i_j2k_i(ptr %a) {
 ; CHECK-LABEL: 'a_i_j2k_i'
 ; CHECK-NEXT:  Inst: store i32 1, ptr %idx, align 4
 ; CHECK-NEXT:  In Loop with Header: for.k
-; CHECK-NEXT:  AccessFunction: {{\{\{\{}}0,+,1028}<%for.i.header>,+,256}<nw><%for.j.header>,+,128}<nw><%for.k>
+; CHECK-NEXT:  AccessFunction: {{\{\{\{}}0,+,1028}<nuw><nsw><%for.i.header>,+,256}<nw><%for.j.header>,+,128}<nw><%for.k>
 ; CHECK-NEXT:  failed to delinearize
 ;
 entry:
@@ -391,7 +391,7 @@ define void @a_i_i_jk(ptr %a) {
 ; CHECK-LABEL: 'a_i_i_jk'
 ; CHECK-NEXT:  Inst: store i32 1, ptr %idx, align 4
 ; CHECK-NEXT:  In Loop with Header: for.k
-; CHECK-NEXT:  AccessFunction: {{\{\{\{}}0,+,1152}<%for.i.header>,+,4}<nw><%for.j.header>,+,4}<nw><%for.k>
+; CHECK-NEXT:  AccessFunction: {{\{\{\{}}0,+,1152}<nuw><nsw><%for.i.header>,+,4}<nw><%for.j.header>,+,4}<nw><%for.k>
 ; CHECK-NEXT:  Base offset: %a
 ; CHECK-NEXT:  ArrayDecl[UnknownSize][288] with elements of 4 bytes.
 ; CHECK-NEXT:  ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{{\{\{}}0,+,1}<nuw><nsw><%for.j.header>,+,1}<nuw><nsw><%for.k>]
diff --git a/llvm/test/Analysis/DependenceAnalysis/scev-nsw-flags-enable-analysis.ll b/llvm/test/Analysis/DependenceAnalysis/scev-nsw-flags-enable-analysis.ll
new file mode 100644
index 0000000000000..b717ce9815341
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/scev-nsw-flags-enable-analysis.ll
@@ -0,0 +1,45 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 | FileCheck %s
+
+; Test that SCEV NSW flag preservation enables dependence analysis to work
+; correctly. Previously, SCEV would lose NSW flags when combining AddRec
+; expressions from GEP operations, causing dependence analysis to incorrectly
+; classify expressions as "wrapping" and fail analysis.
+
+define void @test_da_with_scev_flags(ptr %A) {
+; This test verifies that dependence analysis now correctly identifies
+; self-dependences when SCEV preserves NSW flags from GEP index computations.
+; CHECK-LABEL: 'test_da_with_scev_flags'
+; CHECK-NEXT:  Src: %val = load i32, ptr %gep, align 4 --> Dst: %val = load i32, ptr %gep, align 4
+; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:  Src: %val = load i32, ptr %gep, align 4 --> Dst: store i32 %val, ptr %gep, align 4
+; CHECK-NEXT:    da analyze - consistent anti [0|<]!
+; CHECK-NEXT:  Src: store i32 %val, ptr %gep, align 4 --> Dst: store i32 %val, ptr %gep, align 4
+; CHECK-NEXT:    da analyze - none!
+;
+
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %loop ]
+
+  ; Create NSW-flagged index computation
+  %mul = mul nsw i64 %i, 3
+  %sub = add nsw i64 %mul, -6
+
+  ; GEP that should result in SCEV: {(-2424 + %A),+,1212}<nw>
+  ; The <nw> flag should prevent false "wrapping" detection in DA
+  %gep = getelementptr inbounds [100 x i32], ptr %A, i64 %sub, i64 %sub
+
+  ; Self-dependence: should be detected as "none" (no dependence)
+  %val = load i32, ptr %gep
+  store i32 %val, ptr %gep
+
+  %i.next = add nsw i64 %i, 1
+  %cond = icmp ult i64 %i.next, 50
+  br i1 %cond, label %loop, label %exit
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll b/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
index 023a8c056968f..00a2ce7337d0d 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt -S -disable-output -passes='print<access-info>' < %s 2>&1 | FileCheck %s
 
 
@@ -449,9 +449,10 @@ exit:
 define void @different_type_sizes_forward(ptr %dst) {
 ; CHECK-LABEL: 'different_type_sizes_forward'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Memory dependences are safe
+; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT:  Unknown data dependence.
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Forward:
+; CHECK-NEXT:        Unknown:
 ; CHECK-NEXT:            store i32 0, ptr %gep.10.iv, align 4 ->
 ; CHECK-NEXT:            store i16 1, ptr %gep.iv, align 2
 ; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
index adfd19923e921..8b00ff0ad2a59 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
@@ -70,7 +70,7 @@ define void @forward_different_access_sizes(ptr readnone %end, ptr %start) {
 ; CHECK-NEXT:            store i32 0, ptr %gep.2, align 4 ->
 ; CHECK-NEXT:            %l = load i24, ptr %gep.1, align 1
 ; CHECK-EMPTY:
-; CHECK-NEXT:        Forward:
+; CHECK-NEXT:        Unknown:
 ; CHECK-NEXT:            store i32 0, ptr %gep.2, align 4 ->
 ; CHECK-NEXT:            store i24 %l, ptr %ptr.iv, align 1
 ; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
index 7fc9958dba552..218166526d7c0 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
@@ -35,7 +35,7 @@ define void @f(ptr noalias %A, ptr noalias %B, ptr noalias %C, i64 %N) {
 ; CHECK-NEXT:            store i32 %b_p2, ptr %Aidx_next, align 4 ->
 ; CHECK-NEXT:            %a = load i32, ptr %Aidx, align 4
 ; CHECK-EMPTY:
-; CHECK-NEXT:        Forward:
+; CHECK-NEXT:        Unknown:
 ; CHECK-NEXT:            store i32 %b_p2, ptr %Aidx_next, align 4 ->
 ; CHECK-NEXT:            store i32 %b_p1, ptr %Aidx, align 4
 ; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll b/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll
new file mode 100644
index 0000000000000..be49af7d75460
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='print<access-info>' -disable-output %s 2>&1 | FileCheck %s
+
+; Test that LAA correctly identifies Write-After-Write dependencies with negative
+; distances as unsafe. Previously, LAA would incorrectly classify negative distance
+; WAW dependencies as safe Forward dependencies, allowing inappropriate vectorization.
+;
+; This corresponds to the loop:
+;   for(int i = 0; i < n; ++i) {
+;       A[(i+1)*4] = 10;   // First store: A[4, 8, 12, 16, ...]
+;       A[i] = 100;        // Second store: A[0, 1, 2, 3, 4, ...]
+;   }
+;
+; The dependence distance from first store to second store is negative:
+; A[i] - A[(i+1)*4] = {0,+,4} - {16,+,16} = {-16,+,-12}
+; However, the dependence from second store to first store in the next iteration
+; would be positive: A[(i+1)*4] - A[i] = {16,+,16} - {0,+,4} = {16,+,12}
+;
+; This bidirectional dependence pattern (negative in one direction, positive in the
+; other) creates a Write-After-Write dependency that is unsafe for vectorization.
+; DependenceAnalysis would report this as "output [<>]!" indicating the complex
+; dependence direction. LAA must detect this as unsafe even when only checking
+; the negative distance direction.
+
+define void @test_waw_negative_dependence(i64 %n, ptr nocapture %A) {
+; CHECK-LABEL: 'test_waw_negative_dependence'
+; CHECK-NEXT:    loop:
+; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            store i32 10, ptr %arrayidx1, align 4 ->
+; CHECK-NEXT:            store i32 100, ptr %arrayidx2, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT:      SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Expressions re-written:
+;
+entry:
+  %cmp8 = icmp sgt i64 %n, 0
+  br i1 %cmp8, label %loop, label %exit
+
+loop:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %loop ]
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+
+  ; First store: A[(i+1)*4] = 10
+  %0 = shl nsw i64 %indvars.iv.next, 2  ; (i+1)*4
+  %arrayidx1 = getelementptr inbounds i32, ptr %A, i64 %0
+  store i32 10, ptr %arrayidx1, align 4
+
+  ; Second store: A[i] = 100
+  %arrayidx2 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
+  store i32 100, ptr %arrayidx2, align 4
+
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+  br i1 %exitcond.not, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; Test a similar case but with different stride to ensure the fix is general.
+define void @test_waw_negative_dependence_different_stride(i64 %n, ptr nocapture %A) {
+; CHECK-LABEL: 'test_waw_negative_dependence_different_stride'
+; CHECK-NEXT:    loop:
+; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            store i32 10, ptr %arrayidx1, align 4 ->
+; CHECK-NEXT:            store i32 100, ptr %arrayidx2, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT:      SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Expressions re-written:
+;
+entry:
+  %cmp8 = icmp sgt i64 %n, 0
+  br i1 %cmp8, label %loop, label %exit
+
+loop:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %loop ]
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+
+  ; First store: A[(i+2)*2] = 10
+  %0 = add nsw i64 %indvars.iv, 2       ; i+2
+  %1 = shl nsw i64 %0, 1                ; (i+2)*2
+  %arrayidx1 = getelementptr inbounds i32, ptr %A, i64 %1
+  store i32 10, ptr %arrayidx1, align 4
+
+  ; Second store: A[i] = 100
+  %arrayidx2 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
+  store i32 100, ptr %arrayidx2, align 4
+
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+  br i1 %exitcond.not, label %exit, label %loop
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll b/llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll
index 44bff5638bc85..b700f6cd0b432 100644
--- a/llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll
+++ b/llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll
@@ -23,9 +23,9 @@ define void @test_00(i1 %arg) {
 ; CHECK-NEXT:    %phi3.inc = add i32 %phi3, 3
 ; CHECK-NEXT:    --> {9,+,3}<nuw><nsw><%loop1> U: [9,502) S: [9,502) Exits: 501 LoopDispositions: { %loop1: Computable }
 ; CHECK-NEXT:    %sum1 = add i32 %phi1, %phi2
-; CHECK-NEXT:    --> {14,+,3}<%loop1> U: [14,507) S: [14,507) Exits: 506 LoopDispositions: { %loop1: Computable }
+; CHECK-NEXT:    --> {14,+,3}<nuw><nsw><%loop1> U: [14,507) S: [14,507) Exits: 506 LoopDispositions: { %loop1: Computable }
 ; CHECK-NEXT:    %sum2 = add i32 %sum1, %phi3
-; CHECK-NEXT:    --> {20,+,6}<%loop1> U: [20,1005) S: [20,1005) Exits: 1004 LoopDispositions: { %loop1: Computable }
+; CHECK-NEXT:    --> {20,+,6}<nuw><nsw><%loop1> U: [20,1005) S: [20,1005) Exits: 1004 LoopDispositions: { %loop1: Computable }
 ; CHECK-NEXT:    %phi4 = phi i32 [ 63, %loop1 ], [ %phi4.inc, %loop2 ]
 ; CHECK-NEXT:    --> {63,+,1}<nuw><nsw><%loop2> U: [63,205) S: [63,205) Exits: 204 LoopDispositions: { %loop2: Computable }
 ; CHECK-NEXT:    %phi5 = phi i32 [ 53, %loop1 ], [ %phi5.inc, %loop2 ]
@@ -39,21 +39,21 @@ define void @test_00(i1 %arg) {
 ; CHECK-NEXT:    %phi6.inc = add i32 %phi6, 3
 ; CHECK-NEXT:    --> {46,+,3}<nuw><nsw><%loop2> U: [46,470) S: [46,470) Exits: 469 LoopDispositions: { %loop2: Computable }
 ; CHECK-NEXT:    %sum3 = add i32 %phi4, %phi5
-; CHECK-NEXT:    --> {116,+,3}<%loop2> U: [116,540) S: [116,540) Exits: 539 LoopDispositions: { %loop2: Computable }
+; CHECK-NEXT:    --> {116,+,3}<nuw><nsw><%loop2> U: [116,540) S: [116,540) Exits: 539 LoopDispositions: { %loop2: Computable }
 ; CHECK-NEXT:    %sum4 = add i32 %sum3, %phi6
-; CHECK-NEXT:    --> {159,+,6}<%loop2> U: [159,1006) S: [159,1006) Exits: 1005 LoopDispositions: { %loop2: Computable }
+; CHECK-NEXT:    --> {159,+,6}<nuw><nsw><%loop2> U: [159,1006) S: [159,1006) Exits: 1005 LoopDispositions: { %loop2: Computable }
 ; CHECK-NEXT:    %s1 = add i32 %phi1, %phi4
 ; CHECK-NEXT:    --> {{\{\{}}73,+,1}<nuw><nsw><%loop1>,+,1}<nw><%loop2> U: [73,379) S: [73,379) --> 378 U: [378,379) S: [378,379)
 ; CHECK-NEXT:    %s2 = add i32 %phi5, %phi2
 ; CHECK-NEXT:    --> {{\{\{}}57,+,2}<nuw><nsw><%loop1>,+,2}<nw><%loop2> U: [57,668) S: [57,668) --> 667 U: [667,668) S: [667,668)
 ; CHECK-NEXT:    %s3 = add i32 %sum1, %sum3
-; CHECK-NEXT:    --> {{\{\{}}130,+,3}<%loop1>,+,3}<%loop2> U: [130,1046) S: [130,1046) --> 1045 U: [1045,1046) S: [1045,1046)
+; CHECK-NEXT:    --> {{\{\{}}130,+,3}<nuw><nsw><%loop1>,+,3}<nw><%loop2> U: [130,1046) S: [130,1046) --> 1045 U: [1045,1046) S: [1045,1046)
 ; CHECK-NEXT:    %s4 = add i32 %sum4, %sum2
-; CHECK-NEXT:    --> {{\{\{}}179,+,6}<%loop1>,+,6}<%loop2> U: [179,2010) S: [179,2010) --> 2009 U: [2009,2010) S: [2009,2010)
+; CHECK-NEXT:    --> {{\{\{}}179,+,6}<nuw><nsw><%loop1>,+,6}<nw><%loop2> U: [179,2010) S: [179,2010) --> 2009 U: [2009,2010) S: [2009,2010)
 ; CHECK-NEXT:    %s5 = add i32 %phi3, %sum3
-; CHECK-NEXT:    --> {{\{\{}}122,+,3}<nuw><nsw><%loop1>,+,3}<%loop2> U: [122,1038) S: [122,1038) --> 1037 U: [1037,1038) S: [1037,1038)
+; CHECK-NEXT:    --> {{\{\{}}122,+,3}<nuw><nsw><%loop1>,+,3}<nw><%loop2> U: [122,1038) S: [122,1038) --> 1037 U: [1037,1038) S: [1037,1038)
 ; CHECK-NEXT:    %s6 = add i32 %sum2, %phi6
-; CHECK-NEXT:    --> {{\{\{}}63,+,6}<%loop1>,+,3}<nw><%loop2> U: [63,1471) S: [63,1471) --> 1470 U: [1470,1471) S: [1470,1471)
+; CHECK-NEXT:    --> {{\{\{}}63,+,6}<nuw><nsw><%loop1>,+,3}<nw><%loop2> U: [63,1471) S: [63,1471) --> 1470 U: [1470,1471) S: [1470,1471)
 ; CHECK-NEXT:  Determining loop execution counts for: @test_00
 ; CHECK-NEXT:  Loop %loop2: backedge-taken count is i32 141
 ; CHECK-NEXT:  Loop %loop2: constant max backedge-taken count is i32 141
@@ -139,13 +139,13 @@ define void @test_01(i32 %a, i32 %b) {
 ; CHECK-NEXT:    %phi6.inc = add i32 %phi6, 3
 ; CHECK-NEXT:    --> {46,+,3}<nuw><nsw><%loop2> U: [46,548) S: [46,548) Exits: (46 + (3 * (((-165 + (-6 * (((-6 + (-2 * %a) + (-1 * (1 umin (-6 + (-2 * %a) + (-1 * %b) + (1000 umax (6 + (2 * %a) + %b)))))<nuw><nsw> + (-1 * %b) + (1000 umax (6 + (2 * %a) + %b))) /u 6) + (1 umin (-6 + (-2 * %a) + (-1 * %b) + (1000 umax (6 + (2 * %a) + %b)))))) + (-2 * %a) + (-2 * %b) + (-1 * (1 umin (-165 + (-6 * (((-6 + (-2 * %a) + (-1 * (1 umin (-6 + (-2 * %a) + (-1 * %b) + (1000 umax (6 + (2 * %a) + %b)))))<nuw><nsw> + (-1 * %b) + (1000 umax (6 + (2 * %a) + %b))) /u 6) + (1 umin (-6 + (-2 * %a) + (-1 * %b) + (1000 umax (6 + (2 * %a)...
[truncated]

}
// Step size has changed, so we cannot guarantee no self-wraparound.
Ops[Idx] = getAddRecExpr(AddRecOps, AddRecLoop, SCEV::FlagAnyWrap);
// However, preserve NSW/NUW flags if all combined AddRecs had them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the logic here. Adding together two nsw addrecs doesn't necessarily result in an nsw addrec.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-posting previous review: #155145 (comment)

@igogo-x86
Copy link
Contributor

Is it really correct? In this example, %iv1 and %iv2 do not overlow, but their sum does:

  define void @nuw_sum_wraps() {
  entry:
    br label %loop

  loop:
    ; k = 0..20
    %k = phi i32 [ 0, %entry ], [ %k.next, %loop ]

    %iv1 = phi i8 [ 200, %entry ], [ %iv1.next, %loop ]
    %iv2 = phi i8 [ 40, %entry ], [ %iv2.next, %loop ]

    %sum = add i8 %iv1, %iv2  ; SCEV: {240,+,2} on i8

    ; Individually marked NUW
    %iv1.next = add nuw i8 %iv1, 1
    %iv2.next = add nuw i8 %iv2, 1

    %k.next = add nuw nsw i32 %k, 1
    %cond = icmp ult i32 %k.next, 21
    br i1 %cond, label %loop, label %exit

  exit:
    ret void
  }

But opt -passes="print<scalar-evolution>" -S test.ll adds this flags:

%sum = add i8 %iv1, %iv2
  -->  {-16,+,2}<nuw><nsw><%loop> U: [-16,-1) S: [-16,0)                Exits: 24               LoopDispositions: { %loop: Computable }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants