[SCEV] Add implicit guard conditions when applying loop guards. #84309

Mel-Chen · 2024-03-07T11:43:06Z

When the guard condition has unknown operands, additional implicit
conditions will be generated.
For example, if guard condition is

%cond0 = icmp slt i32 %start, %n

Additional conditions will be generated as follows:

%cond1 = icmp slt i32 %start, MAX_SIGNED_i32
%cond2 = icmp slt i32 MIN_SIGNED_i32, %n

The constant operands helps obtain a more precise range, and one
application is to make getRangeForAffineNoSelfWrappingAR provide a
more accurate range.

For truncated induction variables, this patch still cannot be used to
sharpen a more accurate range due to the loss of the nw flag.

When the guard condition has unknown operands, additional implicit conditions will be generated. For example, if guard condition is %cond0 = icmp slt i32 %start, %n Additional conditions will be generated as follows: %cond1 = icmp slt i32 %start, MAX_SIGNED_i32 %cond2 = icmp slt i32 MIN_SIGNED_i32, %n The constant operands helps obtain a more precise range, and one application is to make `getRangeForAffineNoSelfWrappingAR` provide a more accurate range. For truncated induction variables, this patch still cannot be used to sharpen a more accurate range due to the loss of the nw flag.

llvmbot · 2024-03-07T11:43:35Z

@llvm/pr-subscribers-llvm-analysis

Author: Mel Chen (Mel-Chen)

Changes

When the guard condition has unknown operands, additional implicit
conditions will be generated.
For example, if guard condition is

%cond0 = icmp slt i32 %start, %n

Additional conditions will be generated as follows:

%cond1 = icmp slt i32 %start, MAX_SIGNED_i32
%cond2 = icmp slt i32 MIN_SIGNED_i32, %n

The constant operands helps obtain a more precise range, and one
application is to make getRangeForAffineNoSelfWrappingAR provide a
more accurate range.

For truncated induction variables, this patch still cannot be used to
sharpen a more accurate range due to the loss of the nw flag.

Full diff: https://github.com/llvm/llvm-project/pull/84309.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+38-3)
(added) llvm/test/Analysis/ScalarEvolution/range-sharpening.ll (+76)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index acc0aa23107bb5..5de33f37958e1b 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -7069,7 +7069,8 @@ ConstantRange ScalarEvolution::getRangeForAffineNoSelfWrappingAR(
       IsSigned ? ICmpInst::ICMP_SLE : ICmpInst::ICMP_ULE;
   ICmpInst::Predicate GEPred =
       IsSigned ? ICmpInst::ICMP_SGE : ICmpInst::ICMP_UGE;
-  const SCEV *End = AddRec->evaluateAtIteration(MaxBECount, *this);
+  const SCEV *End = applyLoopGuards(
+      AddRec->evaluateAtIteration(MaxBECount, *this), AddRec->getLoop());
 
   // We know that there is no self-wrap. Let's take Start and End values and
   // look at all intermediate values V1, V2, ..., Vn that IndVar takes during
@@ -7098,10 +7099,10 @@ ConstantRange ScalarEvolution::getRangeForAffineNoSelfWrappingAR(
     return ConstantRange::getFull(BitWidth);
 
   if (isKnownPositive(Step) &&
-      isKnownPredicateViaConstantRanges(LEPred, Start, End))
+      isKnownViaNonRecursiveReasoning(LEPred, Start, End))
     return RangeBetween;
   if (isKnownNegative(Step) &&
-           isKnownPredicateViaConstantRanges(GEPred, Start, End))
+      isKnownViaNonRecursiveReasoning(GEPred, Start, End))
     return RangeBetween;
   return ConstantRange::getFull(BitWidth);
 }
@@ -15434,6 +15435,40 @@ const SCEV *ScalarEvolution::applyLoopGuards(const SCEV *Expr, const Loop *L) {
         const auto *LHS = getSCEV(Cmp->getOperand(0));
         const auto *RHS = getSCEV(Cmp->getOperand(1));
         CollectCondition(Predicate, LHS, RHS, RewriteMap);
+
+        // When the guard condition is of the following form,
+        //   Unknown_LHS s> Unknown_RHS or Unknown_LHS s< Unknown_RHS
+        // Two additional conditions will be added:
+        //   Unknown_LHS s> Min(Data_Type), Max(Data_Type) s> Unknown_RHS
+        //   or
+        //   Unknown_LHS s< Max(Data_Type), Min(Data_Type) s< Unknown_RHS
+        // Note that for pointer types, we will not add additional conditions,
+        // as this may not be a meaningful comparison.
+        // TODO: May worked on unsigned predicate too.
+        if (CmpInst::isStrictPredicate(Predicate) &&
+            CmpInst::isSigned(Predicate) && isa<SCEVUnknown>(LHS) &&
+            isa<SCEVUnknown>(RHS) && !LHS->getType()->isPointerTy()) {
+
+          unsigned BitWidth = getTypeSizeInBits(LHS->getType());
+          APInt MaxAP = APInt::getSignedMaxValue(BitWidth);
+          APInt MinAP = APInt::getSignedMinValue(BitWidth);
+          const SCEV *MaxScev = getConstant(MaxAP);
+          const SCEV *MinScev = getConstant(MinAP);
+
+          switch (Predicate) {
+          case CmpInst::ICMP_SLT:
+            CollectCondition(Predicate, LHS, MaxScev, RewriteMap);
+            CollectCondition(Predicate, MinScev, RHS, RewriteMap);
+            break;
+          case CmpInst::ICMP_SGT:
+            CollectCondition(Predicate, LHS, MinScev, RewriteMap);
+            CollectCondition(Predicate, MaxScev, RHS, RewriteMap);
+            break;
+          default:
+            break;
+          }
+        }
+
         continue;
       }
 
diff --git a/llvm/test/Analysis/ScalarEvolution/range-sharpening.ll b/llvm/test/Analysis/ScalarEvolution/range-sharpening.ll
new file mode 100644
index 00000000000000..f1ad4c9072346e
--- /dev/null
+++ b/llvm/test/Analysis/ScalarEvolution/range-sharpening.ll
@@ -0,0 +1,76 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -disable-output -scalar-evolution-use-expensive-range-sharpening -passes='print<scalar-evolution>' 2>&1 | FileCheck %s
+
+define void @test_guard_less_than_non_const(ptr nocapture %a, i32 %iv_start, i32 %n) {
+; CHECK-LABEL: 'test_guard_less_than_non_const'
+; CHECK-NEXT:  Classifying expressions for: @test_guard_less_than_non_const
+; CHECK-NEXT:    %i.05 = phi i32 [ %inc, %for.body ], [ %iv_start, %entry ]
+; CHECK-NEXT:    --> {%iv_start,+,1}<nsw><%for.body> U: full-set S: [-2147483648,2147483647) Exits: (-1 + %n) LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:    %arrayidx = getelementptr inbounds i32, ptr %a, i32 %i.05
+; CHECK-NEXT:    --> {((4 * (sext i32 %iv_start to i64))<nsw> + %a),+,4}<nw><%for.body> U: full-set S: full-set Exits: ((4 * (zext i32 (-1 + (-1 * %iv_start) + %n) to i64))<nuw><nsw> + (4 * (sext i32 %iv_start to i64))<nsw> + %a) LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:    %inc = add nsw i32 %i.05, 1
+; CHECK-NEXT:    --> {(1 + %iv_start),+,1}<nsw><%for.body> U: full-set S: full-set Exits: %n LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:  Determining loop execution counts for: @test_guard_less_than_non_const
+; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + (-1 * %iv_start) + %n)
+; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i32 -2
+; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + (-1 * %iv_start) + %n)
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+;
+entry:
+  %cmp4 = icmp slt i32 %iv_start, %n
+  br i1 %cmp4, label %for.body, label %exit
+
+for.body:
+  %i.05 = phi i32 [ %inc, %for.body ], [ %iv_start, %entry ]
+  %arrayidx = getelementptr inbounds i32, ptr %a, i32 %i.05
+  store i32 %i.05, ptr %arrayidx, align 4
+  %inc = add nsw i32 %i.05, 1
+  %exitcond.not = icmp eq i32 %inc, %n
+  br i1 %exitcond.not, label %exit, label %for.body
+
+exit:
+  ret void
+}
+
+define void @test_guard_less_than_non_const_sext(ptr nocapture %a, i32 %iv_start, i32 %n) {
+; CHECK-LABEL: 'test_guard_less_than_non_const_sext'
+; CHECK-NEXT:  Classifying expressions for: @test_guard_less_than_non_const_sext
+; CHECK-NEXT:    %0 = sext i32 %iv_start to i64
+; CHECK-NEXT:    --> (sext i32 %iv_start to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
+; CHECK-NEXT:    %wide.trip.count = sext i32 %n to i64
+; CHECK-NEXT:    --> (sext i32 %n to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
+; CHECK-NEXT:    %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+; CHECK-NEXT:    --> {(sext i32 %iv_start to i64),+,1}<nsw><%for.body> U: [-2147483648,-9223372036854775808) S: [-2147483648,2147483647) Exits: (-1 + (sext i32 %n to i64))<nsw> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:    %arrayidx = getelementptr inbounds i32, ptr %a, i64 %indvars.iv
+; CHECK-NEXT:    --> {((4 * (sext i32 %iv_start to i64))<nsw> + %a),+,4}<nw><%for.body> U: full-set S: full-set Exits: (-4 + (4 * (sext i32 %n to i64))<nsw> + %a) LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:    %1 = trunc i64 %indvars.iv to i32
+; CHECK-NEXT:    --> {%iv_start,+,1}<%for.body> U: full-set S: full-set Exits: (-1 + %n) LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:    %indvars.iv.next = add nsw i64 %indvars.iv, 1
+; CHECK-NEXT:    --> {(1 + (sext i32 %iv_start to i64))<nsw>,+,1}<nsw><%for.body> U: [-2147483647,-9223372036854775808) S: [-2147483647,-9223372036854775808) Exits: (sext i32 %n to i64) LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT:  Determining loop execution counts for: @test_guard_less_than_non_const_sext
+; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + (sext i32 %n to i64) + (-1 * (sext i32 %iv_start to i64))<nsw>)
+; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i64 -2
+; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + (sext i32 %n to i64) + (-1 * (sext i32 %iv_start to i64))<nsw>)
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+;
+entry:
+  %cmp4 = icmp slt i32 %iv_start, %n
+  br i1 %cmp4, label %for.body.preheader, label %exit
+
+for.body.preheader:
+  %0 = sext i32 %iv_start to i64
+  %wide.trip.count = sext i32 %n to i64
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %a, i64 %indvars.iv
+  %1 = trunc i64 %indvars.iv to i32
+  store i32 %1, ptr %arrayidx, align 4
+  %indvars.iv.next = add nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+  br i1 %exitcond.not, label %exit, label %for.body
+
+exit:
+  ret void
+}

preames · 2024-03-28T20:32:30Z

Happened to chat w/Craig, and he reminded me I hadn't replied here.

This patch is not unreasonable, but there's a couple things which need done.

First, you're modifying code which is not enabled by default. The first step is assessing the compile time impact of enabling it. If that is prohibitive, then the rest of this line of work is moot.

Second, you're actual change appears to be three independent pieces. Each of these should be their own review with targeted tests for each case. Two of them appear blocked on enabling the disabled piece of code, one you could maybe do in parallel.

I don't plan on devoting any review time to this until the first item has been resolved.

Mel-Chen · 2024-04-01T14:17:59Z

I'm glad to receive your reply. Sorry for the delayed response, I was caught up with other issues previously.

First, you're modifying code which is not enabled by default. The first step is assessing the compile time impact of enabling it. If that is prohibitive, then the rest of this line of work is moot.

During this period, I observed the original PR at https://reviews.llvm.org/D89381. The reason for default disabling this feature seems to be not only due to the increase in compile time but also because the performance gain it brings is insufficient. I don't have experience with compile time-related matters and I'm unsure about the relevant processes (such as benchmarks used, compile options, etc.). Currently, my plan is to measure the compile time and performance with and without enabling -scalar-evolution-use-expensive-range-sharpening, focusing on spec2006 or spec2017 benchmarks separately. If there are common evaluation processes for compile time issues in the community, please let me know. (cc @nikic )

Second, you're actual change appears to be three independent pieces. Each of these should be their own review with targeted tests for each case. Two of them appear blocked on enabling the disabled piece of code, one you could maybe do in parallel.

This matter is also bothering me. I tried to only test with applyLoopGuards modification, but there was no changes on lit tests. Next, I may try to use asserts to find relevant cases in benchmarks. So that when the applyLoopGuards change is split out, there are corresponding test cases to demonstrate the functionality of the patch.

Mel-Chen added 2 commits March 7, 2024 03:19

Pre-commit test cases for SCEV range sharpening (nfc)

5306dc5

Mel-Chen requested review from fhahn and preames March 7, 2024 11:43

Mel-Chen requested a review from nikic as a code owner March 7, 2024 11:43

llvmbot added the llvm:analysis label Mar 7, 2024

Mel-Chen mentioned this pull request Mar 7, 2024

Pre-commit test cases for SCEV range sharpening (nfc) #84310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCEV] Add implicit guard conditions when applying loop guards. #84309

[SCEV] Add implicit guard conditions when applying loop guards. #84309

Mel-Chen commented Mar 7, 2024

llvmbot commented Mar 7, 2024

preames commented Mar 28, 2024

Mel-Chen commented Apr 1, 2024

[SCEV] Add implicit guard conditions when applying loop guards. #84309

Are you sure you want to change the base?

[SCEV] Add implicit guard conditions when applying loop guards. #84309

Conversation

Mel-Chen commented Mar 7, 2024

llvmbot commented Mar 7, 2024

preames commented Mar 28, 2024

Mel-Chen commented Apr 1, 2024