[SCEV][LV] Add Stride equal to one Predicate to enable strided access versioning #77287

ShivaChen · 2024-01-08T08:59:36Z

This commit enables the vectorization for the case from #71517.
The loop can't be vectorized due to the BECount is unknown.

float  s172(int xa, int xb)  {
  for (int i = xa - 1; i < 32000; i += xb)
     a[i] += b[i];
}

By assuming the stride as one and generating the runtime checking to guard the vectorized loop, it seems the case can be vectorized.

llvmbot · 2024-01-08T09:00:02Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (ShivaChen)

Changes

There is a case in TSVC didn't be vectorized due to the BECount is unknown.

float  s172(int xa, int xb)  {
  for (int i = xa - 1; i &lt; 32000; i += xb)
     a[i] += b[i];
}

By assuming the stride as one and generating the runtime checking to guard the vectorized loop, it seems the case can be vectorized.

Full diff: https://github.com/llvm/llvm-project/pull/77287.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+14-1)
(modified) llvm/test/Transforms/LoopVectorize/version-mem-access.ll (+52)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 623814c038a78f..3c712ead953186 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -12778,10 +12778,23 @@ ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,
     // The positive stride case is the same as isKnownPositive(Stride) returning
     // true (original behavior of the function).
     //
-    if (PredicatedIV || !NoWrap || !loopIsFiniteByAssumption(L) ||
+    if (PredicatedIV || !loopIsFiniteByAssumption(L) ||
         !loopHasNoAbnormalExits(L))
       return getCouldNotCompute();
 
+    // Adding Stride equal to one Predicate when there is no wrap flags.
+    // It might enable strided access versioning in LAA and calculate BECount
+    // with Stride = 1.
+    if (!NoWrap) {
+      if (AllowPredicates) {
+        const auto *One =
+            static_cast<const SCEVConstant *>(getOne(Stride->getType()));
+        Predicates.insert(getEqualPredicate(Stride, One));
+        Stride = One;
+      } else
+        return getCouldNotCompute();
+    }
+
     if (!isKnownNonZero(Stride)) {
       // If we have a step of zero, and RHS isn't invariant in L, we don't know
       // if it might eventually be greater than start and if so, on which
diff --git a/llvm/test/Transforms/LoopVectorize/version-mem-access.ll b/llvm/test/Transforms/LoopVectorize/version-mem-access.ll
index 7bf4fbd89b0eea..f1283365ef52a4 100644
--- a/llvm/test/Transforms/LoopVectorize/version-mem-access.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-mem-access.ll
@@ -92,3 +92,55 @@ for.end.loopexit:
 for.end:
   ret void
 }
+
+; We can vectorize the loop by using stride = 1 to calculate iteration count
+; and generate the runtime check to guard the vectorized loop.
+
+; CHECK-LABEL: s172
+; CHECK-DAG: icmp ne i32 %xb, 1
+; CHECK: vector.body
+
+@b = global [32000 x float] zeroinitializer, align 64
+@a = global [32000 x float] zeroinitializer, align 64
+
+; for (int i = xa - 1; i < 32000; i += xb)
+;   a[i] += b[i];
+;
+define float @s172(i32 signext %xa, i32 signext %xb) mustprogress {
+entry:
+  %cmp214 = icmp slt i32 %xa, 32001
+  br i1 %cmp214, label %for.body.us.preheader, label %for.cond.cleanup
+
+for.body.us.preheader:                            ; preds = %entry
+  %sub = add i32 %xa, -1
+  %0 = sext i32 %sub to i64
+  %1 = sext i32 %xb to i64
+  br label %for.body.us
+
+for.body.us:                                      ; preds = %for.body.us.preheader, %for.cond1.for.cond.cleanup3_crit_edge.us
+  %nl.016.us = phi i32 [ %inc.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %for.body.us.preheader ]
+  br label %for.body4.us
+
+for.body4.us:                                     ; preds = %for.body.us, %for.body4.us
+  %indvars.iv = phi i64 [ %0, %for.body.us ], [ %indvars.iv.next, %for.body4.us ]
+  %arrayidx.us = getelementptr inbounds [32000 x float], ptr @b, i64 0, i64 %indvars.iv
+  %2 = load float, ptr %arrayidx.us, align 4
+  %arrayidx6.us = getelementptr inbounds [32000 x float], ptr @a, i64 0, i64 %indvars.iv
+  %3 = load float, ptr %arrayidx6.us, align 4
+  %add.us = fadd fast float %3, %2
+  store float %add.us, ptr %arrayidx6.us, align 4
+  %indvars.iv.next = add i64 %indvars.iv, %1
+  %cmp2.us = icmp slt i64 %indvars.iv.next, 32000
+  br i1 %cmp2.us, label %for.body4.us, label %for.cond1.for.cond.cleanup3_crit_edge.us
+
+for.cond1.for.cond.cleanup3_crit_edge.us:         ; preds = %for.body4.us
+  %inc.us = add nuw nsw i32 %nl.016.us, 1
+  %exitcond.not = icmp eq i32 %inc.us, 100000
+  br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body.us
+
+for.cond.cleanup.loopexit:                        ; preds = %for.cond1.for.cond.cleanup3_crit_edge.us
+  br label %for.cond.cleanup
+
+for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
+  ret float undef
+}

sjoerdmeijer · 2024-01-08T09:32:27Z

Thanks for the patch!
It's worth mentioning (in the description) that this fixes #71517

sjoerdmeijer · 2024-01-08T09:42:40Z

llvm/test/Transforms/LoopVectorize/version-mem-access.ll

+
+; CHECK-LABEL: s172
+; CHECK-DAG: icmp ne i32 %xb, 1
+; CHECK: vector.body


Nit: the checks are a bit minimal, personally I would like to see a bit more context, but I see there's precedent in this file for just checking for stride 1 compare.

Hi Sjoerdmeijer,

I added more check lines to bring more context.
Thanks for the review. :-)

sjoerdmeijer · 2024-01-08T09:43:11Z

Looks like a good patch to me, but I will let @fhahn sign off on it.

… versioning This commit enable the vectorization for the case from llvm#71517. float s172(int xa, int xb) { for (int i = xa - 1; i < 32000; i += xb) a[i] += b[i]; } By assuming the stride as one and generating the runtime checking to guard the vectorized loop, it seems the case can be vectorized.

ShivaChen · 2024-01-08T10:45:46Z

Thanks for the patch! It's worth mentioning (in the description) that this fixes #71517

I have updated the description in the commit and the PR. Thanks for the suggestion!

fhahn · 2024-01-16T21:32:05Z

Thanks for the patch! This is an interesting issue. One thing I am not sure yet if this has potential to clash with the other stride versioning logic in LAA.

For that particular case at hand, I think we may not need to version, as it looks like the wrap flags get dropped during IndVars before LV, but we may be able to retain the flags. This is something I am currently looking into.

sjoerdmeijer · 2024-01-17T08:20:20Z

Thanks for the patch! This is an interesting issue. One thing I am not sure yet if this has potential to clash with the other stride versioning logic in LAA.

For that particular case at hand, I think we may not need to version, as it looks like the wrap flags get dropped during IndVars before LV, but we may be able to retain the flags. This is something I am currently looking into.

This is slightly off-topic for this patch, but GCC has a loop-versioning pass running before vectorisation to deal with these sort of cases, whereas in our case we have some logic sprinkled around to deal with strided accesses. My curiousity and question @fhahn, is if you would see value in a separate loop-versioning pass like GCC?

Add s172() to version-mem-access.ll

18fab95

ShivaChen requested a review from sjoerdmeijer January 8, 2024 08:59

ShivaChen requested a review from nikic as a code owner January 8, 2024 08:59

llvmbot added llvm:analysis llvm:transforms labels Jan 8, 2024

nikic requested a review from fhahn January 8, 2024 09:10

sjoerdmeijer reviewed Jan 8, 2024

View reviewed changes

ShivaChen added 2 commits January 8, 2024 10:39

Add check line to reveal the icmp is vectorized loop guard

8917ec0

ShivaChen force-pushed the stride-1-predicate-becount branch from 2fa239c to 8917ec0 Compare January 8, 2024 10:40

ShivaChen requested a review from aschwaighofer January 11, 2024 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCEV][LV] Add Stride equal to one Predicate to enable strided access versioning #77287

[SCEV][LV] Add Stride equal to one Predicate to enable strided access versioning #77287

ShivaChen commented Jan 8, 2024 •

edited

llvmbot commented Jan 8, 2024 •

edited

sjoerdmeijer commented Jan 8, 2024

sjoerdmeijer Jan 8, 2024

ShivaChen Jan 8, 2024

sjoerdmeijer commented Jan 8, 2024

ShivaChen commented Jan 8, 2024

fhahn commented Jan 16, 2024

sjoerdmeijer commented Jan 17, 2024

[SCEV][LV] Add Stride equal to one Predicate to enable strided access versioning #77287

Are you sure you want to change the base?

[SCEV][LV] Add Stride equal to one Predicate to enable strided access versioning #77287

Conversation

ShivaChen commented Jan 8, 2024 • edited

llvmbot commented Jan 8, 2024 • edited

sjoerdmeijer commented Jan 8, 2024

sjoerdmeijer Jan 8, 2024

Choose a reason for hiding this comment

ShivaChen Jan 8, 2024

Choose a reason for hiding this comment

sjoerdmeijer commented Jan 8, 2024

ShivaChen commented Jan 8, 2024

fhahn commented Jan 16, 2024

sjoerdmeijer commented Jan 17, 2024

ShivaChen commented Jan 8, 2024 •

edited

llvmbot commented Jan 8, 2024 •

edited