Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LAA] Always use DepCands when grouping runtime checks. #91196

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 6, 2024

Update groupChecks to always use DepCands to try and merge runtime checks. DepCands contains the dependency partition, grouping together all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the same underlying object, if there is an unknown dependency for this underlying object; otherwise we already proved that all accesses withing the underlying object are safe w.r.t. vectorization and we only need to check that accesses to the underlying object don't overlap with accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown dependencies, remove equivalence classes containing accesses involved in unknown dependencies.

This reduces the number of runtime checks needed in case non-constant dependence distances are found, and is in preparation for removing the restriction that the accesses need to have the same stride which was added in #88039.

@llvmbot
Copy link
Collaborator

llvmbot commented May 6, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-adt

Author: Florian Hahn (fhahn)

Changes

Update groupChecks to always use DepCands to try and merge runtime checks. DepCands contains the dependency partition, grouping together all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the same underlying object, if there is an unknown dependency for this underlying object; otherwise we already proved that all accesses withing the underlying object are safe w.r.t. vectorization and we only need to check that accesses to the underlying object don't overlap with accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown dependencies, remove equivalence classes containing accesses involved in unknown dependencies.

This reduces the number of runtime checks needed in case non-constant dependence distances are found, and is in preparation for removing the restriction that the accesses need to have the same stride which was added in #88039.


Patch is 22.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91196.diff

8 Files Affected:

  • (modified) llvm/include/llvm/ADT/EquivalenceClasses.h (+7)
  • (modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+5-6)
  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+55-41)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll (+8)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll (+12)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll (+12)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll (+4)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/unknown-dependence-retry-with-runtime-checks.ll (+30-31)
diff --git a/llvm/include/llvm/ADT/EquivalenceClasses.h b/llvm/include/llvm/ADT/EquivalenceClasses.h
index 4f98b84cf97d24..a45a3073231552 100644
--- a/llvm/include/llvm/ADT/EquivalenceClasses.h
+++ b/llvm/include/llvm/ADT/EquivalenceClasses.h
@@ -233,6 +233,13 @@ class EquivalenceClasses {
     return findLeader(TheMapping.find(V));
   }
 
+  /// Erase the class containing \p V.
+  void eraseClass(const ElemTy &V) {
+    if (TheMapping.find(V) == TheMapping.end())
+      return;
+    TheMapping.erase(V);
+  }
+
   /// union - Merge the two equivalence sets for the specified values, inserting
   /// them if they do not already exist in the equivalence set.
   member_iterator unionSets(const ElemTy &V1, const ElemTy &V2) {
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index cf998e66ee486e..11e185a687cab4 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -453,8 +453,8 @@ class RuntimePointerChecking {
 
   /// Generate the checks and store it.  This also performs the grouping
   /// of pointers to reduce the number of memchecks necessary.
-  void generateChecks(MemoryDepChecker::DepCandidates &DepCands,
-                      bool UseDependencies);
+  void generateChecks(const MemoryDepChecker &DepChecker,
+                      MemoryDepChecker::DepCandidates &DepCands);
 
   /// Returns the checks that generateChecks created. They can be used to ensure
   /// no read/write accesses overlap across all loop iterations.
@@ -521,10 +521,9 @@ class RuntimePointerChecking {
 private:
   /// Groups pointers such that a single memcheck is required
   /// between two different groups. This will clear the CheckingGroups vector
-  /// and re-compute it. We will only group dependecies if \p UseDependencies
-  /// is true, otherwise we will create a separate group for each pointer.
-  void groupChecks(MemoryDepChecker::DepCandidates &DepCands,
-                   bool UseDependencies);
+  /// and re-compute it.
+  void groupChecks(const MemoryDepChecker &DepChecker,
+                   MemoryDepChecker::DepCandidates &DepCands);
 
   /// Generate the checks and return them.
   SmallVector<RuntimePointerCheck, 4> generateChecks();
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index ae7f0373c4e846..77b44b4894f378 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -385,9 +385,10 @@ SmallVector<RuntimePointerCheck, 4> RuntimePointerChecking::generateChecks() {
 }
 
 void RuntimePointerChecking::generateChecks(
-    MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {
+    const MemoryDepChecker &DepChecker,
+    MemoryDepChecker::DepCandidates &DepCands) {
   assert(Checks.empty() && "Checks is not empty");
-  groupChecks(DepCands, UseDependencies);
+  groupChecks(DepChecker, DepCands);
   Checks = generateChecks();
 }
 
@@ -454,7 +455,8 @@ bool RuntimeCheckingPtrGroup::addPointer(unsigned Index, const SCEV *Start,
 }
 
 void RuntimePointerChecking::groupChecks(
-    MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {
+    const MemoryDepChecker &DepChecker,
+    MemoryDepChecker::DepCandidates &DepCands) {
   // We build the groups from dependency candidates equivalence classes
   // because:
   //    - We know that pointers in the same equivalence class share
@@ -490,21 +492,11 @@ void RuntimePointerChecking::groupChecks(
   // us to perform an accurate check in this case.
   //
   // The above case requires that we have an UnknownDependence between
-  // accesses to the same underlying object. This cannot happen unless
-  // FoundNonConstantDistanceDependence is set, and therefore UseDependencies
-  // is also false. In this case we will use the fallback path and create
-  // separate checking groups for all pointers.
-
-  // If we don't have the dependency partitions, construct a new
-  // checking pointer group for each pointer. This is also required
-  // for correctness, because in this case we can have checking between
-  // pointers to the same underlying object.
-  if (!UseDependencies) {
-    for (unsigned I = 0; I < Pointers.size(); ++I)
-      CheckingGroups.push_back(RuntimeCheckingPtrGroup(I, *this));
-    return;
-  }
-
+  // accesses to the same underlying object. It is the caller's responsibility
+  // to clear any entries for accesses with unknown dependencies from the
+  // dependency partition (DepCands). This is required for correctness, because
+  // in this case we can have checking between pointers to the same underlying
+  // object.
   unsigned TotalComparisons = 0;
 
   DenseMap<Value *, SmallVector<unsigned>> PositionMap;
@@ -529,6 +521,13 @@ void RuntimePointerChecking::groupChecks(
     MemoryDepChecker::MemAccessInfo Access(Pointers[I].PointerValue,
                                            Pointers[I].IsWritePtr);
 
+    // If there is no entry in the dependency partition, there are no potential
+    // accesses to merge; simply add a new pointer checking group.
+    if (DepCands.findValue(Access) == DepCands.end()) {
+      CheckingGroups.push_back(RuntimeCheckingPtrGroup(I, *this));
+      continue;
+    }
+
     SmallVector<RuntimeCheckingPtrGroup, 2> Groups;
     auto LeaderI = DepCands.findValue(DepCands.getLeaderValue(Access));
 
@@ -700,8 +699,10 @@ class AccessAnalysis {
   ///
   /// Returns true if we need no check or if we do and we can generate them
   /// (i.e. the pointers have computable bounds).
-  bool canCheckPtrAtRT(RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
-                       Loop *TheLoop, const DenseMap<Value *, const SCEV *> &Strides,
+  bool canCheckPtrAtRT(const MemoryDepChecker &DepChecker,
+                       RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
+                       Loop *TheLoop,
+                       const DenseMap<Value *, const SCEV *> &Strides,
                        Value *&UncomputablePtr, bool ShouldCheckWrap = false);
 
   /// Goes over all memory accesses, checks whether a RT check is needed
@@ -717,12 +718,6 @@ class AccessAnalysis {
   /// dependency checking (i.e. FoundNonConstantDistanceDependence).
   bool isDependencyCheckNeeded() { return !CheckDeps.empty(); }
 
-  /// We decided that no dependence analysis would be used.  Reset the state.
-  void resetDepChecks(MemoryDepChecker &DepChecker) {
-    CheckDeps.clear();
-    DepChecker.clearDependences();
-  }
-
   MemAccessInfoList &getDependenciesToCheck() { return CheckDeps; }
 
   const DenseMap<Value *, SmallVector<const Value *, 16>> &
@@ -1107,7 +1102,8 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
     // The id of the dependence set.
     unsigned DepId;
 
-    if (isDependencyCheckNeeded()) {
+    if (isDependencyCheckNeeded() &&
+        DepCands.findValue(Access) != DepCands.end()) {
       Value *Leader = DepCands.getLeaderValue(Access).getPointer();
       unsigned &LeaderId = DepSetId[Leader];
       if (!LeaderId)
@@ -1126,10 +1122,11 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
   return true;
 }
 
-bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
-                                     ScalarEvolution *SE, Loop *TheLoop,
-                                     const DenseMap<Value *, const SCEV *> &StridesMap,
-                                     Value *&UncomputablePtr, bool ShouldCheckWrap) {
+bool AccessAnalysis::canCheckPtrAtRT(
+    const MemoryDepChecker &DepChecker, RuntimePointerChecking &RtCheck,
+    ScalarEvolution *SE, Loop *TheLoop,
+    const DenseMap<Value *, const SCEV *> &StridesMap, Value *&UncomputablePtr,
+    bool ShouldCheckWrap) {
   // Find pointers with computable bounds. We are going to use this information
   // to place a runtime bound check.
   bool CanDoRT = true;
@@ -1137,7 +1134,26 @@ bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
   bool MayNeedRTCheck = false;
   if (!IsRTCheckAnalysisNeeded) return true;
 
-  bool IsDepCheckNeeded = isDependencyCheckNeeded();
+  if (auto *Deps = DepChecker.getDependences()) {
+    // If there are unknown dependences, this means runtime checks are needed to
+    // ensure there's no overlap between accesses to the same underlying object.
+    // Remove the equivalence classes containing both source and destination
+    // accesses from DepCands. This ensures runtime checks will be generated
+    // between those accesses and prevents them from being grouped together.
+    for (const auto &Dep : *Deps) {
+      if (Dep.Type != MemoryDepChecker::Dependence::Unknown) {
+        assert(MemoryDepChecker::Dependence::isSafeForVectorization(Dep.Type) ==
+                   MemoryDepChecker::VectorizationSafetyStatus::Safe &&
+               "Should only skip safe dependences");
+        continue;
+      }
+      Instruction *Src = Dep.getSource(DepChecker);
+      Instruction *Dst = Dep.getDestination(DepChecker);
+      DepCands.eraseClass({getPointerOperand(Src), Src->mayWriteToMemory()});
+      DepCands.eraseClass({getPointerOperand(Dst), Dst->mayWriteToMemory()});
+    }
+  } else
+    CheckDeps.clear();
 
   // We assign a consecutive id to access from different alias sets.
   // Accesses between different groups doesn't need to be checked.
@@ -1265,7 +1281,7 @@ bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
   }
 
   if (MayNeedRTCheck && CanDoRT)
-    RtCheck.generateChecks(DepCands, IsDepCheckNeeded);
+    RtCheck.generateChecks(DepChecker, DepCands);
 
   LLVM_DEBUG(dbgs() << "LAA: We need to do " << RtCheck.getNumberOfChecks()
                     << " pointer comparisons.\n");
@@ -2625,9 +2641,9 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
   // Find pointers with computable bounds. We are going to use this information
   // to place a runtime bound check.
   Value *UncomputablePtr = nullptr;
-  bool CanDoRTIfNeeded =
-      Accesses.canCheckPtrAtRT(*PtrRtChecking, PSE->getSE(), TheLoop,
-                               SymbolicStrides, UncomputablePtr, false);
+  bool CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(
+      getDepChecker(), *PtrRtChecking, PSE->getSE(), TheLoop, SymbolicStrides,
+      UncomputablePtr, false);
   if (!CanDoRTIfNeeded) {
     auto *I = dyn_cast_or_null<Instruction>(UncomputablePtr);
     recordAnalysis("CantIdentifyArrayBounds", I) 
@@ -2651,16 +2667,14 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
     if (!CanVecMem && DepChecker->shouldRetryWithRuntimeCheck()) {
       LLVM_DEBUG(dbgs() << "LAA: Retrying with memory checks\n");
 
-      // Clear the dependency checks. We assume they are not needed.
-      Accesses.resetDepChecks(*DepChecker);
-
       PtrRtChecking->reset();
       PtrRtChecking->Need = true;
 
       auto *SE = PSE->getSE();
       UncomputablePtr = nullptr;
-      CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(
-          *PtrRtChecking, SE, TheLoop, SymbolicStrides, UncomputablePtr, true);
+      CanDoRTIfNeeded =
+          Accesses.canCheckPtrAtRT(getDepChecker(), *PtrRtChecking, SE, TheLoop,
+                                   SymbolicStrides, UncomputablePtr, true);
 
       // Check that we found the bounds for the pointer.
       if (!CanDoRTIfNeeded) {
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll b/llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
index fd4f417e57b635..4143c70c7b6107 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
@@ -90,6 +90,10 @@ define void @test_indirect_read_loop_also_modifies_pointer_array(ptr noundef %ar
 ; CHECK-NEXT:    loop.2:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l.1 = load ptr, ptr %gep.iv.1, align 8, !tbaa !0 ->
+; CHECK-NEXT:            store i64 %l.2, ptr %gep.iv.2, align 8, !tbaa !0
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
@@ -158,6 +162,10 @@ define void @test_indirect_write_loop_also_modifies_pointer_array(ptr noundef %a
 ; CHECK-NEXT:    loop.2:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l.1 = load ptr, ptr %gep.iv.1, align 8, !tbaa !0 ->
+; CHECK-NEXT:            store ptr %l.1, ptr %gep.iv.2, align 8, !tbaa !0
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll b/llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll
index 3b4f2a170d8cbc..12e69cd0cdbe33 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll
@@ -25,6 +25,18 @@
 ; CHECK:   .inner:
 ; CHECK-NEXT:     Memory dependences are safe with run-time checks
 ; CHECK-NEXT:     Dependences:
+; CHECK-NEXT:       Unknown:
+; CHECK-NEXT:           %4 = load i32, ptr %1, align 4 ->
+; CHECK-NEXT:           store i32 %8, ptr %6, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT:       Unknown:
+; CHECK-NEXT:           %3 = load i32, ptr %2, align 4 ->
+; CHECK-NEXT:           store i32 %8, ptr %6, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT:       Forward:
+; CHECK-NEXT:           %7 = load i32, ptr %6, align 4 ->
+; CHECK-NEXT:           store i32 %8, ptr %6, align 4
+; CHECK-EMPTY:
 ; CHECK-NEXT:     Run-time memory checks:
 ; CHECK:          Check 0:
 ; CHECK:          Check 1:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll b/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
index 08e0bae7f05bac..41f6f676a108fc 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
@@ -10,6 +10,10 @@ define void @test_distance_positive_independent_via_trip_count(ptr %A) {
 ; CHECK-NEXT:    loop:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT:            store i32 %ext, ptr %gep.A.400, align 4
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
@@ -55,6 +59,10 @@ define void @test_distance_positive_backwards(ptr %A) {
 ; CHECK-NEXT:    loop:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT:            store i32 %ext, ptr %gep.A.400, align 4
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
@@ -98,6 +106,10 @@ define void @test_distance_positive_via_assume(ptr %A, i64 %off) {
 ; CHECK-NEXT:    loop:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT:            store i32 %ext, ptr %gep.A.400, align 4
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP5:0x[0-9a-f]+]]):
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll b/llvm/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll
index b5ba85c6e3fa8e..7cf334c246a9ad 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll
@@ -13,6 +13,10 @@ target triple = "x86_64-apple-macosx10.10.0"
 
 ; CHECK: Memory dependences are safe with run-time checks
 ; CHECK-NEXT: Dependences:
+; CHECK-NEXT:   Unknown:
+; CHECK-NEXT:     %loadA = load i16, ptr %arrayidxA, align 2 ->
+; CHECK-NEXT:     store i16 %mul1, ptr %arrayidxA2, align 2
+; CHECK-EMPTY:
 ; CHECK-NEXT: Run-time memory checks:
 ; CHECK-NEXT: 0:
 ; CHECK-NEXT: Comparing group
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/unknown-dependence-retry-with-runtime-checks.ll b/llvm/test/Analysis/LoopAccessAnalysis/unknown-dependence-retry-with-runtime-checks.ll
index 23ab92d75cbcdc..54a966c56f0df8 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/unknown-dependence-retry-with-runtime-checks.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/unknown-dependence-retry-with-runtime-checks.ll
@@ -12,17 +12,20 @@ define void @test_dependence_with_non_constant_offset_and_other_accesses_to_noal
 ; CHECK-NEXT:    loop:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT:            store i32 %ext, ptr %gep.A.400, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT:        Forward:
+; CHECK-NEXT:            %l.2 = load i8, ptr %gep.B.1, align 1 ->
+; CHECK-NEXT:            store i8 %l.2, ptr %gep.B, align 1
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
 ; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A.400 = getelementptr inbounds i32, ptr %A.off, i64 %iv
 ; CHECK-NEXT:        Against group ([[GRP2:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv
-; CHECK-NEXT:      Check 1:
-; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
-; CHECK-NEXT:          %gep.B = getelementptr inbounds i8, ptr %B, i64 %iv
-; CHECK-NEXT:        Against group ([[GRP4:0x[0-9a-f]+]]):
-; CHECK-NEXT:          %gep.B.1 = getelementptr inbounds i8, ptr %B, i64 %iv.next
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP1]]:
 ; CHECK-NEXT:          (Low: (%off + %A) High: (404 + %off + %A))
@@ -30,12 +33,10 @@ define void @test_dependence_with_non_constant_offset_and_other_accesses_to_noal
 ; CHECK-NEXT:        Group [[GRP2]]:
 ; CHECK-NEXT:          (Low: %A High: (101 + %A))
 ; CHECK-NEXT:            Member: {%A,+,1}<nuw><%loop>
-; CHECK-NEXT:        Group [[GRP3]]:
-; CHECK-NEXT:          (Low: %B High: (101 + %B))
-; CHECK-NEXT:            Member: {%B,+,1}<nuw><%loop>
-; CHECK-NEXT:        Group [[GRP4]]:
-; CHECK-NEXT:          (Low: (1 + %B)<nuw> High: (102 + %B))
+; CHECK-NEXT:        Group [[GRP3:0x[0-9a-f]+]]:
+; CHECK-NEXT:          (Low: %B High: (102 + %B))
 ; CHECK-NEXT:            Member: {(1 + %B)<nuw>,+,1}<nuw><%loop>
+; CHECK-NEXT:            Member: {%B,+,1}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -72,45 +73,43 @@ define void @test_dependence_with_non_constant_offset_and_other_accesses_to_maya
 ; CHECK-NEXT:    loop:
 ; CHECK-NEXT:      Memory dependences are safe with run-time checks
 ; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:        Forward:
+; CHECK-NEXT:            %l.2 = load i8, ptr %gep.B.1, align 1 ->
+; CHECK-NEXT:            store i8 %l.2, ptr %gep.B, align 1
+; CHECK-EMPTY:
+; CHECK-NEXT:        Unknown:
+; CHECK-NEXT:            %l = load i8, ptr %gep.A, align 1 ->
+; CHECK-NEXT:            store i32 %ext, ptr %gep.A.400, align 4
+; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP4:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A.400 = getelementptr inbounds i32, ptr %A.off, i64 %iv
-; CHECK-NEXT:        Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep.B.1 = getelementptr inbounds i8, ptr %B, i64 %iv.next
 ; CHECK-NEXT:      ...
[truncated]

@fhahn fhahn force-pushed the laa-unknown-dep-rt-checks branch from 5405539 to 8e848c7 Compare May 7, 2024 13:17
Copy link

github-actions bot commented May 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Update groupChecks to always use DepCands to try and merge runtime
checks. DepCands contains the dependency partition, grouping together
all accessed pointers to he same underlying objects.

If we computed the dependencies, We only need to check accesses to the
same underlying object, if there is an unknown dependency for this
underlying object; otherwise we already proved that all accesses withing
the underlying object are safe w.r.t. vectorization and we only need
to check that accesses to the underlying object don't overlap with
accesses to other underlying objects.

To ensure runtime checks are generated for the case with unknown
dependencies, remove equivalence classes containing accesses involved in
unknown dependencies.

This reduces the number of runtime checks needed in case non-constant
dependence distances are found, and is in preparation for removing the
restriction that the accesses need to have the same stride which was
added in llvm#88039.
@fhahn fhahn force-pushed the laa-unknown-dep-rt-checks branch from 8e848c7 to 1059eb7 Compare May 7, 2024 13:28
@fhahn
Copy link
Contributor Author

fhahn commented May 7, 2024

The latest version contains an update to eraseClass to erase all members in an equivalence class.

@fhahn fhahn requested a review from efriedma-quic May 8, 2024 20:25
@fhahn
Copy link
Contributor Author

fhahn commented May 13, 2024

ping :)

@fhahn
Copy link
Contributor Author

fhahn commented May 22, 2024

ping :)

@fhahn
Copy link
Contributor Author

fhahn commented May 30, 2024

ping

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but please consider this a weak one. The structure of this code (both before and after your change) confuses me, and while your change looks reasonable, I'm not sure I'm not missing something.

@@ -1106,7 +1101,8 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
// The id of the dependence set.
unsigned DepId;

if (isDependencyCheckNeeded()) {
if (isDependencyCheckNeeded() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style only: I think you're basically adding a variant of isDependencyChecksNeeded which is specific for a particular access, and having a well named cover function would clarify the code.

return;
auto LeaderI = findValue(getLeaderValue(V));
for (auto MI = member_begin(LeaderI), ME = member_end(); MI != ME;) {
const auto &ToErase = *MI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto &ToErase = *MI;
const ElemTy &ToErase = *MI;

Comment on lines +238 to +240
if (TheMapping.find(V) == TheMapping.end())
return;
auto LeaderI = findValue(getLeaderValue(V));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (TheMapping.find(V) == TheMapping.end())
return;
auto LeaderI = findValue(getLeaderValue(V));
if (TheMapping.find(V) == TheMapping.end())
return;
iterator LeaderI = findValue(getLeaderValue(V));

@@ -233,6 +233,18 @@ class EquivalenceClasses {
return findLeader(TheMapping.find(V));
}

/// Erase the class containing \p V.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment could be clearer what "erasing" a class means. Reading the code, it removes all its members from the set.

Comment on lines +1154 to +1158
DepCands.eraseClass({getPointerOperand(Src), Src->mayWriteToMemory()});
DepCands.eraseClass({getPointerOperand(Dst), Dst->mayWriteToMemory()});
}
} else
CheckDeps.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[serious] Can we do without modifying state within a query function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look!

At the moment, canCheckPtrAtRT both checks if runtime checks can be generated and then also generates them, populating various members. While modifying DepCands here should be fine, I'll see if it is possible to restructure the code to make the check independent of actually generating runtime checks (which is for what the DepCands modifications are needed).

Comment on lines +704 to +707
bool canCheckPtrAtRT(const MemoryDepChecker &DepChecker,
RuntimePointerChecking &RtCheck, ScalarEvolution *SE,
Loop *TheLoop,
const DenseMap<Value *, const SCEV *> &Strides,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the (additional) parameters?

RtCheck seems to be the [out] parameter, should it stay the first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants