Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split load to bytes to deduce load value #72364

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mohammed-nurulhoque
Copy link
Contributor

When the last clobbering write for a load doesn't cover the whole load (example below). AnalyzeLoadAvailability would give up.
This emits temporary byte-sized loads that cover the original load, then run AnalyzeLoadAvailability on each and reconstruct the load value.

  define i32 @f(i8* %P)
    %b2 = getelementptr inbounds i8, i8* %P, i64 1
    store i8 0, i8* %b2, align 1
    store i8 0, i8* %P, align 1
    %s1 = bitcast i8* %P to i16*
    %L = load i16, i16* %s1, align 4

@llvmbot llvmbot added clang Clang issues not falling into any other category llvm:analysis llvm:transforms labels Nov 15, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: None (mohammed-nurulhoque)

Changes

When the last clobbering write for a load doesn't cover the whole load (example below). AnalyzeLoadAvailability would give up.
This emits temporary byte-sized loads that cover the original load, then run AnalyzeLoadAvailability on each and reconstruct the load value.

  define i32 @<!-- -->f(i8* %P)
    %b2 = getelementptr inbounds i8, i8* %P, i64 1
    store i8 0, i8* %b2, align 1
    store i8 0, i8* %P, align 1
    %s1 = bitcast i8* %P to i16*
    %L = load i16, i16* %s1, align 4

Patch is 28.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72364.diff

11 Files Affected:

  • (modified) clang/test/Frontend/optimization-remark-extra-analysis.c (+2-1)
  • (modified) llvm/include/llvm/Transforms/Scalar/GVN.h (+1)
  • (modified) llvm/lib/Transforms/Scalar/GVN.cpp (+104-1)
  • (modified) llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll (+6-6)
  • (modified) llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll (+2-3)
  • (modified) llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll (+3-3)
  • (modified) llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll (+7-13)
  • (modified) llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll (+3)
  • (modified) llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll (+3)
  • (added) llvm/test/Transforms/GVN/split-load-assertion.ll (+166)
  • (modified) llvm/test/Transforms/NewGVN/no_speculative_loads_with_asan.ll (+51-45)
diff --git a/clang/test/Frontend/optimization-remark-extra-analysis.c b/clang/test/Frontend/optimization-remark-extra-analysis.c
index 1a8415e69cff9ba..5af30bd0d3c35b3 100644
--- a/clang/test/Frontend/optimization-remark-extra-analysis.c
+++ b/clang/test/Frontend/optimization-remark-extra-analysis.c
@@ -6,6 +6,7 @@
 int foo(int *x, int *y) {
   int a = *x;
   *y = 2;
-  // expected-remark@+1 {{load of type i32 not eliminated}}
+  // expected-remark@+2 {{load of type i32 not eliminated}}
+  // expected-remark@+1 {{load of type i8 not eliminated}}
   return a + *x;
 }
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index 4ba9b74ccb005d4..4052c0d618d173d 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -314,6 +314,7 @@ class GVNPass : public PassInfoMixin<GVNPass> {
 
   // Helper functions of redundant load elimination
   bool processLoad(LoadInst *L);
+  bool splitAndprocessLoad(LoadInst *L);
   bool processNonLocalLoad(LoadInst *L);
   bool processAssumeIntrinsic(AssumeInst *II);
 
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 5e58af0edc1556f..f64508a8319197f 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -112,6 +112,7 @@ static cl::opt<bool>
 GVNEnableSplitBackedgeInLoadPRE("enable-split-backedge-in-load-pre",
                                 cl::init(false));
 static cl::opt<bool> GVNEnableMemDep("enable-gvn-memdep", cl::init(true));
+static cl::opt<bool> GVNEnableSplitLoad("enable-split-load", cl::init(true));
 
 static cl::opt<uint32_t> MaxNumDeps(
     "gvn-max-num-deps", cl::Hidden, cl::init(100),
@@ -2128,6 +2129,105 @@ static void patchAndReplaceAllUsesWith(Instruction *I, Value *Repl) {
   I->replaceAllUsesWith(Repl);
 }
 
+// split load to single byte loads and check if the value can be deduced
+//
+// Example:
+// define i32 @f(i8* %P)
+// 1:  %b2 = getelementptr inbounds i8, i8* %P, i64 1
+// 2:  store i8 0, i8* %b2, align 1
+// 3:  store i8 0, i8* %P, align 1
+// 4:  %s1 = bitcast i8* %P to i16*
+// 5:  %L = load i16, i16* %s1, align 4
+//
+// The last clobbering write for the load is (3) but it doesn't cover the whole
+// read. So AnalyzeLoadAvailability would give up.
+// This function emit temporary byte-sized loads that cover the original load,
+// so that any last write covers the read. We run AnalyzeLoadAvailability on
+// each byte to try to construct the load as a constant.
+bool GVNPass::splitAndprocessLoad(LoadInst *L) {
+  if (L->isAtomic())
+    return false;
+
+  Type *LTy = L->getType();
+  if (!LTy->isIntegerTy())
+    return false;
+
+  unsigned BW = LTy->getIntegerBitWidth();
+  if (BW % 8)
+    return false;
+
+  IntegerType *ByteTy = IntegerType::getInt8Ty(LTy->getContext());
+  Type *BytePtrTy = PointerType::get(ByteTy, L->getPointerAddressSpace());
+  BitCastInst *Base = new BitCastInst(L->getPointerOperand(), BytePtrTy, "", L);
+
+  // we update this byte by byte as we deduce the load value
+  APInt ConstructedValue = APInt::getZero(BW);
+
+  for (unsigned i=0; i<BW/8; i++) {
+    Value *Offset = Constant::getIntegerValue(LTy, APInt(BW, i));
+    GetElementPtrInst *GEP = GetElementPtrInst::Create(
+                               ByteTy, Base, ArrayRef<Value*>(Offset), "", L);
+    LoadInst *ByteLoad = new LoadInst(ByteTy, GEP, "", L);
+    ByteLoad->setDebugLoc(L->getDebugLoc());
+
+    auto CleanupTmps = [&]() {
+      MD->removeInstruction(ByteLoad);
+      ByteLoad->eraseFromParent();
+      GEP->eraseFromParent();
+    };
+
+    MemDepResult ByteDep = MD->getDependency(ByteLoad);
+    // AnalyzeLoadAvailability only analyzes local deps.
+    // If dep is not def or clobber, we cannot get a value from it.
+    if (ByteDep.isNonLocal() || (!ByteDep.isDef() && !ByteDep.isClobber())) {
+      CleanupTmps();
+      Base->eraseFromParent();
+      return false;
+    }
+
+    auto ByteRes = AnalyzeLoadAvailability(ByteLoad, ByteDep, GEP);
+    // If it doesn't reduce to a constant, give up. Creating the value at runtime by
+    // shifting the bytes is not worth it to remove a load.
+    if (ByteRes &&
+          (ByteRes->isMemIntrinValue() ||
+           (ByteRes->isSimpleValue() &&
+            (isa<ConstantInt>(ByteRes->getSimpleValue()) ||
+             isa<UndefValue>(ByteRes->getSimpleValue()))))) {
+        Value *V = ByteRes->MaterializeAdjustedValue(ByteLoad, ByteLoad, *this);
+        ConstantInt *ByteConst = dyn_cast<ConstantInt>(V);
+        if (!ByteConst) {
+          // replace undef with 0. This helps optimize cases where some bits of
+          // load are constant integer, and some are undef. So we can optimize
+          // the load to a particular integer.
+          ByteConst = ConstantInt::get(ByteTy, 0);
+        }
+        ConstructedValue.insertBits(ByteConst->getValue(), 8*i);
+    } else {
+      LLVM_DEBUG(dbgs() << "GVN split load: byte " << i
+                        << " did not reduce to a constant. Giving up");
+      CleanupTmps();
+      Base->eraseFromParent();
+      return false;
+    }
+    CleanupTmps();
+  }
+  Base->eraseFromParent();
+
+  // Replace the load!
+  Constant *LoadValue = ConstantInt::get(LTy, ConstructedValue);
+  patchAndReplaceAllUsesWith(L, LoadValue);
+  markInstructionForDeletion(L);
+  if (MSSAU)
+    MSSAU->removeMemoryAccess(L);
+  ++NumGVNLoad;
+  reportLoadElim(L, LoadValue, ORE);
+  // Tell MDA to reexamine the reused pointer since we might have more
+  // information after forwarding it.
+  if (MD && LoadValue->getType()->isPtrOrPtrVectorTy())
+    MD->invalidateCachedPointerInfo(LoadValue);
+  return true;
+}
+
 /// Attempt to eliminate a load, first by eliminating it
 /// locally, and then attempting non-local elimination if that fails.
 bool GVNPass::processLoad(LoadInst *L) {
@@ -2161,8 +2261,11 @@ bool GVNPass::processLoad(LoadInst *L) {
   }
 
   auto AV = AnalyzeLoadAvailability(L, Dep, L->getPointerOperand());
-  if (!AV)
+  if (!AV) {
+    if (GVNEnableSplitLoad)
+      return splitAndprocessLoad(L);
     return false;
+  }
 
   Value *AvailableValue = AV->MaterializeAdjustedValue(L, L, *this);
 
diff --git a/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll b/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
index ed97d2192b8e4b8..a0ce0fdb6e756a3 100644
--- a/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
+++ b/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
@@ -1,19 +1,19 @@
 ; This testcase makes sure that size is taken to account when alias analysis 
-; is performed.  It is not legal to delete the second load instruction because
-; the value computed by the first load instruction is changed by the store.
+; is performed. The stores rights to parts of the second load
 
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes=gvn,instcombine -S | FileCheck %s
 
 define i32 @test() {
-; CHECK: %Y.DONOTREMOVE = load i32, ptr %A
-; CHECK: %Z = sub i32 0, %Y.DONOTREMOVE
+; CHECK:      @test
+; CHECK-NEXT: ret i32 -256
+
   %A = alloca i32
   store i32 0, ptr %A
   %X = load i32, ptr %A
   %C = getelementptr i8, ptr %A, i64 1
   store i8 1, ptr %C    ; Aliases %A
-  %Y.DONOTREMOVE = load i32, ptr %A
-  %Z = sub i32 %X, %Y.DONOTREMOVE
+  %D = load i32, ptr %A
+  %Z = sub i32 %X, %D
   ret i32 %Z
 }
 
diff --git a/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll b/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
index 175bef60a4c167f..29bc1602be3b68b 100644
--- a/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
+++ b/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
@@ -21,13 +21,12 @@ entry:
   ret i32 %tmp3
 }
 
-; Test for PartialAlias aliasing. GVN doesn't yet eliminate the load
-; in the BasicAA case.
+; Test for PartialAlias aliasing.
 
 ; TBAA:    @offset
 ; TBAA:      ret i64 0
 ; BASICAA: @offset
-; BASICAA:   ret i64 %tmp3
+; BASICAA:   ret i64 256
 define i64 @offset(ptr %x) nounwind {
 entry:
   store i64 0, ptr %x, !tbaa !4
diff --git a/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll b/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
index f2d4fa9d378448b..d51c86e8be54bc0 100644
--- a/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
+++ b/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
@@ -1,5 +1,5 @@
-; Test to make sure malloc's bitcast does not block detection of a store 
-; to aliased memory; GVN should not optimize away the load in this program.
+; Test to make sure malloc's bitcast does not block detection of a store
+; to aliased memory.
 ; RUN: opt < %s -passes=gvn -S | FileCheck %s
 
 define i64 @test() {
@@ -7,7 +7,7 @@ define i64 @test() {
   store i8 42, ptr %1
   %Y = load i64, ptr %1                               ; <i64> [#uses=1]
   ret i64 %Y
-; CHECK: %Y = load i64, ptr %1
+; CHECK: store i8 42, ptr %1, align 1
 ; CHECK: ret i64 %Y
 }
 
diff --git a/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll b/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
index 0ff237b83fb3c83..9c624d2649ca78c 100644
--- a/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
+++ b/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
@@ -12,11 +12,11 @@ define i32 @TestNoAsan() {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
+; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
+; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
 ; CHECK-NEXT:    ret i32 0
@@ -48,17 +48,14 @@ define i32 @TestAsan() sanitize_address {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
 ; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
 ; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
-; CHECK-NEXT:    [[I11:%.*]] = phi i32 [ [[I9]], [[BB5]] ], [ 0, [[BB:%.*]] ]
-; CHECK-NEXT:    ret i32 [[I11]]
+; CHECK-NEXT:    ret i32 0
 ;
 bb:
   %i = tail call noalias ptr @_Znam(i64 2)
@@ -87,17 +84,14 @@ define i32 @TestHWAsan() sanitize_hwaddress {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
 ; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
 ; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
-; CHECK-NEXT:    [[I11:%.*]] = phi i32 [ [[I9]], [[BB5]] ], [ 0, [[BB:%.*]] ]
-; CHECK-NEXT:    ret i32 [[I11]]
+; CHECK-NEXT:    ret i32 0
 ;
 bb:
   %i = tail call noalias ptr @_Znam(i64 2)
diff --git a/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll b/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
index b7df74e08865efd..b8dbc6d1c32775d 100644
--- a/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
+++ b/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
@@ -17,6 +17,7 @@
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 1, Column: 1 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc: { File: '/tmp/s.c', Line: 4, Column: 4 }
@@ -32,6 +33,7 @@
 ; CHECK-NEXT:   - ClobberedBy:     call
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 3, Column: 3 }
 ; CHECK-NEXT: ...
+; CHECK:      --- !Missed
 
 ; ModuleID = 'bugpoint-reduced-simplified.bc'
 source_filename = "gvn-test.c"
@@ -69,6 +71,7 @@ entry:
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 1, Column: 1 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc: { File: '/tmp/s.c', Line: 4, Column: 4 }
diff --git a/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll b/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
index a5f3d7ecce2b8e7..d4a337c2d9cf37f 100644
--- a/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
+++ b/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
@@ -42,6 +42,7 @@ if.end:                                           ; preds = %if.then, %entry
 
 declare dso_local void @clobberingFunc() local_unnamed_addr #0
 
+; CHECK:      --- !Missed
 ; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
@@ -59,6 +60,7 @@ declare dso_local void @clobberingFunc() local_unnamed_addr #0
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 2, Column: 2 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc:        { File: '/tmp/s.c', Line: 5, Column: 5 }
@@ -136,6 +138,7 @@ if.end5:                                          ; preds = %if.end5.sink.split,
   ret void
 }
 
+; CHECK:      --- !Missed
 ; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
diff --git a/llvm/test/Transforms/GVN/split-load-assertion.ll b/llvm/test/Transforms/GVN/split-load-assertion.ll
new file mode 100644
index 000000000000000..0d7cc632d9cb3a7
--- /dev/null
+++ b/llvm/test/Transforms/GVN/split-load-assertion.ll
@@ -0,0 +1,166 @@
+; RUN: opt -S -O3 < %s
+
+; This test had caused a GVN assertion error in an earlier version
+
+target datalayout = "e-m:e-p:32:32-i64:64-n32-S128"
+target triple = "riscv32-img-unknown-elf"
+
+%"class.std::function" = type { %"class.std::_Function_base", ptr }
+%"class.std::_Function_base" = type { %"union.std::_Any_data", ptr }
+%"union.std::_Any_data" = type { %"union.std::_Nocopy_types" }
+%"union.std::_Nocopy_types" = type { { i32, i32 } }
+
+; Function Attrs: mustprogress
+define dso_local noundef i32 @_Z3foov() local_unnamed_addr personality ptr @__gxx_personality_v0 {
+entry:
+  %f = alloca %"class.std::function", align 4
+  %g = alloca %"class.std::function", align 4
+  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %f) 
+  call void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %f, ptr noundef nonnull @_Z2f1v) 
+  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %g, ptr noundef nonnull @_Z2f2v) 
+  call void @_ZNSt8functionIFivEE4swapERS1_(ptr noundef nonnull align 4 dereferenceable(16) %f, ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  %call = invoke noundef i32 @_ZNKSt8functionIFivEEclEv(ptr noundef nonnull align 4 dereferenceable(16) %g)
+          to label %invoke.cont unwind label %lpad
+
+invoke.cont:                                      ; preds = %entry
+  %cmp = icmp eq i32 %call, 42
+  br i1 %cmp, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %invoke.cont
+  %call2 = invoke noundef i32 @_ZNKSt8functionIFivEEclEv(ptr noundef nonnull align 4 dereferenceable(16) %f)
+          to label %invoke.cont1 unwind label %lpad
+
+invoke.cont1:                                     ; preds = %land.rhs
+  %cmp3 = icmp eq i32 %call2, 0
+  %phi.cast = zext i1 %cmp3 to i32
+  br label %land.end
+
+land.end:                                         ; preds = %invoke.cont1, %invoke.cont
+  %0 = phi i32 [ 0, %invoke.cont ], [ %phi.cast, %invoke.cont1 ]
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %f) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %f) 
+  ret i32 %0
+
+lpad:                                             ; preds = %land.rhs, %entry
+  %1 = landingpad { ptr, i32 }
+          cleanup
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %f) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %f) 
+  resume { ptr, i32 } %1
+}
+
+; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
+declare void @llvm.lifetime.start.p0(i64 immarg %0, ptr nocapture %1) 
+
+declare dso_local noundef i32 @_Z2f1v() 
+
+; Function Attrs: nounwind
+define linkonce_odr dso_local void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %this, ptr noundef nonnull %__f) unnamed_addr align 2 {
+entry:
+  call void @llvm.memset.p0.i32(ptr noundef nonnull align 4 dereferenceable(12) %this, i8 0, i32 12, i1 false)
+  call void @_ZNSt14_Function_baseC2Ev(ptr noundef nonnull align 4 dereferenceable(12) %this) 
+  %_M_invoker = getelementptr inbounds %"class.std::function", ptr %this, i32 0, i32 1
+  store ptr null, ptr %_M_invoker, align 4
+  %call = call noundef zeroext i1 @_ZNSt14_Function_base13_Base_managerIPFivEE21_M_not_empty_functionIS1_EEbPT_(ptr noundef nonnull %__f) 
+  br i1 %call, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  call void @_ZNSt14_Function_base13_Base_managerIPFivEE15_M_init_functorIRS1_EEvRSt9_Any_dataOT_(ptr noundef nonnull align 4 dereferenceable(8) %this, ptr noundef nonnull %__f) 
+  store ptr @_ZNSt17_Function_handlerIFivEPS0_E9_M_invokeERKSt9_Any_data, ptr %_M_invoker, align 4
+  %_M_manager = getelementptr inbounds %"class.std::_Function_base", ptr %this, i32 0, i32 1
+  store ptr @_ZNSt17_Function_handlerIFivEPS0_E10_M_managerERSt9_Any_dataRKS3_St18_Manager_operation, ptr %_M_manager, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+declare dso_local noundef i32 @_Z2f2v() 
+
+; Function Attrs: mustprogress nounwind
+define linkonce_odr dso_local void @_ZNSt8functionIFivEE4swapERS1_(ptr noundef nonnull align 4 dereferenceable(16) %this, ptr noundef nonnull align 4 dereferenceable(16) %__x) local_unnamed_addr align 2 {
+entry:
+  call void @_ZSt4swapISt9_Any_dataENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleIS4_ESt18is_move_assignableIS4_EEE5valueEvE4typeERS4_SD_(ptr noundef nonnull align 4 dereferenceable(8) %this, ptr noundef nonnull align 4 dereferenceable(8) %__x) 
+  %_M_manager = getelementptr inbounds %"class.std::_Function_base", ptr %this, i32 0, i32 1
+  %_M_manager3 = getelementptr inbounds %"class.std::_Function_base", ptr %__x, i32 0, i32 1
+  call void @_ZSt4swapIPFbRSt9_Any_dataRKS0_St18_Manager_operationEENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleISA_ESt18is_move_assignableISA_EEE5valueEvE4typeERSA_SJ_(ptr noundef nonnull align 4 dereferenceable(4) %_M_manager, ptr noundef nonnull align 4 dereferenceable(4) %_M_manager3) 
+  %_M_invoker = getelementptr inbounds %"class.std::function", ptr %this, i32 0, i32 1
+  %_M_invoker4 = getelementptr inbounds %"class.std::function", ptr %__x, i32 0, i32 1
+  call void @_ZSt4swapIPFiRKSt9_Any_dataEENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleI...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-llvm-analysis

Author: None (mohammed-nurulhoque)

Changes

When the last clobbering write for a load doesn't cover the whole load (example below). AnalyzeLoadAvailability would give up.
This emits temporary byte-sized loads that cover the original load, then run AnalyzeLoadAvailability on each and reconstruct the load value.

  define i32 @<!-- -->f(i8* %P)
    %b2 = getelementptr inbounds i8, i8* %P, i64 1
    store i8 0, i8* %b2, align 1
    store i8 0, i8* %P, align 1
    %s1 = bitcast i8* %P to i16*
    %L = load i16, i16* %s1, align 4

Patch is 28.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72364.diff

11 Files Affected:

  • (modified) clang/test/Frontend/optimization-remark-extra-analysis.c (+2-1)
  • (modified) llvm/include/llvm/Transforms/Scalar/GVN.h (+1)
  • (modified) llvm/lib/Transforms/Scalar/GVN.cpp (+104-1)
  • (modified) llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll (+6-6)
  • (modified) llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll (+2-3)
  • (modified) llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll (+3-3)
  • (modified) llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll (+7-13)
  • (modified) llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll (+3)
  • (modified) llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll (+3)
  • (added) llvm/test/Transforms/GVN/split-load-assertion.ll (+166)
  • (modified) llvm/test/Transforms/NewGVN/no_speculative_loads_with_asan.ll (+51-45)
diff --git a/clang/test/Frontend/optimization-remark-extra-analysis.c b/clang/test/Frontend/optimization-remark-extra-analysis.c
index 1a8415e69cff9ba..5af30bd0d3c35b3 100644
--- a/clang/test/Frontend/optimization-remark-extra-analysis.c
+++ b/clang/test/Frontend/optimization-remark-extra-analysis.c
@@ -6,6 +6,7 @@
 int foo(int *x, int *y) {
   int a = *x;
   *y = 2;
-  // expected-remark@+1 {{load of type i32 not eliminated}}
+  // expected-remark@+2 {{load of type i32 not eliminated}}
+  // expected-remark@+1 {{load of type i8 not eliminated}}
   return a + *x;
 }
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index 4ba9b74ccb005d4..4052c0d618d173d 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -314,6 +314,7 @@ class GVNPass : public PassInfoMixin<GVNPass> {
 
   // Helper functions of redundant load elimination
   bool processLoad(LoadInst *L);
+  bool splitAndprocessLoad(LoadInst *L);
   bool processNonLocalLoad(LoadInst *L);
   bool processAssumeIntrinsic(AssumeInst *II);
 
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 5e58af0edc1556f..f64508a8319197f 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -112,6 +112,7 @@ static cl::opt<bool>
 GVNEnableSplitBackedgeInLoadPRE("enable-split-backedge-in-load-pre",
                                 cl::init(false));
 static cl::opt<bool> GVNEnableMemDep("enable-gvn-memdep", cl::init(true));
+static cl::opt<bool> GVNEnableSplitLoad("enable-split-load", cl::init(true));
 
 static cl::opt<uint32_t> MaxNumDeps(
     "gvn-max-num-deps", cl::Hidden, cl::init(100),
@@ -2128,6 +2129,105 @@ static void patchAndReplaceAllUsesWith(Instruction *I, Value *Repl) {
   I->replaceAllUsesWith(Repl);
 }
 
+// split load to single byte loads and check if the value can be deduced
+//
+// Example:
+// define i32 @f(i8* %P)
+// 1:  %b2 = getelementptr inbounds i8, i8* %P, i64 1
+// 2:  store i8 0, i8* %b2, align 1
+// 3:  store i8 0, i8* %P, align 1
+// 4:  %s1 = bitcast i8* %P to i16*
+// 5:  %L = load i16, i16* %s1, align 4
+//
+// The last clobbering write for the load is (3) but it doesn't cover the whole
+// read. So AnalyzeLoadAvailability would give up.
+// This function emit temporary byte-sized loads that cover the original load,
+// so that any last write covers the read. We run AnalyzeLoadAvailability on
+// each byte to try to construct the load as a constant.
+bool GVNPass::splitAndprocessLoad(LoadInst *L) {
+  if (L->isAtomic())
+    return false;
+
+  Type *LTy = L->getType();
+  if (!LTy->isIntegerTy())
+    return false;
+
+  unsigned BW = LTy->getIntegerBitWidth();
+  if (BW % 8)
+    return false;
+
+  IntegerType *ByteTy = IntegerType::getInt8Ty(LTy->getContext());
+  Type *BytePtrTy = PointerType::get(ByteTy, L->getPointerAddressSpace());
+  BitCastInst *Base = new BitCastInst(L->getPointerOperand(), BytePtrTy, "", L);
+
+  // we update this byte by byte as we deduce the load value
+  APInt ConstructedValue = APInt::getZero(BW);
+
+  for (unsigned i=0; i<BW/8; i++) {
+    Value *Offset = Constant::getIntegerValue(LTy, APInt(BW, i));
+    GetElementPtrInst *GEP = GetElementPtrInst::Create(
+                               ByteTy, Base, ArrayRef<Value*>(Offset), "", L);
+    LoadInst *ByteLoad = new LoadInst(ByteTy, GEP, "", L);
+    ByteLoad->setDebugLoc(L->getDebugLoc());
+
+    auto CleanupTmps = [&]() {
+      MD->removeInstruction(ByteLoad);
+      ByteLoad->eraseFromParent();
+      GEP->eraseFromParent();
+    };
+
+    MemDepResult ByteDep = MD->getDependency(ByteLoad);
+    // AnalyzeLoadAvailability only analyzes local deps.
+    // If dep is not def or clobber, we cannot get a value from it.
+    if (ByteDep.isNonLocal() || (!ByteDep.isDef() && !ByteDep.isClobber())) {
+      CleanupTmps();
+      Base->eraseFromParent();
+      return false;
+    }
+
+    auto ByteRes = AnalyzeLoadAvailability(ByteLoad, ByteDep, GEP);
+    // If it doesn't reduce to a constant, give up. Creating the value at runtime by
+    // shifting the bytes is not worth it to remove a load.
+    if (ByteRes &&
+          (ByteRes->isMemIntrinValue() ||
+           (ByteRes->isSimpleValue() &&
+            (isa<ConstantInt>(ByteRes->getSimpleValue()) ||
+             isa<UndefValue>(ByteRes->getSimpleValue()))))) {
+        Value *V = ByteRes->MaterializeAdjustedValue(ByteLoad, ByteLoad, *this);
+        ConstantInt *ByteConst = dyn_cast<ConstantInt>(V);
+        if (!ByteConst) {
+          // replace undef with 0. This helps optimize cases where some bits of
+          // load are constant integer, and some are undef. So we can optimize
+          // the load to a particular integer.
+          ByteConst = ConstantInt::get(ByteTy, 0);
+        }
+        ConstructedValue.insertBits(ByteConst->getValue(), 8*i);
+    } else {
+      LLVM_DEBUG(dbgs() << "GVN split load: byte " << i
+                        << " did not reduce to a constant. Giving up");
+      CleanupTmps();
+      Base->eraseFromParent();
+      return false;
+    }
+    CleanupTmps();
+  }
+  Base->eraseFromParent();
+
+  // Replace the load!
+  Constant *LoadValue = ConstantInt::get(LTy, ConstructedValue);
+  patchAndReplaceAllUsesWith(L, LoadValue);
+  markInstructionForDeletion(L);
+  if (MSSAU)
+    MSSAU->removeMemoryAccess(L);
+  ++NumGVNLoad;
+  reportLoadElim(L, LoadValue, ORE);
+  // Tell MDA to reexamine the reused pointer since we might have more
+  // information after forwarding it.
+  if (MD && LoadValue->getType()->isPtrOrPtrVectorTy())
+    MD->invalidateCachedPointerInfo(LoadValue);
+  return true;
+}
+
 /// Attempt to eliminate a load, first by eliminating it
 /// locally, and then attempting non-local elimination if that fails.
 bool GVNPass::processLoad(LoadInst *L) {
@@ -2161,8 +2261,11 @@ bool GVNPass::processLoad(LoadInst *L) {
   }
 
   auto AV = AnalyzeLoadAvailability(L, Dep, L->getPointerOperand());
-  if (!AV)
+  if (!AV) {
+    if (GVNEnableSplitLoad)
+      return splitAndprocessLoad(L);
     return false;
+  }
 
   Value *AvailableValue = AV->MaterializeAdjustedValue(L, L, *this);
 
diff --git a/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll b/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
index ed97d2192b8e4b8..a0ce0fdb6e756a3 100644
--- a/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
+++ b/llvm/test/Analysis/BasicAA/2003-02-26-AccessSizeTest.ll
@@ -1,19 +1,19 @@
 ; This testcase makes sure that size is taken to account when alias analysis 
-; is performed.  It is not legal to delete the second load instruction because
-; the value computed by the first load instruction is changed by the store.
+; is performed. The stores rights to parts of the second load
 
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes=gvn,instcombine -S | FileCheck %s
 
 define i32 @test() {
-; CHECK: %Y.DONOTREMOVE = load i32, ptr %A
-; CHECK: %Z = sub i32 0, %Y.DONOTREMOVE
+; CHECK:      @test
+; CHECK-NEXT: ret i32 -256
+
   %A = alloca i32
   store i32 0, ptr %A
   %X = load i32, ptr %A
   %C = getelementptr i8, ptr %A, i64 1
   store i8 1, ptr %C    ; Aliases %A
-  %Y.DONOTREMOVE = load i32, ptr %A
-  %Z = sub i32 %X, %Y.DONOTREMOVE
+  %D = load i32, ptr %A
+  %Z = sub i32 %X, %D
   ret i32 %Z
 }
 
diff --git a/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll b/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
index 175bef60a4c167f..29bc1602be3b68b 100644
--- a/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
+++ b/llvm/test/Analysis/TypeBasedAliasAnalysis/precedence.ll
@@ -21,13 +21,12 @@ entry:
   ret i32 %tmp3
 }
 
-; Test for PartialAlias aliasing. GVN doesn't yet eliminate the load
-; in the BasicAA case.
+; Test for PartialAlias aliasing.
 
 ; TBAA:    @offset
 ; TBAA:      ret i64 0
 ; BASICAA: @offset
-; BASICAA:   ret i64 %tmp3
+; BASICAA:   ret i64 256
 define i64 @offset(ptr %x) nounwind {
 entry:
   store i64 0, ptr %x, !tbaa !4
diff --git a/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll b/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
index f2d4fa9d378448b..d51c86e8be54bc0 100644
--- a/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
+++ b/llvm/test/Transforms/GVN/2009-11-12-MemDepMallocBitCast.ll
@@ -1,5 +1,5 @@
-; Test to make sure malloc's bitcast does not block detection of a store 
-; to aliased memory; GVN should not optimize away the load in this program.
+; Test to make sure malloc's bitcast does not block detection of a store
+; to aliased memory.
 ; RUN: opt < %s -passes=gvn -S | FileCheck %s
 
 define i64 @test() {
@@ -7,7 +7,7 @@ define i64 @test() {
   store i8 42, ptr %1
   %Y = load i64, ptr %1                               ; <i64> [#uses=1]
   ret i64 %Y
-; CHECK: %Y = load i64, ptr %1
+; CHECK: store i8 42, ptr %1, align 1
 ; CHECK: ret i64 %Y
 }
 
diff --git a/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll b/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
index 0ff237b83fb3c83..9c624d2649ca78c 100644
--- a/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
+++ b/llvm/test/Transforms/GVN/no_speculative_loads_with_asan.ll
@@ -12,11 +12,11 @@ define i32 @TestNoAsan() {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
+; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
+; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
 ; CHECK-NEXT:    ret i32 0
@@ -48,17 +48,14 @@ define i32 @TestAsan() sanitize_address {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
 ; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
 ; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
-; CHECK-NEXT:    [[I11:%.*]] = phi i32 [ [[I9]], [[BB5]] ], [ 0, [[BB:%.*]] ]
-; CHECK-NEXT:    ret i32 [[I11]]
+; CHECK-NEXT:    ret i32 0
 ;
 bb:
   %i = tail call noalias ptr @_Znam(i64 2)
@@ -87,17 +84,14 @@ define i32 @TestHWAsan() sanitize_hwaddress {
 ; CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 1
 ; CHECK-NEXT:    store i8 0, ptr [[I1]], align 1
 ; CHECK-NEXT:    store i8 0, ptr [[I]], align 1
-; CHECK-NEXT:    [[I3:%.*]] = load i16, ptr [[I]], align 4
-; CHECK-NEXT:    [[I4:%.*]] = icmp eq i16 [[I3]], 0
-; CHECK-NEXT:    br i1 [[I4]], label [[BB10:%.*]], label [[BB5:%.*]]
+; CHECK-NEXT:    br i1 true, label [[BB10:%.*]], label [[BB5:%.*]]
 ; CHECK:       bb5:
 ; CHECK-NEXT:    [[I6:%.*]] = getelementptr inbounds i8, ptr [[I]], i64 2
 ; CHECK-NEXT:    [[I8:%.*]] = load i16, ptr [[I6]], align 2
 ; CHECK-NEXT:    [[I9:%.*]] = sext i16 [[I8]] to i32
 ; CHECK-NEXT:    br label [[BB10]]
 ; CHECK:       bb10:
-; CHECK-NEXT:    [[I11:%.*]] = phi i32 [ [[I9]], [[BB5]] ], [ 0, [[BB:%.*]] ]
-; CHECK-NEXT:    ret i32 [[I11]]
+; CHECK-NEXT:    ret i32 0
 ;
 bb:
   %i = tail call noalias ptr @_Znam(i64 2)
diff --git a/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll b/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
index b7df74e08865efd..b8dbc6d1c32775d 100644
--- a/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
+++ b/llvm/test/Transforms/GVN/opt-remarks-multiple-users.ll
@@ -17,6 +17,7 @@
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 1, Column: 1 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc: { File: '/tmp/s.c', Line: 4, Column: 4 }
@@ -32,6 +33,7 @@
 ; CHECK-NEXT:   - ClobberedBy:     call
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 3, Column: 3 }
 ; CHECK-NEXT: ...
+; CHECK:      --- !Missed
 
 ; ModuleID = 'bugpoint-reduced-simplified.bc'
 source_filename = "gvn-test.c"
@@ -69,6 +71,7 @@ entry:
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 1, Column: 1 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc: { File: '/tmp/s.c', Line: 4, Column: 4 }
diff --git a/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll b/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
index a5f3d7ecce2b8e7..d4a337c2d9cf37f 100644
--- a/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
+++ b/llvm/test/Transforms/GVN/opt-remarks-non-dominating.ll
@@ -42,6 +42,7 @@ if.end:                                           ; preds = %if.then, %entry
 
 declare dso_local void @clobberingFunc() local_unnamed_addr #0
 
+; CHECK:      --- !Missed
 ; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
@@ -59,6 +60,7 @@ declare dso_local void @clobberingFunc() local_unnamed_addr #0
 ; CHECK-NEXT:     DebugLoc: { File: '/tmp/s.c', Line: 2, Column: 2 }
 ; CHECK-NEXT: ...
 ; CHECK:      --- !Missed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
 ; CHECK-NEXT: DebugLoc:        { File: '/tmp/s.c', Line: 5, Column: 5 }
@@ -136,6 +138,7 @@ if.end5:                                          ; preds = %if.end5.sink.split,
   ret void
 }
 
+; CHECK:      --- !Missed
 ; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            gvn
 ; CHECK-NEXT: Name:            LoadClobbered
diff --git a/llvm/test/Transforms/GVN/split-load-assertion.ll b/llvm/test/Transforms/GVN/split-load-assertion.ll
new file mode 100644
index 000000000000000..0d7cc632d9cb3a7
--- /dev/null
+++ b/llvm/test/Transforms/GVN/split-load-assertion.ll
@@ -0,0 +1,166 @@
+; RUN: opt -S -O3 < %s
+
+; This test had caused a GVN assertion error in an earlier version
+
+target datalayout = "e-m:e-p:32:32-i64:64-n32-S128"
+target triple = "riscv32-img-unknown-elf"
+
+%"class.std::function" = type { %"class.std::_Function_base", ptr }
+%"class.std::_Function_base" = type { %"union.std::_Any_data", ptr }
+%"union.std::_Any_data" = type { %"union.std::_Nocopy_types" }
+%"union.std::_Nocopy_types" = type { { i32, i32 } }
+
+; Function Attrs: mustprogress
+define dso_local noundef i32 @_Z3foov() local_unnamed_addr personality ptr @__gxx_personality_v0 {
+entry:
+  %f = alloca %"class.std::function", align 4
+  %g = alloca %"class.std::function", align 4
+  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %f) 
+  call void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %f, ptr noundef nonnull @_Z2f1v) 
+  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %g, ptr noundef nonnull @_Z2f2v) 
+  call void @_ZNSt8functionIFivEE4swapERS1_(ptr noundef nonnull align 4 dereferenceable(16) %f, ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  %call = invoke noundef i32 @_ZNKSt8functionIFivEEclEv(ptr noundef nonnull align 4 dereferenceable(16) %g)
+          to label %invoke.cont unwind label %lpad
+
+invoke.cont:                                      ; preds = %entry
+  %cmp = icmp eq i32 %call, 42
+  br i1 %cmp, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %invoke.cont
+  %call2 = invoke noundef i32 @_ZNKSt8functionIFivEEclEv(ptr noundef nonnull align 4 dereferenceable(16) %f)
+          to label %invoke.cont1 unwind label %lpad
+
+invoke.cont1:                                     ; preds = %land.rhs
+  %cmp3 = icmp eq i32 %call2, 0
+  %phi.cast = zext i1 %cmp3 to i32
+  br label %land.end
+
+land.end:                                         ; preds = %invoke.cont1, %invoke.cont
+  %0 = phi i32 [ 0, %invoke.cont ], [ %phi.cast, %invoke.cont1 ]
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %f) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %f) 
+  ret i32 %0
+
+lpad:                                             ; preds = %land.rhs, %entry
+  %1 = landingpad { ptr, i32 }
+          cleanup
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %g) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %g) 
+  call void @_ZNSt14_Function_baseD2Ev(ptr noundef nonnull align 4 dereferenceable(16) %f) 
+  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %f) 
+  resume { ptr, i32 } %1
+}
+
+; Function Attrs: argmemonly mustprogress nocallback nofree nosync nounwind willreturn
+declare void @llvm.lifetime.start.p0(i64 immarg %0, ptr nocapture %1) 
+
+declare dso_local noundef i32 @_Z2f1v() 
+
+; Function Attrs: nounwind
+define linkonce_odr dso_local void @_ZNSt8functionIFivEEC2IRS0_vEEOT_(ptr noundef nonnull align 4 dereferenceable(16) %this, ptr noundef nonnull %__f) unnamed_addr align 2 {
+entry:
+  call void @llvm.memset.p0.i32(ptr noundef nonnull align 4 dereferenceable(12) %this, i8 0, i32 12, i1 false)
+  call void @_ZNSt14_Function_baseC2Ev(ptr noundef nonnull align 4 dereferenceable(12) %this) 
+  %_M_invoker = getelementptr inbounds %"class.std::function", ptr %this, i32 0, i32 1
+  store ptr null, ptr %_M_invoker, align 4
+  %call = call noundef zeroext i1 @_ZNSt14_Function_base13_Base_managerIPFivEE21_M_not_empty_functionIS1_EEbPT_(ptr noundef nonnull %__f) 
+  br i1 %call, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  call void @_ZNSt14_Function_base13_Base_managerIPFivEE15_M_init_functorIRS1_EEvRSt9_Any_dataOT_(ptr noundef nonnull align 4 dereferenceable(8) %this, ptr noundef nonnull %__f) 
+  store ptr @_ZNSt17_Function_handlerIFivEPS0_E9_M_invokeERKSt9_Any_data, ptr %_M_invoker, align 4
+  %_M_manager = getelementptr inbounds %"class.std::_Function_base", ptr %this, i32 0, i32 1
+  store ptr @_ZNSt17_Function_handlerIFivEPS0_E10_M_managerERSt9_Any_dataRKS3_St18_Manager_operation, ptr %_M_manager, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+declare dso_local noundef i32 @_Z2f2v() 
+
+; Function Attrs: mustprogress nounwind
+define linkonce_odr dso_local void @_ZNSt8functionIFivEE4swapERS1_(ptr noundef nonnull align 4 dereferenceable(16) %this, ptr noundef nonnull align 4 dereferenceable(16) %__x) local_unnamed_addr align 2 {
+entry:
+  call void @_ZSt4swapISt9_Any_dataENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleIS4_ESt18is_move_assignableIS4_EEE5valueEvE4typeERS4_SD_(ptr noundef nonnull align 4 dereferenceable(8) %this, ptr noundef nonnull align 4 dereferenceable(8) %__x) 
+  %_M_manager = getelementptr inbounds %"class.std::_Function_base", ptr %this, i32 0, i32 1
+  %_M_manager3 = getelementptr inbounds %"class.std::_Function_base", ptr %__x, i32 0, i32 1
+  call void @_ZSt4swapIPFbRSt9_Any_dataRKS0_St18_Manager_operationEENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleISA_ESt18is_move_assignableISA_EEE5valueEvE4typeERSA_SJ_(ptr noundef nonnull align 4 dereferenceable(4) %_M_manager, ptr noundef nonnull align 4 dereferenceable(4) %_M_manager3) 
+  %_M_invoker = getelementptr inbounds %"class.std::function", ptr %this, i32 0, i32 1
+  %_M_invoker4 = getelementptr inbounds %"class.std::function", ptr %__x, i32 0, i32 1
+  call void @_ZSt4swapIPFiRKSt9_Any_dataEENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleI...
[truncated]

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff dedf2c6bb5193652f6ad7d9ff9e676624c2485b7 41a9dbd39fbb866e7fc090f82d58ee6364a7cfda -- clang/test/Frontend/optimization-remark-extra-analysis.c llvm/include/llvm/Transforms/Scalar/GVN.h llvm/lib/Transforms/Scalar/GVN.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index f64508a831..fc0b49bcd9 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -2163,10 +2163,10 @@ bool GVNPass::splitAndprocessLoad(LoadInst *L) {
   // we update this byte by byte as we deduce the load value
   APInt ConstructedValue = APInt::getZero(BW);
 
-  for (unsigned i=0; i<BW/8; i++) {
+  for (unsigned i = 0; i < BW / 8; i++) {
     Value *Offset = Constant::getIntegerValue(LTy, APInt(BW, i));
     GetElementPtrInst *GEP = GetElementPtrInst::Create(
-                               ByteTy, Base, ArrayRef<Value*>(Offset), "", L);
+        ByteTy, Base, ArrayRef<Value *>(Offset), "", L);
     LoadInst *ByteLoad = new LoadInst(ByteTy, GEP, "", L);
     ByteLoad->setDebugLoc(L->getDebugLoc());
 
@@ -2186,22 +2186,21 @@ bool GVNPass::splitAndprocessLoad(LoadInst *L) {
     }
 
     auto ByteRes = AnalyzeLoadAvailability(ByteLoad, ByteDep, GEP);
-    // If it doesn't reduce to a constant, give up. Creating the value at runtime by
-    // shifting the bytes is not worth it to remove a load.
-    if (ByteRes &&
-          (ByteRes->isMemIntrinValue() ||
-           (ByteRes->isSimpleValue() &&
-            (isa<ConstantInt>(ByteRes->getSimpleValue()) ||
-             isa<UndefValue>(ByteRes->getSimpleValue()))))) {
-        Value *V = ByteRes->MaterializeAdjustedValue(ByteLoad, ByteLoad, *this);
-        ConstantInt *ByteConst = dyn_cast<ConstantInt>(V);
-        if (!ByteConst) {
-          // replace undef with 0. This helps optimize cases where some bits of
-          // load are constant integer, and some are undef. So we can optimize
-          // the load to a particular integer.
-          ByteConst = ConstantInt::get(ByteTy, 0);
-        }
-        ConstructedValue.insertBits(ByteConst->getValue(), 8*i);
+    // If it doesn't reduce to a constant, give up. Creating the value at
+    // runtime by shifting the bytes is not worth it to remove a load.
+    if (ByteRes && (ByteRes->isMemIntrinValue() ||
+                    (ByteRes->isSimpleValue() &&
+                     (isa<ConstantInt>(ByteRes->getSimpleValue()) ||
+                      isa<UndefValue>(ByteRes->getSimpleValue()))))) {
+      Value *V = ByteRes->MaterializeAdjustedValue(ByteLoad, ByteLoad, *this);
+      ConstantInt *ByteConst = dyn_cast<ConstantInt>(V);
+      if (!ByteConst) {
+        // replace undef with 0. This helps optimize cases where some bits of
+        // load are constant integer, and some are undef. So we can optimize
+        // the load to a particular integer.
+        ByteConst = ConstantInt::get(ByteTy, 0);
+      }
+      ConstructedValue.insertBits(ByteConst->getValue(), 8 * i);
     } else {
       LLVM_DEBUG(dbgs() << "GVN split load: byte " << i
                         << " did not reduce to a constant. Giving up");

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the overall approach of assembling this from individual byte-sized loads here. What I would expect to see is more something along these lines: If we find a clobbering store that a) only clobbers some subset of the loaded bytes and b) stores a constant, then try to find the next clobbering store from there, and continue doing this until we have found all the bytes being loaded.

Comment on lines +2135 to +2140
// define i32 @f(i8* %P)
// 1: %b2 = getelementptr inbounds i8, i8* %P, i64 1
// 2: store i8 0, i8* %b2, align 1
// 3: store i8 0, i8* %P, align 1
// 4: %s1 = bitcast i8* %P to i16*
// 5: %L = load i16, i16* %s1, align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// define i32 @f(i8* %P)
// 1: %b2 = getelementptr inbounds i8, i8* %P, i64 1
// 2: store i8 0, i8* %b2, align 1
// 3: store i8 0, i8* %P, align 1
// 4: %s1 = bitcast i8* %P to i16*
// 5: %L = load i16, i16* %s1, align 4
// define i32 @f(ptr %P)
// 1: %b2 = getelementptr inbounds i8, ptr %P, i64 1
// 2: store i8 0, ptr %b2, align 1
// 3: store i8 0, ptr %P, align 1
// 5: %L = load i16, ptr %P, align 4


IntegerType *ByteTy = IntegerType::getInt8Ty(LTy->getContext());
Type *BytePtrTy = PointerType::get(ByteTy, L->getPointerAddressSpace());
BitCastInst *Base = new BitCastInst(L->getPointerOperand(), BytePtrTy, "", L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary with opaque pointers.

@@ -0,0 +1,166 @@
; RUN: opt -S -O3 < %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GVN tests should use -passes=gvn only, not -O3.

@skachkov-sc
Copy link
Contributor

I think we should also check this optimization for non-byte stores, e.g.

define i32 @f(ptr %ptr)
    %ptr1 = getelementptr inbounds i8, ptr %ptr, i64 2
    store i16 0, ptr %ptr, align 2
    store i16 0, ptr %ptr1, align 2
    %L = load i32, ptr %ptr, align 4

GVN should optimize the case when 1-byte load location is fully covered with 2-byte store and extract the required byte from stored value, but it's good to check if reconstruction of 4-byte %L can be simplified after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:analysis llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GVN] missed load elimination when multiple stores contribute to a single value
4 participants