Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SROA] Unfold gep of index phi (round 2) #83494

Merged
merged 2 commits into from
Mar 4, 2024
Merged

Conversation

aeubanks
Copy link
Contributor

If a gep has only one phi as one of its operands and the remaining indexes are constant, we can unfold gep ptr, (phi idx1, idx2) to phi ((gep ptr, idx1), (gep ptr, idx2)).

Take care not to unfold recursive phis.

Followup to #80983.

This was initially was #83087. Initial PR did not handle allocas in entry block that weren't at the beginning of the function, causing GEPs to be inserted after the first chunk of allocas but potentially before an alloca not at the beginning. Insert GEPs at the end of the entry block instead since constants/arguments/static allocas can all be used there.

If a gep has only one phi as one of its operands and the remaining
indexes are constant, we can unfold `gep ptr, (phi idx1, idx2)` to `phi
((gep ptr, idx1), (gep ptr, idx2))`.

Take care not to unfold recursive phis.

Followup to llvm#80983.

This was initially was llvm#83087. Initial PR did not handle allocas in
entry block that weren't at the beginning of the function, causing GEPs
to be inserted after the first chunk of allocas but potentially before
an alloca not at the beginning. Insert GEPs at the end of the entry
block instead since constants/arguments/static allocas can all be used
there.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Arthur Eubanks (aeubanks)

Changes

If a gep has only one phi as one of its operands and the remaining indexes are constant, we can unfold gep ptr, (phi idx1, idx2) to phi ((gep ptr, idx1), (gep ptr, idx2)).

Take care not to unfold recursive phis.

Followup to #80983.

This was initially was #83087. Initial PR did not handle allocas in entry block that weren't at the beginning of the function, causing GEPs to be inserted after the first chunk of allocas but potentially before an alloca not at the beginning. Insert GEPs at the end of the entry block instead since constants/arguments/static allocas can all be used there.


Full diff: https://github.com/llvm/llvm-project/pull/83494.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+74-41)
  • (modified) llvm/test/Transforms/SROA/phi-and-select.ll (+10-10)
  • (modified) llvm/test/Transforms/SROA/phi-gep.ll (+154-18)
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index fad70e8bf2861f..191e22db5e3d89 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -3956,11 +3956,11 @@ class AggLoadStoreRewriter : public InstVisitor<AggLoadStoreRewriter, bool> {
     return false;
   }
 
-  // Fold gep (select cond, ptr1, ptr2), idx
+  // Unfold gep (select cond, ptr1, ptr2), idx
   //   => select cond, gep(ptr1, idx), gep(ptr2, idx)
   // and  gep ptr, (select cond, idx1, idx2)
   //   => select cond, gep(ptr, idx1), gep(ptr, idx2)
-  bool foldGEPSelect(GetElementPtrInst &GEPI) {
+  bool unfoldGEPSelect(GetElementPtrInst &GEPI) {
     // Check whether the GEP has exactly one select operand and all indices
     // will become constant after the transform.
     SelectInst *Sel = dyn_cast<SelectInst>(GEPI.getPointerOperand());
@@ -4029,67 +4029,100 @@ class AggLoadStoreRewriter : public InstVisitor<AggLoadStoreRewriter, bool> {
     return true;
   }
 
-  // Fold gep (phi ptr1, ptr2) => phi gep(ptr1), gep(ptr2)
-  bool foldGEPPhi(GetElementPtrInst &GEPI) {
-    if (!GEPI.hasAllConstantIndices())
-      return false;
+  // Unfold gep (phi ptr1, ptr2), idx
+  //   => phi ((gep ptr1, idx), (gep ptr2, idx))
+  // and  gep ptr, (phi idx1, idx2)
+  //   => phi ((gep ptr, idx1), (gep ptr, idx2))
+  bool unfoldGEPPhi(GetElementPtrInst &GEPI) {
+    // To prevent infinitely expanding recursive phis, bail if the GEP pointer
+    // operand (looking through the phi if it is the phi we want to unfold) is
+    // an instruction besides an alloca.
+    PHINode *Phi = dyn_cast<PHINode>(GEPI.getPointerOperand());
+    auto IsInvalidPointerOperand = [](Value *V) {
+      return isa<Instruction>(V) && !isa<AllocaInst>(V);
+    };
+    if (Phi) {
+      if (any_of(Phi->operands(), IsInvalidPointerOperand))
+        return false;
+    } else {
+      if (IsInvalidPointerOperand(GEPI.getPointerOperand()))
+        return false;
+    }
+    // Check whether the GEP has exactly one phi operand (including the pointer
+    // operand) and all indices will become constant after the transform.
+    for (Value *Op : GEPI.indices()) {
+      if (auto *SI = dyn_cast<PHINode>(Op)) {
+        if (Phi)
+          return false;
 
-    PHINode *PHI = cast<PHINode>(GEPI.getPointerOperand());
-    if (GEPI.getParent() != PHI->getParent() ||
-        llvm::any_of(PHI->incoming_values(), [](Value *In) {
-          Instruction *I = dyn_cast<Instruction>(In);
-          return !I || isa<GetElementPtrInst>(I) || isa<PHINode>(I) ||
-                 succ_empty(I->getParent()) ||
-                 !I->getParent()->isLegalToHoistInto();
-        }))
+        Phi = SI;
+        if (!all_of(Phi->incoming_values(),
+                    [](Value *V) { return isa<ConstantInt>(V); }))
+          return false;
+        continue;
+      }
+
+      if (!isa<ConstantInt>(Op))
+        return false;
+    }
+
+    if (!Phi)
       return false;
 
     LLVM_DEBUG(dbgs() << "  Rewriting gep(phi) -> phi(gep):\n";
-               dbgs() << "    original: " << *PHI << "\n";
+               dbgs() << "    original: " << *Phi << "\n";
                dbgs() << "              " << GEPI << "\n";);
 
-    SmallVector<Value *, 4> Index(GEPI.indices());
-    bool IsInBounds = GEPI.isInBounds();
-    IRB.SetInsertPoint(GEPI.getParent(), GEPI.getParent()->getFirstNonPHIIt());
-    PHINode *NewPN = IRB.CreatePHI(GEPI.getType(), PHI->getNumIncomingValues(),
-                                   PHI->getName() + ".sroa.phi");
-    for (unsigned I = 0, E = PHI->getNumIncomingValues(); I != E; ++I) {
-      BasicBlock *B = PHI->getIncomingBlock(I);
-      Value *NewVal = nullptr;
-      int Idx = NewPN->getBasicBlockIndex(B);
-      if (Idx >= 0) {
-        NewVal = NewPN->getIncomingValue(Idx);
-      } else {
-        Instruction *In = cast<Instruction>(PHI->getIncomingValue(I));
+    auto GetNewOps = [&](Value *PhiOp) {
+      SmallVector<Value *> NewOps;
+      for (Value *Op : GEPI.operands())
+        if (Op == Phi)
+          NewOps.push_back(PhiOp);
+        else
+          NewOps.push_back(Op);
+      return NewOps;
+    };
 
-        IRB.SetInsertPoint(In->getParent(), std::next(In->getIterator()));
-        Type *Ty = GEPI.getSourceElementType();
-        NewVal = IRB.CreateGEP(Ty, In, Index, In->getName() + ".sroa.gep",
-                               IsInBounds);
-      }
-      NewPN->addIncoming(NewVal, B);
+    IRB.SetInsertPoint(Phi);
+    PHINode *NewPhi = IRB.CreatePHI(GEPI.getType(), Phi->getNumIncomingValues(),
+                                    Phi->getName() + ".sroa.phi");
+
+    bool IsInBounds = GEPI.isInBounds();
+    Type *SourceTy = GEPI.getSourceElementType();
+    // We only handle arguments, constants, and static allocas here, so we can
+    // insert GEPs at the end of the entry block.
+    IRB.SetInsertPoint(GEPI.getFunction()->getEntryBlock().getTerminator());
+    for (unsigned I = 0, E = Phi->getNumIncomingValues(); I != E; ++I) {
+      Value *Op = Phi->getIncomingValue(I);
+      BasicBlock *BB = Phi->getIncomingBlock(I);
+      SmallVector<Value *> NewOps = GetNewOps(Op);
+
+      Value *NewGEP =
+          IRB.CreateGEP(SourceTy, NewOps[0], ArrayRef(NewOps).drop_front(),
+                        Phi->getName() + ".sroa.gep", IsInBounds);
+      NewPhi->addIncoming(NewGEP, BB);
     }
 
     Visited.erase(&GEPI);
-    GEPI.replaceAllUsesWith(NewPN);
+    GEPI.replaceAllUsesWith(NewPhi);
     GEPI.eraseFromParent();
-    Visited.insert(NewPN);
-    enqueueUsers(*NewPN);
+    Visited.insert(NewPhi);
+    enqueueUsers(*NewPhi);
 
     LLVM_DEBUG(dbgs() << "          to: ";
                for (Value *In
-                    : NewPN->incoming_values()) dbgs()
+                    : NewPhi->incoming_values()) dbgs()
                << "\n              " << *In;
-               dbgs() << "\n              " << *NewPN << '\n');
+               dbgs() << "\n              " << *NewPhi << '\n');
 
     return true;
   }
 
   bool visitGetElementPtrInst(GetElementPtrInst &GEPI) {
-    if (foldGEPSelect(GEPI))
+    if (unfoldGEPSelect(GEPI))
       return true;
 
-    if (isa<PHINode>(GEPI.getPointerOperand()) && foldGEPPhi(GEPI))
+    if (unfoldGEPPhi(GEPI))
       return true;
 
     enqueueUsers(GEPI);
diff --git a/llvm/test/Transforms/SROA/phi-and-select.ll b/llvm/test/Transforms/SROA/phi-and-select.ll
index 54cfb10793a1ac..7c8b27c9de9c0b 100644
--- a/llvm/test/Transforms/SROA/phi-and-select.ll
+++ b/llvm/test/Transforms/SROA/phi-and-select.ll
@@ -114,13 +114,13 @@ define i32 @test3(i32 %x) {
 ; CHECK-LABEL: @test3(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    switch i32 [[X:%.*]], label [[BB0:%.*]] [
-; CHECK-NEXT:    i32 1, label [[BB1:%.*]]
-; CHECK-NEXT:    i32 2, label [[BB2:%.*]]
-; CHECK-NEXT:    i32 3, label [[BB3:%.*]]
-; CHECK-NEXT:    i32 4, label [[BB4:%.*]]
-; CHECK-NEXT:    i32 5, label [[BB5:%.*]]
-; CHECK-NEXT:    i32 6, label [[BB6:%.*]]
-; CHECK-NEXT:    i32 7, label [[BB7:%.*]]
+; CHECK-NEXT:      i32 1, label [[BB1:%.*]]
+; CHECK-NEXT:      i32 2, label [[BB2:%.*]]
+; CHECK-NEXT:      i32 3, label [[BB3:%.*]]
+; CHECK-NEXT:      i32 4, label [[BB4:%.*]]
+; CHECK-NEXT:      i32 5, label [[BB5:%.*]]
+; CHECK-NEXT:      i32 6, label [[BB6:%.*]]
+; CHECK-NEXT:      i32 7, label [[BB7:%.*]]
 ; CHECK-NEXT:    ]
 ; CHECK:       bb0:
 ; CHECK-NEXT:    br label [[EXIT:%.*]]
@@ -733,6 +733,7 @@ define void @PR20822(i1 %c1, i1 %c2, ptr %ptr) {
 ; CHECK-LABEL: @PR20822(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[F_SROA_0:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[F1_SROA_GEP:%.*]] = getelementptr inbounds [[STRUCT_S:%.*]], ptr [[PTR:%.*]], i32 0, i32 0
 ; CHECK-NEXT:    br i1 [[C1:%.*]], label [[IF_END:%.*]], label [[FOR_COND:%.*]]
 ; CHECK:       for.cond:
 ; CHECK-NEXT:    br label [[IF_END]]
@@ -742,9 +743,8 @@ define void @PR20822(i1 %c1, i1 %c2, ptr %ptr) {
 ; CHECK:       if.then2:
 ; CHECK-NEXT:    br label [[IF_THEN5]]
 ; CHECK:       if.then5:
-; CHECK-NEXT:    [[F1:%.*]] = phi ptr [ [[PTR:%.*]], [[IF_THEN2]] ], [ [[F_SROA_0]], [[IF_END]] ]
-; CHECK-NEXT:    [[DOTFCA_0_GEP:%.*]] = getelementptr inbounds [[STRUCT_S:%.*]], ptr [[F1]], i32 0, i32 0
-; CHECK-NEXT:    store i32 0, ptr [[DOTFCA_0_GEP]], align 4
+; CHECK-NEXT:    [[F1_SROA_PHI:%.*]] = phi ptr [ [[F1_SROA_GEP]], [[IF_THEN2]] ], [ [[F_SROA_0]], [[IF_END]] ]
+; CHECK-NEXT:    store i32 0, ptr [[F1_SROA_PHI]], align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
diff --git a/llvm/test/Transforms/SROA/phi-gep.ll b/llvm/test/Transforms/SROA/phi-gep.ll
index c5aa1cdd9cf654..39a8fdc18ec761 100644
--- a/llvm/test/Transforms/SROA/phi-gep.ll
+++ b/llvm/test/Transforms/SROA/phi-gep.ll
@@ -65,15 +65,13 @@ end:
 define i32 @test_sroa_phi_gep_poison(i1 %cond) {
 ; CHECK-LABEL: @test_sroa_phi_gep_poison(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[A:%.*]] = alloca [[PAIR:%.*]], align 4
 ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_THEN:%.*]], label [[END:%.*]]
 ; CHECK:       if.then:
+; CHECK-NEXT:    [[PHI_SROA_PHI_SROA_SPECULATE_LOAD_IF_THEN:%.*]] = load i32, ptr poison, align 4
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[PHI:%.*]] = phi ptr [ [[A]], [[ENTRY:%.*]] ], [ poison, [[IF_THEN]] ]
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [[PAIR]], ptr [[PHI]], i32 0, i32 1
-; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[GEP]], align 4
-; CHECK-NEXT:    ret i32 [[LOAD]]
+; CHECK-NEXT:    [[PHI_SROA_PHI_SROA_SPECULATED:%.*]] = phi i32 [ undef, [[ENTRY:%.*]] ], [ [[PHI_SROA_PHI_SROA_SPECULATE_LOAD_IF_THEN]], [[IF_THEN]] ]
+; CHECK-NEXT:    ret i32 [[PHI_SROA_PHI_SROA_SPECULATED]]
 ;
 entry:
   %a = alloca %pair, align 4
@@ -94,17 +92,13 @@ end:
 define i32 @test_sroa_phi_gep_global(i1 %cond) {
 ; CHECK-LABEL: @test_sroa_phi_gep_global(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[A:%.*]] = alloca [[PAIR:%.*]], align 4
-; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr inbounds [[PAIR]], ptr [[A]], i32 0, i32 1
-; CHECK-NEXT:    store i32 1, ptr [[GEP_A]], align 4
 ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_THEN:%.*]], label [[END:%.*]]
 ; CHECK:       if.then:
+; CHECK-NEXT:    [[PHI_SROA_PHI_SROA_SPECULATE_LOAD_IF_THEN:%.*]] = load i32, ptr getelementptr inbounds ([[PAIR:%.*]], ptr @g, i32 0, i32 1), align 4
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[PHI:%.*]] = phi ptr [ [[A]], [[ENTRY:%.*]] ], [ @g, [[IF_THEN]] ]
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [[PAIR]], ptr [[PHI]], i32 0, i32 1
-; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[GEP]], align 4
-; CHECK-NEXT:    ret i32 [[LOAD]]
+; CHECK-NEXT:    [[PHI_SROA_PHI_SROA_SPECULATED:%.*]] = phi i32 [ 1, [[ENTRY:%.*]] ], [ [[PHI_SROA_PHI_SROA_SPECULATE_LOAD_IF_THEN]], [[IF_THEN]] ]
+; CHECK-NEXT:    ret i32 [[PHI_SROA_PHI_SROA_SPECULATED]]
 ;
 entry:
   %a = alloca %pair, align 4
@@ -245,7 +239,7 @@ define i32 @test_sroa_invoke_phi_gep(i1 %cond) personality ptr @__gxx_personalit
 ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[CALL:%.*]], label [[END:%.*]]
 ; CHECK:       call:
 ; CHECK-NEXT:    [[B:%.*]] = invoke ptr @foo()
-; CHECK-NEXT:    to label [[END]] unwind label [[INVOKE_CATCH:%.*]]
+; CHECK-NEXT:            to label [[END]] unwind label [[INVOKE_CATCH:%.*]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[PHI:%.*]] = phi ptr [ [[A]], [[ENTRY:%.*]] ], [ [[B]], [[CALL]] ]
 ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [[PAIR]], ptr [[PHI]], i32 0, i32 1
@@ -253,7 +247,7 @@ define i32 @test_sroa_invoke_phi_gep(i1 %cond) personality ptr @__gxx_personalit
 ; CHECK-NEXT:    ret i32 [[LOAD]]
 ; CHECK:       invoke_catch:
 ; CHECK-NEXT:    [[RES:%.*]] = landingpad { ptr, i32 }
-; CHECK-NEXT:    catch ptr null
+; CHECK-NEXT:            catch ptr null
 ; CHECK-NEXT:    ret i32 0
 ;
 entry:
@@ -468,10 +462,10 @@ define i32 @test_sroa_phi_gep_multiple_values_from_same_block(i32 %arg) {
 ; CHECK-LABEL: @test_sroa_phi_gep_multiple_values_from_same_block(
 ; CHECK-NEXT:  bb.1:
 ; CHECK-NEXT:    switch i32 [[ARG:%.*]], label [[BB_3:%.*]] [
-; CHECK-NEXT:    i32 1, label [[BB_2:%.*]]
-; CHECK-NEXT:    i32 2, label [[BB_2]]
-; CHECK-NEXT:    i32 3, label [[BB_4:%.*]]
-; CHECK-NEXT:    i32 4, label [[BB_4]]
+; CHECK-NEXT:      i32 1, label [[BB_2:%.*]]
+; CHECK-NEXT:      i32 2, label [[BB_2]]
+; CHECK-NEXT:      i32 3, label [[BB_4:%.*]]
+; CHECK-NEXT:      i32 4, label [[BB_4]]
 ; CHECK-NEXT:    ]
 ; CHECK:       bb.2:
 ; CHECK-NEXT:    br label [[BB_4]]
@@ -504,6 +498,148 @@ bb.4:                                                ; preds = %bb.1, %bb.1, %bb
   ret i32 %load
 }
 
+define i64 @test_phi_idx_mem2reg_const(i1 %arg) {
+; CHECK-LABEL: @test_phi_idx_mem2reg_const(
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    br i1 [[ARG:%.*]], label [[BB1:%.*]], label [[BB2:%.*]]
+; CHECK:       bb1:
+; CHECK-NEXT:    br label [[END:%.*]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       end:
+; CHECK-NEXT:    [[PHI_SROA_PHI_SROA_SPECULATED:%.*]] = phi i64 [ 2, [[BB1]] ], [ 3, [[BB2]] ]
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ 0, [[BB1]] ], [ 1, [[BB2]] ]
+; CHECK-NEXT:    ret i64 [[PHI_SROA_PHI_SROA_SPECULATED]]
+;
+bb:
+  %alloca = alloca [2 x i64], align 8
+  %gep1 = getelementptr inbounds i64, ptr %alloca, i64 1
+  store i64 2, ptr %alloca
+  store i64 3, ptr %gep1
+  br i1 %arg, label %bb1, label %bb2
+
+bb1:
+  br label %end
+
+bb2:
+  br label %end
+
+end:
+  %phi = phi i64 [ 0, %bb1 ], [ 1, %bb2 ]
+  %getelementptr = getelementptr inbounds i64, ptr %alloca, i64 %phi
+  %load = load i64, ptr %getelementptr
+  ret i64 %load
+}
+
+define i64 @test_phi_idx_mem2reg_not_const(i1 %arg, i64 %idx) {
+; CHECK-LABEL: @test_phi_idx_mem2reg_not_const(
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [2 x i64], align 8
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[ALLOCA]], i64 1
+; CHECK-NEXT:    store i64 2, ptr [[ALLOCA]], align 4
+; CHECK-NEXT:    store i64 3, ptr [[GEP1]], align 4
+; CHECK-NEXT:    br i1 [[ARG:%.*]], label [[BB1:%.*]], label [[BB2:%.*]]
+; CHECK:       bb1:
+; CHECK-NEXT:    br label [[END:%.*]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       end:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ 0, [[BB1]] ], [ [[IDX:%.*]], [[BB2]] ]
+; CHECK-NEXT:    [[GETELEMENTPTR:%.*]] = getelementptr inbounds i64, ptr [[ALLOCA]], i64 [[PHI]]
+; CHECK-NEXT:    [[LOAD:%.*]] = load i64, ptr [[GETELEMENTPTR]], align 4
+; CHECK-NEXT:    ret i64 [[LOAD]]
+;
+bb:
+  %alloca = alloca [2 x i64], align 8
+  %gep1 = getelementptr inbounds i64, ptr %alloca, i64 1
+  store i64 2, ptr %alloca
+  store i64 3, ptr %gep1
+  br i1 %arg, label %bb1, label %bb2
+
+bb1:
+  br label %end
+
+bb2:
+  br label %end
+
+end:
+  %phi = phi i64 [ 0, %bb1 ], [ %idx, %bb2 ]
+  %getelementptr = getelementptr inbounds i64, ptr %alloca, i64 %phi
+  %load = load i64, ptr %getelementptr
+  ret i64 %load
+}
+
+define i64 @test_phi_mem2reg_pointer_op_is_non_const_gep(i1 %arg, i64 %idx) {
+; CHECK-LABEL: @test_phi_mem2reg_pointer_op_is_non_const_gep(
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [2 x i64], align 8
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[ALLOCA]], i64 1
+; CHECK-NEXT:    store i64 2, ptr [[ALLOCA]], align 4
+; CHECK-NEXT:    store i64 3, ptr [[GEP1]], align 4
+; CHECK-NEXT:    br i1 [[ARG:%.*]], label [[BB1:%.*]], label [[BB2:%.*]]
+; CHECK:       bb1:
+; CHECK-NEXT:    br label [[END:%.*]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[END]]
+; CHECK:       end:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ 0, [[BB1]] ], [ 1, [[BB2]] ]
+; CHECK-NEXT:    [[GETELEMENTPTR:%.*]] = getelementptr inbounds i64, ptr [[ALLOCA]], i64 [[IDX:%.*]]
+; CHECK-NEXT:    [[GETELEMENTPTR2:%.*]] = getelementptr inbounds i64, ptr [[GETELEMENTPTR]], i64 [[PHI]]
+; CHECK-NEXT:    [[LOAD:%.*]] = load i64, ptr [[GETELEMENTPTR]], align 4
+; CHECK-NEXT:    ret i64 [[LOAD]]
+;
+bb:
+  %alloca = alloca [2 x i64], align 8
+  %gep1 = getelementptr inbounds i64, ptr %alloca, i64 1
+  store i64 2, ptr %alloca
+  store i64 3, ptr %gep1
+  br i1 %arg, label %bb1, label %bb2
+
+bb1:
+  br label %end
+
+bb2:
+  br label %end
+
+end:
+  %phi = phi i64 [ 0, %bb1 ], [ 1, %bb2 ]
+  %getelementptr = getelementptr inbounds i64, ptr %alloca, i64 %idx
+  %getelementptr2 = getelementptr inbounds i64, ptr %getelementptr, i64 %phi
+  %load = load i64, ptr %getelementptr
+  ret i64 %load
+}
+
+define i1 @test_phi_mem2reg_entry_block_alloca_not_at_beginning(i1 %arg) {
+; CHECK-LABEL: @test_phi_mem2reg_entry_block_alloca_not_at_beginning(
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    call void @f()
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca i64, align 8
+; CHECK-NEXT:    [[PHI_SROA_GEP:%.*]] = getelementptr i64, ptr [[ALLOCA]], i64 1
+; CHECK-NEXT:    [[PHI_SROA_GEP1:%.*]] = getelementptr i64, ptr [[ALLOCA]], i64 2
+; CHECK-NEXT:    br i1 [[ARG:%.*]], label [[BB2:%.*]], label [[BB3:%.*]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[BB3]]
+; CHECK:       bb3:
+; CHECK-NEXT:    [[PHI_SROA_PHI:%.*]] = phi ptr [ [[PHI_SROA_GEP]], [[BB:%.*]] ], [ [[PHI_SROA_GEP1]], [[BB2]] ]
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ 1, [[BB]] ], [ 2, [[BB2]] ]
+; CHECK-NEXT:    [[ICMP:%.*]] = icmp eq ptr [[PHI_SROA_PHI]], null
+; CHECK-NEXT:    ret i1 [[ICMP]]
+;
+bb:
+  call void @f()
+  %alloca = alloca i64
+  br i1 %arg, label %bb2, label %bb3
+bb2:
+  br label %bb3
+bb3:
+  %phi = phi i64 [ 1, %bb ], [ 2, %bb2 ]
+  %gep = getelementptr i64, ptr %alloca, i64 %phi
+  %icmp = icmp eq ptr %gep, null
+  ret i1 %icmp
+}
+
+declare void @f()
+
 declare ptr @foo()
 
 declare i32 @__gxx_personality_v0(...)

@Artem-B
Copy link
Member

Artem-B commented Feb 29, 2024

Q: why not put those GEPs into appropriate incoming basic blocks. That would be a smaller-scope unfolding. I suspect those geps will be sunk into appropriate basic blocks, but it may be better if we don't have to do that.

@aeubanks
Copy link
Contributor Author

Q: why not put those GEPs into appropriate incoming basic blocks. That would be a smaller-scope unfolding. I suspect those geps will be sunk into appropriate basic blocks, but it may be better if we don't have to do that.

It doesn't really matter where the GEP is, later optimizations will place them wherever optimal.

If we want to support this unfolding for GEPs on top of allocas/constants/args, it's trickier

bb1:
bb2:
bb3:
  %i = phi [ 4, %bb1 ], [ 8, %bb2 ]
  %g1 = gep ...
  %g2 = gep %g1, %i

can't become

bb1:
  %g_bb1 = gep %g1, 4
bb2:
  %g_bb2 = gep %g1, 8
bb3:
  %g2 = phi [ %g_bb1, %bb1 ], [ %g_bb2, %bb2 ]
  %g1 = gep ...

since %g1 is defined later than its uses. So I've been trying to keep the insertion point simple.

@Artem-B
Copy link
Member

Artem-B commented Mar 1, 2024

It doesn't really matter where the GEP is, later optimizations will place them wherever optimal.

It does matter in practice. Later optimization pass may eventually place them wherever optimal, but that assumes there will be enough of those "later" passes. If we happened to be in SROA pass at the tail end of the compilation, that may not be the case.

Anyways, I'm OK with the keep-it-simple approach.

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 1, 2024

Looks like this patch still breaks my benchmark.
See also dtcxzyw/llvm-opt-benchmark#278.

I will provide a minimal reproducer.

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 1, 2024

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.FloatParts128 = type { i8, i8, i32, i64, i64 }

define { i64, i64 } @float128_div(ptr %pa, i8 %cond) {
entry:
  %pa1 = alloca %struct.FloatParts128, align 8
  br label %parts128_div.exit

if.end48.i:                                       ; No predecessors!
  switch i8 %cond, label %common.ret [
    i8 1, label %parts128_div.exit
    i8 0, label %parts128_div.exit
  ]

common.ret:                                       ; preds = %parts128_div.exit, %if.end48.i
  ret { i64, i64 } zeroinitializer

parts128_div.exit:                                ; preds = %if.end48.i, %if.end48.i, %entry
  %retval.i7.0 = phi ptr [ %pa, %if.end48.i ], [ %pa, %if.end48.i ], [ %pa1, %entry ]
  %frac_lo.i = getelementptr i8, ptr %retval.i7.0, i64 16
  %0 = load i64, ptr %frac_lo.i, align 8
  br label %common.ret
}

> bin/opt -passes=sroa reduced.ll -S

PHI node has multiple entries for the same basic block with different incoming values!
  %retval.i7.0.sroa.phi = phi ptr [ %retval.i7.0.sroa.gep, %if.end48.i ], [ %retval.i7.0.sroa.gep1, %if.end48.i ], [ %pa1.sroa.0, %entry ]
label %if.end48.i
  %retval.i7.0.sroa.gep = getelementptr i8, ptr %pa, i64 16
  %retval.i7.0.sroa.gep1 = getelementptr i8, ptr %pa, i64 16
LLVM ERROR: Broken module found, compilation aborted!

@dtcxzyw dtcxzyw linked an issue Mar 1, 2024 that may be closed by this pull request
@aeubanks
Copy link
Contributor Author

aeubanks commented Mar 4, 2024

thanks for the ahead-of-time repro! pushed a commit to fix that

@aeubanks aeubanks merged commit 8848258 into llvm:main Mar 4, 2024
3 of 4 checks passed
@aeubanks aeubanks deleted the sroa-phi2 branch March 4, 2024 22:21
@wlei-llvm
Copy link
Contributor

Hi,
This breaks one of our assertion build:

Instruction does not dominate all uses!
  %3 = alloca i8, i64 %conv17169, align 16, !dbg !15551
  %__cs.0.sroa.gep = getelementptr inbounds i8, ptr %3, i64 1, !dbg !15533
Instruction does not dominate all uses!
  %4 = alloca i8, i64 %conv17, align 16, !dbg !15551
  %__cs.0.sroa.gep173 = getelementptr inbounds i8, ptr %4, i64 1, !dbg !15533
Instruction does not dominate all uses!
  %3 = alloca i8, i64 %conv17169, align 16, !dbg !15551
  %__cs.0.sroa.gep177 = getelementptr inbounds i8, ptr %3, i64 2, !dbg !15533
Instruction does not dominate all uses!
  %4 = alloca i8, i64 %conv17, align 16, !dbg !15551
  %__cs.0.sroa.gep178 = getelementptr inbounds i8, ptr %4, i64 2, !dbg !15533
Instruction does not dominate all uses!
  %3 = alloca i8, i64 %conv17145, align 16, !dbg !15820
  %__cs.0.sroa.gep = getelementptr inbounds i8, ptr %3, i64 1, !dbg !15803
Instruction does not dominate all uses!
  %4 = alloca i8, i64 %conv17, align 16, !dbg !15820
  %__cs.0.sroa.gep149 = getelementptr inbounds i8, ptr %4, i64 1, !dbg !15803
Instruction does not dominate all uses!
  %3 = alloca i8, i64 %conv17145, align 16, !dbg !15820
  %__cs.0.sroa.gep153 = getelementptr inbounds i8, ptr %3, i64 2, !dbg !15803
Instruction does not dominate all uses!
  %4 = alloca i8, i64 %conv17, align 16, !dbg !15820
  %__cs.0.sroa.gep154 = getelementptr inbounds i8, ptr %4, i64 2, !dbg !15803
fatal error: error in backend: Broken module found, compilation aborted!

Please take a look, thanks! I will try to get a reduced reproducer but posted just in case something obvious to fix.

@wlei-llvm
Copy link
Contributor

Hi
Here is the reduced repro:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-redhat-linux-gnu"
define { ptr, i8 } @_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_() {
entry:
  %0 = alloca [45 x i8], align 16
  br label %if.end30
if.then19:                                        ; No predecessors!
  %1 = alloca i8, i64 0, align 16
  br label %if.end30
if.end30:                                         ; preds = %if.then19, %entry
  %__cs.0 = phi ptr [ %1, %if.then19 ], [ %0, %entry ]
  unreachable
lor.lhs.false44:                                  ; No predecessors!
  %arrayidx = getelementptr i8, ptr %__cs.0, i64 1
  %2 = load i8, ptr %arrayidx, align 1
  ret { ptr, i8 } zeroinitializer
}
opt  -passes=sroa  reduced.ll -S
 
 
 
Instruction does not dominate all uses!
  %0 = alloca i8, i64 0, align 16
  %__cs.0.sroa.gep = getelementptr i8, ptr %0, i64 1
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/wlei/local/upstream/llvm-build/bin/opt -passes=sroa reduced.ll -S
 #0 0x00007f6d713f9428 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f6d713f74d0 llvm::sys::RunSignalHandlers() /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f6d713f9afd SignalHandler(int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f6d74c12d20 __restore_rt (/lib64/libpthread.so.0+0x12d20)
 #4 0x00007f6d7024e52f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f6d70221e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f6d7134f6f6 llvm::report_fatal_error(llvm::Twine const&, bool) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/ErrorHandling.cpp:125:5
 #7 0x00007f6d7134f526 (/data/users/wlei/upstream/llvm-build/bin/../lib/libLLVMSupport.so.19.0git+0x14f526)
 #8 0x00007f6d71b012de (/data/users/wlei/upstream/llvm-build/bin/../lib/libLLVMCore.so.19.0git+0x5012de)
 #9 0x00007f6d7533c66d llvm::detail::PassModel<llvm::Module, llvm::VerifierPass, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/wlei/local/upstream/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:90:5
#10 0x00007f6d71aca3eb llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/wlei/local/upstream/llvm-project/llvm/include/llvm/IR/PassManager.h:247:10
#11 0x00007f6d75334a4d llvm::SmallPtrSetImplBase::isSmall() const /home/wlei/local/upstream/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:196:33
#12 0x00007f6d75334a4d llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /home/wlei/local/upstream/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:84:10
#13 0x00007f6d75334a4d llvm::PreservedAnalyses::~PreservedAnalyses() /home/wlei/local/upstream/llvm-project/llvm/include/llvm/IR/Analysis.h:109:7
#14 0x00007f6d75334a4d llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /home/wlei/local/upstream/llvm-project/llvm/tools/opt/NewPMDriver.cpp:547:3
#15 0x00007f6d753410ee optMain /home/wlei/local/upstream/llvm-project/llvm/tools/opt/optdriver.cpp:737:12
#16 0x00007f6d7023a7e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#17 0x0000000000201c1e _start (/home/wlei/local/upstream/llvm-build/bin/opt+0x201c1e)
Aborted (core dumped)

aeubanks added a commit that referenced this pull request Mar 7, 2024
@aeubanks
Copy link
Contributor Author

aeubanks commented Mar 7, 2024

sorry for the crash, should be fixed in eae4f56

@wlei-llvm
Copy link
Contributor

sorry for the crash, should be fixed in eae4f56

Confirmed there is no crash with this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update diff February 28th 2024, 7:16:30 pm
5 participants