Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inline][Cloning] Defer constant propagation after phi-nodes resolution #87963

Conversation

antoniofrighetto
Copy link
Contributor

@antoniofrighetto antoniofrighetto commented Apr 8, 2024

A logic issue arose when inlining via CloneAndPruneFunctionInto, which, besides cloning, performs instruction simplification as well. By the time a new cloned instruction is being simplified, phi-nodes are not remapped yet as the whole CFG needs to be processed first. As VMap state at this stage is incomplete, threadCmpOverPHI and variants could lead to unsound optimizations. This issue has been addressed by performing basic constant folding while cloning, and postponing instruction simplification once phi-nodes are revisited.

Fixes: #87534.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 8, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Antonio Frighetto (antoniofrighetto)

Changes

A logic issue arose when inlining via CloneAndPruneFunctionInto, which, besides cloning, performs basic constant propagation as well. By the time a new cloned instruction is being simplified, phi-nodes are not remapped yet as the whole CFG needs to be processed first. As VMap state at this stage is incomplete, threadCmpOverPHI and similar could lead to unsound optimizations. This issue has been addressed by postponing simplification once phi-nodes are revisited.

Fixes: #87534.


Full diff: https://github.com/llvm/llvm-project/pull/87963.diff

6 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+39-65)
  • (modified) llvm/test/Transforms/Inline/dynamic-alloca-simplified-large.ll (+2)
  • (added) llvm/test/Transforms/Inline/inline-deferred-instsimplify.ll (+78)
  • (modified) llvm/test/Transforms/Inline/prof-update-instr.ll (+1-1)
  • (modified) llvm/test/Transforms/Inline/prof-update-sample-alwaysinline.ll (-1)
  • (modified) llvm/test/Transforms/Inline/prof-update-sample.ll (-1)
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 3eac726994ae13..627174d1b354e4 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -536,29 +536,10 @@ void PruningFunctionCloner::CloneBlock(
     // Eagerly remap operands to the newly cloned instruction, except for PHI
     // nodes for which we defer processing until we update the CFG. Also defer
     // debug intrinsic processing because they may contain use-before-defs.
-    if (!isa<PHINode>(NewInst) && !isa<DbgVariableIntrinsic>(NewInst)) {
+    if (!isa<PHINode>(NewInst) && !isa<DbgVariableIntrinsic>(NewInst))
       RemapInstruction(NewInst, VMap,
                        ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);
 
-      // If we can simplify this instruction to some other value, simply add
-      // a mapping to that value rather than inserting a new instruction into
-      // the basic block.
-      if (Value *V =
-              simplifyInstruction(NewInst, BB->getModule()->getDataLayout())) {
-        // On the off-chance that this simplifies to an instruction in the old
-        // function, map it back into the new function.
-        if (NewFunc != OldFunc)
-          if (Value *MappedV = VMap.lookup(V))
-            V = MappedV;
-
-        if (!NewInst->mayHaveSideEffects()) {
-          VMap[&*II] = V;
-          NewInst->eraseFromParent();
-          continue;
-        }
-      }
-    }
-
     if (II->hasName())
       NewInst->setName(II->getName() + NameSuffix);
     VMap[&*II] = NewInst; // Add instruction map to value.
@@ -823,52 +804,45 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
     }
   }
 
-  // Make a second pass over the PHINodes now that all of them have been
-  // remapped into the new function, simplifying the PHINode and performing any
-  // recursive simplifications exposed. This will transparently update the
-  // WeakTrackingVH in the VMap. Notably, we rely on that so that if we coalesce
-  // two PHINodes, the iteration over the old PHIs remains valid, and the
-  // mapping will just map us to the new node (which may not even be a PHI
-  // node).
-  const DataLayout &DL = NewFunc->getParent()->getDataLayout();
-  SmallSetVector<const Value *, 8> Worklist;
-  for (unsigned Idx = 0, Size = PHIToResolve.size(); Idx != Size; ++Idx)
-    if (isa<PHINode>(VMap[PHIToResolve[Idx]]))
-      Worklist.insert(PHIToResolve[Idx]);
-
-  // Note that we must test the size on each iteration, the worklist can grow.
-  for (unsigned Idx = 0; Idx != Worklist.size(); ++Idx) {
-    const Value *OrigV = Worklist[Idx];
-    auto *I = dyn_cast_or_null<Instruction>(VMap.lookup(OrigV));
-    if (!I)
-      continue;
-
-    // Skip over non-intrinsic callsites, we don't want to remove any nodes from
-    // the CGSCC.
-    CallBase *CB = dyn_cast<CallBase>(I);
-    if (CB && CB->getCalledFunction() &&
-        !CB->getCalledFunction()->isIntrinsic())
-      continue;
-
-    // See if this instruction simplifies.
-    Value *SimpleV = simplifyInstruction(I, DL);
-    if (!SimpleV)
-      continue;
-
-    // Stash away all the uses of the old instruction so we can check them for
-    // recursive simplifications after a RAUW. This is cheaper than checking all
-    // uses of To on the recursive step in most cases.
-    for (const User *U : OrigV->users())
-      Worklist.insert(cast<Instruction>(U));
-
-    // Replace the instruction with its simplified value.
-    I->replaceAllUsesWith(SimpleV);
+  auto GetOriginalValueInVMap = [&](const auto &I) -> Value * {
+    for (const auto &[OrigI, NewI] : VMap) {
+      if (NewI == I)
+        return const_cast<Value *>(OrigI);
+    }
+    return nullptr;
+  };
 
-    // If the original instruction had no side effects, remove it.
-    if (isInstructionTriviallyDead(I))
-      I->eraseFromParent();
-    else
-      VMap[OrigV] = I;
+  // As phi-nodes have been now remapped, allow incremental simplification of
+  // newly-cloned instructions.
+  const DataLayout &DL = NewFunc->getParent()->getDataLayout();
+  auto *NewEntryBlock = cast<BasicBlock>(VMap[StartingBB]);
+  ReversePostOrderTraversal<BasicBlock *> RPOT(NewEntryBlock);
+  for (auto &BB : RPOT) {
+    for (BasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE;) {
+      auto *NewI = &*II++;
+      if (isa<DbgInfoIntrinsic>(NewI))
+        continue;
+
+      CallBase *CB = dyn_cast<CallBase>(NewI);
+      if (CB && CB->getCalledFunction() &&
+          !CB->getCalledFunction()->isIntrinsic())
+        continue;
+
+      if (Value *V = simplifyInstruction(NewI, DL)) {
+        if (!NewI->use_empty())
+          NewI->replaceAllUsesWith(V);
+
+        if (isInstructionTriviallyDead(NewI)) {
+          NewI->eraseFromParent();
+        } else {
+          // Did not erase it? Restore the new instruction into VMap.
+          auto *OriginalI = GetOriginalValueInVMap(NewI);
+          assert(OriginalI != nullptr &&
+                 "Previously remapped value was not found?");
+          VMap[OriginalI] = NewI;
+        }
+      }
+    }
   }
 
   // Remap debug intrinsic operands now that all values have been mapped.
diff --git a/llvm/test/Transforms/Inline/dynamic-alloca-simplified-large.ll b/llvm/test/Transforms/Inline/dynamic-alloca-simplified-large.ll
index 9b293d39c85f95..54e2010f0b0e9f 100644
--- a/llvm/test/Transforms/Inline/dynamic-alloca-simplified-large.ll
+++ b/llvm/test/Transforms/Inline/dynamic-alloca-simplified-large.ll
@@ -107,6 +107,8 @@ define void @caller3_alloca_not_in_entry(ptr %p1, i1 %b) {
 ; CHECK-NEXT:    [[COND:%.*]] = icmp eq i1 [[B]], true
 ; CHECK-NEXT:    br i1 [[COND]], label [[EXIT:%.*]], label [[SPLIT:%.*]]
 ; CHECK:       split:
+; CHECK-NEXT:    [[SAVEDSTACK:%.*]] = call ptr @llvm.stacksave.p0()
+; CHECK-NEXT:    call void @llvm.stackrestore.p0(ptr [[SAVEDSTACK]])
 ; CHECK-NEXT:    br label [[EXIT]]
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/Inline/inline-deferred-instsimplify.ll b/llvm/test/Transforms/Inline/inline-deferred-instsimplify.ll
new file mode 100644
index 00000000000000..f02d03688f0397
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline-deferred-instsimplify.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -passes=inline -S | FileCheck %s
+; RUN: opt < %s -passes='cgscc(inline)' -S | FileCheck %s
+
+%struct.a = type { i32, i32, i32, i32, i32 }
+
+@g_var = global %struct.a { i32 1, i32 0, i32 0, i32 0, i32 0 }, align 8
+@other_g_var = global %struct.a zeroinitializer, align 4
+
+define void @callee(ptr noundef byval(%struct.a) align 8 %ptr) {
+; CHECK-LABEL: define void @callee(
+; CHECK-SAME: ptr noundef byval([[STRUCT_A:%.*]]) align 8 [[PTR:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[PTR]], align 8
+; CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i32 [[VAL]], 0
+; CHECK-NEXT:    br i1 [[DOTNOT]], label [[CHECK_POINTERS_ARE_EQUAL:%.*]], label [[STORE_PTR_IN_GVAR:%.*]]
+; CHECK:       store_ptr_in_gvar:
+; CHECK-NEXT:    store ptr [[PTR]], ptr @other_g_var, align 8
+; CHECK-NEXT:    br label [[CHECK_POINTERS_ARE_EQUAL]]
+; CHECK:       check_pointers_are_equal:
+; CHECK-NEXT:    [[PHI:%.*]] = phi ptr [ [[PTR]], [[STORE_PTR_IN_GVAR]] ], [ @other_g_var, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[DOTNOT1:%.*]] = icmp eq ptr [[PHI]], [[PTR]]
+; CHECK-NEXT:    br i1 [[DOTNOT1]], label [[RETURN:%.*]], label [[ABORT:%.*]]
+; CHECK:       abort:
+; CHECK-NEXT:    call void @abort()
+; CHECK-NEXT:    unreachable
+; CHECK:       return:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %val = load i32, ptr %ptr, align 8
+  %.not = icmp eq i32 %val, 0
+  br i1 %.not, label %check_pointers_are_equal, label %store_ptr_in_gvar
+
+store_ptr_in_gvar:                                ; preds = %entry
+  store ptr %ptr, ptr @other_g_var, align 8
+  br label %check_pointers_are_equal
+
+check_pointers_are_equal:                         ; preds = %store_ptr_in_gvar, %entry
+  %phi = phi ptr [ %ptr, %store_ptr_in_gvar ], [ @other_g_var, %entry ]
+  %.not1 = icmp eq ptr %phi, %ptr
+  br i1 %.not1, label %return, label %abort
+
+abort:                                            ; preds = %check_pointers_are_equal
+  call void @abort()
+  unreachable
+
+return:                                           ; preds = %check_pointers_are_equal
+  ret void
+}
+
+define i32 @main() {
+; CHECK-LABEL: define i32 @main() {
+; CHECK-NEXT:    [[G_VAR:%.*]] = alloca [[STRUCT_A:%.*]], align 8
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 20, ptr [[G_VAR]])
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[G_VAR]], ptr align 1 @g_var, i64 20, i1 false)
+; CHECK-NEXT:    [[VAL_I:%.*]] = load i32, ptr [[G_VAR]], align 8
+; CHECK-NEXT:    [[DOTNOT_I:%.*]] = icmp eq i32 [[VAL_I]], 0
+; CHECK-NEXT:    br i1 [[DOTNOT_I]], label [[CHECK_POINTERS_ARE_EQUAL_I:%.*]], label [[STORE_PTR_IN_GVAR_I:%.*]]
+; CHECK:       store_ptr_in_gvar.i:
+; CHECK-NEXT:    store ptr [[G_VAR]], ptr @other_g_var, align 8
+; CHECK-NEXT:    br label [[CHECK_POINTERS_ARE_EQUAL_I]]
+; CHECK:       check_pointers_are_equal.i:
+; CHECK-NEXT:    [[PHI_I:%.*]] = phi ptr [ [[G_VAR]], [[STORE_PTR_IN_GVAR_I]] ], [ @other_g_var, [[TMP0:%.*]] ]
+; CHECK-NEXT:    [[DOTNOT1_I:%.*]] = icmp eq ptr [[PHI_I]], [[G_VAR]]
+; CHECK-NEXT:    br i1 [[DOTNOT1_I]], label [[CALLEE_EXIT:%.*]], label [[ABORT_I:%.*]]
+; CHECK:       abort.i:
+; CHECK-NEXT:    call void @abort()
+; CHECK-NEXT:    unreachable
+; CHECK:       callee.exit:
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 20, ptr [[G_VAR]])
+; CHECK-NEXT:    ret i32 0
+;
+  call void @callee(ptr noundef byval(%struct.a) align 8 @g_var)
+  ret i32 0
+}
+
+declare void @abort()
diff --git a/llvm/test/Transforms/Inline/prof-update-instr.ll b/llvm/test/Transforms/Inline/prof-update-instr.ll
index 38dfa67dcacdc5..01a515bc8e293e 100644
--- a/llvm/test/Transforms/Inline/prof-update-instr.ll
+++ b/llvm/test/Transforms/Inline/prof-update-instr.ll
@@ -52,6 +52,6 @@ define void @caller() !prof !21 {
 !21 = !{!"function_entry_count", i64 400}
 attributes #0 = { alwaysinline }
 ; CHECK: ![[ENTRY_COUNT]] = !{!"function_entry_count", i64 600}
-; CHECK: ![[COUNT_IND_CALLEE1]] = !{!"VP", i32 0, i64 200, i64 111, i64 100, i64 222, i64 60, i64 333, i64 40}
+; CHECK: ![[COUNT_IND_CALLEE1]] = !{!"VP", i32 0, i64 120, i64 111, i64 60, i64 222, i64 36, i64 333, i64 24}
 ; CHECK: ![[COUNT_IND_CALLEE]] = !{!"VP", i32 0, i64 84, i64 111, i64 48, i64 222, i64 24, i64 333, i64 12}
 ; CHECK: ![[COUNT_IND_CALLER]] = !{!"VP", i32 0, i64 56, i64 111, i64 32, i64 222, i64 16, i64 333, i64 8}
diff --git a/llvm/test/Transforms/Inline/prof-update-sample-alwaysinline.ll b/llvm/test/Transforms/Inline/prof-update-sample-alwaysinline.ll
index d6b771e2629d2e..8af4d89663a4da 100644
--- a/llvm/test/Transforms/Inline/prof-update-sample-alwaysinline.ll
+++ b/llvm/test/Transforms/Inline/prof-update-sample-alwaysinline.ll
@@ -53,7 +53,6 @@ define void @caller() {
 !18 = !{!"VP", i32 0, i64 140, i64 111, i64 80, i64 222, i64 40, i64 333, i64 20}
 attributes #0 = { alwaysinline }
 ; CHECK: ![[ENTRY_COUNT]] = !{!"function_entry_count", i64 600}
-; CHECK: ![[COUNT_CALLEE1]] = !{!"branch_weights", i32 2000}
 ; CHECK: ![[COUNT_CALLEE]] = !{!"branch_weights", i32 1200}
 ; CHECK: ![[COUNT_IND_CALLEE]] = !{!"VP", i32 0, i64 84, i64 111, i64 48, i64 222, i64 24, i64 333, i64 12}
 ; CHECK: ![[COUNT_CALLER]] = !{!"branch_weights", i32 800}
diff --git a/llvm/test/Transforms/Inline/prof-update-sample.ll b/llvm/test/Transforms/Inline/prof-update-sample.ll
index 6cdd70e84e0c6d..e09b859b698120 100644
--- a/llvm/test/Transforms/Inline/prof-update-sample.ll
+++ b/llvm/test/Transforms/Inline/prof-update-sample.ll
@@ -52,7 +52,6 @@ define void @caller() {
 !17 = !{!"branch_weights", i32 400}
 !18 = !{!"VP", i32 0, i64 140, i64 111, i64 80, i64 222, i64 40, i64 333, i64 20}
 ; CHECK: ![[ENTRY_COUNT]] = !{!"function_entry_count", i64 600}
-; CHECK: ![[COUNT_CALLEE1]] = !{!"branch_weights", i32 2000}
 ; CHECK: ![[COUNT_CALLEE]] = !{!"branch_weights", i32 1200}
 ; CHECK: ![[COUNT_IND_CALLEE]] = !{!"VP", i32 0, i64 84, i64 111, i64 48, i64 222, i64 24, i64 333, i64 12}
 ; CHECK: ![[COUNT_CALLER]] = !{!"branch_weights", i32 800}

@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch from fc09adb to 7a5218d Compare April 11, 2024 17:28
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the compile-time impact of this change?

return const_cast<Value *>(OrigI);
}
return nullptr;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very inefficient if many values are cloned and simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the same concern, though in practice I would expect it not to happen that often as most of the times the instruction would be trivially dead? Alternatively, we could iterate over all the instructions of the old function, looking up VMap each time to get the new instruction and see if that can be simplified? Not sure if it’d be worth it though. Compile time does not look excessively bad: http://llvm-compile-time-tracker.com/compare.php?from=d34a2c2adb2a4f1dc262c5756d3725caa4ea2571&to=79475c2bf9493e51dcdf46e49b8bd15da326e912&stat=instructions:u. Also, IIRC RPOT was outperforming classic iteration in a few settings (stage2-O3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic, rewrote it as I was not too much happy with iterating over VMap either. It looks overall slightly better now, although it seems this was not the actual issue as the regression on sqlite3 is still there: http://llvm-compile-time-tracker.com/compare.php?from=17b86d5978af8d171fa28763a9e5eba3ce93713a&to=6870569453fb583c81cb1a8ac7ef321dd7d2aaa7&stat=instructions:u. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW avoiding instruction simplification outright leads to overall worse regressions despite some little amelioration here and there: http://llvm-compile-time-tracker.com/compare.php?from=d34a2c2adb2a4f1dc262c5756d3725caa4ea2571&to=f9307dc124eba7bb019310c9de043166e494f7a3&stat=instructions:u.

Copy link
Contributor

@nikic nikic Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to understand what causes the sqlite3 regression. My theory is that this is because the block cloning is lazy: If we see that the branch condition has folded to a constant, we may only clone one successor.

I think that after this change that logic is essentially dead and we'll always clone the whole function. The whole function might be much larger than just the part reachable via folded conditions. Especially if you take into account that InlineCost does the same thing, so inlining a very large function may be considered cheap if most of it is dead based on caller information.

If this theory is true, then what we might want to do is to still eagerly perform constant folding, and then do the InstSimplify pass afterwards. (We could also test doing only constant folding...)

auto *OriginalI = GetOriginalValueInVMap(&NewI);
assert(OriginalI != nullptr &&
"Previously remapped value was not found?");
VMap[OriginalI] = &NewI;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't understand what this code is doing. It seems like a no-op? It finds the VMap entry OriginalI -> &NewI and then replaces it with ... &NewI?

Was this supposed to replace with V? Also why don't we need to do that if we erase the instruction? Won't we leave behind dangling pointers in VMap with the current implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaceAllUsesWith will drop the old ValueHandleBase in VMap, and update it with the newly-simplified value via ValueHandleBase::ValueIsRAUWd. If we do not erase the instruction, we have to manually restore NewI. I must have missed something while refactoring the code, since clearly we have to retrieve OriginalI through the just-replaced value V, not via NewI, as VMap state has changed. No need to do anything when the instruction gets erased. Fixed it now, thanks.

@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch 3 times, most recently from 8d960d1 to 6870569 Compare April 17, 2024 08:44
@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch from 6870569 to 8944a84 Compare April 20, 2024 13:52
@antoniofrighetto
Copy link
Contributor Author

@nikic, I confirm performing basic constant folding on average decreases the number of blocks being cloned; I think we should be more on track now (http://llvm-compile-time-tracker.com/compare.php?from=eec268e41000faad50b7bf977c9f1161e4006bab&to=65847e8c168123ad328683f96b1b37cd269ea85c&stat=instructions:u). I think we still want to do InstSimplify after cloning, since not doing it leads to regressions in stage2 scenario (http://llvm-compile-time-tracker.com/compare.php?from=60baaf153d05a7828b304050aba461ebb66232c6&to=78c72fd92fa1b22fd2d113bb4e821b484322aadd&stat=instructions:u), besides the following local failures:

********************
Failed Tests (8):
  LLVM :: DebugInfo/Generic/assignment-tracking/inline/shared-alloca.ll
  LLVM :: Transforms/Inline/byval.ll
  LLVM :: Transforms/Inline/inline_cleanup.ll
  LLVM :: Transforms/Inline/nonnull.ll
  LLVM :: Transforms/Inline/pr50270.ll
  LLVM :: Transforms/Inline/pr50589.ll
  LLVM :: Transforms/Inline/ptr-diff.ll
  LLVM :: Transforms/Inline/simplify-instruction-computeKnownFPClass-context.ll

@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch from 8944a84 to f646eaa Compare April 22, 2024 07:11
if (!NewInst->mayHaveSideEffects()) {
// Eagerly constant fold the newly cloned instruction. If successful, add
// a mapping to the new value.
if (Value *V = ConstantFoldInstruction(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth explicitly noting that non-constant operands might not incomplete at this point, so we don't want to try to do simplifications based on them.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A bit unfortunate that we have to do it this way, but I don't really see a good alternative.

// a mapping to the new value.
if (Value *V = ConstantFoldInstruction(
NewInst, BB->getModule()->getDataLayout())) {
if (wouldInstructionBeTriviallyDead(NewInst)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally use isInstructionTriviallyDead here just to be more obvious (even though we know that there are no uses at this point.)

@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch from f646eaa to c455aaa Compare April 24, 2024 14:54
A logic issue arose when inlining via `CloneAndPruneFunctionInto`,
which, besides cloning, performs instruction simplification as well.
By the time a new cloned instruction is being simplified, phi-nodes
are not remapped yet as the whole CFG needs to be processed first.
As `VMap` state at this stage is incomplete, `threadCmpOverPHI` and
variants could lead to unsound optimizations. This issue has been
addressed by performing basic constant folding while cloning, and
postponing instruction simplification once phi-nodes are revisited.

Fixes: llvm#87534.
@antoniofrighetto antoniofrighetto force-pushed the feature/cloning-postpone-instsimplify branch from c455aaa to a61f9fe Compare April 24, 2024 14:55
@antoniofrighetto antoniofrighetto merged commit a61f9fe into llvm:main Apr 24, 2024
3 of 4 checks passed
@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 24, 2024

This patch breaks our downstream CI: dtcxzyw/llvm-opt-benchmark#550

Reduced testcase:

; bin/opt -passes=inline -S test.ll

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @_ZN4core4iter6traits8iterator8Iterator3zip17h5482e9cf1625f275E.llvm.18097807149992804452() {
"_ZN123_$LT$$RF$alloc..collections..btree..map..BTreeMap$LT$K$C$V$C$A$GT$$u20$as$u20$core..iter..traits..collect..IntoIterator$GT$9into_iter17haee589ca14fbbed3E.exit":
  %0 = load ptr, ptr null, align 8
  %.not.i.i = icmp eq ptr %0, null
  %1 = load i64, ptr null, align 8
  %.sroa.1.0 = select i1 %.not.i.i, i64 undef, i64 %1
  ret void
}

define noundef i1 @"_ZN98_$LT$alloc..collections..btree..map..BTreeMap$LT$K$C$V$C$A$GT$$u20$as$u20$core..cmp..PartialEq$GT$2eq17h320375e90a5cf86fE"() {
  call void @_ZN4core4iter6traits8iterator8Iterator3zip17h5482e9cf1625f275E.llvm.18097807149992804452()
  ret i1 false
}
opt: /home/dtcxzyw/WorkSpace/Projects/compilers/llvm-project/llvm/include/llvm/IR/User.h:170: llvm::Value* llvm::User::getOperand(unsigned int) const: Assertion `i < NumUserOperands && "getOperand() out of range!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/opt -passes=inline reduced.ll -S
 #0 0x0000750d65c06c10 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.19.0git+0x206c10)
 #1 0x0000750d65c03c1f llvm::sys::RunSignalHandlers() (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMSupport.so.19.0git+0x203c1f)
 #2 0x0000750d65c03d75 SignalHandler(int) Signals.cpp:0:0
 #3 0x0000750d65642520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x0000750d656969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x0000750d656969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x0000750d656969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x0000750d65642476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x0000750d656287f3 abort ./stdlib/abort.c:81:7
 #9 0x0000750d6562871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x0000750d65639e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x0000750d608cbb18 (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMAnalysis.so.19.0git+0x4cbb18)
#12 0x0000750d608d5e42 bool handleGuaranteedNonPoisonOps<llvm::mustTriggerUB(llvm::Instruction const*, llvm::SmallPtrSetImpl<llvm::Value const*> const&)::'lambda'(llvm::Value const*)>(llvm::Instruction const*, llvm::mustTriggerUB(llvm::Instruction const*, llvm::SmallPtrSetImpl<llvm::Value const*> const&)::'lambda'(llvm::Value const*) const&) ValueTracking.cpp:0:0
#13 0x0000750d608d71f8 programUndefinedIfUndefOrPoison(llvm::Value const*, bool) ValueTracking.cpp:0:0
#14 0x0000750d608d7c0e isGuaranteedNotToBeUndefOrPoison(llvm::Value const*, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, unsigned int, UndefPoisonKind) ValueTracking.cpp:0:0
#15 0x0000750d608ea785 llvm::impliesPoison(llvm::Value const*, llvm::Value const*) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMAnalysis.so.19.0git+0x4ea785)
#16 0x0000750d606c761b simplifySelectInst(llvm::Value*, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) InstructionSimplify.cpp:0:0
#17 0x0000750d606b659d simplifyInstructionWithOperands(llvm::Instruction*, llvm::ArrayRef<llvm::Value*>, llvm::SimplifyQuery const&, unsigned int) InstructionSimplify.cpp:0:0
#18 0x0000750d606c8bce llvm::simplifyInstruction(llvm::Instruction*, llvm::SimplifyQuery const&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMAnalysis.so.19.0git+0x2c8bce)
#19 0x0000750d60cdc165 llvm::CloneAndPruneIntoFromInst(llvm::Function*, llvm::Function const*, llvm::Instruction const*, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false>>>&, bool, llvm::SmallVectorImpl<llvm::ReturnInst*>&, char const*, llvm::ClonedCodeInfo*) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMTransformUtils.so.19.0git+0xdc165)
#20 0x0000750d60cdcc89 llvm::CloneAndPruneFunctionInto(llvm::Function*, llvm::Function const*, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false>>>&, bool, llvm::SmallVectorImpl<llvm::ReturnInst*>&, char const*, llvm::ClonedCodeInfo*) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMTransformUtils.so.19.0git+0xdcc89)
#21 0x0000750d60d5351f llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool, llvm::AAResults*, bool, llvm::Function*) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMTransformUtils.so.19.0git+0x15351f)
#22 0x0000750d61e2f1f9 llvm::InlinerPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMipo.so.19.0git+0x22f1f9)
#23 0x0000750d62ec6106 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::InlinerPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMPasses.so.19.0git+0xc6106)
#24 0x0000750d60583ff0 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMAnalysis.so.19.0git+0x183ff0)
#25 0x0000750d62ec2b96 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMPasses.so.19.0git+0xc2b96)
#26 0x0000750d605861a4 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMAnalysis.so.19.0git+0x1861a4)
#27 0x0000750d62ec2b46 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMPasses.so.19.0git+0xc2b46)
#28 0x0000750d5ff475c1 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/../lib/libLLVMCore.so.19.0git+0x3475c1)
#29 0x0000750d660c51d4 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.19.0git+0x2d1d4)
#30 0x0000750d660d2378 optMain (/home/dtcxzyw/WorkSpace/Projects/compilers/LLVM/llvm-build/bin/../lib/libLLVMOptDriver.so.19.0git+0x3a378)
#31 0x0000750d65629d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#32 0x0000750d65629e40 call_init ./csu/../csu/libc-start.c:128:20
#33 0x0000750d65629e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#34 0x00005bc002bc8095 _start (bin/opt+0x1095)

opt crashes when processing ret void in handleGuaranteedWellDefinedOps:

case Instruction::Ret:
if (I->getFunction()->hasRetAttribute(Attribute::NoUndef) &&
Handle(I->getOperand(0)))
return true;

I don't know why the ret attribute noundef exists in a function that returns void.

@UsmanNadeem
Copy link
Contributor

@antoniofrighetto
Copy link
Contributor Author

The issue looks orthogonal and incidentally happens to be triggered by the patch, I'll take a look soon.

@efriedma-quic
Copy link
Collaborator

@antoniofrighetto the first rule of buildbots is that they stay green. No matter what the root cause is, if you break the bots, you revert first, and sort it out later.

@nikic
Copy link
Contributor

nikic commented Apr 25, 2024

See #87482 for discussion about this issue.

Possibly this change also provide a way to fix that one properly, by delaying simplification until after the returns have been rewritten...

@danilaml
Copy link
Collaborator

I've noticed that before this PR, certain calls were simplified-away and now they are not because of

      // Skip over non-intrinsic callsites, we don't want to remove any nodes
      // from the CGSCC.
      CallBase *CB = dyn_cast<CallBase>(NewI);
      if (CB && CB->getCalledFunction() &&
          !CB->getCalledFunction()->isIntrinsic())
        continue;

I'm not clear if this check is still required. And if it is, was the fact that it was missing in CloneBLock before a bug? Shouldn't it still be added there, since ConstantFold can still eliminate some non-intrinsic calls (i.e. some math functions)? It was added by @majnemer in 5554eda

@nikic
Copy link
Contributor

nikic commented May 16, 2024

@danilaml It should be safe to drop. There used to be VMap-based CG updates for the legacy PM, but the relevant function went away entirely in fa6ea7a.

@danilaml
Copy link
Collaborator

@nikic made a PR with the check removal here: #92577

danilaml added a commit that referenced this pull request May 20, 2024
We do not need to concern ourselves with CGSCC since all remaining CG
related updates went away in fa6ea7a as
pointed out by @nikic in
#87963 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrong code at -O1 on x86_64-linux-gnu
8 participants