From ae7f08812e0995481eb345cecc5dd4529829ba44 Mon Sep 17 00:00:00 2001
From: Roman Lebedev <lebedev.ri@gmail.com>
Date: Sun, 16 Aug 2020 23:27:56 +0300
Subject: [PATCH] [InstCombine] Aggregate reconstruction simplification
 (PR47060)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787
---
 clang/test/CodeGenCXX/nrvo.cpp                |   4 +-
 .../InstCombine/InstCombineInternal.h         |   2 +
 .../InstCombine/InstCombineVectorOps.cpp      | 246 ++++++++++++++++++
 .../InstCombine/InstructionCombining.cpp      |  12 +-
 .../InstCombine/aggregate-reconstruction.ll   |  39 +--
 .../phi-aware-aggregate-reconstruction.ll     |  18 +-
 6 files changed, 267 insertions(+), 54 deletions(-)

diff --git a/clang/test/CodeGenCXX/nrvo.cpp b/clang/test/CodeGenCXX/nrvo.cpp
index 61b9364497aed..1a5e63cd7fd71 100644
--- a/clang/test/CodeGenCXX/nrvo.cpp
+++ b/clang/test/CodeGenCXX/nrvo.cpp
@@ -85,8 +85,8 @@ X test2(bool B) {
   // %lpad: landing pad for ctor of 'y', dtor of 'y'
   // CHECK-EH:      [[CAUGHTVAL:%.*]] = landingpad { i8*, i32 }
   // CHECK-EH-NEXT:   cleanup
-  // CHECK-EH-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 0
-  // CHECK-EH-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 1
+  // CHECK-EH-03-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 0
+  // CHECK-EH-03-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 1
   // CHECK-EH-NEXT: br label
   // -> %eh.cleanup
 
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
index b67ceb75eb722..92169ffde22b9 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
+++ b/llvm/lib/Transforms/InstCombine/InstCombineInternal.h
@@ -158,6 +158,8 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
   Instruction *visitFenceInst(FenceInst &FI);
   Instruction *visitSwitchInst(SwitchInst &SI);
   Instruction *visitReturnInst(ReturnInst &RI);
+  Instruction *
+  foldAggregateConstructionIntoAggregateReuse(InsertValueInst &OrigIVI);
   Instruction *visitInsertValueInst(InsertValueInst &IV);
   Instruction *visitInsertElementInst(InsertElementInst &IE);
   Instruction *visitExtractElementInst(ExtractElementInst &EI);
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index bcda851028933..67041ff63f8c4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -18,6 +18,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/InstructionSimplify.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/IR/BasicBlock.h"
@@ -46,6 +47,10 @@ using namespace PatternMatch;
 
 #define DEBUG_TYPE "instcombine"
 
+STATISTIC(NumAggregateReconstructionsSimplified,
+          "Number of aggregate reconstructions turned into reuse of the "
+          "original aggregate");
+
 /// Return true if the value is cheaper to scalarize than it is to leave as a
 /// vector operation. IsConstantExtractIndex indicates whether we are extracting
 /// one known element from a vector constant.
@@ -694,6 +699,243 @@ static ShuffleOps collectShuffleElements(Value *V, SmallVectorImpl<int> &Mask,
   return std::make_pair(V, nullptr);
 }
 
+/// Look for chain of insertvalue's that fully define an aggregate, and trace
+/// back the values inserted, see if they are all were extractvalue'd from
+/// the same source aggregate from the exact same element indexes.
+/// If they were, just reuse the source aggregate.
+/// This potentially deals with PHI indirections.
+Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
+    InsertValueInst &OrigIVI) {
+  BasicBlock *UseBB = OrigIVI.getParent();
+  Type *AggTy = OrigIVI.getType();
+  unsigned NumAggElts;
+  switch (AggTy->getTypeID()) {
+  case Type::StructTyID:
+    NumAggElts = AggTy->getStructNumElements();
+    break;
+  case Type::ArrayTyID:
+    NumAggElts = AggTy->getArrayNumElements();
+    break;
+  default:
+    llvm_unreachable("Unhandled aggregate type?");
+  }
+
+  // Arbitrary aggregate size cut-off. Motivation for limit of 2 is to be able
+  // to handle clang C++ exception struct (which is hardcoded as {i8*, i32}),
+  // FIXME: any interesting patterns to be caught with larger limit?
+  assert(NumAggElts > 0 && "Aggregate should have elements.");
+  if (NumAggElts > 2)
+    return nullptr;
+
+  static constexpr auto NotFound = None;
+  static constexpr auto FoundMismatch = nullptr;
+
+  // Try to find a value of each element of an aggregate.
+  // FIXME: deal with more complex, not one-dimensional, aggregate types
+  SmallVector<Optional<Value *>, 2> AggElts(NumAggElts, NotFound);
+
+  // Do we know values for each element of the aggregate?
+  auto KnowAllElts = [&AggElts]() {
+    return all_of(AggElts,
+                  [](Optional<Value *> Elt) { return Elt != NotFound; });
+  };
+
+  int Depth = 0;
+
+  // Arbitrary `insertvalue` visitation depth limit. Let's be okay with
+  // every element being overwritten twice, which should never happen.
+  static const int DepthLimit = 2 * NumAggElts;
+
+  // Recurse up the chain of `insertvalue` aggregate operands until either we've
+  // reconstructed full initializer or can't visit any more `insertvalue`'s.
+  for (InsertValueInst *CurrIVI = &OrigIVI;
+       Depth < DepthLimit && CurrIVI && !KnowAllElts();
+       CurrIVI = dyn_cast<InsertValueInst>(CurrIVI->getAggregateOperand()),
+                       ++Depth) {
+    Value *InsertedValue = CurrIVI->getInsertedValueOperand();
+    ArrayRef<unsigned int> Indices = CurrIVI->getIndices();
+
+    // Don't bother with more than single-level aggregates.
+    if (Indices.size() != 1)
+      return nullptr; // FIXME: deal with more complex aggregates?
+
+    // Now, we may have already previously recorded the value for this element
+    // of an aggregate. If we did, that means the CurrIVI will later be
+    // overwritten with the already-recorded value. But if not, let's record it!
+    Optional<Value *> &Elt = AggElts[Indices.front()];
+    Elt = Elt.getValueOr(InsertedValue);
+
+    // FIXME: should we handle chain-terminating undef base operand?
+  }
+
+  // Was that sufficient to deduce the full initializer for the aggregate?
+  if (!KnowAllElts())
+    return nullptr; // Give up then.
+
+  // We now want to find the source[s] of the aggregate elements we've found.
+  // And with "source" we mean the original aggregate[s] from which
+  // the inserted elements were extracted. This may require PHI translation.
+
+  enum class SourceAggregate {
+    /// When analyzing the value that was inserted into an aggregate, we did
+    /// not manage to find defining `extractvalue` instruction to analyze.
+    NotFound,
+    /// When analyzing the value that was inserted into an aggregate, we did
+    /// manage to find defining `extractvalue` instruction[s], and everything
+    /// matched perfectly - aggregate type, element insertion/extraction index.
+    Found,
+    /// When analyzing the value that was inserted into an aggregate, we did
+    /// manage to find defining `extractvalue` instruction, but there was
+    /// a mismatch: either the source type from which the extraction was didn't
+    /// match the aggregate type into which the insertion was,
+    /// or the extraction/insertion channels mismatched,
+    /// or different elements had different source aggregates.
+    FoundMismatch
+  };
+  auto Describe = [](Optional<Value *> SourceAggregate) {
+    if (SourceAggregate == NotFound)
+      return SourceAggregate::NotFound;
+    if (*SourceAggregate == FoundMismatch)
+      return SourceAggregate::FoundMismatch;
+    return SourceAggregate::Found;
+  };
+
+  // Given the value \p Elt that was being inserted into element \p EltIdx of an
+  // aggregate AggTy, see if \p Elt was originally defined by an
+  // appropriate extractvalue (same element index, same aggregate type).
+  // If found, return the source aggregate from which the extraction was.
+  // If \p PredBB is provided, does PHI translation of an \p Elt first.
+  auto FindSourceAggregate =
+      [&](Value *Elt, unsigned EltIdx,
+          Optional<BasicBlock *> PredBB) -> Optional<Value *> {
+    // For now(?), only deal with, at most, a single level of PHI indirection.
+    if (PredBB)
+      Elt = Elt->DoPHITranslation(UseBB, *PredBB);
+    // FIXME: deal with multiple levels of PHI indirection?
+
+    // Did we find an extraction?
+    auto *EVI = dyn_cast<ExtractValueInst>(Elt);
+    if (!EVI)
+      return NotFound;
+
+    Value *SourceAggregate = EVI->getAggregateOperand();
+
+    // Is the extraction from the same type into which the insertion was?
+    if (SourceAggregate->getType() != AggTy)
+      return FoundMismatch;
+    // And the element index doesn't change between extraction and insertion?
+    if (EVI->getNumIndices() != 1 || EltIdx != EVI->getIndices().front())
+      return FoundMismatch;
+
+    return SourceAggregate; // SourceAggregate::Found
+  };
+
+  // Given elements AggElts that were constructing an aggregate OrigIVI,
+  // see if we can find appropriate source aggregate for each of the elements,
+  // and see it's the same aggregate for each element. If so, return it.
+  auto FindCommonSourceAggregate =
+      [&](Optional<BasicBlock *> PredBB) -> Optional<Value *> {
+    Optional<Value *> SourceAggregate;
+
+    for (auto I : enumerate(AggElts)) {
+      assert(Describe(SourceAggregate) != SourceAggregate::FoundMismatch &&
+             "We don't store nullptr in SourceAggregate!");
+      assert((Describe(SourceAggregate) == SourceAggregate::Found) ==
+                 (I.index() != 0) &&
+             "SourceAggregate should be valid after the the first element,");
+
+      // For this element, is there a plausible source aggregate?
+      // FIXME: we could special-case undef element, IFF we know that in the
+      //        source aggregate said element isn't poison.
+      Optional<Value *> SourceAggregateForElement =
+          FindSourceAggregate(*I.value(), I.index(), PredBB);
+
+      // Okay, what have we found? Does that correlate with previous findings?
+
+      // Regardless of whether or not we have previously found source
+      // aggregate for previous elements (if any), if we didn't find one for
+      // this element, passthrough whatever we have just found.
+      if (Describe(SourceAggregateForElement) != SourceAggregate::Found)
+        return SourceAggregateForElement;
+
+      // Okay, we have found source aggregate for this element.
+      // Let's see what we already know from previous elements, if any.
+      switch (Describe(SourceAggregate)) {
+      case SourceAggregate::NotFound:
+        // This is apparently the first element that we have examined.
+        SourceAggregate = SourceAggregateForElement; // Record the aggregate!
+        continue; // Great, now look at next element.
+      case SourceAggregate::Found:
+        // We have previously already successfully examined other elements.
+        // Is this the same source aggregate we've found for other elements?
+        if (*SourceAggregateForElement != *SourceAggregate)
+          return FoundMismatch;
+        continue; // Still the same aggregate, look at next element.
+      case SourceAggregate::FoundMismatch:
+        llvm_unreachable("Can't happen. We would have early-exited then.");
+      };
+    }
+
+    assert(Describe(SourceAggregate) == SourceAggregate::Found &&
+           "Must be a valid Value");
+    return *SourceAggregate;
+  };
+
+  Optional<Value *> SourceAggregate;
+
+  // Can we find the source aggregate without looking at predecessors?
+  SourceAggregate = FindCommonSourceAggregate(/*PredBB=*/None);
+  if (Describe(SourceAggregate) != SourceAggregate::NotFound) {
+    if (Describe(SourceAggregate) == SourceAggregate::FoundMismatch)
+      return nullptr; // Conflicting source aggregates!
+    ++NumAggregateReconstructionsSimplified;
+    return replaceInstUsesWith(OrigIVI, *SourceAggregate);
+  }
+
+  // If we didn't manage to find source aggregate without looking at
+  // predecessors, and there are no predecessors to look at, then we're done.
+  if (pred_empty(UseBB))
+    return nullptr;
+
+  // Okay, apparently we need to look at predecessors.
+
+  // Arbitrary predecessor count limit.
+  static const int PredCountLimit = 64;
+  // Don't bother if there are too many predecessors.
+  if (UseBB->hasNPredecessorsOrMore(PredCountLimit + 1))
+    return nullptr;
+
+  // For each predecessor, what is the source aggregate,
+  // from which all the elements were originally extracted from?
+  // Note that we want for the map to have stable iteration order!
+  SmallMapVector<BasicBlock *, Value *, 4> SourceAggregates;
+  for (BasicBlock *Pred : predecessors(UseBB)) {
+    std::pair<decltype(SourceAggregates)::iterator, bool> IV =
+        SourceAggregates.insert({Pred, nullptr});
+    // Did we already evaluate this predecessor?
+    if (!IV.second)
+      continue;
+
+    // Let's hope that when coming from predecessor Pred, all elements of the
+    // aggregate produced by OrigIVI must have been originally extracted from
+    // the same aggregate. Is that so? Can we find said original aggregate?
+    SourceAggregate = FindCommonSourceAggregate(Pred);
+    if (Describe(SourceAggregate) != SourceAggregate::Found)
+      return nullptr; // Give up.
+    IV.first->second = *SourceAggregate;
+  }
+
+  // All good! Now we just need to thread the source aggregates here.
+  auto *PHI = PHINode::Create(AggTy, SourceAggregates.size(),
+                              OrigIVI.getName() + ".merged");
+  for (const std::pair<BasicBlock *, Value *> &SourceAggregate :
+       SourceAggregates)
+    PHI->addIncoming(SourceAggregate.second, SourceAggregate.first);
+
+  ++NumAggregateReconstructionsSimplified;
+  return PHI;
+};
+
 /// Try to find redundant insertvalue instructions, like the following ones:
 ///  %0 = insertvalue { i8, i32 } undef, i8 %x, 0
 ///  %1 = insertvalue { i8, i32 } %0,    i8 %y, 0
@@ -726,6 +968,10 @@ Instruction *InstCombinerImpl::visitInsertValueInst(InsertValueInst &I) {
 
   if (IsRedundant)
     return replaceInstUsesWith(I, I.getOperand(0));
+
+  if (Instruction *NewI = foldAggregateConstructionIntoAggregateReuse(I))
+    return NewI;
+
   return nullptr;
 }
 
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 8142d1104c8be..cb6e77aa029cd 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3647,10 +3647,14 @@ bool InstCombinerImpl::run() {
         BasicBlock *InstParent = I->getParent();
         BasicBlock::iterator InsertPos = I->getIterator();
 
-        // If we replace a PHI with something that isn't a PHI, fix up the
-        // insertion point.
-        if (!isa<PHINode>(Result) && isa<PHINode>(InsertPos))
-          InsertPos = InstParent->getFirstInsertionPt();
+        // Are we replace a PHI with something that isn't a PHI, or vice versa?
+        if (isa<PHINode>(Result) != isa<PHINode>(I)) {
+          // We need to fix up the insertion point.
+          if (isa<PHINode>(I)) // PHI -> Non-PHI
+            InsertPos = InstParent->getFirstInsertionPt();
+          else // Non-PHI -> PHI
+            InsertPos = InstParent->getFirstNonPHI()->getIterator();
+        }
 
         InstParent->getInstList().insert(InsertPos, Result);
 
diff --git a/llvm/test/Transforms/InstCombine/aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/aggregate-reconstruction.ll
index c5f39ddce0425..427151f55f68c 100644
--- a/llvm/test/Transforms/InstCombine/aggregate-reconstruction.ll
+++ b/llvm/test/Transforms/InstCombine/aggregate-reconstruction.ll
@@ -13,11 +13,7 @@ declare void @usei32i32agg({ i32, i32 })
 ; We should just return the source aggregate.
 define { i32, i32 } @test0({ i32, i32 } %srcagg) {
 ; CHECK-LABEL: @test0(
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[SRCAGG:%.*]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue { i32, i32 } [[SRCAGG]], 1
-; CHECK-NEXT:    [[I2:%.*]] = insertvalue { i32, i32 } undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, i32 } [[I2]], i32 [[I1]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I3]]
+; CHECK-NEXT:    ret { i32, i32 } [[SRCAGG:%.*]]
 ;
   %i0 = extractvalue { i32, i32 } %srcagg, 0
   %i1 = extractvalue { i32, i32 } %srcagg, 1
@@ -29,11 +25,7 @@ define { i32, i32 } @test0({ i32, i32 } %srcagg) {
 ; Arrays are still aggregates
 define [2 x i32] @test1([2 x i32] %srcagg) {
 ; CHECK-LABEL: @test1(
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue [2 x i32] [[SRCAGG:%.*]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue [2 x i32] [[SRCAGG]], 1
-; CHECK-NEXT:    [[I2:%.*]] = insertvalue [2 x i32] undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue [2 x i32] [[I2]], i32 [[I1]], 1
-; CHECK-NEXT:    ret [2 x i32] [[I3]]
+; CHECK-NEXT:    ret [2 x i32] [[SRCAGG:%.*]]
 ;
   %i0 = extractvalue [2 x i32] %srcagg, 0
   %i1 = extractvalue [2 x i32] %srcagg, 1
@@ -83,11 +75,7 @@ define {{ i32, i32 }} @test3({{ i32, i32 }} %srcagg) {
 ; This is fine, however, all elements are on the same level
 define { i32, { i32 } } @test4({ i32, { i32 } } %srcagg) {
 ; CHECK-LABEL: @test4(
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, { i32 } } [[SRCAGG:%.*]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue { i32, { i32 } } [[SRCAGG]], 1
-; CHECK-NEXT:    [[I2:%.*]] = insertvalue { i32, { i32 } } undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, { i32 } } [[I2]], { i32 } [[I1]], 1
-; CHECK-NEXT:    ret { i32, { i32 } } [[I3]]
+; CHECK-NEXT:    ret { i32, { i32 } } [[SRCAGG:%.*]]
 ;
   %i0 = extractvalue { i32, { i32 } } %srcagg, 0
   %i1 = extractvalue { i32, { i32 } } %srcagg, 1
@@ -216,8 +204,7 @@ define { i32, i32 } @test12({ i32, i32 } %srcagg) {
 ; CHECK-NEXT:    call void @usei32(i32 [[I1]])
 ; CHECK-NEXT:    [[I2:%.*]] = insertvalue { i32, i32 } undef, i32 [[I0]], 0
 ; CHECK-NEXT:    call void @usei32i32agg({ i32, i32 } [[I2]])
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, i32 } [[I2]], i32 [[I1]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I3]]
+; CHECK-NEXT:    ret { i32, i32 } [[SRCAGG]]
 ;
   %i0 = extractvalue { i32, i32 } %srcagg, 0
   call void @usei32(i32 %i0)
@@ -233,11 +220,7 @@ define { i32, i32 } @test12({ i32, i32 } %srcagg) {
 ; overwritten with %i0, so all is fine.
 define { i32, i32 } @test13({ i32, i32 } %srcagg) {
 ; CHECK-LABEL: @test13(
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[SRCAGG:%.*]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue { i32, i32 } [[SRCAGG]], 1
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, i32 } undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I4:%.*]] = insertvalue { i32, i32 } [[I3]], i32 [[I1]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I4]]
+; CHECK-NEXT:    ret { i32, i32 } [[SRCAGG:%.*]]
 ;
   %i0 = extractvalue { i32, i32 } %srcagg, 0
   %i1 = extractvalue { i32, i32 } %srcagg, 1
@@ -283,11 +266,7 @@ define { i32, i32 } @test16({ i32, i32 } %srcagg) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br label [[END:%.*]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[SRCAGG:%.*]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue { i32, i32 } [[SRCAGG]], 1
-; CHECK-NEXT:    [[I2:%.*]] = insertvalue { i32, i32 } undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, i32 } [[I2]], i32 [[I1]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I3]]
+; CHECK-NEXT:    ret { i32, i32 } [[SRCAGG:%.*]]
 ;
 entry:
   br label %end
@@ -308,11 +287,7 @@ define { i32, i32 } @test17({ i32, i32 } %srcagg0, { i32, i32 } %srcagg1, i1 %c)
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[SRCAGG_PHI:%.*]] = phi { i32, i32 } [ [[SRCAGG0:%.*]], [[ENTRY:%.*]] ], [ [[SRCAGG1:%.*]], [[INTERMEDIATE]] ]
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[SRCAGG_PHI]], 0
-; CHECK-NEXT:    [[I1:%.*]] = extractvalue { i32, i32 } [[SRCAGG_PHI]], 1
-; CHECK-NEXT:    [[I2:%.*]] = insertvalue { i32, i32 } undef, i32 [[I0]], 0
-; CHECK-NEXT:    [[I3:%.*]] = insertvalue { i32, i32 } [[I2]], i32 [[I1]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I3]]
+; CHECK-NEXT:    ret { i32, i32 } [[SRCAGG_PHI]]
 ;
 entry:
   br i1 %c, label %intermediate, label %end
diff --git a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
index 8ea52c4e5b228..2338a696d187a 100644
--- a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
+++ b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
@@ -17,21 +17,14 @@ define { i32, i32 } @test0({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C:%.*]], label [[LEFT:%.*]], label [[RIGHT:%.*]]
 ; CHECK:       left:
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT:%.*]], 0
-; CHECK-NEXT:    [[I2:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT]], 1
 ; CHECK-NEXT:    call void @foo()
 ; CHECK-NEXT:    br label [[END:%.*]]
 ; CHECK:       right:
-; CHECK-NEXT:    [[I3:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT:%.*]], 0
-; CHECK-NEXT:    [[I4:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT]], 1
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[I5:%.*]] = phi i32 [ [[I0]], [[LEFT]] ], [ [[I3]], [[RIGHT]] ]
-; CHECK-NEXT:    [[I6:%.*]] = phi i32 [ [[I2]], [[LEFT]] ], [ [[I4]], [[RIGHT]] ]
+; CHECK-NEXT:    [[I8:%.*]] = phi { i32, i32 } [ [[AGG_RIGHT:%.*]], [[RIGHT]] ], [ [[AGG_LEFT:%.*]], [[LEFT]] ]
 ; CHECK-NEXT:    call void @baz()
-; CHECK-NEXT:    [[I7:%.*]] = insertvalue { i32, i32 } undef, i32 [[I5]], 0
-; CHECK-NEXT:    [[I8:%.*]] = insertvalue { i32, i32 } [[I7]], i32 [[I6]], 1
 ; CHECK-NEXT:    ret { i32, i32 } [[I8]]
 ;
 entry:
@@ -278,24 +271,17 @@ define { i32, i32 } @test5({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C0:%.*]], label [[LEFT:%.*]], label [[RIGHT:%.*]]
 ; CHECK:       left:
-; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT:%.*]], 0
-; CHECK-NEXT:    [[I2:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT]], 1
 ; CHECK-NEXT:    call void @foo()
 ; CHECK-NEXT:    br label [[MIDDLE:%.*]]
 ; CHECK:       right:
-; CHECK-NEXT:    [[I3:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT:%.*]], 0
-; CHECK-NEXT:    [[I4:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT]], 1
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    br label [[MIDDLE]]
 ; CHECK:       middle:
-; CHECK-NEXT:    [[I5:%.*]] = phi i32 [ [[I0]], [[LEFT]] ], [ [[I3]], [[RIGHT]] ], [ [[I5]], [[MIDDLE]] ]
-; CHECK-NEXT:    [[I6:%.*]] = phi i32 [ [[I2]], [[LEFT]] ], [ [[I4]], [[RIGHT]] ], [ [[I6]], [[MIDDLE]] ]
+; CHECK-NEXT:    [[I8:%.*]] = phi { i32, i32 } [ [[I8]], [[MIDDLE]] ], [ [[AGG_RIGHT:%.*]], [[RIGHT]] ], [ [[AGG_LEFT:%.*]], [[LEFT]] ]
 ; CHECK-NEXT:    call void @baz()
 ; CHECK-NEXT:    [[C1:%.*]] = call i1 @geni1()
 ; CHECK-NEXT:    br i1 [[C1]], label [[END:%.*]], label [[MIDDLE]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[I7:%.*]] = insertvalue { i32, i32 } undef, i32 [[I5]], 0
-; CHECK-NEXT:    [[I8:%.*]] = insertvalue { i32, i32 } [[I7]], i32 [[I6]], 1
 ; CHECK-NEXT:    ret { i32, i32 } [[I8]]
 ;
 entry: