Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLP]Improve minbitwidth analysis. #84536

Conversation

alexey-bataev
Copy link
Member

This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program size..text
results results0 diff
test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0%

                                                                          test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: #84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Created using spr 1.3.5
@alexey-bataev alexey-bataev merged commit 2bd369b into main Mar 8, 2024
4 of 5 checks passed
@alexey-bataev alexey-bataev deleted the users/alexey-bataev/spr/slpimprove-minbitwidth-analysis-1 branch March 8, 2024 18:57
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 8, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alexey Bataev (alexey-bataev)

Changes

This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program size..text
results results0 diff
test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0%

                                                                          test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: #84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.


Patch is 71.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84536.diff

15 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+481-195)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll (+5-4)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr2.ll (+2-2)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/reduce-add-i64.ll (+5-15)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/reductions.ll (+4-3)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/PR35777.ll (+5-4)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/int-bitcast-minbitwidth.ll (+1-1)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-multiuse-with-insertelement.ll (+8-9)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-transformed-operand.ll (+13-8)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll (+24-19)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll (+12-12)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/resched.ll (+16-16)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/reused-reductions-with-minbitwidth.ll (+4-6)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/store-insertelement-minbitwidth.ll (+12-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/alt-cmp-vectorize.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 36dc9094538ae9..a5c34bfbf9b4c3 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -1085,6 +1085,9 @@ class BoUpSLP {
       BS->clear();
     }
     MinBWs.clear();
+    ReductionBitWidth = 0;
+    CastMaxMinBWSizes.reset();
+    TruncNodes.clear();
     InstrElementSize.clear();
     UserIgnoreList = nullptr;
     PostponedGathers.clear();
@@ -2287,6 +2290,7 @@ class BoUpSLP {
   void clearReductionData() {
     AnalyzedReductionsRoots.clear();
     AnalyzedReductionVals.clear();
+    AnalyzedMinBWVals.clear();
   }
   /// Checks if the given value is gathered in one of the nodes.
   bool isAnyGathered(const SmallDenseSet<Value *> &Vals) const {
@@ -2307,9 +2311,11 @@ class BoUpSLP {
   /// constant and to be demoted. Required to correctly identify constant nodes
   /// to be demoted.
   bool collectValuesToDemote(
-      Value *V, SmallVectorImpl<Value *> &ToDemote,
+      Value *V, bool IsProfitableToDemoteRoot, unsigned &BitWidth,
+      SmallVectorImpl<Value *> &ToDemote,
       DenseMap<Instruction *, SmallVector<unsigned>> &DemotedConsts,
-      SmallVectorImpl<Value *> &Roots, DenseSet<Value *> &Visited) const;
+      DenseSet<Value *> &Visited, unsigned &MaxDepthLevel,
+      bool &IsProfitableToDemote) const;
 
   /// Check if the operands on the edges \p Edges of the \p UserTE allows
   /// reordering (i.e. the operands can be reordered because they have only one
@@ -2375,6 +2381,10 @@ class BoUpSLP {
   /// \ returns the graph entry for the \p Idx operand of the \p E entry.
   const TreeEntry *getOperandEntry(const TreeEntry *E, unsigned Idx) const;
 
+  /// \returns Cast context for the given graph node.
+  TargetTransformInfo::CastContextHint
+  getCastContextHint(const TreeEntry &TE) const;
+
   /// \returns the cost of the vectorizable entry.
   InstructionCost getEntryCost(const TreeEntry *E,
                                ArrayRef<Value *> VectorizedVals,
@@ -2925,11 +2935,18 @@ class BoUpSLP {
       }
       assert(!BundleMember && "Bundle and VL out of sync");
     } else {
-      MustGather.insert(VL.begin(), VL.end());
       // Build a map for gathered scalars to the nodes where they are used.
+      bool AllConstsOrCasts = true;
       for (Value *V : VL)
-        if (!isConstant(V))
+        if (!isConstant(V)) {
+          auto *I = dyn_cast<CastInst>(V);
+          AllConstsOrCasts &= I && I->getType()->isIntegerTy();
           ValueToGatherNodes.try_emplace(V).first->getSecond().insert(Last);
+        }
+      if (AllConstsOrCasts)
+        CastMaxMinBWSizes =
+            std::make_pair(std::numeric_limits<unsigned>::max(), 1);
+      MustGather.insert(VL.begin(), VL.end());
     }
 
     if (UserTreeIdx.UserTE)
@@ -3054,6 +3071,10 @@ class BoUpSLP {
   /// Set of hashes for the list of reduction values already being analyzed.
   DenseSet<size_t> AnalyzedReductionVals;
 
+  /// Values, already been analyzed for mininmal bitwidth and found to be
+  /// non-profitable.
+  DenseSet<Value *> AnalyzedMinBWVals;
+
   /// A list of values that need to extracted out of the tree.
   /// This list holds pairs of (Internal Scalar : External User). External User
   /// can be nullptr, it means that this Internal Scalar will be used later,
@@ -3629,6 +3650,18 @@ class BoUpSLP {
   /// value must be signed-extended, rather than zero-extended, back to its
   /// original width.
   DenseMap<const TreeEntry *, std::pair<uint64_t, bool>> MinBWs;
+
+  /// Final size of the reduced vector, if the current graph represents the
+  /// input for the reduction and it was possible to narrow the size of the
+  /// reduction.
+  unsigned ReductionBitWidth = 0;
+
+  /// If the tree contains any zext/sext/trunc nodes, contains max-min pair of
+  /// type sizes, used in the tree.
+  std::optional<std::pair<unsigned, unsigned>> CastMaxMinBWSizes;
+
+  /// Indices of the vectorized trunc nodes.
+  DenseSet<unsigned> TruncNodes;
 };
 
 } // end namespace slpvectorizer
@@ -6539,8 +6572,29 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
     case Instruction::Trunc:
     case Instruction::FPTrunc:
     case Instruction::BitCast: {
+      auto [PrevMaxBW, PrevMinBW] = CastMaxMinBWSizes.value_or(
+          std::make_pair(std::numeric_limits<unsigned>::min(),
+                         std::numeric_limits<unsigned>::max()));
+      if (ShuffleOrOp == Instruction::ZExt ||
+          ShuffleOrOp == Instruction::SExt) {
+        CastMaxMinBWSizes = std::make_pair(
+            std::max<unsigned>(DL->getTypeSizeInBits(VL0->getType()),
+                               PrevMaxBW),
+            std::min<unsigned>(
+                DL->getTypeSizeInBits(VL0->getOperand(0)->getType()),
+                PrevMinBW));
+      } else if (ShuffleOrOp == Instruction::Trunc) {
+        CastMaxMinBWSizes = std::make_pair(
+            std::max<unsigned>(
+                DL->getTypeSizeInBits(VL0->getOperand(0)->getType()),
+                PrevMaxBW),
+            std::min<unsigned>(DL->getTypeSizeInBits(VL0->getType()),
+                               PrevMinBW));
+        TruncNodes.insert(VectorizableTree.size());
+      }
       TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx,
                                    ReuseShuffleIndicies);
+
       LLVM_DEBUG(dbgs() << "SLP: added a vector of casts.\n");
 
       TE->setOperandsInOrder();
@@ -8362,6 +8416,22 @@ const BoUpSLP::TreeEntry *BoUpSLP::getOperandEntry(const TreeEntry *E,
   return It->get();
 }
 
+TTI::CastContextHint BoUpSLP::getCastContextHint(const TreeEntry &TE) const {
+  if (TE.State == TreeEntry::ScatterVectorize ||
+      TE.State == TreeEntry::StridedVectorize)
+    return TTI::CastContextHint::GatherScatter;
+  if (TE.State == TreeEntry::Vectorize && TE.getOpcode() == Instruction::Load &&
+      !TE.isAltShuffle()) {
+    if (TE.ReorderIndices.empty())
+      return TTI::CastContextHint::Normal;
+    SmallVector<int> Mask;
+    inversePermutation(TE.ReorderIndices, Mask);
+    if (ShuffleVectorInst::isReverseMask(Mask, Mask.size()))
+      return TTI::CastContextHint::Reversed;
+  }
+  return TTI::CastContextHint::None;
+}
+
 InstructionCost
 BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
                       SmallPtrSetImpl<Value *> &CheckedExtracts) {
@@ -8384,6 +8454,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
   // If we have computed a smaller type for the expression, update VecTy so
   // that the costs will be accurate.
   auto It = MinBWs.find(E);
+  Type *OrigScalarTy = ScalarTy;
   if (It != MinBWs.end()) {
     ScalarTy = IntegerType::get(F->getContext(), It->second.first);
     VecTy = FixedVectorType::get(ScalarTy, VL.size());
@@ -8441,24 +8512,11 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
     UsedScalars.set(I);
   }
   auto GetCastContextHint = [&](Value *V) {
-    if (const TreeEntry *OpTE = getTreeEntry(V)) {
-      if (OpTE->State == TreeEntry::ScatterVectorize ||
-          OpTE->State == TreeEntry::StridedVectorize)
-        return TTI::CastContextHint::GatherScatter;
-      if (OpTE->State == TreeEntry::Vectorize &&
-          OpTE->getOpcode() == Instruction::Load && !OpTE->isAltShuffle()) {
-        if (OpTE->ReorderIndices.empty())
-          return TTI::CastContextHint::Normal;
-        SmallVector<int> Mask;
-        inversePermutation(OpTE->ReorderIndices, Mask);
-        if (ShuffleVectorInst::isReverseMask(Mask, Mask.size()))
-          return TTI::CastContextHint::Reversed;
-      }
-    } else {
-      InstructionsState SrcState = getSameOpcode(E->getOperand(0), *TLI);
-      if (SrcState.getOpcode() == Instruction::Load && !SrcState.isAltShuffle())
-        return TTI::CastContextHint::GatherScatter;
-    }
+    if (const TreeEntry *OpTE = getTreeEntry(V))
+      return getCastContextHint(*OpTE);
+    InstructionsState SrcState = getSameOpcode(E->getOperand(0), *TLI);
+    if (SrcState.getOpcode() == Instruction::Load && !SrcState.isAltShuffle())
+      return TTI::CastContextHint::GatherScatter;
     return TTI::CastContextHint::None;
   };
   auto GetCostDiff =
@@ -8507,8 +8565,6 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
               TTI::CastContextHint CCH = GetCastContextHint(VL0);
               VecCost += TTI->getCastInstrCost(VecOpcode, UserVecTy, VecTy, CCH,
                                                CostKind);
-              ScalarCost += Sz * TTI->getCastInstrCost(VecOpcode, UserScalarTy,
-                                                       ScalarTy, CCH, CostKind);
             }
           }
         }
@@ -8525,7 +8581,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
     InstructionCost ScalarCost = 0;
     InstructionCost VecCost = 0;
     std::tie(ScalarCost, VecCost) = getGEPCosts(
-        *TTI, Ptrs, BasePtr, E->getOpcode(), CostKind, ScalarTy, VecTy);
+        *TTI, Ptrs, BasePtr, E->getOpcode(), CostKind, OrigScalarTy, VecTy);
     LLVM_DEBUG(dumpTreeCosts(E, 0, VecCost, ScalarCost,
                              "Calculated GEPs cost for Tree"));
 
@@ -8572,7 +8628,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
           NumElts = ATy->getNumElements();
         else
           NumElts = AggregateTy->getStructNumElements();
-        SrcVecTy = FixedVectorType::get(ScalarTy, NumElts);
+        SrcVecTy = FixedVectorType::get(OrigScalarTy, NumElts);
       }
       if (I->hasOneUse()) {
         Instruction *Ext = I->user_back();
@@ -8740,13 +8796,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       }
     }
     auto GetScalarCost = [&](unsigned Idx) -> InstructionCost {
-      // Do not count cost here if minimum bitwidth is in effect and it is just
-      // a bitcast (here it is just a noop).
-      if (VecOpcode != Opcode && VecOpcode == Instruction::BitCast)
-        return TTI::TCC_Free;
-      auto *VI = VL0->getOpcode() == Opcode
-                     ? cast<Instruction>(UniqueValues[Idx])
-                     : nullptr;
+      auto *VI = cast<Instruction>(UniqueValues[Idx]);
       return TTI->getCastInstrCost(Opcode, VL0->getType(),
                                    VL0->getOperand(0)->getType(),
                                    TTI::getCastContextHint(VI), CostKind, VI);
@@ -8789,7 +8839,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
                                        ? CmpInst::BAD_FCMP_PREDICATE
                                        : CmpInst::BAD_ICMP_PREDICATE;
 
-      return TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy,
+      return TTI->getCmpSelInstrCost(E->getOpcode(), OrigScalarTy,
                                      Builder.getInt1Ty(), CurrentPred, CostKind,
                                      VI);
     };
@@ -8844,7 +8894,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       TTI::OperandValueInfo Op2Info =
           TTI::getOperandInfo(VI->getOperand(OpIdx));
       SmallVector<const Value *> Operands(VI->operand_values());
-      return TTI->getArithmeticInstrCost(ShuffleOrOp, ScalarTy, CostKind,
+      return TTI->getArithmeticInstrCost(ShuffleOrOp, OrigScalarTy, CostKind,
                                          Op1Info, Op2Info, Operands, VI);
     };
     auto GetVectorCost = [=](InstructionCost CommonCost) {
@@ -8863,9 +8913,9 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
   case Instruction::Load: {
     auto GetScalarCost = [&](unsigned Idx) {
       auto *VI = cast<LoadInst>(UniqueValues[Idx]);
-      return TTI->getMemoryOpCost(Instruction::Load, ScalarTy, VI->getAlign(),
-                                  VI->getPointerAddressSpace(), CostKind,
-                                  TTI::OperandValueInfo(), VI);
+      return TTI->getMemoryOpCost(Instruction::Load, OrigScalarTy,
+                                  VI->getAlign(), VI->getPointerAddressSpace(),
+                                  CostKind, TTI::OperandValueInfo(), VI);
     };
     auto *LI0 = cast<LoadInst>(VL0);
     auto GetVectorCost = [&](InstructionCost CommonCost) {
@@ -8908,9 +8958,9 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
     auto GetScalarCost = [=](unsigned Idx) {
       auto *VI = cast<StoreInst>(VL[Idx]);
       TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(VI->getValueOperand());
-      return TTI->getMemoryOpCost(Instruction::Store, ScalarTy, VI->getAlign(),
-                                  VI->getPointerAddressSpace(), CostKind,
-                                  OpInfo, VI);
+      return TTI->getMemoryOpCost(Instruction::Store, OrigScalarTy,
+                                  VI->getAlign(), VI->getPointerAddressSpace(),
+                                  CostKind, OpInfo, VI);
     };
     auto *BaseSI =
         cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);
@@ -9772,6 +9822,44 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
     Cost -= InsertCost;
   }
 
+  // Add the cost for reduced value resize (if required).
+  if (ReductionBitWidth != 0) {
+    assert(UserIgnoreList && "Expected reduction tree.");
+    const TreeEntry &E = *VectorizableTree.front().get();
+    auto It = MinBWs.find(&E);
+    if (It != MinBWs.end() && It->second.first != ReductionBitWidth) {
+      unsigned SrcSize = It->second.first;
+      unsigned DstSize = ReductionBitWidth;
+      unsigned Opcode = Instruction::Trunc;
+      if (SrcSize < DstSize)
+        Opcode = It->second.second ? Instruction::SExt : Instruction::ZExt;
+      auto *SrcVecTy =
+          FixedVectorType::get(Builder.getIntNTy(SrcSize), E.getVectorFactor());
+      auto *DstVecTy =
+          FixedVectorType::get(Builder.getIntNTy(DstSize), E.getVectorFactor());
+      TTI::CastContextHint CCH = getCastContextHint(E);
+      InstructionCost CastCost;
+      switch (E.getOpcode()) {
+      case Instruction::SExt:
+      case Instruction::ZExt:
+      case Instruction::Trunc: {
+        const TreeEntry *OpTE = getOperandEntry(&E, 0);
+        CCH = getCastContextHint(*OpTE);
+        break;
+      }
+      default:
+        break;
+      }
+      CastCost += TTI->getCastInstrCost(Opcode, DstVecTy, SrcVecTy, CCH,
+                                        TTI::TCK_RecipThroughput);
+      Cost += CastCost;
+      LLVM_DEBUG(dbgs() << "SLP: Adding cost " << CastCost
+                        << " for final resize for reduction from " << SrcVecTy
+                        << " to " << DstVecTy << "\n";
+                 dbgs() << "SLP: Current total cost = " << Cost << "\n");
+    }
+  }
+
 #ifndef NDEBUG
   SmallString<256> Str;
   {
@@ -10042,7 +10130,7 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
             continue;
           VTE = *It->getSecond().begin();
           // Iterate through all vectorized nodes.
-          auto *MIt = find_if(It->getSecond(), [](const TreeEntry *MTE) {
+          auto *MIt = find_if(It->getSecond(), [&](const TreeEntry *MTE) {
             return MTE->State == TreeEntry::Vectorize;
           });
           if (MIt == It->getSecond().end())
@@ -10053,11 +10141,6 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
       Instruction &LastBundleInst = getLastInstructionInBundle(VTE);
       if (&LastBundleInst == TEInsertPt || !CheckOrdering(&LastBundleInst))
         continue;
-      auto It = MinBWs.find(VTE);
-      // If vectorize node is demoted - do not match.
-      if (It != MinBWs.end() &&
-          It->second.first != DL->getTypeSizeInBits(V->getType()))
-        continue;
       VToTEs.insert(VTE);
     }
     if (VToTEs.empty())
@@ -10105,6 +10188,57 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
     return std::nullopt;
   }
 
+  // Filter out entries with larger bitwidth of elements.
+  Type *ScalarTy = VL.front()->getType();
+  unsigned BitWidth = 0;
+  if (ScalarTy->isIntegerTy()) {
+    // Check if the used TEs supposed to be resized and choose the best
+    // candidates.
+    BitWidth = DL->getTypeStoreSize(ScalarTy);
+    if (TEUseEI.UserTE->getOpcode() != Instruction::Select ||
+        TEUseEI.EdgeIdx != 0) {
+      auto UserIt = MinBWs.find(TEUseEI.UserTE);
+      if (UserIt != MinBWs.end())
+        BitWidth = UserIt->second.second;
+    }
+    // Check if the used TEs supposed to be resized and choose the best
+    // candidates.
+    unsigned NodesBitWidth = 0;
+    auto CheckBitwidth = [&](const TreeEntry &TE) {
+      unsigned TEBitWidth = BitWidth;
+      auto UserIt = MinBWs.find(TEUseEI.UserTE);
+      if (UserIt != MinBWs.end())
+        TEBitWidth = UserIt->second.second;
+      if (BitWidth <= TEBitWidth) {
+        if (NodesBitWidth == 0)
+          NodesBitWidth = TEBitWidth;
+        return NodesBitWidth == TEBitWidth;
+      }
+      return false;
+    };
+    for (auto [Idx, Set] : enumerate(UsedTEs)) {
+      DenseSet<const TreeEntry *> ForRemoval;
+      for (const TreeEntry *TE : Set) {
+        if (!CheckBitwidth(*TE))
+          ForRemoval.insert(TE);
+      }
+      // All elements must be removed - remove the whole container.
+      if (ForRemoval.size() == Set.size()) {
+        Set.clear();
+        continue;
+      }
+      for (const TreeEntry *TE : ForRemoval)
+        Set.erase(TE);
+    }
+    for (auto *It = UsedTEs.begin(); It != UsedTEs.end();) {
+      if (It->empty()) {
+        UsedTEs.erase(It);
+        continue;
+      }
+      std::advance(It, 1);
+    }
+  }
+
   unsigned VF = 0;
   if (UsedTEs.size() == 1) {
     // Keep the order to avoid non-determinism.
@@ -12929,7 +13063,21 @@ Value *BoUpSLP::vectorizeTree(
   Builder.ClearInsertionPoint();
   InstrElementSize.clear();
 
-  return VectorizableTree[0]->VectorizedValue;
+  const TreeEntry &RootTE = *VectorizableTree.front().get();
+  Value *Vec = RootTE.VectorizedValue;
+  if (auto It = MinBWs.find(&RootTE); ReductionBitWidth != 0 &&
+                                      It != MinBWs.end() &&
+                                      ReductionBitWidth != It->second.first) {
+    IRBuilder<>::InsertPointGuard Guard(Builder);
+    Builder.SetInsertPoint(ReductionRoot->getParent(),
+                           ReductionRoot->getIterator());
+    Vec = Builder.CreateIntCast(
+        Vec,
+        VectorType::get(Builder.getIntNTy(ReductionBitWidth),
+                        cast<VectorType>(Vec->getType())->getElementCount()),
+        It->second.second);
+  }
+  return Vec;
 }
 
 void BoUpSLP::optimizeGatherSequence() {
@@ -13749,23 +13897,48 @@ unsigned BoUpSLP::getVectorElementSize(Value *V) {
 // smaller type with a truncation. We collect the values that will be demoted
 // in ToDemote and additional roots that require investigating in Roots.
 bool BoUpSLP::collectValuesToDemote(
-    Value *V, SmallVectorImpl<Value *> &ToDemote,
+    Value *V, bool IsProfitableToDemoteRoot, unsigned &BitWidth,
+    SmallVectorImpl<Value *> &ToDemote,
     DenseMap<Instruction *, SmallVector<unsigned>> &DemotedConsts,
-    SmallVectorImpl<Value *> &Roots, DenseSet<Value *> &Visited) const {
+    DenseSet<Value *> &Visited, unsigned &MaxDepthLevel,
+    bool &IsProfitableToDemote) const {
   // We can always demote constants.
-  if (isa<Constant>(V))
+  if (isa<Constant>(V)) {
+    MaxDepthLevel = 1;
     return true;
+  }
+
+  if (DL->getTypeSizeInBits(V->getType()) == BitWidth) {
+    MaxDepthLevel = 1;
+    return true;
+  }
 
   // If the value is not a vectorized instruction in the expression and not used
   // by the insertelement instruction and not used in multiple vector nodes, it
   // cannot be demoted.
+  // TODO: improve handling of gathered values and others.
   auto *I = dyn_cast<Instruction>(V);
-  if (!I || !getTreeEntry(I) || MultiNodeScalars.contains(I) ||
-      !Visited.insert(I).second || all_of(I->users(), [&](User *U) {
+  const TreeEntry *ITE = I ? getTreeEntry(I) : nullptr;
+  if (!ITE || !Visited.insert(I).second || MultiNodeScalars.contains(I) ||
+      all_of(I->users(), [&](User *U) {
         return isa<InsertElementInst>(U) && !getTreeEntry(U);
       }))
     return false;
 
+  auto IsPotentiallyTruncated = [&](Value *V, unsigned &BitWidth) -> bool {
+    if (MultiNodeScalars.contain...
[truncated]

Copy link

github-actions bot commented Mar 8, 2024

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

alexey-bataev added a commit that referenced this pull request Mar 14, 2024
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: #84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: #84536
@chapuni
Copy link
Contributor

chapuni commented Mar 15, 2024

Looks like 7f21678 broke aarch64 bootstrap.

https://lab.llvm.org/buildbot/#/builders/176
(and our private builder)

@alexey-bataev
Copy link
Member Author

Can you provide a reproducer? Unable to reproduce it myself

@mstorsjo
Copy link
Member

I'm also seeing issues with this; I see failed asserts in some cases, and successful compilation but incorrect results in others. I'll follow up with repro instructions in a moment.

@alexey-bataev
Copy link
Member Author

Thanks, appreciate it.

@alexey-bataev
Copy link
Member Author

I'll revert the patches for now, but please provide reproducers, it will help to investigate the issues.

@mstorsjo
Copy link
Member

I've run into two separate issues:

typedef char *a;
int b, c;
void d() {
  int colctr = b, colsum, i, j;
  a e, f, g;
  long h;
  for (; c;) {
    colsum = *e++ + *f++;
    j = e[0] + f[0];
    i = colsum;
    for (; colctr; colctr--) {
      e++;
      f++;
      j = e[0] + f[0];
      *g++ = i = colsum;
      colsum = j;
    }
    h = i + j;
    *g = h;
  }
}

Compiled like this:

$ clang -target i686-w64-mingw32 -O2 -c repro.c
clang: ../include/llvm/ADT/DenseMap.h:1270: llvm::DenseMapIterator<KeyT, ValueT,
 KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, Ke
yInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::slpvectori
zer::BoUpSLP::TreeEntry*; ValueT = std::pair<long unsigned int, bool>; KeyInfoT 
= llvm::DenseMapInfo<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, void>; Buck
et = llvm::detail::DenseMapPair<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, 
std::pair<long unsigned int, bool> >; bool IsConst = false; llvm::DenseMapIterat
or<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPai
r<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, std::pair<long unsigned int, b
ool> >*]: Assertion `Ptr != End && "dereferencing end() iterator"' failed.

This regressed in ea429e1 / #84363, and is a clear and simple assert failure.

This PR, and commit 7f21678, caused miscompilations (code doing the wrong thing at runtime), noticeable in ffmpeg. (I have also observed test failures in openh264, but I haven't had to verify if this is caused by this same commit or not, but it seems plausible.)

To repro that, do something along these lines:

$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=../ffmpeg-samples
$ make fate-rsync # fetching input data to tests
$ make -j$(nproc) fate

I've reproduced these failures with i686-mingw, x86_64-mingw, aarch64-mingw and aarch64-linux targets - I presume it's reproducible for x86_64 linux as well but I haven't tried that.

@alexey-bataev
Copy link
Member Author

Thanks, will investigate

@mstorsjo
Copy link
Member

Thanks, will investigate

Thanks! If fixing it takes long, I'd appreicate a revert, but otherwise I'm fine waiting a bit for a forward fix if that's possible.

@alexey-bataev
Copy link
Member Author

AS I said, I'll revert it, better to commit fixed versions. Will revert ASAP.

alexey-bataev added a commit that referenced this pull request Mar 15, 2024
alexey-bataev added a commit that referenced this pull request Mar 15, 2024
…false."

This reverts commit e4b7724 to fixx the
issues reported in #84536.
alexey-bataev added a commit that referenced this pull request Mar 15, 2024
This reverts commit 7f21678 to fix
issues reported in #84536.
@goldsteinn
Copy link
Contributor

goldsteinn commented Mar 15, 2024

Think this also is causing a failure when building llvm-test-suite (which might be easier to reproduce):

FAILED: MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o 
/home/noah/programs/opensource/llvm-dev/src/llvm-test-suite/build/tools/timeit --summary MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o.time /home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang -DNDEBUG  -fuse-ld=lld --ld-path=/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/ld.lld -O3    -O3 -DNDEBUG   -w -Werror=date-time -MD -MT MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o -MF MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o.d -o MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o -c /home/noah/programs/opensource/llvm-dev/src/llvm-test-suite/MultiSource/Benchmarks/MiBench/consumer-jpeg/jdsample.c
clang: /home/noah/programs/opensource/llvm-dev/src/llvm-project/llvm/include/llvm/ADT/DenseMap.h:1270: pointer llvm::DenseMapIterator<const llvm::slpvectorizer::BoUpSLP::TreeEntry *, std::pair<unsigned long, bool>>::operator->() const [KeyT = const llvm::slpvectorizer::BoUpSLP::TreeEntry *, ValueT = std::pair<unsigned long, bool>, KeyInfoT = llvm::DenseMapInfo<const llvm::slpvectorizer::BoUpSLP::TreeEntry *>, Bucket = llvm::detail::DenseMapPair<const llvm::slpvectorizer::BoUpSLP::TreeEntry *, std::pair<unsigned long, bool>>, IsConst = false]: Assertion `Ptr != End && "dereferencing end() iterator"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang -DNDEBUG -fuse-ld=lld --ld-path=/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/ld.lld -O3 -O3 -DNDEBUG -w -Werror=date-time -MD -MT MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o -MF MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o.d -o MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o -c /home/noah/programs/opensource/llvm-dev/src/llvm-test-suite/MultiSource/Benchmarks/MiBench/consumer-jpeg/jdsample.c
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00007f82fe244c18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMSupport.so.19.0git+0x244c18)
 #1 0x00007f82fe2425f0 llvm::sys::RunSignalHandlers() (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMSupport.so.19.0git+0x2425f0)
 #2 0x00007f82fe16d566 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f82fd842520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f82fd8969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f82fd8969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007f82fd8969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007f82fd842476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f82fd8287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007f82fd82871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007f82fd839e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x00007f83015ce8d7 (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1ce8d7)
#12 0x00007f83015e7b36 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1e7b36)
#13 0x00007f83015ee39f llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, 0u>> const&, llvm::SmallVectorImpl<std::pair<llvm::Value*, llvm::Value*>>&, llvm::Instruction*) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1ee39f)
#14 0x00007f83015ee055 llvm::slpvectorizer::BoUpSLP::vectorizeTree() (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1ee055)
#15 0x00007f8301604d7e llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x204d7e)
#16 0x00007f830160a694 bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool, llvm::slpvectorizer::BoUpSLP&) SLPVectorizer.cpp:0:0
#17 0x00007f83015ff4f3 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1ff4f3)
#18 0x00007f83015fd19c llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1fd19c)
#19 0x00007f83015fc49e llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMVectorize.so.19.0git+0x1fc49e)
#20 0x00007f82fcfd1e9d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#21 0x00007f82fe925fff llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMCore.so.19.0git+0x525fff)
#22 0x00007f83085d65cd llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) AMDGPUTargetMachine.cpp:0:0
#23 0x00007f82fe92b37b llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMCore.so.19.0git+0x52b37b)
#24 0x00007f83085d636d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) AMDGPUTargetMachine.cpp:0:0
#25 0x00007f82fe924aef llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMCore.so.19.0git+0x524aef)
#26 0x00007f8302f2683a (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#27 0x00007f8302f19bd9 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangCodeGen.so.19.0git+0x319bd9)
#28 0x00007f83033c00dc clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangCodeGen.so.19.0git+0x7c00dc)
#29 0x00007f82fc6e2379 clang::ParseAST(clang::Sema&, bool, bool) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/../lib/libclangParse.so.19.0git+0x81379)
#30 0x00007f83011fdec4 clang::FrontendAction::Execute() (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangFrontend.so.19.0git+0x1fdec4)
#31 0x00007f830114e7dd clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangFrontend.so.19.0git+0x14e7dd)
#32 0x00007f8304f7bc47 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangFrontendTool.so.19.0git+0x5c47)
#33 0x000055d326570e14 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang+0x1ae14)
#34 0x000055d32656d1c0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#35 0x00007f8300d779c9 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#36 0x00007f82fe16d286 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libLLVMSupport.so.19.0git+0x16d286)
#37 0x00007f8300d7697f clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangDriver.so.19.0git+0x17697f)
#38 0x00007f8300d259cc clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangDriver.so.19.0git+0x1259cc)
#39 0x00007f8300d25fa7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangDriver.so.19.0git+0x125fa7)
#40 0x00007f8300d4c609 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/../lib/libclangDriver.so.19.0git+0x14c609)
#41 0x000055d32656c28d clang_main(int, char**, llvm::ToolContext const&) (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang+0x1628d)
#42 0x000055d32657fd43 main (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang+0x29d43)
#43 0x00007f82fd829d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#44 0x00007f82fd829e40 call_init ./csu/../csu/libc-start.c:128:20
#45 0x00007f82fd829e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#46 0x000055d326569535 _start (/home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin/clang+0x13535)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 19.0.0git (git@github.com:llvm/llvm-project.git c3a1eb6207d85cb37ea29306481b40c9f6402309)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/noah/programs/opensource/llvm-dev/src/llvm-project/build/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/jdsample-da2de5.c
clang: note: diagnostic msg: /tmp/jdsample-da2de5.sh
clang: note: diagnostic msg: 

********************

@alexey-bataev
Copy link
Member Author

The patches were reverted, it should be fixed for now

@mstorsjo
Copy link
Member

AS I said, I'll revert it, better to commit fixed versions. Will revert ASAP.

Thanks, much appreciated!

Think this also is causing a failure when building llvm-test-suite (which might be easier to reproduce):

FAILED: MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o 

I think this is the same issue as I had reduced above; building libjpeg (in some form) triggers failed asserts in jdsample.c and jcsample.c, the one above that I had reduced was from jcsample.c.

But the cases that trigger no asserts but produce wrong results at runtime are probably harder to sort out.

@alexey-bataev
Copy link
Member Author

AS I said, I'll revert it, better to commit fixed versions. Will revert ASAP.

Thanks, much appreciated!

Think this also is causing a failure when building llvm-test-suite (which might be easier to reproduce):

FAILED: MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jdsample.c.o 

I think this is the same issue as I had reduced above; building libjpeg (in some form) triggers failed asserts in jdsample.c and jcsample.c, the one above that I had reduced was from jcsample.c.

But the cases that trigger no asserts but produce wrong results at runtime are probably harder to sort out.

Thanks, I fixed one small bug and currently investigating the last one.

alexey-bataev added a commit that referenced this pull request Mar 19, 2024
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: #84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: #84536
@ZequanWu
Copy link
Contributor

The reland 31eaf86 causes clang crash:

clang-cl: ../../llvm/lib/IR/Instructions.cpp:737: void llvm::CallInst::init(FunctionType *, Value *, ArrayRef<Value *>, ArrayRef<OperandBundleDef>, const Twine &): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang-cl -cc1 -triple x86_64-pc-windows-msvc19.34.0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name performance_metrics_overlay.cc -mrelocation-model pic -pic-level 2 -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -fms-volatile -funwind-tables=2 -target-cpu x86-64 -target-feature +sse3 -mllvm -x86-asm-syntax=intel -tune-cpu generic -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames --show-includes -fno-rtti-data -stack-protector 2 -fdiagnostics-format msvc -cfguard-no-checks -gcodeview -gcodeview-ghash -gno-codeview-command-line -debug-info-kind=line-tables-only -fdebug-compilation-dir=. -object-file-name=obj\\media\\cast\\sender\\performance_metrics_overlay.obj -mllvm -crash-diagnostics-dir=../../tools/clang/crashreports -ffunction-sections -fcoverage-compilation-dir=. -D USE_AURA=1 -D MEMORY_TOOL_REPLACES_ALLOCATOR -D ADDRESS_SANITIZER -D _HAS_NODISCARD -D _CRT_NONSTDC_NO_WARNINGS -D _WINSOCK_DEPRECATED_NO_WARNINGS -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D CR_CLANG_REVISION=\"llvmorg-19-init-6054-g9fb85b09-1\" -D _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D CR_LIBCXX_REVISION=80307e66e74bae927fb8709a549859e777e3bf0b -D TEMP_REBUILD_HACK -D __STD_C -D _CRT_RAND_S -D _CRT_SECURE_NO_DEPRECATE -D _SCL_SECURE_NO_DEPRECATE -D _ATL_NO_OPENGL -D _WINDOWS -D CERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D PSAPI_VERSION=2 -D WIN32 -D _SECURE_ATL -D WINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -D WIN32_LEAN_AND_MEAN -D NOMINMAX -D _UNICODE -D UNICODE -D NTDDI_VERSION=NTDDI_WIN10_NI -D _WIN32_WINNT=0x0A00 -D WINVER=0x0A00 -D NDEBUG -D NVALGRIND -D DYNAMIC_ANNOTATIONS_ENABLED=0 -D BASE_USE_PERFETTO_CLIENT_LIBRARY=1 -D ENABLE_IPC_FUZZER -D LIBYUV_DISABLE_NEON -D LIBYUV_DISABLE_LSX -D LIBYUV_DISABLE_LASX -D SK_ENABLE_SKSL -D SK_UNTIL_CRBUG_1187654_IS_FIXED -D SK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -D SK_WIN_FONTMGR_NO_SIMULATIONS -D SK_DISABLE_LEGACY_INIT_DECODERS -D SK_SLUG_DISABLE_LEGACY_DESERIALIZE -D SK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -D SK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -D SK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -D SK_CODEC_DECODES_JPEG -D SK_ENCODE_JPEG -D SK_ENCODE_PNG -D SK_ENCODE_WEBP -D GR_GL_FUNCTION_TYPE=__stdcall -D SK_GANESH -D SK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -D SK_GL -D SK_VULKAN=1 -D SK_GRAPHITE -D SK_DAWN -D VK_USE_PLATFORM_WIN32_KHR -D USE_EGL -D GOOGLE_PROTOBUF_NO_RTTI -D GOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -D GOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -D U_USING_ICU_NAMESPACE=0 -D U_ENABLE_DYLOAD=0 -D USE_CHROMIUM_ICU=1 -D U_ENABLE_TRACING=1 -D U_ENABLE_RESOURCE_TRACING=0 -D U_STATIC_IMPLEMENTATION -D ICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -D WEBRTC_ENABLE_AVX2 -D RTC_ENABLE_WIN_WGC -D WEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -D WEBRTC_CHROMIUM_BUILD -D WEBRTC_WIN -D ABSL_ALLOCATOR_NOTHROW=1 -D LOGGING_INSIDE_WEBRTC -D ENABLE_TRACE_LOGGING -D CRASHPAD_ZLIB_SOURCE_EXTERNAL -D LEVELDB_PLATFORM_CHROMIUM=1 -D __WRL_ENABLE_FUNCTION_STATICS__ -D __DATE__= -D __TIME__= -D __TIMESTAMP__= -D PROTOBUF_ALLOW_DEPRECATED=1 -ffile-reproducible -O2 -WCL4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wshadow -Wno-builtin-macro-redefined -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-redundant-parens -Wno-redundant-parens -Wenum-compare-conditional -Wno-c++11-narrowing-const-reference -Wno-trigraphs -fdeprecated-macro -ferror-limit 19 -fsanitize=address -fno-sanitize-memory-param-retval -fsanitize-address-use-after-scope -fsanitize-address-globals-dead-stripping -fno-sanitize-address-use-odr-indicator -fno-assume-sane-operator-new -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.34 -std=c++20 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -x c++ performance_metrics_overlay-b41337.cpp
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00005576b9397188 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af5188)
 #1 0x00005576b9394ebe llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af2ebe)
 #2 0x00005576b939781d SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f85a0f7b510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f85a0fc916c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f85a0f7b472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f85a0f654b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f85a0f653d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f85a0f743a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005576b912c0ae llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x388a0ae)
#10 0x00005576b7eee0ed llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::Instruction*) CodeGenModule.cpp:0:0
#11 0x00005576b7f36890 llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::MDNode*) CGCall.cpp:0:0
#12 0x00005576bb7f4c16 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f52c16)
#13 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#14 0x00005576bb7f3709 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51709)
#15 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#16 0x00005576bb7f3f4b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51f4b)
#17 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#18 0x00005576bb7f385b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5185b)
#19 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#20 0x00005576bb7f477a llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5277a)
#21 0x00005576bb7f81e1 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, 0u>> const&, llvm::SmallVectorImpl<std::pair<llvm::Value*, llvm::Value*>>&, llvm::Instruction*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f561e1)
#22 0x00005576bb7f7ff2 llvm::slpvectorizer::BoUpSLP::vectorizeTree() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f55ff2)
#23 0x00005576bb8084de llvm::SLPVectorizerPass::vectorizeStoreChain(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, unsigned int, unsigned int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f664de)
#24 0x00005576bb809b64 llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::$_0::operator()(std::set<std::pair<unsigned int, int>, llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::StoreDistCompare, std::allocator<std::pair<unsigned int, int>>> const&) const SLPVectorizer.cpp:0:0
#25 0x00005576bb808ffb llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f66ffb)
#26 0x00005576bb805b3b llvm::SLPVectorizerPass::vectorizeStoreChains(llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f63b3b)
#27 0x00005576bb804e9b llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f62e9b)
#28 0x00005576bb80479f llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f6279f)
#29 0x00005576bb35ea1d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#30 0x00005576b919d7c0 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fb7c0)
#31 0x00005576b845857d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
#32 0x00005576b919c472 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fa472)
#33 0x00005576b8454d2d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#34 0x00005576b919ca52 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38faa52)
#35 0x00005576b844b03c (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#36 0x00005576b8443acc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2ba1acc)
#37 0x00005576b845d4f5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2bbb4f5)
#38 0x00005576ba357a09 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x4ab5a09)
#39 0x00005576b86f2d5f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2e50d5f)
#40 0x00005576b866a28d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2dc828d)
#41 0x00005576b87ce307 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2f2c307)
#42 0x00005576b7e1e015 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257c015)
#43 0x00005576b7e2c669 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#44 0x00005576b7e2b663 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2589663)
#45 0x00005576b7e2d447 main (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x258b447)
#46 0x00007f85a0f666ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#47 0x00007f85a0f66785 call_init ./csu/../csu/libc-start.c:128:20
#48 0x00007f85a0f66785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#49 0x00005576b7e1cd11 _start (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257ad11)

@alexey-bataev
Copy link
Member Author

The reland 31eaf86 causes clang crash:

clang-cl: ../../llvm/lib/IR/Instructions.cpp:737: void llvm::CallInst::init(FunctionType *, Value *, ArrayRef<Value *>, ArrayRef<OperandBundleDef>, const Twine &): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang-cl -cc1 -triple x86_64-pc-windows-msvc19.34.0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name performance_metrics_overlay.cc -mrelocation-model pic -pic-level 2 -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -fms-volatile -funwind-tables=2 -target-cpu x86-64 -target-feature +sse3 -mllvm -x86-asm-syntax=intel -tune-cpu generic -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames --show-includes -fno-rtti-data -stack-protector 2 -fdiagnostics-format msvc -cfguard-no-checks -gcodeview -gcodeview-ghash -gno-codeview-command-line -debug-info-kind=line-tables-only -fdebug-compilation-dir=. -object-file-name=obj\\media\\cast\\sender\\performance_metrics_overlay.obj -mllvm -crash-diagnostics-dir=../../tools/clang/crashreports -ffunction-sections -fcoverage-compilation-dir=. -D USE_AURA=1 -D MEMORY_TOOL_REPLACES_ALLOCATOR -D ADDRESS_SANITIZER -D _HAS_NODISCARD -D _CRT_NONSTDC_NO_WARNINGS -D _WINSOCK_DEPRECATED_NO_WARNINGS -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D CR_CLANG_REVISION=\"llvmorg-19-init-6054-g9fb85b09-1\" -D _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D CR_LIBCXX_REVISION=80307e66e74bae927fb8709a549859e777e3bf0b -D TEMP_REBUILD_HACK -D __STD_C -D _CRT_RAND_S -D _CRT_SECURE_NO_DEPRECATE -D _SCL_SECURE_NO_DEPRECATE -D _ATL_NO_OPENGL -D _WINDOWS -D CERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D PSAPI_VERSION=2 -D WIN32 -D _SECURE_ATL -D WINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -D WIN32_LEAN_AND_MEAN -D NOMINMAX -D _UNICODE -D UNICODE -D NTDDI_VERSION=NTDDI_WIN10_NI -D _WIN32_WINNT=0x0A00 -D WINVER=0x0A00 -D NDEBUG -D NVALGRIND -D DYNAMIC_ANNOTATIONS_ENABLED=0 -D BASE_USE_PERFETTO_CLIENT_LIBRARY=1 -D ENABLE_IPC_FUZZER -D LIBYUV_DISABLE_NEON -D LIBYUV_DISABLE_LSX -D LIBYUV_DISABLE_LASX -D SK_ENABLE_SKSL -D SK_UNTIL_CRBUG_1187654_IS_FIXED -D SK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -D SK_WIN_FONTMGR_NO_SIMULATIONS -D SK_DISABLE_LEGACY_INIT_DECODERS -D SK_SLUG_DISABLE_LEGACY_DESERIALIZE -D SK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -D SK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -D SK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -D SK_CODEC_DECODES_JPEG -D SK_ENCODE_JPEG -D SK_ENCODE_PNG -D SK_ENCODE_WEBP -D GR_GL_FUNCTION_TYPE=__stdcall -D SK_GANESH -D SK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -D SK_GL -D SK_VULKAN=1 -D SK_GRAPHITE -D SK_DAWN -D VK_USE_PLATFORM_WIN32_KHR -D USE_EGL -D GOOGLE_PROTOBUF_NO_RTTI -D GOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -D GOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -D U_USING_ICU_NAMESPACE=0 -D U_ENABLE_DYLOAD=0 -D USE_CHROMIUM_ICU=1 -D U_ENABLE_TRACING=1 -D U_ENABLE_RESOURCE_TRACING=0 -D U_STATIC_IMPLEMENTATION -D ICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -D WEBRTC_ENABLE_AVX2 -D RTC_ENABLE_WIN_WGC -D WEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -D WEBRTC_CHROMIUM_BUILD -D WEBRTC_WIN -D ABSL_ALLOCATOR_NOTHROW=1 -D LOGGING_INSIDE_WEBRTC -D ENABLE_TRACE_LOGGING -D CRASHPAD_ZLIB_SOURCE_EXTERNAL -D LEVELDB_PLATFORM_CHROMIUM=1 -D __WRL_ENABLE_FUNCTION_STATICS__ -D __DATE__= -D __TIME__= -D __TIMESTAMP__= -D PROTOBUF_ALLOW_DEPRECATED=1 -ffile-reproducible -O2 -WCL4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wshadow -Wno-builtin-macro-redefined -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-redundant-parens -Wno-redundant-parens -Wenum-compare-conditional -Wno-c++11-narrowing-const-reference -Wno-trigraphs -fdeprecated-macro -ferror-limit 19 -fsanitize=address -fno-sanitize-memory-param-retval -fsanitize-address-use-after-scope -fsanitize-address-globals-dead-stripping -fno-sanitize-address-use-odr-indicator -fno-assume-sane-operator-new -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.34 -std=c++20 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -x c++ performance_metrics_overlay-b41337.cpp
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00005576b9397188 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af5188)
 #1 0x00005576b9394ebe llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af2ebe)
 #2 0x00005576b939781d SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f85a0f7b510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f85a0fc916c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f85a0f7b472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f85a0f654b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f85a0f653d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f85a0f743a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005576b912c0ae llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x388a0ae)
#10 0x00005576b7eee0ed llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::Instruction*) CodeGenModule.cpp:0:0
#11 0x00005576b7f36890 llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::MDNode*) CGCall.cpp:0:0
#12 0x00005576bb7f4c16 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f52c16)
#13 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#14 0x00005576bb7f3709 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51709)
#15 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#16 0x00005576bb7f3f4b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51f4b)
#17 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#18 0x00005576bb7f385b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5185b)
#19 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#20 0x00005576bb7f477a llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5277a)
#21 0x00005576bb7f81e1 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, 0u>> const&, llvm::SmallVectorImpl<std::pair<llvm::Value*, llvm::Value*>>&, llvm::Instruction*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f561e1)
#22 0x00005576bb7f7ff2 llvm::slpvectorizer::BoUpSLP::vectorizeTree() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f55ff2)
#23 0x00005576bb8084de llvm::SLPVectorizerPass::vectorizeStoreChain(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, unsigned int, unsigned int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f664de)
#24 0x00005576bb809b64 llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::$_0::operator()(std::set<std::pair<unsigned int, int>, llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::StoreDistCompare, std::allocator<std::pair<unsigned int, int>>> const&) const SLPVectorizer.cpp:0:0
#25 0x00005576bb808ffb llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f66ffb)
#26 0x00005576bb805b3b llvm::SLPVectorizerPass::vectorizeStoreChains(llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f63b3b)
#27 0x00005576bb804e9b llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f62e9b)
#28 0x00005576bb80479f llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f6279f)
#29 0x00005576bb35ea1d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#30 0x00005576b919d7c0 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fb7c0)
#31 0x00005576b845857d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
#32 0x00005576b919c472 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fa472)
#33 0x00005576b8454d2d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#34 0x00005576b919ca52 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38faa52)
#35 0x00005576b844b03c (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#36 0x00005576b8443acc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2ba1acc)
#37 0x00005576b845d4f5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2bbb4f5)
#38 0x00005576ba357a09 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x4ab5a09)
#39 0x00005576b86f2d5f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2e50d5f)
#40 0x00005576b866a28d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2dc828d)
#41 0x00005576b87ce307 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2f2c307)
#42 0x00005576b7e1e015 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257c015)
#43 0x00005576b7e2c669 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#44 0x00005576b7e2b663 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2589663)
#45 0x00005576b7e2d447 main (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x258b447)
#46 0x00007f85a0f666ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#47 0x00007f85a0f66785 call_init ./csu/../csu/libc-start.c:128:20
#48 0x00007f85a0f66785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#49 0x00005576b7e1cd11 _start (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257ad11)

Need a reproducer

@alexey-bataev
Copy link
Member Author

The reland 31eaf86 causes clang crash:

clang-cl: ../../llvm/lib/IR/Instructions.cpp:737: void llvm::CallInst::init(FunctionType *, Value *, ArrayRef<Value *>, ArrayRef<OperandBundleDef>, const Twine &): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang-cl -cc1 -triple x86_64-pc-windows-msvc19.34.0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name performance_metrics_overlay.cc -mrelocation-model pic -pic-level 2 -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -fms-volatile -funwind-tables=2 -target-cpu x86-64 -target-feature +sse3 -mllvm -x86-asm-syntax=intel -tune-cpu generic -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames --show-includes -fno-rtti-data -stack-protector 2 -fdiagnostics-format msvc -cfguard-no-checks -gcodeview -gcodeview-ghash -gno-codeview-command-line -debug-info-kind=line-tables-only -fdebug-compilation-dir=. -object-file-name=obj\\media\\cast\\sender\\performance_metrics_overlay.obj -mllvm -crash-diagnostics-dir=../../tools/clang/crashreports -ffunction-sections -fcoverage-compilation-dir=. -D USE_AURA=1 -D MEMORY_TOOL_REPLACES_ALLOCATOR -D ADDRESS_SANITIZER -D _HAS_NODISCARD -D _CRT_NONSTDC_NO_WARNINGS -D _WINSOCK_DEPRECATED_NO_WARNINGS -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D CR_CLANG_REVISION=\"llvmorg-19-init-6054-g9fb85b09-1\" -D _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D CR_LIBCXX_REVISION=80307e66e74bae927fb8709a549859e777e3bf0b -D TEMP_REBUILD_HACK -D __STD_C -D _CRT_RAND_S -D _CRT_SECURE_NO_DEPRECATE -D _SCL_SECURE_NO_DEPRECATE -D _ATL_NO_OPENGL -D _WINDOWS -D CERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D PSAPI_VERSION=2 -D WIN32 -D _SECURE_ATL -D WINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -D WIN32_LEAN_AND_MEAN -D NOMINMAX -D _UNICODE -D UNICODE -D NTDDI_VERSION=NTDDI_WIN10_NI -D _WIN32_WINNT=0x0A00 -D WINVER=0x0A00 -D NDEBUG -D NVALGRIND -D DYNAMIC_ANNOTATIONS_ENABLED=0 -D BASE_USE_PERFETTO_CLIENT_LIBRARY=1 -D ENABLE_IPC_FUZZER -D LIBYUV_DISABLE_NEON -D LIBYUV_DISABLE_LSX -D LIBYUV_DISABLE_LASX -D SK_ENABLE_SKSL -D SK_UNTIL_CRBUG_1187654_IS_FIXED -D SK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -D SK_WIN_FONTMGR_NO_SIMULATIONS -D SK_DISABLE_LEGACY_INIT_DECODERS -D SK_SLUG_DISABLE_LEGACY_DESERIALIZE -D SK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -D SK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -D SK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -D SK_CODEC_DECODES_JPEG -D SK_ENCODE_JPEG -D SK_ENCODE_PNG -D SK_ENCODE_WEBP -D GR_GL_FUNCTION_TYPE=__stdcall -D SK_GANESH -D SK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -D SK_GL -D SK_VULKAN=1 -D SK_GRAPHITE -D SK_DAWN -D VK_USE_PLATFORM_WIN32_KHR -D USE_EGL -D GOOGLE_PROTOBUF_NO_RTTI -D GOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -D GOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -D U_USING_ICU_NAMESPACE=0 -D U_ENABLE_DYLOAD=0 -D USE_CHROMIUM_ICU=1 -D U_ENABLE_TRACING=1 -D U_ENABLE_RESOURCE_TRACING=0 -D U_STATIC_IMPLEMENTATION -D ICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -D WEBRTC_ENABLE_AVX2 -D RTC_ENABLE_WIN_WGC -D WEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -D WEBRTC_CHROMIUM_BUILD -D WEBRTC_WIN -D ABSL_ALLOCATOR_NOTHROW=1 -D LOGGING_INSIDE_WEBRTC -D ENABLE_TRACE_LOGGING -D CRASHPAD_ZLIB_SOURCE_EXTERNAL -D LEVELDB_PLATFORM_CHROMIUM=1 -D __WRL_ENABLE_FUNCTION_STATICS__ -D __DATE__= -D __TIME__= -D __TIMESTAMP__= -D PROTOBUF_ALLOW_DEPRECATED=1 -ffile-reproducible -O2 -WCL4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wshadow -Wno-builtin-macro-redefined -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-redundant-parens -Wno-redundant-parens -Wenum-compare-conditional -Wno-c++11-narrowing-const-reference -Wno-trigraphs -fdeprecated-macro -ferror-limit 19 -fsanitize=address -fno-sanitize-memory-param-retval -fsanitize-address-use-after-scope -fsanitize-address-globals-dead-stripping -fno-sanitize-address-use-odr-indicator -fno-assume-sane-operator-new -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.34 -std=c++20 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -x c++ performance_metrics_overlay-b41337.cpp
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00005576b9397188 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af5188)
 #1 0x00005576b9394ebe llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af2ebe)
 #2 0x00005576b939781d SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f85a0f7b510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f85a0fc916c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f85a0f7b472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f85a0f654b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f85a0f653d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f85a0f743a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005576b912c0ae llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x388a0ae)
#10 0x00005576b7eee0ed llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::Instruction*) CodeGenModule.cpp:0:0
#11 0x00005576b7f36890 llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::MDNode*) CGCall.cpp:0:0
#12 0x00005576bb7f4c16 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f52c16)
#13 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#14 0x00005576bb7f3709 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51709)
#15 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#16 0x00005576bb7f3f4b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51f4b)
#17 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#18 0x00005576bb7f385b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5185b)
#19 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#20 0x00005576bb7f477a llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5277a)
#21 0x00005576bb7f81e1 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, 0u>> const&, llvm::SmallVectorImpl<std::pair<llvm::Value*, llvm::Value*>>&, llvm::Instruction*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f561e1)
#22 0x00005576bb7f7ff2 llvm::slpvectorizer::BoUpSLP::vectorizeTree() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f55ff2)
#23 0x00005576bb8084de llvm::SLPVectorizerPass::vectorizeStoreChain(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, unsigned int, unsigned int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f664de)
#24 0x00005576bb809b64 llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::$_0::operator()(std::set<std::pair<unsigned int, int>, llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::StoreDistCompare, std::allocator<std::pair<unsigned int, int>>> const&) const SLPVectorizer.cpp:0:0
#25 0x00005576bb808ffb llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f66ffb)
#26 0x00005576bb805b3b llvm::SLPVectorizerPass::vectorizeStoreChains(llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f63b3b)
#27 0x00005576bb804e9b llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f62e9b)
#28 0x00005576bb80479f llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f6279f)
#29 0x00005576bb35ea1d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#30 0x00005576b919d7c0 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fb7c0)
#31 0x00005576b845857d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
#32 0x00005576b919c472 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fa472)
#33 0x00005576b8454d2d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#34 0x00005576b919ca52 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38faa52)
#35 0x00005576b844b03c (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#36 0x00005576b8443acc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2ba1acc)
#37 0x00005576b845d4f5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2bbb4f5)
#38 0x00005576ba357a09 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x4ab5a09)
#39 0x00005576b86f2d5f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2e50d5f)
#40 0x00005576b866a28d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2dc828d)
#41 0x00005576b87ce307 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2f2c307)
#42 0x00005576b7e1e015 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257c015)
#43 0x00005576b7e2c669 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#44 0x00005576b7e2b663 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2589663)
#45 0x00005576b7e2d447 main (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x258b447)
#46 0x00007f85a0f666ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#47 0x00007f85a0f66785 call_init ./csu/../csu/libc-start.c:128:20
#48 0x00007f85a0f66785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#49 0x00005576b7e1cd11 _start (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257ad11)

Most probably I have a fix for it already, but I really need a reproducer

@ZequanWu
Copy link
Contributor

The reland 31eaf86 causes clang crash:

clang-cl: ../../llvm/lib/IR/Instructions.cpp:737: void llvm::CallInst::init(FunctionType *, Value *, ArrayRef<Value *>, ArrayRef<OperandBundleDef>, const Twine &): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang-cl -cc1 -triple x86_64-pc-windows-msvc19.34.0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name performance_metrics_overlay.cc -mrelocation-model pic -pic-level 2 -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=none -relaxed-aliasing -ffp-contract=off -fno-rounding-math -mconstructor-aliases -fms-volatile -funwind-tables=2 -target-cpu x86-64 -target-feature +sse3 -mllvm -x86-asm-syntax=intel -tune-cpu generic -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames --show-includes -fno-rtti-data -stack-protector 2 -fdiagnostics-format msvc -cfguard-no-checks -gcodeview -gcodeview-ghash -gno-codeview-command-line -debug-info-kind=line-tables-only -fdebug-compilation-dir=. -object-file-name=obj\\media\\cast\\sender\\performance_metrics_overlay.obj -mllvm -crash-diagnostics-dir=../../tools/clang/crashreports -ffunction-sections -fcoverage-compilation-dir=. -D USE_AURA=1 -D MEMORY_TOOL_REPLACES_ALLOCATOR -D ADDRESS_SANITIZER -D _HAS_NODISCARD -D _CRT_NONSTDC_NO_WARNINGS -D _WINSOCK_DEPRECATED_NO_WARNINGS -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D CR_CLANG_REVISION=\"llvmorg-19-init-6054-g9fb85b09-1\" -D _LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D CR_LIBCXX_REVISION=80307e66e74bae927fb8709a549859e777e3bf0b -D TEMP_REBUILD_HACK -D __STD_C -D _CRT_RAND_S -D _CRT_SECURE_NO_DEPRECATE -D _SCL_SECURE_NO_DEPRECATE -D _ATL_NO_OPENGL -D _WINDOWS -D CERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D PSAPI_VERSION=2 -D WIN32 -D _SECURE_ATL -D WINAPI_FAMILY=WINAPI_FAMILY_DESKTOP_APP -D WIN32_LEAN_AND_MEAN -D NOMINMAX -D _UNICODE -D UNICODE -D NTDDI_VERSION=NTDDI_WIN10_NI -D _WIN32_WINNT=0x0A00 -D WINVER=0x0A00 -D NDEBUG -D NVALGRIND -D DYNAMIC_ANNOTATIONS_ENABLED=0 -D BASE_USE_PERFETTO_CLIENT_LIBRARY=1 -D ENABLE_IPC_FUZZER -D LIBYUV_DISABLE_NEON -D LIBYUV_DISABLE_LSX -D LIBYUV_DISABLE_LASX -D SK_ENABLE_SKSL -D SK_UNTIL_CRBUG_1187654_IS_FIXED -D SK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -D SK_WIN_FONTMGR_NO_SIMULATIONS -D SK_DISABLE_LEGACY_INIT_DECODERS -D SK_SLUG_DISABLE_LEGACY_DESERIALIZE -D SK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -D SK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -D SK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -D SK_CODEC_DECODES_JPEG -D SK_ENCODE_JPEG -D SK_ENCODE_PNG -D SK_ENCODE_WEBP -D GR_GL_FUNCTION_TYPE=__stdcall -D SK_GANESH -D SK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -D SK_GL -D SK_VULKAN=1 -D SK_GRAPHITE -D SK_DAWN -D VK_USE_PLATFORM_WIN32_KHR -D USE_EGL -D GOOGLE_PROTOBUF_NO_RTTI -D GOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -D GOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -D U_USING_ICU_NAMESPACE=0 -D U_ENABLE_DYLOAD=0 -D USE_CHROMIUM_ICU=1 -D U_ENABLE_TRACING=1 -D U_ENABLE_RESOURCE_TRACING=0 -D U_STATIC_IMPLEMENTATION -D ICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -D WEBRTC_ENABLE_AVX2 -D RTC_ENABLE_WIN_WGC -D WEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -D WEBRTC_CHROMIUM_BUILD -D WEBRTC_WIN -D ABSL_ALLOCATOR_NOTHROW=1 -D LOGGING_INSIDE_WEBRTC -D ENABLE_TRACE_LOGGING -D CRASHPAD_ZLIB_SOURCE_EXTERNAL -D LEVELDB_PLATFORM_CHROMIUM=1 -D __WRL_ENABLE_FUNCTION_STATICS__ -D __DATE__= -D __TIME__= -D __TIMESTAMP__= -D PROTOBUF_ALLOW_DEPRECATED=1 -ffile-reproducible -O2 -WCL4 -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-nonportable-include-path -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wshadow -Wno-builtin-macro-redefined -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-redundant-parens -Wno-redundant-parens -Wenum-compare-conditional -Wno-c++11-narrowing-const-reference -Wno-trigraphs -fdeprecated-macro -ferror-limit 19 -fsanitize=address -fno-sanitize-memory-param-retval -fsanitize-address-use-after-scope -fsanitize-address-globals-dead-stripping -fno-sanitize-address-use-odr-indicator -fno-assume-sane-operator-new -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.34 -std=c++20 -fno-implicit-modules -fskip-odr-check-in-gmf -Qn -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -x c++ performance_metrics_overlay-b41337.cpp
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00005576b9397188 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af5188)
 #1 0x00005576b9394ebe llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x3af2ebe)
 #2 0x00005576b939781d SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f85a0f7b510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f85a0fc916c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f85a0f7b472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f85a0f654b2 abort ./stdlib/abort.c:81:7
 #7 0x00007f85a0f653d5 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f85a0f743a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005576b912c0ae llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x388a0ae)
#10 0x00005576b7eee0ed llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::Instruction*) CodeGenModule.cpp:0:0
#11 0x00005576b7f36890 llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*>>, llvm::Twine const&, llvm::MDNode*) CGCall.cpp:0:0
#12 0x00005576bb7f4c16 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f52c16)
#13 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#14 0x00005576bb7f3709 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51709)
#15 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#16 0x00005576bb7f3f4b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f51f4b)
#17 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#18 0x00005576bb7f385b llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5185b)
#19 0x00005576bb7f2327 llvm::slpvectorizer::BoUpSLP::vectorizeOperand(llvm::slpvectorizer::BoUpSLP::TreeEntry*, unsigned int, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f50327)
#20 0x00005576bb7f477a llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f5277a)
#21 0x00005576bb7f81e1 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, 0u>> const&, llvm::SmallVectorImpl<std::pair<llvm::Value*, llvm::Value*>>&, llvm::Instruction*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f561e1)
#22 0x00005576bb7f7ff2 llvm::slpvectorizer::BoUpSLP::vectorizeTree() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f55ff2)
#23 0x00005576bb8084de llvm::SLPVectorizerPass::vectorizeStoreChain(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, unsigned int, unsigned int) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f664de)
#24 0x00005576bb809b64 llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::$_0::operator()(std::set<std::pair<unsigned int, int>, llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&)::StoreDistCompare, std::allocator<std::pair<unsigned int, int>>> const&) const SLPVectorizer.cpp:0:0
#25 0x00005576bb808ffb llvm::SLPVectorizerPass::vectorizeStores(llvm::ArrayRef<llvm::StoreInst*>, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f66ffb)
#26 0x00005576bb805b3b llvm::SLPVectorizerPass::vectorizeStoreChains(llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f63b3b)
#27 0x00005576bb804e9b llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f62e9b)
#28 0x00005576bb80479f llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x5f6279f)
#29 0x00005576bb35ea1d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#30 0x00005576b919d7c0 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fb7c0)
#31 0x00005576b845857d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
#32 0x00005576b919c472 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38fa472)
#33 0x00005576b8454d2d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0
#34 0x00005576b919ca52 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x38faa52)
#35 0x00005576b844b03c (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#36 0x00005576b8443acc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2ba1acc)
#37 0x00005576b845d4f5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2bbb4f5)
#38 0x00005576ba357a09 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x4ab5a09)
#39 0x00005576b86f2d5f clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2e50d5f)
#40 0x00005576b866a28d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2dc828d)
#41 0x00005576b87ce307 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2f2c307)
#42 0x00005576b7e1e015 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257c015)
#43 0x00005576b7e2c669 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#44 0x00005576b7e2b663 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x2589663)
#45 0x00005576b7e2d447 main (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x258b447)
#46 0x00007f85a0f666ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#47 0x00007f85a0f66785 call_init ./csu/../csu/libc-start.c:128:20
#48 0x00007f85a0f66785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#49 0x00005576b7e1cd11 _start (/usr/local/google/home/zequanwu/workspace/llvm/out/gn/bin/clang+0x257ad11)

Most probably I have a fix for it already, but I really need a reproducer

I'm trying to reduce it.

@ZequanWu
Copy link
Contributor

ZequanWu commented Mar 21, 2024

To repro: opt -O2 reduced.txt

reduced.txt

@alexey-bataev
Copy link
Member Author

Will fix it ASAP

@alexey-bataev
Copy link
Member Author

reduced.txt

Should be fixed in 3942bd2

@ZequanWu
Copy link
Contributor

reduced.txt

Should be fixed in 3942bd2

Thanks for the quick fix.

chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: llvm#84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: llvm#84536
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants