Skip to content

[MemProf] Ensure node merging happens for newly created nodes #151593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2025

Conversation

teresajohnson
Copy link
Contributor

We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.

Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.

We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.

Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.

Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.


Patch is 534.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151593.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+37-9)
  • (added) llvm/test/Transforms/MemProfContextDisambiguation/iterative_merge.ll (+1103)
  • (modified) llvm/test/Transforms/MemProfContextDisambiguation/mergenodes.ll (+3)
  • (modified) llvm/test/Transforms/MemProfContextDisambiguation/mergenodes2.ll (+3)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index c009c1e0e018b..b8c99f1f33891 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -99,6 +99,9 @@ STATISTIC(SkippedCallsCloning,
           "Number of calls skipped during cloning due to unexpected operand");
 STATISTIC(MismatchedCloneAssignments,
           "Number of callsites assigned to call multiple non-matching clones");
+STATISTIC(TotalMergeInvokes, "Number of merge invocations for nodes");
+STATISTIC(TotalMergeIters, "Number of merge iterations for nodes");
+STATISTIC(MaxMergeIters, "Max merge iterations for nodes");
 
 static cl::opt<std::string> DotFilePathPrefix(
     "memprof-dot-file-path-prefix", cl::init(""), cl::Hidden,
@@ -109,6 +112,11 @@ static cl::opt<bool> ExportToDot("memprof-export-to-dot", cl::init(false),
                                  cl::Hidden,
                                  cl::desc("Export graph to dot files."));
 
+// TODO: Remove this option once new handling is validated more widely.
+static cl::opt<bool> DoMergeIteration(
+    "memprof-merge-iteration", cl::init(true), cl::Hidden,
+    cl::desc("Iteratively apply merging on a node to catch new callers"));
+
 // How much of the graph to export to dot.
 enum DotScope {
   All,     // The full CCG graph.
@@ -3995,7 +4003,7 @@ IndexCallsiteContextGraph::getAllocationCallType(const CallInfo &Call) const {
 
 void ModuleCallsiteContextGraph::updateCall(CallInfo &CallerCall,
                                             FuncInfo CalleeFunc) {
-  auto *CurF = cast<CallBase>(CallerCall.call())->getCalledFunction();
+  auto *CurF = getCalleeFunc(CallerCall.call());
   auto NewCalleeCloneNo = CalleeFunc.cloneNo();
   if (isMemProfClone(*CurF)) {
     // If we already assigned this callsite to call a specific non-default
@@ -4191,16 +4199,36 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::mergeClones(
   if (!Inserted.second)
     return;
 
-  // Make a copy since the recursive call may move a caller edge to a new
-  // callee, messing up the iterator.
-  auto CallerEdges = Node->CallerEdges;
-  for (auto CallerEdge : CallerEdges) {
-    // Skip any caller edge moved onto a different callee during recursion.
-    if (CallerEdge->Callee != Node)
-      continue;
-    mergeClones(CallerEdge->Caller, Visited, ContextIdToAllocationNode);
+  // Iteratively perform merging on this node to handle new caller nodes created
+  // during the recursive traversal. We could do something more elegant such as
+  // maintain a worklist, but this is a simple approach that doesn't cause a
+  // measureable compile time effect, as most nodes don't have many caller
+  // edges to check.
+  bool FoundUnvisited = true;
+  unsigned Iters = 0;
+  while (FoundUnvisited) {
+    Iters++;
+    FoundUnvisited = false;
+    // Make a copy since the recursive call may move a caller edge to a new
+    // callee, messing up the iterator.
+    auto CallerEdges = Node->CallerEdges;
+    for (auto CallerEdge : CallerEdges) {
+      // Skip any caller edge moved onto a different callee during recursion.
+      if (CallerEdge->Callee != Node)
+        continue;
+      // If we found an unvisited caller, note that we should check the caller
+      // edges again as mergeClones may add or change caller nodes.
+      if (DoMergeIteration && !Visited.contains(CallerEdge->Caller))
+        FoundUnvisited = true;
+      mergeClones(CallerEdge->Caller, Visited, ContextIdToAllocationNode);
+    }
   }
 
+  TotalMergeInvokes++;
+  TotalMergeIters += Iters;
+  if (Iters > MaxMergeIters)
+    MaxMergeIters = Iters;
+
   // Merge for this node after we handle its callers.
   mergeNodeCalleeClones(Node, Visited, ContextIdToAllocationNode);
 }
diff --git a/llvm/test/Transforms/MemProfContextDisambiguation/iterative_merge.ll b/llvm/test/Transforms/MemProfContextDisambiguation/iterative_merge.ll
new file mode 100644
index 0000000000000..b681ecdc0dccb
--- /dev/null
+++ b/llvm/test/Transforms/MemProfContextDisambiguation/iterative_merge.ll
@@ -0,0 +1,1103 @@
+;; Test for iterative node merging. This is an llvm-reduced version of the xalancbmk
+;; benchmark with FullLTO and memprof.
+
+;; -stats requires asserts
+; REQUIRES: asserts
+
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new -stats \
+; RUN:	-memprof-merge-iteration=false %s -S 2>&1 | FileCheck %s --check-prefix=NOITER
+
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new -stats \
+; RUN:	-memprof-merge-iteration=true %s -S 2>&1 | FileCheck %s --check-prefix=ITER
+
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new -stats \
+; RUN:	%s -S 2>&1 | FileCheck %s --check-prefix=ITER
+
+; NOITER-NOT: _ZN10xalanc_1_8L11doTranscodeEPKcjRNSt3__u6vectorItNS2_9allocatorItEEEEb.memprof.2
+; NOITER: 7 memprof-context-disambiguation - Number of function clones created during whole program analysis
+; NOITER: 1 memprof-context-disambiguation - Max merge iterations for nodes
+; NOITER: 2 memprof-context-disambiguation - Number of new nodes created during merging
+
+; ITER: _ZN10xalanc_1_8L11doTranscodeEPKcjRNSt3__u6vectorItNS2_9allocatorItEEEEb.memprof.2
+; ITER: 8 memprof-context-disambiguation - Number of function clones created during whole program analysis
+; ITER: 3 memprof-context-disambiguation - Max merge iterations for nodes
+; ITER: 3 memprof-context-disambiguation - Number of new nodes created during merging
+
+; ModuleID = 'reduced.bc'
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-grtev4-linux-gnu"
+
+%"class.xercesc_2_5::XMLNumber" = type { %"class.xercesc_2_5::XMLEnumerator" }
+%"class.xercesc_2_5::XMLEnumerator" = type { ptr }
+
+@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_XPath.cpp, ptr null }]
+@_ZTVN10xalanc_1_822FunctionNormalizeSpaceE = constant { [11 x ptr] } { [11 x ptr] [ptr null, ptr null, ptr null, ptr null, ptr null, ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace7executeERNS_21XPathExecutionContextEPNS_9XalanNodeEPKN11xercesc_2_57LocatorE, ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace7executeERNS_21XPathExecutionContextEPNS_9XalanNodeENS_10XObjectPtrEPKN11xercesc_2_57LocatorE, ptr null, ptr null, ptr null, ptr null] }
+@_ZTVN10__cxxabiv121__vmi_class_type_infoE = constant { [10 x ptr] } zeroinitializer
+@_ZTVN10__cxxabiv119__pointer_type_infoE = constant { [7 x ptr] } zeroinitializer
+@_ZTVSt13bad_exception = constant { [5 x ptr] } { [5 x ptr] [ptr null, ptr @_ZTISt13bad_exception, ptr @_ZNSt13bad_exceptionD1Ev, ptr null, ptr null] }
+@_ZTISt13bad_exception = constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr null, ptr @_ZTISt9exception }
+@_ZTISt9bad_alloc = constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr null, ptr @_ZTISt9exception }
+@_ZTVSt8bad_cast = constant { [5 x ptr] } { [5 x ptr] [ptr null, ptr @_ZTISt8bad_cast, ptr @_ZNSt8bad_castD1Ev, ptr null, ptr null] }
+@_ZTVSt10bad_typeid = constant { [5 x ptr] } { [5 x ptr] [ptr null, ptr @_ZTISt10bad_typeid, ptr @_ZNSt10bad_typeidD1Ev, ptr null, ptr null] }
+@_ZTVN10__cxxabiv117__class_type_infoE = constant { [10 x ptr] } zeroinitializer
+@_ZTISt8bad_cast = constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr null, ptr @_ZTISt9exception }
+@_ZTVN10__cxxabiv120__si_class_type_infoE = constant { [10 x ptr] } zeroinitializer
+@_ZTISt9exception = constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv117__class_type_infoE, i64 2), ptr null }
+@_ZTISt10bad_typeid = constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr null, ptr @_ZTISt9exception }
+@_ZTVSt9exception = constant { [5 x ptr] } { [5 x ptr] [ptr null, ptr @_ZTISt9exception, ptr @_ZNSt9exceptionD2Ev, ptr null, ptr null] }
+
+@_ZN10xalanc_1_814XalanDOMStringC1EPKcj = alias void (ptr, ptr, i32), ptr @_ZN10xalanc_1_814XalanDOMStringC2EPKcj
+@_Znwm = alias ptr (i64), ptr @TCMallocInternalNew
+@_ZdlPvm = alias void (ptr, i64), ptr @TCMallocInternalDeleteSized
+@_Znam = alias ptr (i64), ptr @TCMallocInternalNew
+@_ZdaPv = alias void (ptr), ptr @TCMallocInternalDelete
+@_ZdlPv = alias void (ptr), ptr @TCMallocInternalDelete
+@_ZnwmRKSt9nothrow_t = alias ptr (i64, ptr), ptr @TCMallocInternalNewNothrow
+@_ZnamRKSt9nothrow_t = alias ptr (i64, ptr), ptr @TCMallocInternalNewNothrow
+@_ZdlPvRKSt9nothrow_t = alias void (ptr, ptr), ptr @TCMallocInternalDelete
+@_ZdaPvRKSt9nothrow_t = alias void (ptr, ptr), ptr @TCMallocInternalDelete
+@_ZnwmSt11align_val_t = alias ptr (i64, i64), ptr @TCMallocInternalNewAligned
+@_ZnwmSt11align_val_tRKSt9nothrow_t = alias ptr (i64, i64, ptr), ptr @TCMallocInternalNewAlignedNothrow
+@_ZdlPvSt11align_val_t = alias void (ptr, i64), ptr @TCMallocInternalDelete
+@_ZdlPvSt11align_val_tRKSt9nothrow_t = alias void (ptr, i64, ptr), ptr @TCMallocInternalDelete
+@_ZdlPvmSt11align_val_t = alias void (ptr, i64, i64), ptr @TCMallocInternalDeleteSizedAligned
+@_ZnamSt11align_val_t = alias ptr (i64, i64), ptr @TCMallocInternalNewAligned
+@_ZnamSt11align_val_tRKSt9nothrow_t = alias ptr (i64, i64, ptr), ptr @TCMallocInternalNewAlignedNothrow
+@_ZdaPvSt11align_val_t = alias void (ptr, i64), ptr @TCMallocInternalDelete
+@_ZdaPvSt11align_val_tRKSt9nothrow_t = alias void (ptr, i64, ptr), ptr @TCMallocInternalDelete
+@_ZdaPvmSt11align_val_t = alias void (ptr, i64, i64), ptr @TCMallocInternalDeleteSizedAligned
+@_ZNSt13exception_ptrD1Ev = alias void (ptr), ptr @_ZNSt13exception_ptrD2Ev
+@_ZNSt13exception_ptrC1ERKS_ = alias void (ptr, ptr), ptr @_ZNSt13exception_ptrC2ERKS_
+@_ZNSt13bad_exceptionD1Ev = alias void (ptr), ptr @_ZNSt9exceptionD2Ev
+@_ZNSt8bad_castD1Ev = alias void (ptr), ptr @_ZNSt8bad_castD2Ev
+@_ZNSt10bad_typeidD1Ev = alias void (ptr), ptr @_ZNSt10bad_typeidD2Ev
+
+define ptr @_ZNSt3__u6vectorItNS_9allocatorItEEE7reserveEm() {
+  %1 = tail call ptr @_Znwm(i64 0), !memprof !29, !callsite !592
+  ret ptr %1
+}
+
+; Function Attrs: cold
+declare void @_ZN10xalanc_1_88FunctionC2Ev() #0
+
+define void @_ZN10xalanc_1_812FunctionLangC2Ev() {
+  call void @_ZN10xalanc_1_88FunctionC2Ev()
+  call void @_ZN10xalanc_1_814XalanDOMStringC1EPKcj(ptr null, ptr null, i32 0), !callsite !593
+  ret void
+}
+
+define void @_ZN10xalanc_1_822FunctionNormalizeSpaceC2Ev(ptr %0) {
+  store ptr @_ZTVN10xalanc_1_822FunctionNormalizeSpaceE, ptr %0, align 8
+  ret void
+}
+
+define void @_ZNK10xalanc_1_822FunctionNormalizeSpace7executeERNS_21XPathExecutionContextEPNS_9XalanNodeEPKN11xercesc_2_57LocatorE() {
+  call void @_ZN10xalanc_1_818XalanMessageLoader10getMessageENS_13XalanMessages5CodesEPKcS4_S4_S4_()
+  ret void
+}
+
+define ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace7executeERNS_21XPathExecutionContextEPNS_9XalanNodeENS_10XObjectPtrEPKN11xercesc_2_57LocatorE() {
+  %1 = call ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace9normalizeERNS_21XPathExecutionContextERKNS_10XObjectPtrE()
+  ret ptr %1
+}
+
+define ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace9normalizeERNS_21XPathExecutionContextERKNS_10XObjectPtrE() {
+  %1 = load ptr, ptr null, align 8
+  %2 = getelementptr i8, ptr %1, i64 72
+  %3 = load ptr, ptr %2, align 8
+  %4 = tail call ptr %3(ptr null)
+  %5 = call ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace9normalizeERNS_21XPathExecutionContextERKNS_14XalanDOMStringE()
+  ret ptr %5
+}
+
+define ptr @_ZNK10xalanc_1_822FunctionNormalizeSpace9normalizeERNS_21XPathExecutionContextERKNS_14XalanDOMStringE() {
+  %1 = call ptr @_ZNSt3__u6vectorItNS_9allocatorItEEE7reserveEm()
+  ret ptr %1
+}
+
+declare i64 @mbstowcs()
+
+define void @_GLOBAL__sub_I_XPath.cpp() {
+  tail call void @_ZN10xalanc_1_818XPathFunctionTableC2Eb()
+  ret void
+}
+
+define void @_ZN10xalanc_1_818XPathFunctionTableC2Eb() {
+  call void @_ZN10xalanc_1_818XPathFunctionTable11CreateTableEv()
+  ret void
+}
+
+define void @_ZN10xalanc_1_818XPathFunctionTable11CreateTableEv() {
+  %1 = alloca %"class.xercesc_2_5::XMLNumber", align 8
+  call void @_ZN10xalanc_1_812FunctionLangC2Ev()
+  call void @_ZN10xalanc_1_822FunctionNormalizeSpaceC2Ev(ptr %1)
+  ret void
+}
+
+define void @_ZN10xalanc_1_814XalanDOMStringC2EPKcj(ptr %0, ptr %1, i32 %2) #1 {
+  %4 = call ptr @_ZN10xalanc_1_814XalanDOMString6appendEPKcj(ptr %0, ptr %1, i32 %2), !callsite !594
+  ret void
+}
+
+; Function Attrs: cold
+define ptr @_ZN10xalanc_1_814XalanDOMString6appendEPKcj(ptr %0, ptr %1, i32 %2) #0 {
+  %4 = load i32, ptr %0, align 8
+  %5 = icmp eq i32 %4, 0
+  br i1 %5, label %common.ret, label %6
+
+common.ret:                                       ; preds = %3
+  tail call fastcc void @_ZN10xalanc_1_8L11doTranscodeEPKcjRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr %1, i32 %2, ptr %0, i1 true), !callsite !595
+  ret ptr %0
+
+6:                                                ; preds = %3
+  call fastcc void @_ZN10xalanc_1_8L11doTranscodeEPKcjRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr null, i32 1, ptr null, i1 false)
+  unreachable
+}
+
+define fastcc void @_ZN10xalanc_1_8L11doTranscodeEPKcjRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr %0, i32 %1, ptr %2, i1 %3) !prof !596 {
+  %5 = icmp eq i32 %1, 1
+  br i1 %5, label %6, label %9
+
+6:                                                ; preds = %4
+  %7 = call fastcc i1 @_ZN10xalanc_1_8L28doTranscodeFromLocalCodePageEPKcjbRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr %0, ptr %2, i1 %3)
+  br i1 %7, label %11, label %8
+
+8:                                                ; preds = %6
+  ret void
+
+9:                                                ; preds = %4
+  %10 = call fastcc i1 @_ZN10xalanc_1_8L28doTranscodeFromLocalCodePageEPKcjbRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr %0, ptr null, i1 false), !callsite !597
+  br label %11
+
+11:                                               ; preds = %9, %6
+  ret void
+}
+
+define fastcc i1 @_ZN10xalanc_1_8L28doTranscodeFromLocalCodePageEPKcjbRNSt3__u6vectorItNS2_9allocatorItEEEEb(ptr %0, ptr %1, i1 %2) {
+  %4 = icmp eq ptr %0, null
+  br i1 %4, label %5, label %7
+
+5:                                                ; preds = %3
+  %6 = load i64, ptr %1, align 8
+  %cond = icmp eq i64 %6, 0
+  ret i1 %cond
+
+7:                                                ; preds = %3
+  %8 = call i64 @mbstowcs()
+  %9 = zext i1 %2 to i64
+  call void @_ZNSt3__u6vectorIwNS_9allocatorIwEEE8__appendEm(), !callsite !598
+  ret i1 false
+}
+
+define void @_ZNSt3__u6vectorIwNS_9allocatorIwEEE8__appendEm() {
+  %1 = tail call ptr @_Znwm(i64 0), !memprof !599, !callsite !768
+  ret void
+}
+
+; Function Attrs: cold
+define void @_ZN10xalanc_1_826XalanInMemoryMessageLoaderC2Ev() #0 {
+  call void @_ZN10xalanc_1_814XalanDOMStringC1EPKcj(ptr null, ptr null, i32 0), !callsite !769
+  ret void
+}
+
+define void @_ZN10xalanc_1_818XalanMessageLoader12createLoaderEv() {
+  %1 = tail call ptr @_Znwm(i64 0)
+  call void @_ZN10xalanc_1_826XalanInMemoryMessageLoaderC2Ev(), !callsite !770
+  ret void
+}
+
+define void @_ZN10xalanc_1_818XalanMessageLoader10getMessageENS_13XalanMessages5CodesEPKcS4_S4_S4_() {
+  tail call void @_ZN10xalanc_1_818XalanMessageLoader12createLoaderEv()
+  ret void
+}
+
+define void @TCMallocInternalDeleteSized() {
+  ret void
+}
+
+; Function Attrs: nobuiltin noinline
+define ptr @TCMallocInternalNew(i64 %0) #2 {
+  ret ptr null
+}
+
+define void @TCMallocInternalDelete() {
+  ret void
+}
+
+define i64 @TCMallocInternalNewNothrow() {
+  ret i64 0
+}
+
+define i64 @TCMallocInternalNewAligned() {
+  ret i64 0
+}
+
+define i64 @TCMallocInternalNewAlignedNothrow() {
+  ret i64 0
+}
+
+define void @TCMallocInternalDeleteSizedAligned() {
+  ret void
+}
+
+define i1 @_ZSt18uncaught_exceptionv() {
+  ret i1 false
+}
+
+define void @_ZNSt13exception_ptrD2Ev() {
+  ret void
+}
+
+define void @_ZNSt13exception_ptrC2ERKS_() {
+  ret void
+}
+
+define ptr @_ZNSt13exception_ptraSERKS_() {
+  ret ptr null
+}
+
+define void @_ZSt17rethrow_exceptionSt13exception_ptr() {
+  unreachable
+}
+
+define void @_ZSt17__throw_bad_allocv() {
+  unreachable
+}
+
+define void @__cxa_bad_cast() {
+  unreachable
+}
+
+define ptr @__cxa_allocate_exception() {
+  ret ptr null
+}
+
+define ptr @__cxa_begin_catch() {
+  ret ptr null
+}
+
+define void @__cxa_free_exception() {
+  ret void
+}
+
+define void @__cxa_throw() {
+  unreachable
+}
+
+define void @__cxa_end_catch() {
+  ret void
+}
+
+define ptr @__cxa_current_exception_type() {
+  ret ptr null
+}
+
+define void @__cxa_rethrow() {
+  ret void
+}
+
+define void @_ZSt9terminatev() {
+  ret void
+}
+
+define i32 @__gxx_personality_v0() {
+  ret i32 0
+}
+
+define void @__cxa_call_unexpected() {
+  ret void
+}
+
+define ptr @__dynamic_cast() {
+  ret ptr null
+}
+
+define void @_ZNSt9exceptionD2Ev() {
+  ret void
+}
+
+define void @_ZNSt8bad_castD2Ev() {
+  ret void
+}
+
+define void @_ZNSt10bad_typeidD2Ev() {
+  ret void
+}
+
+attributes #0 = { cold }
+attributes #1 = { "target-features"="+aes" }
+attributes #2 = { nobuiltin noinline }
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 1, !"ProfileSummary", !1}
+!1 = !{!2, !3, !4, !5, !6, !7, !8, !9, !10, !11}
+!2 = !{!"ProfileFormat", !"InstrProf"}
+!3 = !{!"TotalCount", i64 331263925478}
+!4 = !{!"MaxCount", i64 89521949747}
+!5 = !{!"MaxInternalCount", i64 89521949747}
+!6 = !{!"MaxFunctionCount", i64 14842374247}
+!7 = !{!"NumCounts", i64 80529}
+!8 = !{!"NumFunctions", i64 13237}
+!9 = !{!"IsPartialProfile", i64 0}
+!10 = !{!"PartialProfileRatio", double 0.000000e+00}
+!11 = !{!"DetailedSummary", !12}
+!12 = !{!13, !14, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28}
+!13 = !{i32 10000, i64 89521949747, i32 1}
+!14 = !{i32 100000, i64 89521949747, i32 1}
+!15 = !{i32 200000, i64 89521949747, i32 1}
+!16 = !{i32 300000, i64 89454229684, i32 2}
+!17 = !{i32 400000, i64 89454229684, i32 2}
+!18 = !{i32 500000, i64 89454229684, i32 2}
+!19 = !{i32 600000, i64 28686354153, i32 3}
+!20 = !{i32 700000, i64 12169900676, i32 5}
+!21 = !{i32 800000, i64 2585869019, i32 9}
+!22 = !{i32 900000, i64 1189366531, i32 32}
+!23 = !{i32 950000, i64 137116556, i32 82}
+!24 = !{i32 990000, i64 24641624, i32 286}
+!25 = !{i32 999000, i64 832911, i32 881}
+!26 = !{i32 999900, i64 110792, i32 1739}
+!27 = !{i32 999990, i64 20910, i32 2245}
+!28 = !{i32 999999, i64 650, i32 2817}
+!29 = !{!30, !32, !34, !36, !38, !40, !42, !44, !46, !48, !50, !52, !54, !56, !58, !60, !62, !64, !66, !68, !70, !72, !74, !76, !78, !80, !82, !84, !86, !88, !90, !92, !94, !96, !98, !100, !102, !104, !106, !108, !110, !112, !114, !116, !118, !120, !122, !124, !126, !128, !130, !132, !134, !136, !138, !140, !142, !144, !146, !148, !150, !152, !154, !156, !158, !160, !162, !164, !166, !168, !170, !172, !174, !176, !178, !180, !182, !184, !186, !188, !190, !192, !194, !196, !198, !200, !202, !204, !206, !208, !210, !212, !214, !216, !218, !220, !222, !224, !226, !228, !230, !232, !234, !236, !238, !240, !242, !244, !246, !248, !250, !252, !254, !256, !258, !260, !262, !264, !266, !268, !270, !272, !274, !276, !278, !280, !282, !284, !286, !288, !290, !292, !294, !296, !298, !300, !302, !304, !306, !308, !310, !312, !314, !316, !318, !320, !322, !324, !326, !328, !330, !332, !334, !336, !338, !340, !342, !344, !346, !348, !350, !352, !354, !356, !358, !360, !362, !364, !366, !368, !370, !372, !374, !376, !378, !380, !382, !384, !386, !388, !390, !392, !394, !396, !398, !400, !402, !404, !406, !408, !410, !412, !414, !416, !418, !420, !422, !424, !426, !428, !430, !432, !434, !436, !438, !440, !442, !444, !446, !448, !450, !452, !454, !456, !458, !460, !462, !464, !466, !468, !470, !472, !474, !476, !478, !480, !482, !484, !486, !488, !490, !492, !494, !496, !498, !500, !502, !504, !506, !508, !510, !512, !514, !516, !518, !520, !522, !524, !526, !528, !530, !532, !534, !536, !538, !540, !542, !544, !546, !548, !550, !552, !554, !556, !558, !560, !562, !564, !566, !568, !570, !572, !574, !576, !578, !580, !582, !584, !586, !588, !590}
+!30 = !{!31, !"cold"}
+!31 = !{i64 761518489666860826, i64 -1420336805534834351, i64 -2943078617660248973, i64 3500755695426091485, i64 4378935957859808257, i64 445663428903236269...
[truncated]

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@teresajohnson teresajohnson merged commit dc90472 into llvm:main Aug 1, 2025
11 checks passed
krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
…51593)

We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.

Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants