[MemProf] Handle empty stack context during ThinLTO cloning #81008

teresajohnson · 2024-02-07T16:17:10Z

Fix for assert after PR#78264.

Handle the case where the MIB context is empty after skipping the
callsite context, because the callsite context is actually longer than
the MIB context. Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case. Need to investigate why this is occuring, but for now handle
this gracefully to fix the build regression.

Fix for assert after PR#78264. Handle the case where the MIB context is empty after skipping the callsite context, because the callsite context is actually longer than the MIB context. Presumably this happened as a result of inlining, but in theory the metadata should have been replaced with an attribute in that case. Need to investigate why this is occuring, but for now handle this gracefully to fix the build regression.

llvmbot · 2024-02-07T16:17:43Z

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

Fix for assert after PR#78264.

Handle the case where the MIB context is empty after skipping the
callsite context, because the callsite context is actually longer than
the MIB context. Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case. Need to investigate why this is occuring, but for now handle
this gracefully to fix the build regression.

Full diff: https://github.com/llvm/llvm-project/pull/81008.diff

2 Files Affected:

(modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+5-1)
(added) llvm/test/ThinLTO/X86/memprof-import-fix.ll (+36)

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index d81239dd49e4cc..323f33fcc862dc 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -3475,7 +3475,11 @@ bool MemProfContextDisambiguation::applyImport(Module &M) {
             auto ContextIterBegin =
                 StackContext.beginAfterSharedPrefix(CallsiteContext);
             // Skip the checking on the first iteration.
-            uint64_t LastStackContextId = *ContextIterBegin == 0 ? 1 : 0;
+            uint64_t LastStackContextId =
+                (ContextIterBegin != StackContext.end() &&
+                 *ContextIterBegin == 0)
+                    ? 1
+                    : 0;
             for (auto ContextIter = ContextIterBegin;
                  ContextIter != StackContext.end(); ++ContextIter) {
               // If this is a direct recursion, simply skip the duplicate
diff --git a/llvm/test/ThinLTO/X86/memprof-import-fix.ll b/llvm/test/ThinLTO/X86/memprof-import-fix.ll
new file mode 100644
index 00000000000000..aeb54057976a3a
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/memprof-import-fix.ll
@@ -0,0 +1,36 @@
+;; Test to ensure that importing of cloning decisions does not assert when
+;; the callsite context is longer than the MIB context.
+;; FIXME: Presumably this happened as a result of inlining, but in theory the
+;; metadata should have been replaced with an attribute in that case. Need to
+;; investigate why this is occuring.
+
+; RUN: opt -thinlto-bc %s >%t.o
+; RUN: llvm-lto2 run %t.o -enable-memprof-context-disambiguation \
+; RUN:	-supports-hot-cold-new \
+; RUN:	-r=%t.o,main,plx \
+; RUN:	-r=%t.o,_Znam, \
+; RUN:	-save-temps \
+; RUN:	-o %t.out
+
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+;; The call to new is not changed in this case.
+; IR: call ptr @_Znam(i64 0)
+
+source_filename = "memprof-import-fix.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() #0 {
+entry:
+  %call = call ptr @_Znam(i64 0), !memprof !0, !callsite !3
+  ret i32 0
+}
+
+declare ptr @_Znam(i64)
+
+attributes #0 = { noinline optnone }
+
+!0 = !{!1}
+!1 = !{!2, !"notcold"}
+!2 = !{i64 9086428284934609951}
+!3 = !{i64 9086428284934609951, i64 -5964873800580613432}

snehasish

lgtm

lifengxiang1025 · 2024-02-08T07:05:01Z

Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case.

BTW, I'm investigating one problems now that IR metadata doesn't match function summary. I found one function foo(template function) is linkonce linkage after thin_compile and it has many copies. Copy1 inlines it's callee and copy2 doesn't. So copy1 has one callsite metadata but copy2 doesn't. Copy1's linkage is weak_odr and copy2's linkage is AvailableExternallyLinkage in function summary after thin_link. The problem happens in thin_backend, compiler choose copy2's definition and copy1's summary. And finally trigger an assertion.

teresajohnson · 2024-02-08T14:05:54Z

Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case.

BTW, I'm investigating one problems now that IR metadata doesn't match function summary. I found one function foo(template function) is linkonce linkage after thin_compile and it has many copies. Copy1 inlines it's callee and copy2 doesn't. So copy1 has one callsite metadata but copy2 doesn't. Copy1's linkage is weak_odr and copy2's linkage is AvailableExternallyLinkage in function summary after thin_link. The problem happens in thin_backend, compiler choose copy2's definition and copy1's summary. And finally trigger an assertion.

Hmm, the fact that copy1 has weak_odr linkage and copy2 is available_externally indicates that the linker chose copy1 as the prevailing copy. I suppose we must have decided to import copy2 instead of copy1 for some reason (it can happen, e.g. copy1 might be over the instruction threshold due to the additional inlining, but copy2 wasn't). The code in the backend for cloning does try to locate the summary in that module, but it is the module we are compiling which isn't the module of where we imported from (we no longer have that info). We fall back to selecting the first copy, which is likely the prevailing copy:

llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Lines 3376 to 3381 in fe8a62c

    
           auto *GVSummary = 
        
               ImportSummary->findSummaryInModule(TheFnVI, M.getModuleIdentifier()); 
        
           if (!GVSummary) 
        
             // Must have been imported, use the first summary (might be multiple if 
        
             // this was a linkonce_odr). 
        
             GVSummary = TheFnVI.getSummaryList().front().get();

I'm surprised I haven't hit this issue yet...need to think about the best way to solve this. A couple of possibilities off the top of my head:

Restrict importing to import only the prevailing def (we have this info in the importer, it is already used for global var importing).
Carry some extra info on imported defs (e.g. metadata) indicating the original module.

1 is pretty straightforward. We may lose some importing but it is only in some corner cases like this one. However, then in the cloning code I link to above we rely on the first linked copy being the prevailing one (likely but not 100% guaranteed). So 2 may be better overall.

Problem description: #81008 (comment) Solution: #81008 (comment) (choose plan2)

llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Feb 7, 2024

teresajohnson requested a review from lifengxiang1025 February 7, 2024 16:17

teresajohnson mentioned this pull request Feb 7, 2024

[MemProf] Fix assert when exists direct recursion #78264

Merged

teresajohnson requested a review from snehasish February 7, 2024 16:56

snehasish approved these changes Feb 7, 2024

View reviewed changes

teresajohnson merged commit 9eed899 into llvm:main Feb 7, 2024
7 checks passed

lifengxiang1025 mentioned this pull request Mar 2, 2024

[MemProf] Match function's summary and definition strictly #83665

Merged

lifengxiang1025 added a commit that referenced this pull request Mar 12, 2024

[MemProf] Match function's summary and definition strictly (#83665)

e40cabf

Problem description: #81008 (comment) Solution: #81008 (comment) (choose plan2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MemProf] Handle empty stack context during ThinLTO cloning #81008

[MemProf] Handle empty stack context during ThinLTO cloning #81008

teresajohnson commented Feb 7, 2024

llvmbot commented Feb 7, 2024 •

edited

Loading

snehasish left a comment

lifengxiang1025 commented Feb 8, 2024

teresajohnson commented Feb 8, 2024

[MemProf] Handle empty stack context during ThinLTO cloning #81008

[MemProf] Handle empty stack context during ThinLTO cloning #81008

Conversation

teresajohnson commented Feb 7, 2024

llvmbot commented Feb 7, 2024 • edited Loading

snehasish left a comment

Choose a reason for hiding this comment

lifengxiang1025 commented Feb 8, 2024

teresajohnson commented Feb 8, 2024

llvmbot commented Feb 7, 2024 •

edited

Loading