Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MemProf] Handle empty stack context during ThinLTO cloning #81008

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

teresajohnson
Copy link
Contributor

Fix for assert after PR#78264.

Handle the case where the MIB context is empty after skipping the
callsite context, because the callsite context is actually longer than
the MIB context. Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case. Need to investigate why this is occuring, but for now handle
this gracefully to fix the build regression.

Fix for assert after PR#78264.

Handle the case where the MIB context is empty after skipping the
callsite context, because the callsite context is actually longer than
the MIB context. Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case. Need to investigate why this is occuring, but for now handle
this gracefully to fix the build regression.
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Feb 7, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 7, 2024

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

Fix for assert after PR#78264.

Handle the case where the MIB context is empty after skipping the
callsite context, because the callsite context is actually longer than
the MIB context. Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case. Need to investigate why this is occuring, but for now handle
this gracefully to fix the build regression.


Full diff: https://github.com/llvm/llvm-project/pull/81008.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+5-1)
  • (added) llvm/test/ThinLTO/X86/memprof-import-fix.ll (+36)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index d81239dd49e4cc..323f33fcc862dc 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -3475,7 +3475,11 @@ bool MemProfContextDisambiguation::applyImport(Module &M) {
             auto ContextIterBegin =
                 StackContext.beginAfterSharedPrefix(CallsiteContext);
             // Skip the checking on the first iteration.
-            uint64_t LastStackContextId = *ContextIterBegin == 0 ? 1 : 0;
+            uint64_t LastStackContextId =
+                (ContextIterBegin != StackContext.end() &&
+                 *ContextIterBegin == 0)
+                    ? 1
+                    : 0;
             for (auto ContextIter = ContextIterBegin;
                  ContextIter != StackContext.end(); ++ContextIter) {
               // If this is a direct recursion, simply skip the duplicate
diff --git a/llvm/test/ThinLTO/X86/memprof-import-fix.ll b/llvm/test/ThinLTO/X86/memprof-import-fix.ll
new file mode 100644
index 00000000000000..aeb54057976a3a
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/memprof-import-fix.ll
@@ -0,0 +1,36 @@
+;; Test to ensure that importing of cloning decisions does not assert when
+;; the callsite context is longer than the MIB context.
+;; FIXME: Presumably this happened as a result of inlining, but in theory the
+;; metadata should have been replaced with an attribute in that case. Need to
+;; investigate why this is occuring.
+
+; RUN: opt -thinlto-bc %s >%t.o
+; RUN: llvm-lto2 run %t.o -enable-memprof-context-disambiguation \
+; RUN:	-supports-hot-cold-new \
+; RUN:	-r=%t.o,main,plx \
+; RUN:	-r=%t.o,_Znam, \
+; RUN:	-save-temps \
+; RUN:	-o %t.out
+
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+;; The call to new is not changed in this case.
+; IR: call ptr @_Znam(i64 0)
+
+source_filename = "memprof-import-fix.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() #0 {
+entry:
+  %call = call ptr @_Znam(i64 0), !memprof !0, !callsite !3
+  ret i32 0
+}
+
+declare ptr @_Znam(i64)
+
+attributes #0 = { noinline optnone }
+
+!0 = !{!1}
+!1 = !{!2, !"notcold"}
+!2 = !{i64 9086428284934609951}
+!3 = !{i64 9086428284934609951, i64 -5964873800580613432}

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@teresajohnson teresajohnson merged commit 9eed899 into llvm:main Feb 7, 2024
7 checks passed
@lifengxiang1025
Copy link
Contributor

Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case.

BTW, I'm investigating one problems now that IR metadata doesn't match function summary. I found one function foo(template function) is linkonce linkage after thin_compile and it has many copies. Copy1 inlines it's callee and copy2 doesn't. So copy1 has one callsite metadata but copy2 doesn't. Copy1's linkage is weak_odr and copy2's linkage is AvailableExternallyLinkage in function summary after thin_link. The problem happens in thin_backend, compiler choose copy2's definition and copy1's summary. And finally trigger an assertion.

@teresajohnson
Copy link
Contributor Author

Presumably this happened as a result of inlining, but
in theory the metadata should have been replaced with an attribute in
that case.

BTW, I'm investigating one problems now that IR metadata doesn't match function summary. I found one function foo(template function) is linkonce linkage after thin_compile and it has many copies. Copy1 inlines it's callee and copy2 doesn't. So copy1 has one callsite metadata but copy2 doesn't. Copy1's linkage is weak_odr and copy2's linkage is AvailableExternallyLinkage in function summary after thin_link. The problem happens in thin_backend, compiler choose copy2's definition and copy1's summary. And finally trigger an assertion.

Hmm, the fact that copy1 has weak_odr linkage and copy2 is available_externally indicates that the linker chose copy1 as the prevailing copy. I suppose we must have decided to import copy2 instead of copy1 for some reason (it can happen, e.g. copy1 might be over the instruction threshold due to the additional inlining, but copy2 wasn't). The code in the backend for cloning does try to locate the summary in that module, but it is the module we are compiling which isn't the module of where we imported from (we no longer have that info). We fall back to selecting the first copy, which is likely the prevailing copy:

auto *GVSummary =
ImportSummary->findSummaryInModule(TheFnVI, M.getModuleIdentifier());
if (!GVSummary)
// Must have been imported, use the first summary (might be multiple if
// this was a linkonce_odr).
GVSummary = TheFnVI.getSummaryList().front().get();

I'm surprised I haven't hit this issue yet...need to think about the best way to solve this. A couple of possibilities off the top of my head:

  1. Restrict importing to import only the prevailing def (we have this info in the importer, it is already used for global var importing).
  2. Carry some extra info on imported defs (e.g. metadata) indicating the original module.

1 is pretty straightforward. We may lose some importing but it is only in some corner cases like this one. However, then in the cloning code I link to above we rely on the first linked copy being the prevailing one (likely but not 100% guaranteed). So 2 may be better overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants