Skip to content

Conversation

@teresajohnson
Copy link
Contributor

@teresajohnson teresajohnson commented Oct 18, 2025

We are scanning through every single definition of a vtable across all
translation units which is unnecessary in most cases.

If this is a local, we want to make sure there isn't another local with
the same GUID due to it having the same relative path. However, we were
always scanning through every single summary in all cases.

We can now check the new HasLocal flag added in PR164647 ahead of the loop,
instead of checking on every iteration.

This cut down a large thin link by around 6%, which was over half the
time it spent in WPD.

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.

@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Oct 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 18, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-lto

Author: Teresa Johnson (teresajohnson)

Changes

We are scanning through every single definition of a vtable across all
translation units which is unnecessary in most cases.

If this is a local, we want to make sure there isn't another local with
the same GUID due to it having the same relative path. However, we were
always scanning through every single summary in all cases.

We shouldn't have locals and non-locals with the same GUID since local
GUIDs are computed from "path:name" and non-locals from just "name". So
once we find a non-local vtable summary meeting the other constraints we
can just use it.

This cut down a large thin link by around 6%, which was over half the
time it spent in WPD.

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.


Full diff: https://github.com/llvm/llvm-project/pull/164046.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp (+9-1)
  • (modified) llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll (+5-5)
diff --git a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
index 2d5cb8268ffdd..e37e0d3142224 100644
--- a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -1142,7 +1142,11 @@ bool DevirtIndex::tryFindVirtualCallTargets(
         if (LocalFound)
           return false;
         LocalFound = true;
-      }
+      } else
+        // Don't expect to find a mix of locals and non-locals (due to path
+        // prefix for locals one should never have the same GUID as a
+        // non-local).
+        assert(!LocalFound);
       auto *CurVS = cast<GlobalVarSummary>(S->getBaseObject());
       if (!CurVS->vTableFuncs().empty() ||
           // Previously clang did not attach the necessary type metadata to
@@ -1158,6 +1162,10 @@ bool DevirtIndex::tryFindVirtualCallTargets(
         // with public LTO visibility.
         if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic)
           return false;
+        // Unless VS is a local, we don't need to keep looking through the rest
+        // of the summaries.
+        if (!LocalFound)
+          break;
       }
     }
     // There will be no VS if all copies are available_externally having no
diff --git a/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll b/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
index 1f0737b719254..d0b8d14777f52 100644
--- a/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
+++ b/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
@@ -24,9 +24,9 @@
 ; RUN: llvm-dis %t5.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR1
 ; RUN: llvm-dis %t5.2.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR2
 
-; PRINT-DAG: Devirtualized call to {{.*}} (_ZN1A1nEi)
+; PRINT-DAG: Devirtualized call to {{.*}} (_ZN1B1nEi)
 
-; REMARK-DAG: single-impl: devirtualized a call to _ZN1A1nEi
+; REMARK-DAG: single-impl: devirtualized a call to _ZN1B1nEi
 
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-grtev4-linux-gnu"
@@ -55,7 +55,7 @@ entry:
   ret i32 0
 }
 
-; CHECK-IR1: define i32 @test(
+; CHECK-IR1: define noundef i32 @test(
 define i32 @test(ptr %obj, i32 %a) {
 entry:
   %vtable = load ptr, ptr %obj
@@ -65,7 +65,7 @@ entry:
   %fptr1 = load ptr, ptr %fptrptr, align 8
 
   ; Check that the call was devirtualized.
-  ; CHECK-IR1: tail call i32 {{.*}}@_ZN1A1nEi
+  ; CHECK-IR1: tail call i32 {{.*}}@_ZN1B1nEi
   %call = tail call i32 %fptr1(ptr nonnull %obj, i32 %a)
 
   ret i32 %call
@@ -73,7 +73,7 @@ entry:
 
 ; CHECK-IR2: define i32 @test2
 ; Check that the call was devirtualized.
-; CHECK-IR2:   tail call i32 @_ZN1A1nEi
+; CHECK-IR2:   tail call i32 @_ZN1B1nEi
 
 declare i1 @llvm.type.test(ptr, metadata)
 declare void @llvm.assume(i1)

Copy link
Contributor

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad for committing ef97725 directly when I actually meant to do suggest edit. Not an excuse but in the other code review tool that I'm more familiar with, editing a file directly gives suggested edit..

The commit itself means to use bracket following

// Use braces for the `if` block to keep it uniform with the `else` block.
if (isa<FunctionDecl>(D)) {
handleFunctionDecl(D);
} else {
// In this `else` case, it is necessary that we explain the situation with
// this surprisingly long comment, so it would be unclear without the braces
// whether the following statement is in the scope of the `if`.
handleOtherDecl(D);
}

LGTM except the nits.

// Don't expect to find a mix of locals and non-locals (due to path
// prefix for locals one should never have the same GUID as a
// non-local).
assert(!LocalFound);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(!LocalFound);
assert(!LocalFound && "Local and non-local should have different GUIDs");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/CodingStandards.html#assert-liberally suggests put some kind of error message in the assertion statement, which is printed if the assertion is tripped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to update this PR to rebase it on top of PR 164647, and this assert is no longer relevant in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

If I understand the series of reworked optimization correctly, the idea is to skip loop iteration selectively when GlobalValueSummaryList bit states shows it's safe to do so. Statistically this optimization applies for many summary lists and LTO indexing will speed up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. PR has been updated, I'll wait until the tests finish running but let me know if you have any other concerns.

@mingmingl-llvm
Copy link
Contributor

mingmingl-llvm commented Oct 18, 2025

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.

IIUC this is an artifact of llvm-lto2 run %t3.o %t4.o and summary iteration order.

Before this change the loop uses the last summary, and after this change the loop exits after finding the first summary, so WAI (the break takes effect).

@teresajohnson
Copy link
Contributor Author

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.

IIUC this is an artifact of llvm-lto2 run %t3.o %t4.o and summary iteration order.

Before this change the loop uses the last summary, and after this change the loop exits after finding the first summary, so WAI (the break takes effect).

Correct.

We are scanning through every single definition of a vtable across all
translation units which is unnecessary in most cases.

If this is a local, we want to make sure there isn't another local with
the same GUID due to it having the same relative path. However, we were
always scanning through every single summary in all cases.

We shouldn't have locals and non-locals with the same GUID since local
GUIDs are computed from "path:name" and non-locals from just "name". So
once we find a non-local vtable summary meeting the other constraints we
can just use it.

This cut down a large thin link by around 6%, which was over half the
time it spent in WPD.

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.
// and therefore the same GUID. This can happen if there isn't enough
// distinguishing path when compiling the source file. In that case we
// conservatively return false early.
if (P.VTableVI.hasLocal() && P.VTableVI.getSummaryList().size() > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick on the compile error: I think ValueInfo struct needs to have a method to expose hasLocal state, something like

struct ValueInfo {
  ... 
  bool hasLocal() const {
    return getRef().second.hasLocal();
  }
  ...
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have that locally, but the branch seems hopelessly messed up right now despite force pushing etc. I may have to close this PR and start a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I think my latest force push added them

// with public LTO visibility.
if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic)
return false;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, loop iterates all VS and returns false if any VS has public visibility.

After this change, the visibility check happens on the first VS. Is my understanding correct that we assumes one-definition-rule and all vtables with the same GUID has the same visibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, per https://clang.llvm.org/docs/LTOVisibility.html: "A class’s LTO visibility is treated as an ODR-relevant property of its definition, so it must be consistent between translation units."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. The optimization on finding representative vtable value summary LGTM.

// with public LTO visibility.
if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic)
return false;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. The optimization on finding representative vtable value summary LGTM.

@teresajohnson teresajohnson merged commit 9c7b304 into llvm:main Oct 23, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants