-
Notifications
You must be signed in to change notification settings - Fork 15k
[WPD] Reduce ThinLTO link time by avoiding unnecessary summary analysis #164046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-lto Author: Teresa Johnson (teresajohnson) ChangesWe are scanning through every single definition of a vtable across all If this is a local, we want to make sure there isn't another local with We shouldn't have locals and non-locals with the same GUID since local This cut down a large thin link by around 6%, which was over half the Note that we previously took the last conforming vtable summary, and now Full diff: https://github.com/llvm/llvm-project/pull/164046.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
index 2d5cb8268ffdd..e37e0d3142224 100644
--- a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -1142,7 +1142,11 @@ bool DevirtIndex::tryFindVirtualCallTargets(
if (LocalFound)
return false;
LocalFound = true;
- }
+ } else
+ // Don't expect to find a mix of locals and non-locals (due to path
+ // prefix for locals one should never have the same GUID as a
+ // non-local).
+ assert(!LocalFound);
auto *CurVS = cast<GlobalVarSummary>(S->getBaseObject());
if (!CurVS->vTableFuncs().empty() ||
// Previously clang did not attach the necessary type metadata to
@@ -1158,6 +1162,10 @@ bool DevirtIndex::tryFindVirtualCallTargets(
// with public LTO visibility.
if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic)
return false;
+ // Unless VS is a local, we don't need to keep looking through the rest
+ // of the summaries.
+ if (!LocalFound)
+ break;
}
}
// There will be no VS if all copies are available_externally having no
diff --git a/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll b/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
index 1f0737b719254..d0b8d14777f52 100644
--- a/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
+++ b/llvm/test/ThinLTO/X86/devirt_external_comdat_same_guid.ll
@@ -24,9 +24,9 @@
; RUN: llvm-dis %t5.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR1
; RUN: llvm-dis %t5.2.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-IR2
-; PRINT-DAG: Devirtualized call to {{.*}} (_ZN1A1nEi)
+; PRINT-DAG: Devirtualized call to {{.*}} (_ZN1B1nEi)
-; REMARK-DAG: single-impl: devirtualized a call to _ZN1A1nEi
+; REMARK-DAG: single-impl: devirtualized a call to _ZN1B1nEi
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"
@@ -55,7 +55,7 @@ entry:
ret i32 0
}
-; CHECK-IR1: define i32 @test(
+; CHECK-IR1: define noundef i32 @test(
define i32 @test(ptr %obj, i32 %a) {
entry:
%vtable = load ptr, ptr %obj
@@ -65,7 +65,7 @@ entry:
%fptr1 = load ptr, ptr %fptrptr, align 8
; Check that the call was devirtualized.
- ; CHECK-IR1: tail call i32 {{.*}}@_ZN1A1nEi
+ ; CHECK-IR1: tail call i32 {{.*}}@_ZN1B1nEi
%call = tail call i32 %fptr1(ptr nonnull %obj, i32 %a)
ret i32 %call
@@ -73,7 +73,7 @@ entry:
; CHECK-IR2: define i32 @test2
; Check that the call was devirtualized.
-; CHECK-IR2: tail call i32 @_ZN1A1nEi
+; CHECK-IR2: tail call i32 @_ZN1B1nEi
declare i1 @llvm.type.test(ptr, metadata)
declare void @llvm.assume(i1)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad for committing ef97725 directly when I actually meant to do suggest edit. Not an excuse but in the other code review tool that I'm more familiar with, editing a file directly gives suggested edit..
The commit itself means to use bracket following
llvm-project/llvm/docs/CodingStandards.rst
Lines 1732 to 1740 in a3557c3
| // Use braces for the `if` block to keep it uniform with the `else` block. | |
| if (isa<FunctionDecl>(D)) { | |
| handleFunctionDecl(D); | |
| } else { | |
| // In this `else` case, it is necessary that we explain the situation with | |
| // this surprisingly long comment, so it would be unclear without the braces | |
| // whether the following statement is in the scope of the `if`. | |
| handleOtherDecl(D); | |
| } |
LGTM except the nits.
| // Don't expect to find a mix of locals and non-locals (due to path | ||
| // prefix for locals one should never have the same GUID as a | ||
| // non-local). | ||
| assert(!LocalFound); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert(!LocalFound); | |
| assert(!LocalFound && "Local and non-local should have different GUIDs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://llvm.org/docs/CodingStandards.html#assert-liberally suggests put some kind of error message in the assertion statement, which is printed if the assertion is tripped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to update this PR to rebase it on top of PR 164647, and this assert is no longer relevant in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
If I understand the series of reworked optimization correctly, the idea is to skip loop iteration selectively when GlobalValueSummaryList bit states shows it's safe to do so. Statistically this optimization applies for many summary lists and LTO indexing will speed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. PR has been updated, I'll wait until the tests finish running but let me know if you have any other concerns.
IIUC this is an artifact of Before this change the loop uses the last summary, and after this change the loop exits after finding the first summary, so WAI (the |
Correct. |
We are scanning through every single definition of a vtable across all translation units which is unnecessary in most cases. If this is a local, we want to make sure there isn't another local with the same GUID due to it having the same relative path. However, we were always scanning through every single summary in all cases. We shouldn't have locals and non-locals with the same GUID since local GUIDs are computed from "path:name" and non-locals from just "name". So once we find a non-local vtable summary meeting the other constraints we can just use it. This cut down a large thin link by around 6%, which was over half the time it spent in WPD. Note that we previously took the last conforming vtable summary, and now we use the first. This caused a test difference in one somewhat contrived test for vtables in comdats.
ef97725 to
0c3d827
Compare
| // and therefore the same GUID. This can happen if there isn't enough | ||
| // distinguishing path when compiling the source file. In that case we | ||
| // conservatively return false early. | ||
| if (P.VTableVI.hasLocal() && P.VTableVI.getSummaryList().size() > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick on the compile error: I think ValueInfo struct needs to have a method to expose hasLocal state, something like
struct ValueInfo {
...
bool hasLocal() const {
return getRef().second.hasLocal();
}
...
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have that locally, but the branch seems hopelessly messed up right now despite force pushing etc. I may have to close this PR and start a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I think my latest force push added them
| // with public LTO visibility. | ||
| if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic) | ||
| return false; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, loop iterates all VS and returns false if any VS has public visibility.
After this change, the visibility check happens on the first VS. Is my understanding correct that we assumes one-definition-rule and all vtables with the same GUID has the same visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, per https://clang.llvm.org/docs/LTOVisibility.html: "A class’s LTO visibility is treated as an ODR-relevant property of its definition, so it must be consistent between translation units."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. The optimization on finding representative vtable value summary LGTM.
38bbb4e to
3cb75fd
Compare
| // with public LTO visibility. | ||
| if (VS->getVCallVisibility() == GlobalObject::VCallVisibilityPublic) | ||
| return false; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. The optimization on finding representative vtable value summary LGTM.
We are scanning through every single definition of a vtable across all
translation units which is unnecessary in most cases.
If this is a local, we want to make sure there isn't another local with
the same GUID due to it having the same relative path. However, we were
always scanning through every single summary in all cases.
We can now check the new HasLocal flag added in PR164647 ahead of the loop,
instead of checking on every iteration.
This cut down a large thin link by around 6%, which was over half the
time it spent in WPD.
Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.