-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. #81442
base: main
Are you sure you want to change the base?
Conversation
Created using spr 1.3.4 [skip ci]
…hCond from versionCallSite
…h vtable-based comparison
✅ With the latest revision this PR passed the C/C++ code formatter. |
indirect-call-promotion with vtable profiles. Clang's `-fwhole-program-vtables` is required for this optimization to take place. If `-fwhole-program-vtables` is not enabled, this change is no-op. Function-comparison (before): VTable-comparison (after): Key changes: 1. Find out virtual calls and the vtables they come from. - The ICP relies on type intrinsic `llvm.type.test` and `llvm.public.type.test` to find out virtual calls and the compatible vtables, and relies on type metadata to find the address point (offset) for comparison. 2. ICP pass does cost-benefit analysis and compares vtable only when both conditions are met 1) The function addressing and vtable load can sink to indirect fallback, and the indirect fallback is cold block 2) The number of vtables for a function candidate is within (option specified) threshold. 3. Sink the function addressing and vtable load instruction to indirect fallback. - The sink helper functions are simplified versions of `InstCombinerImpl::tryToSinkInstruction`. - The helper functions to handle debug intrinsics are copied from `InstCombinerImpl::tryToSinkInstructionDbgValues` and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` into Transforms/Utils/Local.cpp. Ideally only one copy should exist for inst-combine, icp and other passes. 4. Keep value profiles updated 1) Update vtable value profiles after inline 2) For either function-based comparison or vtable-based comparison, update both vtable and indirect call value profiles.
54adf41
to
ff3c219
Compare
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-analysis Author: Mingming Liu (minglotus-6) ChangesClang's
Key changes:
Patch is 83.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81442.diff 13 Files Affected:
diff --git a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
index e51805bdf923c..73921adcc0c15 100644
--- a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
+++ b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
@@ -5,59 +5,61 @@
// ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
// UNSUPPORTED: ppc && host-byteorder-big-endian
-// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o %t-test
-// RUN: env LLVM_PROFILE_FILE=%t-test.profraw %t-test
+// RUN: rm -rf %t && mkdir %t && cd %t
+
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test
// Show vtable profiles from raw profile.
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
// Generate indexed profile from raw profile and show the data.
-// RUN: llvm-profdata merge %t-test.profraw -o %t-test.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge test.profraw -o test.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
// Generate text profile from raw and indexed profiles respectively and show the data.
-// RUN: llvm-profdata merge --text %t-test.profraw -o %t-raw.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-raw.proftext | FileCheck %s --check-prefix=ICTEXT
-// RUN: llvm-profdata merge --text %t-test.profdata -o %t-indexed.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-indexed.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profraw -o raw.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text raw.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profdata -o indexed.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text indexed.proftext | FileCheck %s --check-prefix=ICTEXT
// Generate indexed profile from text profiles and show the data
-// RUN: llvm-profdata merge --binary %t-raw.proftext -o %t-text.profraw
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
-// RUN: llvm-profdata merge --binary %t-indexed.proftext -o %t-text.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary raw.proftext -o text.profraw
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary indexed.proftext -o text.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
// COMMON: Counters:
// COMMON-NEXT: main:
-// COMMON-NEXT: Hash: 0x0f9a16fe6d398548
-// COMMON-NEXT: Counters: 2
+// COMMON-NEXT: Hash: 0x068617320ec408a0
+// COMMON-NEXT: Counters: 4
// COMMON-NEXT: Indirect Call Site Count: 2
// COMMON-NEXT: Number of instrumented vtables: 2
// RAW: Indirect Target Results:
-// RAW-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
-// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
-// RAW-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
-// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
+// RAW-NEXT: [ 0, _ZN8Derived14funcEii, 50 ] (25.00%)
+// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii, 150 ] (75.00%)
+// RAW-NEXT: [ 1, _ZN8Derived1D0Ev, 250 ] (25.00%)
+// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev, 750 ] (75.00%)
// RAW-NEXT: VTable Results:
-// RAW-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
-// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
+// RAW-NEXT: [ 0, _ZTV8Derived1, 50 ] (25.00%)
+// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 150 ] (75.00%)
// RAW-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// INDEXED: Indirect Target Results:
-// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
-// INDEXED-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
-// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
-// INDEXED-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
+// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii, 150 ] (75.00%)
+// INDEXED-NEXT: [ 0, _ZN8Derived14funcEii, 50 ] (25.00%)
+// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev, 750 ] (75.00%)
+// INDEXED-NEXT: [ 1, _ZN8Derived1D0Ev, 250 ] (25.00%)
// INDEXED-NEXT: VTable Results:
-// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
-// INDEXED-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
+// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 150 ] (75.00%)
+// INDEXED-NEXT: [ 0, _ZTV8Derived1, 50 ] (25.00%)
// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// INDEXED-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
// COMMON: Instrumentation level: IR entry_first = 0
// COMMON-NEXT: Functions shown: 1
-// COMMON-NEXT: Total functions: 6
+// COMMON-NEXT: Total functions: 7
// COMMON-NEXT: Maximum function count: 1000
-// COMMON-NEXT: Maximum internal block count: 250
+// COMMON-NEXT: Maximum internal block count: 1000
// COMMON-NEXT: Statistics for indirect call sites profile:
// COMMON-NEXT: Total number of sites: 2
// COMMON-NEXT: Total number of sites with values: 2
@@ -76,11 +78,13 @@
// ICTEXT: :ir
// ICTEXT: main
// ICTEXT: # Func Hash:
-// ICTEXT: 1124236338992350536
+// ICTEXT: 470088714870327456
// ICTEXT: # Num Counters:
-// ICTEXT: 2
+// ICTEXT: 4
// ICTEXT: # Counter Values:
// ICTEXT: 1000
+// ICTEXT: 1000
+// ICTEXT: 200
// ICTEXT: 1
// ICTEXT: # Num Value Kinds:
// ICTEXT: 2
@@ -89,41 +93,50 @@
// ICTEXT: # NumValueSites:
// ICTEXT: 2
// ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
-// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150
+// ICTEXT: _ZN8Derived14funcEii:50
// ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
-// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750
+// ICTEXT: _ZN8Derived1D0Ev:250
// ICTEXT: # ValueKind = IPVK_VTableTarget:
// ICTEXT: 2
// ICTEXT: # NumValueSites:
// ICTEXT: 2
// ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
-// ICTEXT: _ZTV8Derived1:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150
+// ICTEXT: _ZTV8Derived1:50
// ICTEXT: 2
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
// ICTEXT: _ZTV8Derived1:250
+// Test indirect call promotion transformation using vtable profiles.
+// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin -fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm -icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s --check-prefix=REMARK --implicit-check-not="!VP"
+
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, compare 1 vtables and sink 2 instructions
+// REMARK: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, compare 1 vtables and sink 2 instructions
+
#include <cstdio>
#include <cstdlib>
class Base {
public:
- virtual int func1(int a, int b) = 0;
- virtual int func2(int a, int b) = 0;
+ virtual int func(int a, int b) = 0;
+
+ virtual ~Base() {};
};
class Derived1 : public Base {
public:
- int func1(int a, int b) override { return a + b; }
+ int func(int a, int b) override { return a * b; }
- int func2(int a, int b) override { return a * b; }
+ ~Derived1() {}
};
namespace {
class Derived2 : public Base {
public:
- int func1(int a, int b) override { return a - b; }
+ int func(int a, int b) override { return a * (a - b); }
- int func2(int a, int b) override { return a * (a - b); }
+ ~Derived2() {}
};
} // namespace
__attribute__((noinline)) Base *createType(int a) {
@@ -140,7 +153,10 @@ int main(int argc, char **argv) {
int a = rand();
int b = rand();
Base *ptr = createType(i);
- sum += ptr->func1(a, b) + ptr->func2(b, a);
+ if (i % 5 == 0)
+ sum += ptr->func(b, a);
+
+ delete ptr;
}
printf("sum is %d\n", sum);
return 0;
diff --git a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
index 8a05e913a9106..eda672d7d50ee 100644
--- a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
+++ b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
@@ -57,7 +57,7 @@ class ICallPromotionAnalysis {
///
/// The returned array space is owned by this class, and overwritten on
/// subsequent calls.
- ArrayRef<InstrProfValueData>
+ MutableArrayRef<InstrProfValueData>
getPromotionCandidatesForInstruction(const Instruction *I, uint32_t &NumVals,
uint64_t &TotalCount,
uint32_t &NumCandidates);
diff --git a/llvm/include/llvm/Analysis/IndirectCallVisitor.h b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
index 66c972572b06c..f070e83c41689 100644
--- a/llvm/include/llvm/Analysis/IndirectCallVisitor.h
+++ b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
@@ -37,6 +37,9 @@ struct PGOIndirectCallVisitor : public InstVisitor<PGOIndirectCallVisitor> {
// A heuristic is used to find the address feeding instructions.
static Instruction *tryGetVTableInstruction(CallBase *CB) {
assert(CB != nullptr && "Caller guaranteed");
+ if (!CB->isIndirectCall())
+ return nullptr;
+
LoadInst *LI = dyn_cast<LoadInst>(CB->getCalledOperand());
if (LI != nullptr) {
diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index 6937ec8dfd21c..5535a722a40fe 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I,
ArrayRef<DbgVariableIntrinsic *> Insns,
ArrayRef<DbgVariableRecord *> DPInsns);
+void tryToSinkInstructionDbgValues(
+ Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+ BasicBlock *DestBlock, SmallVectorImpl<DbgVariableIntrinsic *> &DbgUsers);
+
+void tryToSinkInstructionDPValues(
+ Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+ BasicBlock *DestBlock,
+ SmallVectorImpl<DbgVariableRecord *> &DbgVariableRecords);
+
/// Given an instruction \p I and DIExpression \p DIExpr operating on
/// it, append the effects of \p I to the DIExpression operand list
/// \p Ops, or return \p nullptr if it cannot be salvaged.
diff --git a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
index ab53717eb889a..643c155ba6d7e 100644
--- a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
+++ b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
@@ -87,7 +87,7 @@ uint32_t ICallPromotionAnalysis::getProfitablePromotionCandidates(
return I;
}
-ArrayRef<InstrProfValueData>
+MutableArrayRef<InstrProfValueData>
ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
const Instruction *I, uint32_t &NumVals, uint64_t &TotalCount,
uint32_t &NumCandidates) {
@@ -96,8 +96,8 @@ ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
ValueDataArray.get(), NumVals, TotalCount);
if (!Res) {
NumCandidates = 0;
- return ArrayRef<InstrProfValueData>();
+ return MutableArrayRef<InstrProfValueData>();
}
NumCandidates = getProfitablePromotionCandidates(I, NumVals, TotalCount);
- return ArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
+ return MutableArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
}
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 23a7c6a20aecb..4de0aaef8d7ca 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -13,13 +13,17 @@
//===----------------------------------------------------------------------===//
#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/IndirectCallPromotionAnalysis.h"
#include "llvm/Analysis/IndirectCallVisitor.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/Analysis/TypeMetadataUtils.h"
+#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"
@@ -37,6 +41,7 @@
#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
#include "llvm/Transforms/Utils/CallPromotionUtils.h"
+#include "llvm/Transforms/Utils/Local.h"
#include <cassert>
#include <cstdint>
#include <memory>
@@ -51,6 +56,8 @@ using namespace llvm;
STATISTIC(NumOfPGOICallPromotion, "Number of indirect call promotions.");
STATISTIC(NumOfPGOICallsites, "Number of indirect call candidate sites.");
+extern cl::opt<unsigned> MaxNumVTableAnnotations;
+
// Command line option to disable indirect-call promotion with the default as
// false. This is for debug purpose.
static cl::opt<bool> DisableICP("disable-icp", cl::init(false), cl::Hidden,
@@ -103,13 +110,202 @@ static cl::opt<bool>
ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
cl::desc("Dump IR after transformation happens"));
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt<bool> ICPEnableVTableCmp(
+ "icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+ cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+ "indirect-call promotion pass will compare vtables rather than "
+ "functions for speculative devirtualization of virtual calls."
+ " If set to false, indirect-call promotion pass will always "
+ "compare functions."));
+
+static cl::opt<float>
+ ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+ cl::Hidden,
+ cl::desc("Percentage of vtable count to compare"));
+
+static cl::opt<int> ICPNumAdditionalVTableLast(
+ "icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+ cl::desc("The number of additional instruction for the last candidate"));
+
namespace {
+using VTableAddressPointOffsetValMap =
+ SmallDenseMap<const GlobalVariable *, SmallDenseMap<int, Constant *, 4>, 8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+ // The offset from the address point to virtual function in the vtable.
+ uint64_t FunctionOffset;
+ // The instruction that computes the address point of vtable.
+ Instruction *VPtr;
+ // The compatible type used in LLVM type intrinsics.
+ StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+ SmallDenseMap<const CallBase *, VirtualCallSiteInfo, 8>;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional<uint64_t>
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+ StringRef CompatibleType) {
+ SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
+ VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+ for (MDNode *Type : Types)
+ if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get());
+ TypeId && TypeId->getString() == CompatibleType)
+
+ return cast<ConstantInt>(
+ cast<ConstantAsMetadata>(Type->getOperand(0))->getValue())
+ ->getZExtValue();
+
+ return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+ uint32_t AddressPointOffset) {
+ Module &M = *VTable->getParent();
+ LLVMContext &Context = M.getContext();
+ assert(AddressPointOffset <
+ M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+ "Out-of-bound access");
+
+ return ConstantExpr::getInBoundsGetElementPtr(
+ Type::getInt8Ty(Context), VTable,
+ llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+ Instruction *UserInst) {
+ if (PHINode *PN = dyn_cast<PHINode>(UserInst))
+ return PN->getIncomingBlock(
+ PHINode::getIncomingValueNumForOperand(OperandNo));
+
+ return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB`
+// is dominated by `Inst->getParent()` and we don't need to sink across a
+// critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+ BasicBlock *BB = Inst->getParent();
+ assert(Inst->getParent() != DestBB &&
+ BB->getTerminator()->getNumSuccessors() == 2 &&
+ "Caller should guarantee");
+ // Do not sink across a critical edge for simplicity.
+ if (DestBB->getUniquePredecessor() != BB)
+ return false;
+
+ // Now we know BB dominates DestBB.
+ BasicBlock *UserBB = nullptr;
+ for (Use &Use : Inst->uses()) {
+ User *User = Use.getUser();
+ // Do checked cast since IR verifier guarantees that the user of an
+ // instruction must be an instruction. See `Verifier::visitInstruction`.
+ Instruction *UserInst = cast<Instruction>(User);
+ // We can sink debug or pseudo instructions together with Inst.
+ if (UserInst->isDebugOrPseudoInst())
+ continue;
+ UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst);
+ // Do not...
[truncated]
|
Can we rely on instcombine to do the sinking? (the patch summary has a small error -- the comparison instruction in the example should be %cond = icmp eq ptr %vtable, @vtable-address-point |
Good question. https://gcc.godbolt.org/z/G7Paj7h37 has two examples, |
thanks. I corrected it. |
@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee( | |||
uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount; | |||
for (auto Entry : *VMap) { | |||
if (isa<CallInst>(Entry.first)) | |||
if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second)) | |||
if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tryGetVTableInstuction can handle non-call/invoke instruction (returns null), then the code duplication can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to not further duplicate the code. How about something like this?
if (isa<CallBase>(Entry.first)) {
if(auto* CB = dyn_cast_or_null<CallBase>(Entry.Base)) {
CB->updateProfWeight(...
Instruction* VPtr = PGOIndirectCallVisitor::tryGetVTableInstruction(CB);
if (VPtr)
scaleProfData(VPtr...)
}
CallBase
is the base class of five IR instructions [1] now. Among them CoroAwaitSuspendInst
[1] and GCStatePointInst
[2] are relatively new; the rest three are well understood. Despite more derived classes of CallBase
might be added for new cases, the code above should be future-proof since updateProfWeight
and scaleProfData
are programmed to !prof
(not CallBase
).
[1]
llvm-project/llvm/lib/Transforms/Coroutines/CoroInstr.h
Lines 85 to 88 in 692ae54
class LLVM_LIBRARY_VISIBILITY CoroAwaitSuspendInst : public CallBase { | |
enum { AwaiterArg, FrameArg, WrapperArg }; | |
public: |
[2]
llvm-project/llvm/include/llvm/IR/Statepoint.h
Lines 61 to 63 in 692ae54
class GCStatepointInst : public CallBase { | |
public: | |
GCStatepointInst() = delete; |
@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I, | |||
ArrayRef<DbgVariableIntrinsic *> Insns, | |||
ArrayRef<DbgVariableRecord *> DPInsns); | |||
|
|||
void tryToSinkInstructionDbgValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description for the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug intrinsics are not handled in this PR, I un-do Local.h/cpp
changes (and overlooked this comment).
The updated compiler-rt test (instrprof-vtable-value-prof.cpp) enabled -g
and tested IR for the first callsite.
llvm/lib/Transforms/Utils/Local.cpp
Outdated
@@ -2538,6 +2538,190 @@ Value *getSalvageOpsForIcmpOp(ICmpInst *Icmp, uint64_t CurrentLocOps, | |||
return Icmp->getOperand(0); | |||
} | |||
|
|||
void llvm::tryToSinkInstructionDbgValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there test coverage for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gep
and load
sequence doesn't have debug values (so far) so it's not easy to test within this PR.
IMO helper functions that handles debug value along instruction sink (e.g.
void InstCombinerImpl::tryToSinkInstructionDbgValues( |
cl::Hidden, | ||
cl::desc("Percentage of vtable count to compare")); | ||
|
||
static cl::opt<int> ICPNumAdditionalVTableLast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable. | ||
VTableVar.getMetadata(LLVMContext::MD_type, Types); | ||
|
||
for (MDNode *Type : Types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can it have more than one types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type metadata are generated for compatible vtables and virtual functions.
https://gcc.godbolt.org/z/rr5YY7cdv is an example to show the !type
metadata list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the compatible types aka the static types that can actually contain a pointer to this vtable. E.g. something statically declared to have type Base*
can point to a Derived*
but not vice versa. So the Base vtable is only compatible with the Base*
type and a Derived vtable is compatible with both Base*
and Derived*
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are getting here is the address point offset of the given compatible type within the given vtable. See also https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata. It might be good to add a comment describing this more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the function comment based on Teresa's input, and renamed the function to getAddressPointOffset
} | ||
|
||
// Returns the basic block in which `Inst` by `Use`. | ||
static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "Inst" used in the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the catch. I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meanwhile, I can find a few places where the gist of helper function is used. I'm planning to have some canonical helper functions maybe in llvm/lib/Transforms/Util
directory.
Examples:
llvm-project/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
Lines 1139 to 1143 in ce5b371
BasicBlock *UseBB = User->getParent(); if (PHINode *P = dyn_cast<PHINode>(User)) { unsigned i = PHINode::getIncomingValueNumForOperand(U.getOperandNo()); UseBB = P->getIncomingBlock(i); llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
Lines 3517 to 3520 in ce5b371
const BasicBlock *UseBB = !isa<PHINode>(UserInst) ? UserInst->getParent() : cast<PHINode>(UserInst)->getIncomingBlock( PHINode::getIncomingValueNumForOperand(U.getOperandNo())); - (made closer to this function by my change [nfc][InstCombine]Find PHI incoming block by operand number #93249)
llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
Lines 5011 to 5014 in 5f243b3
// Special handling for Phi nodes - get the block the use occurs in. BasicBlock *UserBB = UserInst->getParent(); if (PHINode *PN = dyn_cast<PHINode>(UserInst)) UserBB = PN->getIncomingBlock(U);
|
||
int SinkCount = 0; | ||
// FIXME: Find a way to bail out of the loop. | ||
for (Instruction &I : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the code assume 1 instruction to be dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines to check instruction sink legality (https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR237-R267) doesn't rely on this. Instead, those lines are strict about compile-time correctness.
In reality, getelementptr
and load
should sink to indirect fallback for real-world cases. When it's not legal to sink them, the implementation here won't do it.
- Note
getelementptr
andload
are two IR instructions but likely one machine instruction after instruction selection. - As indirect-call-promotion interleaves with instruction sink, we begin to see simple blocks like
else-bb
(3 IR instructions) starting from the second candidate of one indirect call site.
bb0:
vtable = load obj
cmp = icmp eq vtable, ...
br cmp if-bb, else-bb
if-bb:
call direct-call
br bb-merge
else-bb:
func-addr = getelementptr
func-ptr = load func-addr
call func-ptr
br bb-merge
bb-merge:
res = phi ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped. Perhaps add a brief comment: try to sink all eligible instructions in OriginalBB in reverse order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped.
Ah I see, thanks for clarifying. I added a brief comment.
Meanwhile pass OriginalBB
as function argument rather than VPtr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the tests yet, but some comments/questions from a first pass through most of the code
" If set to false, indirect-call promotion pass will always " | ||
"compare functions.")); | ||
|
||
static cl::opt<float> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
namespace { | ||
|
||
using VTableAddressPointOffsetValMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment describing the contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable. | ||
VTableVar.getMetadata(LLVMContext::MD_type, Types); | ||
|
||
for (MDNode *Type : Types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the compatible types aka the static types that can actually contain a pointer to this vtable. E.g. something statically declared to have type Base*
can point to a Derived*
but not vice versa. So the Base vtable is only compatible with the Base*
type and a Derived vtable is compatible with both Base*
and Derived*
types.
SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable. | ||
VTableVar.getMetadata(LLVMContext::MD_type, Types); | ||
|
||
for (MDNode *Type : Types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are getting here is the address point offset of the given compatible type within the given vtable. See also https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata. It might be good to add a comment describing this more explicitly.
llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset)); | ||
} | ||
|
||
// Returns the basic block in which `Inst` by `Use`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow the comment, can you expand it a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment is broken. I updated it.
|
||
computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo); | ||
|
||
// This map records states across functions in an LLVM IR module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "records states"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VTableAddressPointOffsetVal
stores the vtable address points. The vtable address point of a given <vtable, address point offset>
is static (doesn't change after being computed once).
IndirectCallPromoter::getOrCreateVTableAddressPointVar
creates the map entry the first time a <vtable, offset>
pair is seen, as promoteIndirectCalls
processes an IR module and calls IndirectCallPromoter
repeatedly on each function.
I reworded the comments. PTAL.
// Right now only llvm.type.test is used to find out virtual call sites. | ||
// With ThinLTO and whole-program-devirtualization, llvm.type.test and | ||
// llvm.public.type.test are emitted, and llvm.public.type.test is either | ||
// refined to llvm.type.test or dropped before indirect-call-promotion pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we then need to do the below analysis on llvm.public.type.test? ICP should be after it gets converted or dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ICP pass is disabled in prelink, we don't need to analyze llvm.public.type.test
as you point out. llvm.public.type.test
was analyzed here because I didn't disable prelink ICP for compiler-rt test.
Now I updated compiler-rt test to disable prelink ICP and enable postlink ICP, and skip the analysis for llvm.public.type.test
.
return Iter->second; | ||
} | ||
|
||
Instruction *IndirectCallPromoter::computeVTableInfos( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot going on in this function - can you add some more description at the top describing how this works and/or some more comments before each block of code within (particularly the latter half of the function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document output parameters and return value around class method declaration, and added more comments in the function (e.g., one example with detailed steps) as suggested.
VTableGUIDCountsMap &VTableGUIDCounts) { | ||
SmallVector<uint64_t, 4> PromotedFuncCount; | ||
for (const auto &Candidate : Candidates) { | ||
uint64_t IfCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name IfCount isn't very intuitive to me, suggest using a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable represents the 'if-branch' count of the branch weights for conditional devirtualization. I removed it. Now Candidate.Count
represents if branch count and TotalFuncCount - Candidate.Count
represents else branch count.
for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) | ||
SumVTableCount += VTableCount; | ||
|
||
for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment describing the intuition behind the way the counts are being updated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
int SinkCount = 0; | ||
// FIXME: Find a way to bail out of the loop. | ||
for (Instruction &I : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped. Perhaps add a brief comment: try to sink all eligible instructions in OriginalBB in reverse order.
return false; | ||
|
||
// Do not sink convergent call instructions. | ||
if (const auto *C = dyn_cast<CallBase>(I)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a common utility function to do the checks below or add a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added FIXME: Move the sink eligibility check below to a utility function in Transforms/Utils/ directory.
.
The sink correctness check is a copy of the core logic in inst-combine pass (after removing one irrelevant case and making the check stricter by removing this)
llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
Lines 4684 to 4726 in 1752740
// Cannot move control-flow-involving, volatile loads, vaarg, etc. | |
if (isa<PHINode>(I) || I->isEHPad() || I->mayThrow() || !I->willReturn() || | |
I->isTerminator()) | |
return false; | |
// Do not sink static or dynamic alloca instructions. Static allocas must | |
// remain in the entry block, and dynamic allocas must not be sunk in between | |
// a stacksave / stackrestore pair, which would incorrectly shorten its | |
// lifetime. | |
if (isa<AllocaInst>(I)) | |
return false; | |
// Do not sink into catchswitch blocks. | |
if (isa<CatchSwitchInst>(DestBlock->getTerminator())) | |
return false; | |
// Do not sink convergent call instructions. | |
if (auto *CI = dyn_cast<CallInst>(I)) { | |
if (CI->isConvergent()) | |
return false; | |
} | |
// Unless we can prove that the memory write isn't visibile except on the | |
// path we're sinking to, we must bail. | |
if (I->mayWriteToMemory()) { | |
if (!SoleWriteToDeadLocal(I, TLI)) | |
return false; | |
} | |
// We can only sink load instructions if there is nothing between the load and | |
// the end of block that could change the value. | |
if (I->mayReadFromMemory()) { | |
// We don't want to do any sophisticated alias analysis, so we only check | |
// the instructions after I in I's parent block if we try to sink to its | |
// successor block. | |
if (DestBlock->getUniquePredecessor() != I->getParent()) | |
return false; | |
for (BasicBlock::iterator Scan = std::next(I->getIterator()), | |
E = I->getParent()->end(); | |
Scan != E; ++Scan) | |
if (Scan->mayWriteToMemory()) | |
return false; | |
} |
for (BasicBlock::iterator Scan = std::next(I->getIterator()), | ||
E = I->getParent()->end(); | ||
Scan != E; ++Scan) | ||
if (Scan->mayWriteToMemory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, aliasing check can be done here, but probably not important in reality for the code pattern involved.
Clang's
-fwhole-program-vtables
is required for this optimization to take place. If-fwhole-program-vtables
is not enabled, this change is no-op.Key changes:
llvm.type.test
andllvm.public.type.test
to find out virtual calls and thecompatible vtables, and relies on type metadata to find the address point for comparison.
InstCombinerImpl::tryToSinkInstruction
. Currently debug intrinsics are not handled. IdeallyInstCombinerImpl::tryToSinkInstructionDbgValues
andInstCombinerImpl::tryToSinkInstructionDbgVariableRecords
could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks.update both vtable and indirect call value profiles.