Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. #81442

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

minglotus-6
Copy link
Contributor

@minglotus-6 minglotus-6 commented Feb 12, 2024

Clang's -fwhole-program-vtables is required for this optimization to take place. If -fwhole-program-vtables is not enabled, this change is no-op.

  • Function-comparison (before):
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
  • VTable-comparison (after):
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func

Key changes:

  1. Find out virtual calls and the vtables they come from.
    • The ICP relies on type intrinsic llvm.type.test and llvm.public.type.test to find out virtual calls and the
      compatible vtables, and relies on type metadata to find the address point for comparison.
  2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold.
  3. Sink the function addressing and vtable load instruction to indirect fallback.
    • The sink helper functions are simplified versions of InstCombinerImpl::tryToSinkInstruction. Currently debug intrinsics are not handled. Ideally InstCombinerImpl::tryToSinkInstructionDbgValues and InstCombinerImpl::tryToSinkInstructionDbgVariableRecords could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks.
  4. Keep value profiles updated
    1. Update vtable value profiles after inline
    2. For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.

@minglotus-6 minglotus-6 changed the base branch from main to users/minglotus-6/spr/vcsv February 12, 2024 06:30
Copy link

github-actions bot commented Feb 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Base automatically changed from users/minglotus-6/spr/vcsv to main May 19, 2024 23:33
@minglotus-6 minglotus-6 changed the base branch from main to users/minglotus-6/spr/vcsv May 19, 2024 23:34
@minglotus-6 minglotus-6 changed the base branch from users/minglotus-6/spr/vcsv to main May 20, 2024 04:29
@minglotus-6 minglotus-6 changed the title [TypeProf][IndirectCallPromotion]Implement vtable-based transformation [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. May 28, 2024
indirect-call-promotion with vtable profiles.

Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.

Function-comparison (before):

VTable-comparison (after):

Key changes:
1. Find out virtual calls and the vtables they come from.
   - The ICP relies on type intrinsic `llvm.type.test` and
     `llvm.public.type.test` to find out virtual calls and the
     compatible vtables, and relies on type metadata to find the address
     point (offset) for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when
   both conditions are met
   1) The function addressing and vtable load can sink to indirect
      fallback, and the indirect fallback is cold block
   2) The number of vtables for a function candidate is within
      (option specified) threshold.
3. Sink the function addressing and vtable load instruction to indirect
   fallback.
   - The sink helper functions are simplified versions of
     `InstCombinerImpl::tryToSinkInstruction`.
   - The helper functions to handle debug intrinsics are copied from
     `InstCombinerImpl::tryToSinkInstructionDbgValues` and
     `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` into
     Transforms/Utils/Local.cpp. Ideally only one copy should exist
     for inst-combine, icp and other passes.
4. Keep value profiles updated
   1) Update vtable value profiles after inline
   2) For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.
@minglotus-6 minglotus-6 force-pushed the users/minglotus-6/spr/icpass branch from 54adf41 to ff3c219 Compare May 28, 2024 15:52
@minglotus-6 minglotus-6 marked this pull request as ready for review May 28, 2024 15:53
@llvmbot
Copy link
Collaborator

llvmbot commented May 28, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-analysis

Author: Mingming Liu (minglotus-6)

Changes

Clang's -fwhole-program-vtables is required for this optimization to take place. If -fwhole-program-vtables is not enabled, this change is no-op.

  • Function-comparison (before):
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @<!-- -->callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @<!-- -->callee

bb2:
   call %func
  • VTable-comparison (after):
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %func, @<!-- -->vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @<!-- -->callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func

Key changes:

  1. Find out virtual calls and the vtables they come from.
    • The ICP relies on type intrinsic llvm.type.test and llvm.public.type.test to find out virtual calls and the
      compatible vtables, and relies on type metadata to find the address point for comparison.
  2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold.
  3. Sink the function addressing and vtable load instruction to indirect fallback.
    • The sink helper functions are simplified versions of
      InstCombinerImpl::tryToSinkInstruction.
    • The helper functions to handle debug intrinsics are copied from
      InstCombinerImpl::tryToSinkInstructionDbgValues and
      InstCombinerImpl::tryToSinkInstructionDbgVariableRecords into
      Transforms/Utils/Local.cpp. Ideally only one copy should exist
      for inst-combine, icp and other passes.
  4. Keep value profiles updated
    1. Update vtable value profiles after inline
    2. For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.

Patch is 83.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81442.diff

13 Files Affected:

  • (modified) compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp (+60-44)
  • (modified) llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h (+1-1)
  • (modified) llvm/include/llvm/Analysis/IndirectCallVisitor.h (+3)
  • (modified) llvm/include/llvm/Transforms/Utils/Local.h (+9)
  • (modified) llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp (+3-3)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+595-32)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+31-5)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+184)
  • (modified) llvm/test/Transforms/Inline/update_invoke_prof.ll (+46-28)
  • (modified) llvm/test/Transforms/Inline/update_value_profile.ll (+29-25)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll (+139)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_invoke.ll (+127)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_tail_call.ll (+67)
diff --git a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
index e51805bdf923c..73921adcc0c15 100644
--- a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
+++ b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
@@ -5,59 +5,61 @@
 // ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
 // UNSUPPORTED: ppc && host-byteorder-big-endian
 
-// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o %t-test
-// RUN: env LLVM_PROFILE_FILE=%t-test.profraw %t-test
+// RUN: rm -rf %t && mkdir %t && cd %t
+
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test
 
 // Show vtable profiles from raw profile.
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
 
 // Generate indexed profile from raw profile and show the data.
-// RUN: llvm-profdata merge %t-test.profraw -o %t-test.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge test.profraw -o test.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
 
 // Generate text profile from raw and indexed profiles respectively and show the data.
-// RUN: llvm-profdata merge --text %t-test.profraw -o %t-raw.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-raw.proftext | FileCheck %s --check-prefix=ICTEXT
-// RUN: llvm-profdata merge --text %t-test.profdata -o %t-indexed.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-indexed.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profraw -o raw.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text raw.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profdata -o indexed.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text indexed.proftext | FileCheck %s --check-prefix=ICTEXT
 
 // Generate indexed profile from text profiles and show the data
-// RUN: llvm-profdata merge --binary %t-raw.proftext -o %t-text.profraw
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
-// RUN: llvm-profdata merge --binary %t-indexed.proftext -o %t-text.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary raw.proftext -o text.profraw
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary indexed.proftext -o text.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
 
 // COMMON: Counters:
 // COMMON-NEXT:  main:
-// COMMON-NEXT:  Hash: 0x0f9a16fe6d398548
-// COMMON-NEXT:  Counters: 2
+// COMMON-NEXT:  Hash: 0x068617320ec408a0
+// COMMON-NEXT:  Counters: 4
 // COMMON-NEXT:  Indirect Call Site Count: 2
 // COMMON-NEXT:  Number of instrumented vtables: 2
 // RAW:  Indirect Target Results:
-// RAW-NEXT:       [  0, _ZN8Derived15func1Eii,        250 ] (25.00%)
-// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,        750 ] (75.00%)
-// RAW-NEXT:       [  1, _ZN8Derived15func2Eii,        250 ] (25.00%)
-// RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,        750 ] (75.00%)
+// RAW-NEXT:       [  0, _ZN8Derived14funcEii,        50 ] (25.00%)
+// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii,        150 ] (75.00%)
+// RAW-NEXT:       [  1, _ZN8Derived1D0Ev,        250 ] (25.00%)
+// RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev,        750 ] (75.00%)
 // RAW-NEXT:  VTable Results:
-// RAW-NEXT:       [  0, _ZTV8Derived1,        250 ] (25.00%)
-// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
+// RAW-NEXT:       [  0, _ZTV8Derived1,        50 ] (25.00%)
+// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        150 ] (75.00%)
 // RAW-NEXT:       [  1, _ZTV8Derived1,        250 ] (25.00%)
 // RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
 // INDEXED:     Indirect Target Results:
-// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,        750 ] (75.00%)
-// INDEXED-NEXT:         [  0, _ZN8Derived15func1Eii,        250 ] (25.00%)
-// INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,        750 ] (75.00%)
-// INDEXED-NEXT:         [  1, _ZN8Derived15func2Eii,        250 ] (25.00%)
+// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii,        150 ] (75.00%)
+// INDEXED-NEXT:         [  0, _ZN8Derived14funcEii,        50 ] (25.00%)
+// INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev,        750 ] (75.00%)
+// INDEXED-NEXT:         [  1, _ZN8Derived1D0Ev,        250 ] (25.00%)
 // INDEXED-NEXT:     VTable Results:
-// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
-// INDEXED-NEXT:         [  0, _ZTV8Derived1,        250 ] (25.00%)
+// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        150 ] (75.00%)
+// INDEXED-NEXT:         [  0, _ZTV8Derived1,        50 ] (25.00%)
 // INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
 // INDEXED-NEXT:         [  1, _ZTV8Derived1,        250 ] (25.00%)
 // COMMON: Instrumentation level: IR  entry_first = 0
 // COMMON-NEXT: Functions shown: 1
-// COMMON-NEXT: Total functions: 6
+// COMMON-NEXT: Total functions: 7
 // COMMON-NEXT: Maximum function count: 1000
-// COMMON-NEXT: Maximum internal block count: 250
+// COMMON-NEXT: Maximum internal block count: 1000
 // COMMON-NEXT: Statistics for indirect call sites profile:
 // COMMON-NEXT:   Total number of sites: 2
 // COMMON-NEXT:   Total number of sites with values: 2
@@ -76,11 +78,13 @@
 // ICTEXT: :ir
 // ICTEXT: main
 // ICTEXT: # Func Hash:
-// ICTEXT: 1124236338992350536
+// ICTEXT: 470088714870327456
 // ICTEXT: # Num Counters:
-// ICTEXT: 2
+// ICTEXT: 4
 // ICTEXT: # Counter Values:
 // ICTEXT: 1000
+// ICTEXT: 1000
+// ICTEXT: 200
 // ICTEXT: 1
 // ICTEXT: # Num Value Kinds:
 // ICTEXT: 2
@@ -89,41 +93,50 @@
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
-// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150
+// ICTEXT: _ZN8Derived14funcEii:50
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
-// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750
+// ICTEXT: _ZN8Derived1D0Ev:250
 // ICTEXT: # ValueKind = IPVK_VTableTarget:
 // ICTEXT: 2
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
-// ICTEXT: _ZTV8Derived1:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150
+// ICTEXT: _ZTV8Derived1:50
 // ICTEXT: 2
 // ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
 // ICTEXT: _ZTV8Derived1:250
 
+// Test indirect call promotion transformation using vtable profiles.
+// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin -fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm -icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s --check-prefix=REMARK --implicit-check-not="!VP"
+
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, compare 1 vtables and sink 2 instructions
+// REMARK: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, compare 1 vtables and sink 2 instructions
+
 #include <cstdio>
 #include <cstdlib>
 class Base {
 public:
-  virtual int func1(int a, int b) = 0;
-  virtual int func2(int a, int b) = 0;
+  virtual int func(int a, int b) = 0;
+
+  virtual ~Base() {};
 };
 class Derived1 : public Base {
 public:
-  int func1(int a, int b) override { return a + b; }
+  int func(int a, int b) override { return a * b; }
 
-  int func2(int a, int b) override { return a * b; }
+  ~Derived1() {}
 };
 namespace {
 class Derived2 : public Base {
 public:
-  int func1(int a, int b) override { return a - b; }
+  int func(int a, int b) override { return a * (a - b); }
 
-  int func2(int a, int b) override { return a * (a - b); }
+  ~Derived2() {}
 };
 } // namespace
 __attribute__((noinline)) Base *createType(int a) {
@@ -140,7 +153,10 @@ int main(int argc, char **argv) {
     int a = rand();
     int b = rand();
     Base *ptr = createType(i);
-    sum += ptr->func1(a, b) + ptr->func2(b, a);
+    if (i % 5 == 0)
+      sum += ptr->func(b, a);
+
+    delete ptr;
   }
   printf("sum is %d\n", sum);
   return 0;
diff --git a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
index 8a05e913a9106..eda672d7d50ee 100644
--- a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
+++ b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
@@ -57,7 +57,7 @@ class ICallPromotionAnalysis {
   ///
   /// The returned array space is owned by this class, and overwritten on
   /// subsequent calls.
-  ArrayRef<InstrProfValueData>
+  MutableArrayRef<InstrProfValueData>
   getPromotionCandidatesForInstruction(const Instruction *I, uint32_t &NumVals,
                                        uint64_t &TotalCount,
                                        uint32_t &NumCandidates);
diff --git a/llvm/include/llvm/Analysis/IndirectCallVisitor.h b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
index 66c972572b06c..f070e83c41689 100644
--- a/llvm/include/llvm/Analysis/IndirectCallVisitor.h
+++ b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
@@ -37,6 +37,9 @@ struct PGOIndirectCallVisitor : public InstVisitor<PGOIndirectCallVisitor> {
   // A heuristic is used to find the address feeding instructions.
   static Instruction *tryGetVTableInstruction(CallBase *CB) {
     assert(CB != nullptr && "Caller guaranteed");
+    if (!CB->isIndirectCall())
+      return nullptr;
+
     LoadInst *LI = dyn_cast<LoadInst>(CB->getCalledOperand());
 
     if (LI != nullptr) {
diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index 6937ec8dfd21c..5535a722a40fe 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I,
                                   ArrayRef<DbgVariableIntrinsic *> Insns,
                                   ArrayRef<DbgVariableRecord *> DPInsns);
 
+void tryToSinkInstructionDbgValues(
+    Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+    BasicBlock *DestBlock, SmallVectorImpl<DbgVariableIntrinsic *> &DbgUsers);
+
+void tryToSinkInstructionDPValues(
+    Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+    BasicBlock *DestBlock,
+    SmallVectorImpl<DbgVariableRecord *> &DbgVariableRecords);
+
 /// Given an instruction \p I and DIExpression \p DIExpr operating on
 /// it, append the effects of \p I to the DIExpression operand list
 /// \p Ops, or return \p nullptr if it cannot be salvaged.
diff --git a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
index ab53717eb889a..643c155ba6d7e 100644
--- a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
+++ b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
@@ -87,7 +87,7 @@ uint32_t ICallPromotionAnalysis::getProfitablePromotionCandidates(
   return I;
 }
 
-ArrayRef<InstrProfValueData>
+MutableArrayRef<InstrProfValueData>
 ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
     const Instruction *I, uint32_t &NumVals, uint64_t &TotalCount,
     uint32_t &NumCandidates) {
@@ -96,8 +96,8 @@ ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
                                ValueDataArray.get(), NumVals, TotalCount);
   if (!Res) {
     NumCandidates = 0;
-    return ArrayRef<InstrProfValueData>();
+    return MutableArrayRef<InstrProfValueData>();
   }
   NumCandidates = getProfitablePromotionCandidates(I, NumVals, TotalCount);
-  return ArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
+  return MutableArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
 }
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 23a7c6a20aecb..4de0aaef8d7ca 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -13,13 +13,17 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/IndirectCallPromotionAnalysis.h"
 #include "llvm/Analysis/IndirectCallVisitor.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/Analysis/TypeMetadataUtils.h"
+#include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
@@ -37,6 +41,7 @@
 #include "llvm/Transforms/Instrumentation.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Utils/CallPromotionUtils.h"
+#include "llvm/Transforms/Utils/Local.h"
 #include <cassert>
 #include <cstdint>
 #include <memory>
@@ -51,6 +56,8 @@ using namespace llvm;
 STATISTIC(NumOfPGOICallPromotion, "Number of indirect call promotions.");
 STATISTIC(NumOfPGOICallsites, "Number of indirect call candidate sites.");
 
+extern cl::opt<unsigned> MaxNumVTableAnnotations;
+
 // Command line option to disable indirect-call promotion with the default as
 // false. This is for debug purpose.
 static cl::opt<bool> DisableICP("disable-icp", cl::init(false), cl::Hidden,
@@ -103,13 +110,202 @@ static cl::opt<bool>
     ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
                  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt<bool> ICPEnableVTableCmp(
+    "icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+    cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+             "indirect-call promotion pass will compare vtables rather than "
+             "functions for speculative devirtualization of virtual calls."
+             " If set to false, indirect-call promotion pass will always "
+             "compare functions."));
+
+static cl::opt<float>
+    ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+                             cl::Hidden,
+                             cl::desc("Percentage of vtable count to compare"));
+
+static cl::opt<int> ICPNumAdditionalVTableLast(
+    "icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+    cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+    SmallDenseMap<const GlobalVariable *, SmallDenseMap<int, Constant *, 4>, 8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+    SmallDenseMap<const CallBase *, VirtualCallSiteInfo, 8>;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional<uint64_t>
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+                        StringRef CompatibleType) {
+  SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+    if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get());
+        TypeId && TypeId->getString() == CompatibleType)
+
+      return cast<ConstantInt>(
+                 cast<ConstantAsMetadata>(Type->getOperand(0))->getValue())
+          ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+                                             uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+             M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+         "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+      Type::getInt8Ty(Context), VTable,
+      llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+                                     Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast<PHINode>(UserInst))
+    return PN->getIncomingBlock(
+        PHINode::getIncomingValueNumForOperand(OperandNo));
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB`
+//    is dominated by `Inst->getParent()` and we don't need to sink across a
+//    critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+         BB->getTerminator()->getNumSuccessors() == 2 &&
+         "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+    return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+    User *User = Use.getUser();
+    // Do checked cast since IR verifier guarantees that the user of an
+    // instruction must be an instruction. See `Verifier::visitInstruction`.
+    Instruction *UserInst = cast<Instruction>(User);
+    // We can sink debug or pseudo instructions together with Inst.
+    if (UserInst->isDebugOrPseudoInst())
+      continue;
+    UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst);
+    // Do not...
[truncated]

@david-xl
Copy link
Contributor

Can we rely on instcombine to do the sinking?

(the patch summary has a small error -- the comparison instruction in the example should be %cond = icmp eq ptr %vtable, @vtable-address-point

@minglotus-6
Copy link
Contributor Author

Can we rely on instcombine to do the sinking?

Good question. https://gcc.godbolt.org/z/G7Paj7h37 has two examples, instcombine can sink the first but cannot see through the second. I filed #88960 to discuss this, and I think it's more reliable to interleave the instruction sink with vtable transformations. I do think it's very preferred to re-use the helper functions like tryToSinkInstructionDbgValues and tryToSinkInstructionDbgVariableRecords.

@minglotus-6
Copy link
Contributor Author

(the patch summary has a small error -- the comparison instruction in the example should be %cond = icmp eq ptr %vtable, @vtable-address-point

thanks. I corrected it.

1. In Transforms/Utils/Local.{h,cpp}, remove debug intrinisc helper
   functions, and make it a TODO to handle debug intrinsic when sinking
   instructions.
2. In compiler-rt test, added lines to test IR are expected.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen labels May 28, 2024
@minglotus-6 minglotus-6 removed clang:codegen clang Clang issues not falling into any other category labels May 28, 2024
@@ -1967,11 +1969,23 @@ void llvm::updateProfileCallee(
uint64_t CloneEntryCount = PriorEntryCount - NewEntryCount;
for (auto Entry : *VMap) {
if (isa<CallInst>(Entry.first))
if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second))
if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tryGetVTableInstuction can handle non-call/invoke instruction (returns null), then the code duplication can be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to not further duplicate the code. How about something like this?

if (isa<CallBase>(Entry.first)) {
  if(auto* CB = dyn_cast_or_null<CallBase>(Entry.Base)) {
    CB->updateProfWeight(...
    Instruction* VPtr = PGOIndirectCallVisitor::tryGetVTableInstruction(CB);
    if (VPtr)
      scaleProfData(VPtr...)
}

CallBase is the base class of five IR instructions [1] now. Among them CoroAwaitSuspendInst [1] and GCStatePointInst [2] are relatively new; the rest three are well understood. Despite more derived classes of CallBase might be added for new cases, the code above should be future-proof since updateProfWeight and scaleProfData are programmed to !prof (not CallBase).

[1]

class LLVM_LIBRARY_VISIBILITY CoroAwaitSuspendInst : public CallBase {
enum { AwaiterArg, FrameArg, WrapperArg };
public:

[2]
class GCStatepointInst : public CallBase {
public:
GCStatepointInst() = delete;

@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I,
ArrayRef<DbgVariableIntrinsic *> Insns,
ArrayRef<DbgVariableRecord *> DPInsns);

void tryToSinkInstructionDbgValues(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description for the methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug intrinsics are not handled in this PR, I un-do Local.h/cpp changes (and overlooked this comment).

The updated compiler-rt test (instrprof-vtable-value-prof.cpp) enabled -g and tested IR for the first callsite.

@@ -2538,6 +2538,190 @@ Value *getSalvageOpsForIcmpOp(ICmpInst *Icmp, uint64_t CurrentLocOps,
return Icmp->getOperand(0);
}

void llvm::tryToSinkInstructionDbgValues(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there test coverage for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gep and load sequence doesn't have debug values (so far) so it's not easy to test within this PR.

IMO helper functions that handles debug value along instruction sink (e.g.

void InstCombinerImpl::tryToSinkInstructionDbgValues(
) will help other passes (e.g., simplify-cfg, etc) that moves instruction around. And test cases can be added for these shared helper functions.

cl::Hidden,
cl::desc("Percentage of vtable count to compare"));

static cl::opt<int> ICPNumAdditionalVTableLast(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
VTableVar.getMetadata(LLVMContext::MD_type, Types);

for (MDNode *Type : Types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can it have more than one types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type metadata are generated for compatible vtables and virtual functions.

https://gcc.godbolt.org/z/rr5YY7cdv is an example to show the !type metadata list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the compatible types aka the static types that can actually contain a pointer to this vtable. E.g. something statically declared to have type Base* can point to a Derived* but not vice versa. So the Base vtable is only compatible with the Base* type and a Derived vtable is compatible with both Base* and Derived* types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are getting here is the address point offset of the given compatible type within the given vtable. See also https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata. It might be good to add a comment describing this more explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the function comment based on Teresa's input, and renamed the function to getAddressPointOffset

}

// Returns the basic block in which `Inst` by `Use`.
static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "Inst" used in the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch. I removed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile, I can find a few places where the gist of helper function is used. I'm planning to have some canonical helper functions maybe in llvm/lib/Transforms/Util directory.

Examples:

  1. BasicBlock *UseBB = User->getParent();
    if (PHINode *P = dyn_cast<PHINode>(User)) {
    unsigned i =
    PHINode::getIncomingValueNumForOperand(U.getOperandNo());
    UseBB = P->getIncomingBlock(i);
  2. const BasicBlock *UseBB = !isa<PHINode>(UserInst) ?
    UserInst->getParent() :
    cast<PHINode>(UserInst)->getIncomingBlock(
    PHINode::getIncomingValueNumForOperand(U.getOperandNo()));
  3. (made closer to this function by my change [nfc][InstCombine]Find PHI incoming block by operand number #93249)
    // Special handling for Phi nodes - get the block the use occurs in.
    BasicBlock *UserBB = UserInst->getParent();
    if (PHINode *PN = dyn_cast<PHINode>(UserInst))
    UserBB = PN->getIncomingBlock(U);


int SinkCount = 0;
// FIXME: Find a way to bail out of the loop.
for (Instruction &I :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the code assume 1 instruction to be dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines to check instruction sink legality (https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR237-R267) doesn't rely on this. Instead, those lines are strict about compile-time correctness.

In reality, getelementptr and load should sink to indirect fallback for real-world cases. When it's not legal to sink them, the implementation here won't do it.

  • Note getelementptr and load are two IR instructions but likely one machine instruction after instruction selection.
  • As indirect-call-promotion interleaves with instruction sink, we begin to see simple blocks like else-bb (3 IR instructions) starting from the second candidate of one indirect call site.
bb0:
   vtable = load obj
   cmp = icmp eq vtable, ...
   br cmp if-bb, else-bb
if-bb:
   call direct-call
   br bb-merge
else-bb: 
   func-addr = getelementptr
   func-ptr = load func-addr
   call func-ptr
   br bb-merge
bb-merge:
   res = phi ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped. Perhaps add a brief comment: try to sink all eligible instructions in OriginalBB in reverse order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped.

Ah I see, thanks for clarifying. I added a brief comment.

Meanwhile pass OriginalBB as function argument rather than VPtr.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the tests yet, but some comments/questions from a first pass through most of the code

" If set to false, indirect-call promotion pass will always "
"compare functions."));

static cl::opt<float>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

namespace {

using VTableAddressPointOffsetValMap =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment describing the contents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
VTableVar.getMetadata(LLVMContext::MD_type, Types);

for (MDNode *Type : Types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the compatible types aka the static types that can actually contain a pointer to this vtable. E.g. something statically declared to have type Base* can point to a Derived* but not vice versa. So the Base vtable is only compatible with the Base* type and a Derived vtable is compatible with both Base* and Derived* types.

SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
VTableVar.getMetadata(LLVMContext::MD_type, Types);

for (MDNode *Type : Types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are getting here is the address point offset of the given compatible type within the given vtable. See also https://llvm.org/docs/TypeMetadata.html#representing-type-information-using-type-metadata. It might be good to add a comment describing this more explicitly.

llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
}

// Returns the basic block in which `Inst` by `Use`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the comment, can you expand it a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment is broken. I updated it.


computeVirtualCallSiteTypeInfoMap(M, MAM, VirtualCSInfo);

// This map records states across functions in an LLVM IR module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "records states"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VTableAddressPointOffsetVal stores the vtable address points. The vtable address point of a given <vtable, address point offset> is static (doesn't change after being computed once).

IndirectCallPromoter::getOrCreateVTableAddressPointVar creates the map entry the first time a <vtable, offset> pair is seen, as promoteIndirectCalls processes an IR module and calls IndirectCallPromoter repeatedly on each function.

I reworded the comments. PTAL.

// Right now only llvm.type.test is used to find out virtual call sites.
// With ThinLTO and whole-program-devirtualization, llvm.type.test and
// llvm.public.type.test are emitted, and llvm.public.type.test is either
// refined to llvm.type.test or dropped before indirect-call-promotion pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we then need to do the below analysis on llvm.public.type.test? ICP should be after it gets converted or dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ICP pass is disabled in prelink, we don't need to analyze llvm.public.type.test as you point out. llvm.public.type.test was analyzed here because I didn't disable prelink ICP for compiler-rt test.

Now I updated compiler-rt test to disable prelink ICP and enable postlink ICP, and skip the analysis for llvm.public.type.test.

return Iter->second;
}

Instruction *IndirectCallPromoter::computeVTableInfos(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot going on in this function - can you add some more description at the top describing how this works and/or some more comments before each block of code within (particularly the latter half of the function)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document output parameters and return value around class method declaration, and added more comments in the function (e.g., one example with detailed steps) as suggested.

VTableGUIDCountsMap &VTableGUIDCounts) {
SmallVector<uint64_t, 4> PromotedFuncCount;
for (const auto &Candidate : Candidates) {
uint64_t IfCount = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name IfCount isn't very intuitive to me, suggest using a different name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable represents the 'if-branch' count of the branch weights for conditional devirtualization. I removed it. Now Candidate.Count represents if branch count and TotalFuncCount - Candidate.Count represents else branch count.

for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts)
SumVTableCount += VTableCount;

for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment describing the intuition behind the way the counts are being updated here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


int SinkCount = 0;
// FIXME: Find a way to bail out of the loop.
for (Instruction &I :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is about the drop_begin(..) call, but it is correct as the last instruction is always control transfer instruction that should be skipped. Perhaps add a brief comment: try to sink all eligible instructions in OriginalBB in reverse order.

return false;

// Do not sink convergent call instructions.
if (const auto *C = dyn_cast<CallBase>(I))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a common utility function to do the checks below or add a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added FIXME: Move the sink eligibility check below to a utility function in Transforms/Utils/ directory..

The sink correctness check is a copy of the core logic in inst-combine pass (after removing one irrelevant case and making the check stricter by removing this)

// Cannot move control-flow-involving, volatile loads, vaarg, etc.
if (isa<PHINode>(I) || I->isEHPad() || I->mayThrow() || !I->willReturn() ||
I->isTerminator())
return false;
// Do not sink static or dynamic alloca instructions. Static allocas must
// remain in the entry block, and dynamic allocas must not be sunk in between
// a stacksave / stackrestore pair, which would incorrectly shorten its
// lifetime.
if (isa<AllocaInst>(I))
return false;
// Do not sink into catchswitch blocks.
if (isa<CatchSwitchInst>(DestBlock->getTerminator()))
return false;
// Do not sink convergent call instructions.
if (auto *CI = dyn_cast<CallInst>(I)) {
if (CI->isConvergent())
return false;
}
// Unless we can prove that the memory write isn't visibile except on the
// path we're sinking to, we must bail.
if (I->mayWriteToMemory()) {
if (!SoleWriteToDeadLocal(I, TLI))
return false;
}
// We can only sink load instructions if there is nothing between the load and
// the end of block that could change the value.
if (I->mayReadFromMemory()) {
// We don't want to do any sophisticated alias analysis, so we only check
// the instructions after I in I's parent block if we try to sink to its
// successor block.
if (DestBlock->getUniquePredecessor() != I->getParent())
return false;
for (BasicBlock::iterator Scan = std::next(I->getIterator()),
E = I->getParent()->end();
Scan != E; ++Scan)
if (Scan->mayWriteToMemory())
return false;
}

for (BasicBlock::iterator Scan = std::next(I->getIterator()),
E = I->getParent()->end();
Scan != E; ++Scan)
if (Scan->mayWriteToMemory())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, aliasing check can be done here, but probably not important in reality for the code pattern involved.

1. Resolve review comments.
2. Handle vtable's PGO name, like what we do for indirect-call
   promotion.
   - InstrProf.h/cpp and PGOInstrumentation.cpp are modified.
3. Make use of 'MaxNumVTableAnnotations' in PGOInstrumentation.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants