Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PseudoProbe] Mix block and call probe ID in lexical order #75092

Merged
merged 4 commits into from
Apr 3, 2024

Conversation

wlei-llvm
Copy link
Contributor

Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example:

main():
BB1
if(...) 
  BB2 foo(..);   
else 
  BB3 bar(...);
BB4

Before the profile is

main
 1: ..
 2: ..
 3: ...
 4: ...
 5: foo ...
 6: bar ...

Now the new order is

 main
 1: ..
 2: ..
 3: foo ...
 4: ...
 5: bar ...
 6: ...

This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.

@llvmbot llvmbot added clang Clang issues not falling into any other category llvm:transforms labels Dec 11, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-pgo
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example:

main():
BB1
if(...) 
  BB2 foo(..);   
else 
  BB3 bar(...);
BB4

Before the profile is

main
 1: ..
 2: ..
 3: ...
 4: ...
 5: foo ...
 6: bar ...

Now the new order is

 main
 1: ..
 2: ..
 3: foo ...
 4: ...
 5: bar ...
 6: ...

This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.


Full diff: https://github.com/llvm/llvm-project/pull/75092.diff

11 Files Affected:

  • (modified) clang/test/CodeGen/pseudo-probe-emit.c (+4-4)
  • (modified) llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h (+1-2)
  • (modified) llvm/lib/Transforms/IPO/SampleProfileProbe.cpp (+3-13)
  • (modified) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof (+4-4)
  • (modified) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof (+4-4)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll (+6-6)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll (+3-3)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll (+7-8)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll (+11-11)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll (+5-6)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll (+8-8)
diff --git a/clang/test/CodeGen/pseudo-probe-emit.c b/clang/test/CodeGen/pseudo-probe-emit.c
index c7a3f7e6d5b02..360f831e84294 100644
--- a/clang/test/CodeGen/pseudo-probe-emit.c
+++ b/clang/test/CodeGen/pseudo-probe-emit.c
@@ -10,9 +10,9 @@ void foo(int x) {
   // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
   if (x == 0)
     // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 2, i32 0, i64 -1)
-    bar();
+    bar();  // probe id : 3
   else
-    // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 3, i32 0, i64 -1)
-    go();
-  // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1)
+    // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1)
+    go(); // probe id : 5
+  // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 6, i32 0, i64 -1)
 }
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h b/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
index 0f2729a9462de..69b87adf105fd 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
@@ -82,8 +82,7 @@ class SampleProfileProber {
   uint32_t getBlockId(const BasicBlock *BB) const;
   uint32_t getCallsiteId(const Instruction *Call) const;
   void computeCFGHash();
-  void computeProbeIdForBlocks();
-  void computeProbeIdForCallsites();
+  void computeProbeId();
 
   Function *F;
 
diff --git a/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp b/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
index 8f0b12d0cfedf..e2c5eaf261d29 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
@@ -173,8 +173,7 @@ SampleProfileProber::SampleProfileProber(Function &Func,
   BlockProbeIds.clear();
   CallProbeIds.clear();
   LastProbeId = (uint32_t)PseudoProbeReservedId::Last;
-  computeProbeIdForBlocks();
-  computeProbeIdForCallsites();
+  computeProbeId();
   computeCFGHash();
 }
 
@@ -209,7 +208,7 @@ void SampleProfileProber::computeCFGHash() {
                     << ", Hash = " << FunctionHash << "\n");
 }
 
-void SampleProfileProber::computeProbeIdForBlocks() {
+void SampleProfileProber::computeProbeId() {
   DenseSet<BasicBlock *> KnownColdBlocks;
   computeEHOnlyBlocks(*F, KnownColdBlocks);
   // Insert pseudo probe to non-cold blocks only. This will reduce IR size as
@@ -218,18 +217,9 @@ void SampleProfileProber::computeProbeIdForBlocks() {
     ++LastProbeId;
     if (!KnownColdBlocks.contains(&BB))
       BlockProbeIds[&BB] = LastProbeId;
-  }
-}
-
-void SampleProfileProber::computeProbeIdForCallsites() {
-  LLVMContext &Ctx = F->getContext();
-  Module *M = F->getParent();
 
-  for (auto &BB : *F) {
     for (auto &I : BB) {
-      if (!isa<CallBase>(I))
-        continue;
-      if (isa<IntrinsicInst>(&I))
+      if (!isa<CallBase>(I) || isa<IntrinsicInst>(&I))
         continue;
 
       // The current implementation uses the lower 16 bits of the discriminator
diff --git a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
index ba4c6117dc96a..d3847946b9403 100644
--- a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
+++ b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
@@ -1,8 +1,8 @@
 foo:3200:13
  1: 13
  2: 7
- 3: 6
- 4: 13
- 5: 7 _Z3barv:2 _Z3foov:5
- 6: 6 _Z3barv:4 _Z3foov:2
+ 4: 6
+ 6: 13
+ 3: 7 _Z3barv:2 _Z3foov:5
+ 5: 6 _Z3barv:4 _Z3foov:2
  !CFGChecksum: 563022570642068
diff --git a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
index 62f9bd5992e73..213bf0b6f81cc 100644
--- a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
+++ b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
@@ -1,8 +1,8 @@
 foo:3200:13
  1: 13
  2: 7
- 3: 6
- 4: 13
- 5: 7
- 6: 6
+ 4: 6
+ 6: 13
+ 7: 7
+ 9: 6
  !CFGChecksum: 844530426352218
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
index 4647a34fc2f62..f0b6fdf62d969 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
@@ -23,21 +23,21 @@ Merge:
 ; JT-LABEL-NO: T
 ; JT-LABEL-NO: F
 ; JT-LABEL: Merge
+; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4
 ; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3
-; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2
-; JT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
+; JT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; ASM-NOT: .pseudoprobe	6699318081062747564 4
 ; ASM-NOT: .pseudoprobe	6699318081062747564 3
-; ASM-NOT: .pseudoprobe	6699318081062747564 2
-; ASM: .pseudoprobe	6699318081062747564 4 0 0
+; ASM: .pseudoprobe	6699318081062747564 5 0 0
 	ret i32 %call
 }
 
 ;; Check block T and F are gone, and their probes (probe 2 and 3) are gone too.
 ; MIR-tail: bb.0
 ; MIR-tail: PSEUDO_PROBE [[#GUID:]], 1, 0, 0
-; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 2
 ; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 3
-; MIR-tail: PSEUDO_PROBE [[#GUID:]], 4, 0, 0
+; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 4
+; MIR-tail: PSEUDO_PROBE [[#GUID:]], 5, 0, 0
 
 
 define i32 @test(i32 %a, i32 %b, i32 %c) {
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
index 62f0737875aec..97b0ed600ad10 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
@@ -62,10 +62,10 @@ attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "fra
 ; DEBUG: ![[INST]] = !DILocation(line: 4, column: 15, scope: ![[INSTBLOCK:[0-9]+]])
 ; DEBUG: ![[INSTBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 4)
 
-           
+
 ; PROBE: ![[CALL1]] = !DILocation(line: 4, column: 3, scope: ![[CALL1BLOCK:[0-9]+]])
-; PROBE: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646575)
+; PROBE: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646559)
 ; PROBE: ![[CALL2]] = !DILocation(line: 4, column: 9, scope: ![[CALL2BLOCK:[0-9]+]])
-; PROBE: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646583)
+; PROBE: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646567)
 ; PROBE: ![[INST]] = !DILocation(line: 4, column: 15, scope: ![[INSTBLOCK:[0-9]+]])
 ; PROBE: ![[INSTBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 4)
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
index 148f3ede5ab48..379dcfcab338d 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
@@ -29,7 +29,7 @@ if.else:
   br label %return
 
 return:
-  call void @llvm.pseudoprobe(i64 6699318081062747564, i64 4, i32 0, i64 -1)
+  call void @llvm.pseudoprobe(i64 6699318081062747564, i64 6, i32 0, i64 -1)
   %1 = load i32, ptr %retval, align 4
   ret i32 %1
 }
@@ -55,13 +55,12 @@ attributes #0 = {"use-sample-profile"}
 !9 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !5, isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug)
 !10 = !{!"function_entry_count", i64 14}
 !11 = !{!"branch_weights", i32 100, i32 0}
-;; A discriminator of 186646575 which is 0x6f80057 in hexdecimal, stands for an indirect call probe
-;; with an index of 5 and probe factor of 1.0.
-!12 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 186646575)
+;; A discriminator of 186646559 which is 0xB20001F in hexdecimal, stands for an indirect call probe
+;; with an index of 3 and probe factor of 1.0.
+!12 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 186646559)
 !13 = distinct !DILocation(line: 10, column: 11, scope: !12)
-;; A discriminator of 134217775 which is 0x6f80057 in hexdecimal, stands for an indirect call probe
-;; with an index of 5 and probe factor of 0.
-!14 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 134217775)
+;; A discriminator of 134217759 which is 0x800001F in hexdecimal, stands for an indirect call probe
+;; with an index of 3 and probe factor of 0.
+!14 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 134217759)
 !15 = distinct !DILocation(line: 10, column: 11, scope: !14)
 !16 = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
-
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
index 474b6668b0a7a..867a49dbaed2e 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
@@ -22,12 +22,12 @@ if.then:
 if.else:
   ; CHECK: call {{.*}}, !dbg ![[#PROBE2:]], !prof ![[PROF2:[0-9]+]]
   call void %f(i32 2)
-  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
+  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
   store i32 2, ptr %retval, align 4
   br label %return
 
 return:
-  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
+  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
   %1 = load i32, ptr %retval, align 4
   ret i32 %1
 }
@@ -36,14 +36,14 @@ attributes #0 = {"use-sample-profile"}
 
 ; CHECK: ![[PD1]] = !{!"branch_weights", i32 8, i32 7}
 ; CHECK: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])
+;; A discriminator of 119537695 which is 0x720001f in hexdecimal, stands for an indirect call probe
+;; with an index of 3.
+; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537695)
+; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
 ;; A discriminator of 119537711 which is 0x720002f in hexdecimal, stands for an indirect call probe
 ;; with an index of 5.
-; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537711)
-; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
-;; A discriminator of 119537719 which is 0x7200037 in hexdecimal, stands for an indirect call probe
-;; with an index of 6.
 ; CHECK: ![[#PROBE2]] = !DILocation(line: 0, scope: ![[#SCOPE2:]])
-; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537719)
+; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537711)
 ; CHECK: ![[PROF2]] = !{!"VP", i32 0, i64 6, i64 -1069303473483922844, i64 4, i64 9191153033785521275, i64 2}
 
 !llvm.module.flags = !{!9, !10}
@@ -83,7 +83,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '7'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '5'
+;YAML-NEXT:    - ProbeId:         '3'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -113,7 +113,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '6'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '6'
+;YAML-NEXT:    - ProbeId:         '5'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -128,7 +128,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '6'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '3'
+;YAML-NEXT:    - ProbeId:         '4'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -143,7 +143,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '13'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '4'
+;YAML-NEXT:    - ProbeId:         '6'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
index 992afedd14f75..217b61970933d 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
@@ -14,15 +14,15 @@ T1:
 	%v1 = call i32 @f1()
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
 ;; The distribution factor -8513881372706734080 stands for 53.85%, whic is from 7/6+7.
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -8513881372706734080)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -8513881372706734080)
     %cond3 = icmp eq i32 %v1, 412
 	br label %Merge
 F1:
 ; CHECK: %v2 = call i32 @f2(), !prof ![[#PROF2:]]
 	%v2 = call i32 @f2()
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
 ;; The distribution factor 8513881922462547968 stands for 46.25%, which is from 6/6+7.
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 8513881922462547968)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 8513881922462547968)
 	br label %Merge
 Merge:
 
@@ -30,11 +30,11 @@ Merge:
 	%B = phi i32 [%v1, %T1], [%v2, %F1]
 	br i1 %A, label %T2, label %F2
 T2:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 7, i32 0, i64 -1)
 	call void @f3()
 	ret i32 %B
 F2:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 9, i32 0, i64 -1)
 	ret i32 %B
 }
 
@@ -42,4 +42,3 @@ F2:
 ; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 6}
 
 attributes #0 = {"use-sample-profile"}
-
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
index f70e5189ab129..b622cfbd6634e 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
@@ -4,7 +4,7 @@
 
 ; VERIFY: *** Pseudo Probe Verification After LoopFullUnrollPass ***
 ; VERIFY: Function foo:
-; VERIFY-DAG: Probe 6	previous factor 1.00	current factor 5.00
+; VERIFY-DAG: Probe 5	previous factor 1.00	current factor 5.00
 ; VERIFY-DAG: Probe 4	previous factor 1.00	current factor 5.00
 
 declare void @foo2() nounwind
@@ -27,15 +27,15 @@ bb7.preheader:
 
 bb10:
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
   %indvars.iv = phi i64 [ 0, %bb7.preheader ], [ %indvars.iv.next, %bb10 ]
   %tmp1.14 = phi i32 [ %tmp1.06, %bb7.preheader ], [ %spec.select, %bb10 ]
@@ -50,14 +50,14 @@ bb10:
   br i1 %exitcond.not, label %bb3.loopexit, label %bb10, !llvm.loop !13
 
 bb24:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
   ret void
 }
 
 ;; A discriminator of 186646583 which is 0xb200037 in hexdecimal, stands for a direct call probe
 ;; with an index of 6 and a scale of -1%.
 ; CHECK: ![[#PROBE6]] = !DILocation(line: 2, column: 20, scope: ![[#SCOPE:]])
-; CHECK: ![[#SCOPE]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646583)
+; CHECK: ![[#SCOPE]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646575)
 
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!9, !10}

@WenleiHe
Copy link
Member

WenleiHe commented Jan 3, 2024

We will need some protection against accidentally consuming old profile with new toolchain and vice versa. The cost of investigating mysterious perf regression can be high and we'd rather simply error out in those cases. Maybe consider some flag for SecProfSummaryFlags (also wondering if this needs to be per-function flag since technically some function/module can be compiled with old order while others with new order).

Can we gauge perf benefit of mixed order encoding? i.e. profile an old source rev (several weeks ago?) with both probe order, and build with new source rev using that old profile, then see how old vs new order compare in terms of perf and efficacy with incremental PGO.

@wlei-llvm
Copy link
Contributor Author

We will need some protection against accidentally consuming old profile with new toolchain and vice versa. The cost of investigating mysterious perf regression can be high and we'd rather simply error out in those cases. Maybe consider some flag for SecProfSummaryFlags

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

(also wondering if this needs to be per-function flag since technically some function/module can be compiled with old order while others with new order).

This seems very complicated, supposing you meant it's mixed compiled case(some objects files are built on old toolchain and some are on new toolchain), then in profile generation time, it's hard to know which order the function uses. That requires some function level version thing.. maybe add version to pseudo_probe_desc which is function-level and encode the version.

Can we gauge perf benefit of mixed order encoding? i.e. profile an old source rev (several weeks ago?) with both probe order, and build with new source rev using that old profile, then see how old vs new order compare in terms of perf and efficacy with incremental PGO.

Yes, I saw some perf wins using the new order, it's not as easily measurable as the call anchor, as you said, I had to use a old rev to run profile collection using the new order. I will show you the results offline.

@WenleiHe
Copy link
Member

WenleiHe commented Jan 5, 2024

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

Versioning is one way to do this. Alternatively, we should be able to detect the case during llvm-profgen time when all call site probe and block probe ids are separated, which can then be used to set flags?

@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jan 10, 2024
@wlei-llvm
Copy link
Contributor Author

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

Versioning is one way to do this. Alternatively, we should be able to detect the case during llvm-profgen time when all call site probe and block probe ids are separated, which can then be used to set flags?

Sounds good.

Add a new function(checkProbeIDIsInMixedOrder) to check the old order in profile generation time.
Always write a SecProfSummaryFlags flag, if the function return true.

const char *Msg =
"Pseudo-probe-based profile is on an old version ID order which "
"could cause profile mismatch(performance regression)";
Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an error so issues like this won't go unnoticed?

also:
cause profile mismatch(performance regression
->
cause performance regression due to profile mismatch

return PossibleNewOrderNum >= PossibleOldOrderNum;
}

void warnOrWriteProbeOrderFlag(const SampleProfileMap &Profiles) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we both warn and set the flag. The Or is not accurate.

how about moving the setting of ProfileIsMixedProbeOrder into checkProbeIDIsInMixedOrder, and then makes thing warning only.

PossibleBBProbeIDs.upper_bound(ID));
uint32_t LengthSum = PossibleBBProbeIDs.size() * CallsiteIDs.size();

// Note that PossibleBBProbeIDs could contains some callsite probe id, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an LBR range (target to source of next LBR) covering a location/probe, and that location has a call, we should be able to see that call in LBR too (because that next LBR will be a call) hence having a call target profiled?

I'm curious what lead to empty call targets for a call location covered by LBR..

This heuristic seems more complicated and less reliable than I thought..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what lead to empty call targets for a call location covered by LBR..

yeah, I need to investigate where the empty call targets comes from.(I'm sure I saw a lot of empty callsite names, the count are all zero)

Comment on lines 514 to 519
if ((float)DistanceToBeginSum / LengthSum > 0.8)
PossibleOldOrderNum++;
else
PossibleNewOrderNum++;
}
return PossibleNewOrderNum >= PossibleOldOrderNum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite fuzzy.. If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable.. I wasn't anticipating that, but if that's case we can't even confidently issue error.

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it was also far complicated than I expected, try to explain more about this, we can discuss offline.

The fact is that, in theory there can be cases that all the calls are after BB probes for the new order ID(like all hot profiled calls are in the last BB), so even we can decide the probe is call or not, we still can't just use one function or one probe as a line to decide the order, that's why here I use this complex statistical way.(It appears it works well)

If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable

So far sadly no from profile. We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work.(Also as mentioned above, even the case all calls are after all BB also can't decide the order)

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Yep, both are actually a lot in a large production-used profile, that's why I use a 0.8 not a 0.9 things, I spent some time tuning this threshold, right now it works very well on my large profile test.

but if that's case we can't even confidently issue error.

To some degree, this heuristic is reliable(as long as there are enough big functions). From
my statistics, using 0.8 as threshold showed it works really good to detect the order, the two num are distinguishable. For both side, the PossibleOrderNum is much larger than the other num, the ratio(PossibleNewOrderNum/PossibleOldOrderNum) is almost ~0.9 for new order, a large room to swing, and vice versa. (like PossibleNewOrderNum is 9X more than PossibleOldOrderNum for a new order. Or PossibleOldOrderNum is 9X more than PossibleNewOrderNum for an old order).
But TBH, I'm sure how it works for a small profile, what i tested is all larger than 100M profile, I know that for a small function, it's quite random, it just has little calls, I have to filter out those small function as showed in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if this is fuzzy heuristic that is mostly right but not guaranteed, setting a flag like SecFlagIsMixedProbeOrder may also be arguable. Is is more like Maybe.

We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work

If we only traverse the probe sections, would that be enough to verify?

Another idea is like you said, just move the checks to compile time. But maybe we don't need to detect order specifically, but just detect huge staleness and issue error. Say if a profile is >=90% (or maybe 100%?) stale at compile time for a large file or a file with hot functions, issue error. That would be not specific to this order issue, but also generally useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only traverse the probe sections, would that be enough to verify?

yes, that should be enough, we can check some large functions if there are calls before BBs to decide old order.

Another idea is like you said, just move the checks to compile time. But maybe we don't need to detect order specifically, but just detect huge staleness and issue error. Say if a profile is >=90% (or maybe 100%?) stale at compile time for a large file or a file with hot functions, issue error. That would be not specific to this order issue, but also generally useful?

Yes, that should also work, it should be useful for catching other mismatch cases, like compiler upgrade.

Copy link
Contributor Author

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, it's indeed complicated (even than I expected..)

Comment on lines 514 to 519
if ((float)DistanceToBeginSum / LengthSum > 0.8)
PossibleOldOrderNum++;
else
PossibleNewOrderNum++;
}
return PossibleNewOrderNum >= PossibleOldOrderNum;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it was also far complicated than I expected, try to explain more about this, we can discuss offline.

The fact is that, in theory there can be cases that all the calls are after BB probes for the new order ID(like all hot profiled calls are in the last BB), so even we can decide the probe is call or not, we still can't just use one function or one probe as a line to decide the order, that's why here I use this complex statistical way.(It appears it works well)

If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable

So far sadly no from profile. We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work.(Also as mentioned above, even the case all calls are after all BB also can't decide the order)

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Yep, both are actually a lot in a large production-used profile, that's why I use a 0.8 not a 0.9 things, I spent some time tuning this threshold, right now it works very well on my large profile test.

but if that's case we can't even confidently issue error.

To some degree, this heuristic is reliable(as long as there are enough big functions). From
my statistics, using 0.8 as threshold showed it works really good to detect the order, the two num are distinguishable. For both side, the PossibleOrderNum is much larger than the other num, the ratio(PossibleNewOrderNum/PossibleOldOrderNum) is almost ~0.9 for new order, a large room to swing, and vice versa. (like PossibleNewOrderNum is 9X more than PossibleOldOrderNum for a new order. Or PossibleOldOrderNum is 9X more than PossibleNewOrderNum for an old order).
But TBH, I'm sure how it works for a small profile, what i tested is all larger than 100M profile, I know that for a small function, it's quite random, it just has little calls, I have to filter out those small function as showed in the code.

PossibleBBProbeIDs.upper_bound(ID));
uint32_t LengthSum = PossibleBBProbeIDs.size() * CallsiteIDs.size();

// Note that PossibleBBProbeIDs could contains some callsite probe id, the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what lead to empty call targets for a call location covered by LBR..

yeah, I need to investigate where the empty call targets comes from.(I'm sure I saw a lot of empty callsite names, the count are all zero)

@wlei-llvm
Copy link
Contributor Author

As discussed offline, reverted the complicated order detection change.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

@wlei-llvm
Copy link
Contributor Author

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.

Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

  1. Use the current prod profile, so all build should pass without any false positives break.

  2. Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

@WenleiHe
Copy link
Member

WenleiHe commented Mar 1, 2024

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.

Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

  1. Use the current prod profile, so all build should pass without any false positives break.
  2. Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

Sorry I wasn't clear. The error out change should be a separate patch as logically it's independent of the original mix order change. By go in together, I was mostly thinking about two patches that go in together..

@wlei-llvm
Copy link
Contributor Author

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.
Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

  1. Use the current prod profile, so all build should pass without any false positives break.
  2. Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

Sorry I wasn't clear. The error out change should be a separate patch as logically it's independent of the original mix order change. By go in together, I was mostly thinking about two patches that go in together..

Oh, that's my initial thought too, Ok, I will create a new PR, and hold this until we can go in together.

Copy link

github-actions bot commented Mar 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@wlei-llvm wlei-llvm changed the title [PseudoProbe] Mix and reorder block and call probe ID in lexical order [PseudoProbe] Mix block and call probe ID in lexical order Apr 2, 2024
@wlei-llvm wlei-llvm merged commit 5bbce06 into llvm:main Apr 3, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants