[PseudoProbe] Mix block and call probe ID in lexical order #75092

wlei-llvm · 2023-12-11T20:04:58Z

Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example:

main():
BB1
if(...) 
  BB2 foo(..);   
else 
  BB3 bar(...);
BB4

Before the profile is

main
 1: ..
 2: ..
 3: ...
 4: ...
 5: foo ...
 6: bar ...

Now the new order is

 main
 1: ..
 2: ..
 3: foo ...
 4: ...
 5: bar ...
 6: ...

This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.

llvmbot · 2023-12-11T20:05:26Z

@llvm/pr-subscribers-pgo
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example:

main():
BB1
if(...) 
  BB2 foo(..);   
else 
  BB3 bar(...);
BB4

Before the profile is

main
 1: ..
 2: ..
 3: ...
 4: ...
 5: foo ...
 6: bar ...

Now the new order is

 main
 1: ..
 2: ..
 3: foo ...
 4: ...
 5: bar ...
 6: ...

This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.

Full diff: https://github.com/llvm/llvm-project/pull/75092.diff

11 Files Affected:

(modified) clang/test/CodeGen/pseudo-probe-emit.c (+4-4)
(modified) llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h (+1-2)
(modified) llvm/lib/Transforms/IPO/SampleProfileProbe.cpp (+3-13)
(modified) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof (+4-4)
(modified) llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof (+4-4)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll (+6-6)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll (+3-3)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll (+7-8)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll (+11-11)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll (+5-6)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll (+8-8)

diff --git a/clang/test/CodeGen/pseudo-probe-emit.c b/clang/test/CodeGen/pseudo-probe-emit.c
index c7a3f7e6d5b02..360f831e84294 100644
--- a/clang/test/CodeGen/pseudo-probe-emit.c
+++ b/clang/test/CodeGen/pseudo-probe-emit.c
@@ -10,9 +10,9 @@ void foo(int x) {
   // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
   if (x == 0)
     // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 2, i32 0, i64 -1)
-    bar();
+    bar();  // probe id : 3
   else
-    // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 3, i32 0, i64 -1)
-    go();
-  // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1)
+    // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1)
+    go(); // probe id : 5
+  // CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 6, i32 0, i64 -1)
 }
diff --git a/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h b/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
index 0f2729a9462de..69b87adf105fd 100644
--- a/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
+++ b/llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
@@ -82,8 +82,7 @@ class SampleProfileProber {
   uint32_t getBlockId(const BasicBlock *BB) const;
   uint32_t getCallsiteId(const Instruction *Call) const;
   void computeCFGHash();
-  void computeProbeIdForBlocks();
-  void computeProbeIdForCallsites();
+  void computeProbeId();
 
   Function *F;
 
diff --git a/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp b/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
index 8f0b12d0cfedf..e2c5eaf261d29 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
@@ -173,8 +173,7 @@ SampleProfileProber::SampleProfileProber(Function &Func,
   BlockProbeIds.clear();
   CallProbeIds.clear();
   LastProbeId = (uint32_t)PseudoProbeReservedId::Last;
-  computeProbeIdForBlocks();
-  computeProbeIdForCallsites();
+  computeProbeId();
   computeCFGHash();
 }
 
@@ -209,7 +208,7 @@ void SampleProfileProber::computeCFGHash() {
                     << ", Hash = " << FunctionHash << "\n");
 }
 
-void SampleProfileProber::computeProbeIdForBlocks() {
+void SampleProfileProber::computeProbeId() {
   DenseSet<BasicBlock *> KnownColdBlocks;
   computeEHOnlyBlocks(*F, KnownColdBlocks);
   // Insert pseudo probe to non-cold blocks only. This will reduce IR size as
@@ -218,18 +217,9 @@ void SampleProfileProber::computeProbeIdForBlocks() {
     ++LastProbeId;
     if (!KnownColdBlocks.contains(&BB))
       BlockProbeIds[&BB] = LastProbeId;
-  }
-}
-
-void SampleProfileProber::computeProbeIdForCallsites() {
-  LLVMContext &Ctx = F->getContext();
-  Module *M = F->getParent();
 
-  for (auto &BB : *F) {
     for (auto &I : BB) {
-      if (!isa<CallBase>(I))
-        continue;
-      if (isa<IntrinsicInst>(&I))
+      if (!isa<CallBase>(I) || isa<IntrinsicInst>(&I))
         continue;
 
       // The current implementation uses the lower 16 bits of the discriminator
diff --git a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
index ba4c6117dc96a..d3847946b9403 100644
--- a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
+++ b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile.prof
@@ -1,8 +1,8 @@
 foo:3200:13
  1: 13
  2: 7
- 3: 6
- 4: 13
- 5: 7 _Z3barv:2 _Z3foov:5
- 6: 6 _Z3barv:4 _Z3foov:2
+ 4: 6
+ 6: 13
+ 3: 7 _Z3barv:2 _Z3foov:5
+ 5: 6 _Z3barv:4 _Z3foov:2
  !CFGChecksum: 563022570642068
diff --git a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
index 62f9bd5992e73..213bf0b6f81cc 100644
--- a/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
+++ b/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof
@@ -1,8 +1,8 @@
 foo:3200:13
  1: 13
  2: 7
- 3: 6
- 4: 13
- 5: 7
- 6: 6
+ 4: 6
+ 6: 13
+ 7: 7
+ 9: 6
  !CFGChecksum: 844530426352218
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
index 4647a34fc2f62..f0b6fdf62d969 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-dangle.ll
@@ -23,21 +23,21 @@ Merge:
 ; JT-LABEL-NO: T
 ; JT-LABEL-NO: F
 ; JT-LABEL: Merge
+; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4
 ; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3
-; JT-NOT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2
-; JT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
+; JT: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; ASM-NOT: .pseudoprobe	6699318081062747564 4
 ; ASM-NOT: .pseudoprobe	6699318081062747564 3
-; ASM-NOT: .pseudoprobe	6699318081062747564 2
-; ASM: .pseudoprobe	6699318081062747564 4 0 0
+; ASM: .pseudoprobe	6699318081062747564 5 0 0
 	ret i32 %call
 }
 
 ;; Check block T and F are gone, and their probes (probe 2 and 3) are gone too.
 ; MIR-tail: bb.0
 ; MIR-tail: PSEUDO_PROBE [[#GUID:]], 1, 0, 0
-; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 2
 ; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 3
-; MIR-tail: PSEUDO_PROBE [[#GUID:]], 4, 0, 0
+; MIR-tail-NOT: PSEUDO_PROBE [[#GUID:]], 4
+; MIR-tail: PSEUDO_PROBE [[#GUID:]], 5, 0, 0
 
 
 define i32 @test(i32 %a, i32 %b, i32 %c) {
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
index 62f0737875aec..97b0ed600ad10 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-discriminator.ll
@@ -62,10 +62,10 @@ attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "fra
 ; DEBUG: ![[INST]] = !DILocation(line: 4, column: 15, scope: ![[INSTBLOCK:[0-9]+]])
 ; DEBUG: ![[INSTBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 4)
 
-           
+
 ; PROBE: ![[CALL1]] = !DILocation(line: 4, column: 3, scope: ![[CALL1BLOCK:[0-9]+]])
-; PROBE: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646575)
+; PROBE: ![[CALL1BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646559)
 ; PROBE: ![[CALL2]] = !DILocation(line: 4, column: 9, scope: ![[CALL2BLOCK:[0-9]+]])
-; PROBE: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646583)
+; PROBE: ![[CALL2BLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 186646567)
 ; PROBE: ![[INST]] = !DILocation(line: 4, column: 15, scope: ![[INSTBLOCK:[0-9]+]])
 ; PROBE: ![[INSTBLOCK]] = !DILexicalBlockFile({{.*}} discriminator: 4)
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
index 148f3ede5ab48..379dcfcab338d 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-metadata-2.ll
@@ -29,7 +29,7 @@ if.else:
   br label %return
 
 return:
-  call void @llvm.pseudoprobe(i64 6699318081062747564, i64 4, i32 0, i64 -1)
+  call void @llvm.pseudoprobe(i64 6699318081062747564, i64 6, i32 0, i64 -1)
   %1 = load i32, ptr %retval, align 4
   ret i32 %1
 }
@@ -55,13 +55,12 @@ attributes #0 = {"use-sample-profile"}
 !9 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !5, isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug)
 !10 = !{!"function_entry_count", i64 14}
 !11 = !{!"branch_weights", i32 100, i32 0}
-;; A discriminator of 186646575 which is 0x6f80057 in hexdecimal, stands for an indirect call probe
-;; with an index of 5 and probe factor of 1.0.
-!12 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 186646575)
+;; A discriminator of 186646559 which is 0xB20001F in hexdecimal, stands for an indirect call probe
+;; with an index of 3 and probe factor of 1.0.
+!12 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 186646559)
 !13 = distinct !DILocation(line: 10, column: 11, scope: !12)
-;; A discriminator of 134217775 which is 0x6f80057 in hexdecimal, stands for an indirect call probe
-;; with an index of 5 and probe factor of 0.
-!14 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 134217775)
+;; A discriminator of 134217759 which is 0x800001F in hexdecimal, stands for an indirect call probe
+;; with an index of 3 and probe factor of 0.
+!14 = !DILexicalBlockFile(scope: !4, file: !5, discriminator: 134217759)
 !15 = distinct !DILocation(line: 10, column: 11, scope: !14)
 !16 = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
-
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
index 474b6668b0a7a..867a49dbaed2e 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll
@@ -22,12 +22,12 @@ if.then:
 if.else:
   ; CHECK: call {{.*}}, !dbg ![[#PROBE2:]], !prof ![[PROF2:[0-9]+]]
   call void %f(i32 2)
-  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
+  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
   store i32 2, ptr %retval, align 4
   br label %return
 
 return:
-  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
+  ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
   %1 = load i32, ptr %retval, align 4
   ret i32 %1
 }
@@ -36,14 +36,14 @@ attributes #0 = {"use-sample-profile"}
 
 ; CHECK: ![[PD1]] = !{!"branch_weights", i32 8, i32 7}
 ; CHECK: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])
+;; A discriminator of 119537695 which is 0x720001f in hexdecimal, stands for an indirect call probe
+;; with an index of 3.
+; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537695)
+; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
 ;; A discriminator of 119537711 which is 0x720002f in hexdecimal, stands for an indirect call probe
 ;; with an index of 5.
-; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537711)
-; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
-;; A discriminator of 119537719 which is 0x7200037 in hexdecimal, stands for an indirect call probe
-;; with an index of 6.
 ; CHECK: ![[#PROBE2]] = !DILocation(line: 0, scope: ![[#SCOPE2:]])
-; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537719)
+; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537711)
 ; CHECK: ![[PROF2]] = !{!"VP", i32 0, i64 6, i64 -1069303473483922844, i64 4, i64 9191153033785521275, i64 2}
 
 !llvm.module.flags = !{!9, !10}
@@ -83,7 +83,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '7'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '5'
+;YAML-NEXT:    - ProbeId:         '3'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -113,7 +113,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '6'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '6'
+;YAML-NEXT:    - ProbeId:         '5'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -128,7 +128,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '6'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '3'
+;YAML-NEXT:    - ProbeId:         '4'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
@@ -143,7 +143,7 @@ attributes #0 = {"use-sample-profile"}
 ;YAML-NEXT:    - String:          'Applied '
 ;YAML-NEXT:    - NumSamples:      '13'
 ;YAML-NEXT:    - String:          ' samples from profile (ProbeId='
-;YAML-NEXT:    - ProbeId:         '4'
+;YAML-NEXT:    - ProbeId:         '6'
 ;YAML-NEXT:    - String:          ', Factor='
 ;YAML-NEXT:    - Factor:          '1.000000e+00'
 ;YAML-NEXT:    - String:          ', OriginalSamples='
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
index 992afedd14f75..217b61970933d 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll
@@ -14,15 +14,15 @@ T1:
 	%v1 = call i32 @f1()
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
 ;; The distribution factor -8513881372706734080 stands for 53.85%, whic is from 7/6+7.
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -8513881372706734080)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -8513881372706734080)
     %cond3 = icmp eq i32 %v1, 412
 	br label %Merge
 F1:
 ; CHECK: %v2 = call i32 @f2(), !prof ![[#PROF2:]]
 	%v2 = call i32 @f2()
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
 ;; The distribution factor 8513881922462547968 stands for 46.25%, which is from 6/6+7.
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 8513881922462547968)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 8513881922462547968)
 	br label %Merge
 Merge:
 
@@ -30,11 +30,11 @@ Merge:
 	%B = phi i32 [%v1, %T1], [%v2, %F1]
 	br i1 %A, label %T2, label %F2
 T2:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 7, i32 0, i64 -1)
 	call void @f3()
 	ret i32 %B
 F2:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 9, i32 0, i64 -1)
 	ret i32 %B
 }
 
@@ -42,4 +42,3 @@ F2:
 ; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 6}
 
 attributes #0 = {"use-sample-profile"}
-
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
index f70e5189ab129..b622cfbd6634e 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll
@@ -4,7 +4,7 @@
 
 ; VERIFY: *** Pseudo Probe Verification After LoopFullUnrollPass ***
 ; VERIFY: Function foo:
-; VERIFY-DAG: Probe 6	previous factor 1.00	current factor 5.00
+; VERIFY-DAG: Probe 5	previous factor 1.00	current factor 5.00
 ; VERIFY-DAG: Probe 4	previous factor 1.00	current factor 5.00
 
 declare void @foo2() nounwind
@@ -27,15 +27,15 @@ bb7.preheader:
 
 bb10:
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
-; CHECK: call void @foo2(), !dbg ![[#PROBE6:]] 
+; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
 ; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
   %indvars.iv = phi i64 [ 0, %bb7.preheader ], [ %indvars.iv.next, %bb10 ]
   %tmp1.14 = phi i32 [ %tmp1.06, %bb7.preheader ], [ %spec.select, %bb10 ]
@@ -50,14 +50,14 @@ bb10:
   br i1 %exitcond.not, label %bb3.loopexit, label %bb10, !llvm.loop !13
 
 bb24:
-; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
+; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
   ret void
 }
 
 ;; A discriminator of 186646583 which is 0xb200037 in hexdecimal, stands for a direct call probe
 ;; with an index of 6 and a scale of -1%.
 ; CHECK: ![[#PROBE6]] = !DILocation(line: 2, column: 20, scope: ![[#SCOPE:]])
-; CHECK: ![[#SCOPE]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646583)
+; CHECK: ![[#SCOPE]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646575)
 
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!9, !10}

WenleiHe · 2024-01-03T03:59:43Z

We will need some protection against accidentally consuming old profile with new toolchain and vice versa. The cost of investigating mysterious perf regression can be high and we'd rather simply error out in those cases. Maybe consider some flag for SecProfSummaryFlags (also wondering if this needs to be per-function flag since technically some function/module can be compiled with old order while others with new order).

Can we gauge perf benefit of mixed order encoding? i.e. profile an old source rev (several weeks ago?) with both probe order, and build with new source rev using that old profile, then see how old vs new order compare in terms of perf and efficacy with incremental PGO.

wlei-llvm · 2024-01-04T17:55:00Z

We will need some protection against accidentally consuming old profile with new toolchain and vice versa. The cost of investigating mysterious perf regression can be high and we'd rather simply error out in those cases. Maybe consider some flag for SecProfSummaryFlags

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

(also wondering if this needs to be per-function flag since technically some function/module can be compiled with old order while others with new order).

This seems very complicated, supposing you meant it's mixed compiled case(some objects files are built on old toolchain and some are on new toolchain), then in profile generation time, it's hard to know which order the function uses. That requires some function level version thing.. maybe add version to pseudo_probe_desc which is function-level and encode the version.

Can we gauge perf benefit of mixed order encoding? i.e. profile an old source rev (several weeks ago?) with both probe order, and build with new source rev using that old profile, then see how old vs new order compare in terms of perf and efficacy with incremental PGO.

Yes, I saw some perf wins using the new order, it's not as easily measurable as the call anchor, as you said, I had to use a old rev to run profile collection using the new order. I will show you the results offline.

WenleiHe · 2024-01-05T00:09:01Z

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

Versioning is one way to do this. Alternatively, we should be able to detect the case during llvm-profgen time when all call site probe and block probe ids are separated, which can then be used to set flags?

wlei-llvm · 2024-01-10T19:34:22Z

Agreed with this concern. To do this, we probably also need a flag in the binary, because otherwise if we use the new toolchain for prof-gen but the binary built on the old toolchain, we then would generate a profile with this flag on but the order is the old one. My understanding is the (probe) version of profile should line up with the version of the binary. Maybe make this more general, we can introduce a "pseudo_probe_version" thing, similar to instrPGO https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L996 . It can be useful if we want to update other pseudo probe formats in future.

Versioning is one way to do this. Alternatively, we should be able to detect the case during llvm-profgen time when all call site probe and block probe ids are separated, which can then be used to set flags?

Sounds good.

Add a new function(checkProbeIDIsInMixedOrder) to check the old order in profile generation time.
Always write a SecProfSummaryFlags flag, if the function return true.

WenleiHe · 2024-01-24T22:34:02Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+      const char *Msg =
+          "Pseudo-probe-based profile is on an old version ID order which "
+          "could cause profile mismatch(performance regression)";
+      Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg,


Make this an error so issues like this won't go unnoticed?

also:
cause profile mismatch(performance regression
->
cause performance regression due to profile mismatch

WenleiHe · 2024-01-25T00:09:39Z

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+  return PossibleNewOrderNum >= PossibleOldOrderNum;
+}
+
+void warnOrWriteProbeOrderFlag(const SampleProfileMap &Profiles) {


It seems we both warn and set the flag. The Or is not accurate.

how about moving the setting of ProfileIsMixedProbeOrder into checkProbeIDIsInMixedOrder, and then makes thing warning only.

WenleiHe · 2024-01-25T00:14:05Z

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+                                          PossibleBBProbeIDs.upper_bound(ID));
+    uint32_t LengthSum = PossibleBBProbeIDs.size() * CallsiteIDs.size();
+
+    // Note that PossibleBBProbeIDs could contains some callsite probe id, the


If there is an LBR range (target to source of next LBR) covering a location/probe, and that location has a call, we should be able to see that call in LBR too (because that next LBR will be a call) hence having a call target profiled?

I'm curious what lead to empty call targets for a call location covered by LBR..

This heuristic seems more complicated and less reliable than I thought..

I'm curious what lead to empty call targets for a call location covered by LBR..

yeah, I need to investigate where the empty call targets comes from.(I'm sure I saw a lot of empty callsite names, the count are all zero)

WenleiHe · 2024-01-25T00:30:33Z

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+    if ((float)DistanceToBeginSum / LengthSum > 0.8)
+      PossibleOldOrderNum++;
+    else
+      PossibleNewOrderNum++;
+  }
+  return PossibleNewOrderNum >= PossibleOldOrderNum;


This seems quite fuzzy.. If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable.. I wasn't anticipating that, but if that's case we can't even confidently issue error.

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Sorry, it was also far complicated than I expected, try to explain more about this, we can discuss offline.

The fact is that, in theory there can be cases that all the calls are after BB probes for the new order ID(like all hot profiled calls are in the last BB), so even we can decide the probe is call or not, we still can't just use one function or one probe as a line to decide the order, that's why here I use this complex statistical way.(It appears it works well)

If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable

So far sadly no from profile. We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work.(Also as mentioned above, even the case all calls are after all BB also can't decide the order)

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Yep, both are actually a lot in a large production-used profile, that's why I use a 0.8 not a 0.9 things, I spent some time tuning this threshold, right now it works very well on my large profile test.

but if that's case we can't even confidently issue error.

To some degree, this heuristic is reliable(as long as there are enough big functions). From
my statistics, using 0.8 as threshold showed it works really good to detect the order, the two num are distinguishable. For both side, the PossibleOrderNum is much larger than the other num, the ratio(PossibleNewOrderNum/PossibleOldOrderNum) is almost ~0.9 for new order, a large room to swing, and vice versa. (like PossibleNewOrderNum is 9X more than PossibleOldOrderNum for a new order. Or PossibleOldOrderNum is 9X more than PossibleNewOrderNum for an old order).
But TBH, I'm sure how it works for a small profile, what i tested is all larger than 100M profile, I know that for a small function, it's quite random, it just has little calls, I have to filter out those small function as showed in the code.

Ok, if this is fuzzy heuristic that is mostly right but not guaranteed, setting a flag like SecFlagIsMixedProbeOrder may also be arguable. Is is more like Maybe.

We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work

If we only traverse the probe sections, would that be enough to verify?

Another idea is like you said, just move the checks to compile time. But maybe we don't need to detect order specifically, but just detect huge staleness and issue error. Say if a profile is >=90% (or maybe 100%?) stale at compile time for a large file or a file with hot functions, issue error. That would be not specific to this order issue, but also generally useful?

If we only traverse the probe sections, would that be enough to verify?

yes, that should be enough, we can check some large functions if there are calls before BBs to decide old order.

Another idea is like you said, just move the checks to compile time. But maybe we don't need to detect order specifically, but just detect huge staleness and issue error. Say if a profile is >=90% (or maybe 100%?) stale at compile time for a large file or a file with hot functions, issue error. That would be not specific to this order issue, but also generally useful?

Yes, that should also work, it should be useful for catching other mismatch cases, like compiler upgrade.

wlei-llvm

Thanks for the feedback, it's indeed complicated (even than I expected..)

wlei-llvm · 2024-01-25T02:58:59Z

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+    if ((float)DistanceToBeginSum / LengthSum > 0.8)
+      PossibleOldOrderNum++;
+    else
+      PossibleNewOrderNum++;
+  }
+  return PossibleNewOrderNum >= PossibleOldOrderNum;


Sorry, it was also far complicated than I expected, try to explain more about this, we can discuss offline.

The fact is that, in theory there can be cases that all the calls are after BB probes for the new order ID(like all hot profiled calls are in the last BB), so even we can decide the probe is call or not, we still can't just use one function or one probe as a line to decide the order, that's why here I use this complex statistical way.(It appears it works well)

If we can't decide whether a probe is originally a call probe or a block probe, that makes the checking less reliable

So far sadly no from profile. We can query the type of probe from the original profiled binary, like parsing the binary asm, but that seems a lot of infra work.(Also as mentioned above, even the case all calls are after all BB also can't decide the order)

In reality, do we see cases where there are many PossibleNewOrderNum and PossibleOldOrderNum for one profile?

Yep, both are actually a lot in a large production-used profile, that's why I use a 0.8 not a 0.9 things, I spent some time tuning this threshold, right now it works very well on my large profile test.

but if that's case we can't even confidently issue error.

To some degree, this heuristic is reliable(as long as there are enough big functions). From
my statistics, using 0.8 as threshold showed it works really good to detect the order, the two num are distinguishable. For both side, the PossibleOrderNum is much larger than the other num, the ratio(PossibleNewOrderNum/PossibleOldOrderNum) is almost ~0.9 for new order, a large room to swing, and vice versa. (like PossibleNewOrderNum is 9X more than PossibleOldOrderNum for a new order. Or PossibleOldOrderNum is 9X more than PossibleNewOrderNum for an old order).
But TBH, I'm sure how it works for a small profile, what i tested is all larger than 100M profile, I know that for a small function, it's quite random, it just has little calls, I have to filter out those small function as showed in the code.

wlei-llvm · 2024-01-25T02:59:19Z

llvm/tools/llvm-profgen/ProfileGenerator.cpp

+                                          PossibleBBProbeIDs.upper_bound(ID));
+    uint32_t LengthSum = PossibleBBProbeIDs.size() * CallsiteIDs.size();
+
+    // Note that PossibleBBProbeIDs could contains some callsite probe id, the


I'm curious what lead to empty call targets for a call location covered by LBR..

yeah, I need to investigate where the empty call targets comes from.(I'm sure I saw a lot of empty callsite names, the count are all zero)

wlei-llvm · 2024-02-29T05:28:25Z

As discussed offline, reverted the complicated order detection change.

WenleiHe

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

wlei-llvm · 2024-03-01T03:26:52Z

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.

Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

Use the current prod profile, so all build should pass without any false positives break.
Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

WenleiHe · 2024-03-01T03:29:45Z

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.

Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

Use the current prod profile, so all build should pass without any false positives break.

Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

Sorry I wasn't clear. The error out change should be a separate patch as logically it's independent of the original mix order change. By go in together, I was mostly thinking about two patches that go in together..

wlei-llvm · 2024-03-01T03:32:27Z

LGTM. I'd also have a change that errors out on huge staleness go in together with this one..

Was thinking to use a separate diff for that, feels non-trivial to do it, but I'm also good for doing in this.
Just pushed one version for this, it use heuristics to check this, note that the current threshold flags is just based my "gut feeling", so I'm going to tune those flags on our internal services, hope that we can achieve the goal:

Use the current prod profile, so all build should pass without any false positives break.

Use the incompatible profile, so all build should error out.

Any early feedback is appreciated.

Sorry I wasn't clear. The error out change should be a separate patch as logically it's independent of the original mix order change. By go in together, I was mostly thinking about two patches that go in together..

Oh, that's my initial thought too, Ok, I will create a new PR, and hold this until we can go in together.

github-actions · 2024-03-01T03:39:18Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This reverts commit 28b0374.

wlei-llvm requested review from htyu and WenleiHe December 11, 2023 20:04

llvmbot added clang Clang issues not falling into any other category llvm:transforms labels Dec 11, 2023

wlei-llvm force-pushed the mixed-pseudo-probe branch from 90c3c82 to fe05f64 Compare December 11, 2023 20:37

wlei-llvm force-pushed the mixed-pseudo-probe branch from fe05f64 to 6ff8847 Compare January 10, 2024 19:30

llvmbot added the PGO Profile Guided Optimizations label Jan 10, 2024

wlei-llvm force-pushed the mixed-pseudo-probe branch from 6ff8847 to 28b0374 Compare January 23, 2024 03:19

WenleiHe reviewed Jan 25, 2024

View reviewed changes

wlei-llvm commented Jan 25, 2024

View reviewed changes

wlei-llvm requested a review from MatzeB February 23, 2024 01:42

WenleiHe approved these changes Feb 29, 2024

View reviewed changes

wlei-llvm force-pushed the mixed-pseudo-probe branch from 2ad6a39 to f25b13f Compare March 1, 2024 03:19

wlei-llvm mentioned this pull request Mar 21, 2024

[CSSPGO] Reject high checksum mismatched profile #84097

Merged

wlei-llvm added 3 commits April 2, 2024 15:22

[PseudoProbe] Mix and reorder block and call probe ID in lexical order

811eb3d

Detect the order and add a summary flag to the profile

3589875

Revert "Detect the order and add a summary flag to the profile"

28964e7

This reverts commit 28b0374.

wlei-llvm force-pushed the mixed-pseudo-probe branch from 6641409 to 28964e7 Compare April 2, 2024 22:23

wlei-llvm changed the title ~~[PseudoProbe] Mix and reorder block and call probe ID in lexical order~~ [PseudoProbe] Mix block and call probe ID in lexical order Apr 2, 2024

rebase fix lint

7e62f5b

wlei-llvm merged commit 5bbce06 into llvm:main Apr 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PseudoProbe] Mix block and call probe ID in lexical order #75092

[PseudoProbe] Mix block and call probe ID in lexical order #75092

wlei-llvm commented Dec 11, 2023

llvmbot commented Dec 11, 2023 •

edited

WenleiHe commented Jan 3, 2024

wlei-llvm commented Jan 4, 2024

WenleiHe commented Jan 5, 2024

wlei-llvm commented Jan 10, 2024

WenleiHe Jan 24, 2024

WenleiHe Jan 25, 2024

WenleiHe Jan 25, 2024

wlei-llvm Jan 25, 2024

WenleiHe Jan 25, 2024

wlei-llvm Jan 25, 2024

WenleiHe Jan 25, 2024

wlei-llvm Jan 25, 2024

wlei-llvm left a comment

wlei-llvm Jan 25, 2024

wlei-llvm Jan 25, 2024

wlei-llvm commented Feb 29, 2024

WenleiHe left a comment

wlei-llvm commented Mar 1, 2024

WenleiHe commented Mar 1, 2024

wlei-llvm commented Mar 1, 2024

github-actions bot commented Mar 1, 2024 •

edited

[PseudoProbe] Mix block and call probe ID in lexical order #75092

[PseudoProbe] Mix block and call probe ID in lexical order #75092

Conversation

wlei-llvm commented Dec 11, 2023

llvmbot commented Dec 11, 2023 • edited

WenleiHe commented Jan 3, 2024

wlei-llvm commented Jan 4, 2024

WenleiHe commented Jan 5, 2024

wlei-llvm commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlei-llvm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlei-llvm commented Feb 29, 2024

WenleiHe left a comment

Choose a reason for hiding this comment

wlei-llvm commented Mar 1, 2024

WenleiHe commented Mar 1, 2024

wlei-llvm commented Mar 1, 2024

github-actions bot commented Mar 1, 2024 • edited

llvmbot commented Dec 11, 2023 •

edited

github-actions bot commented Mar 1, 2024 •

edited