[CSSPGO] Compute and report profile matching recovered callsites and samples #79090

wlei-llvm · 2024-01-23T03:55:36Z

This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching.
Some implementation notes:

The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples.
We add two new counters NumRecoveredCallsites, RecoveredCallsiteSamples for this and removed TotalCallsiteSamples as now the we can use the TotalFuncHashSamples as base, and renamed some counters.
In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the MatchState for the state, and used a new function recordCallsiteMatchStates to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a FuncCallsiteMatchStates map for later counting use.
Changed the counting function to run on module-level and moved it to the end of the whole process(computeAndReportProfileStaleness). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.

llvmbot · 2024-01-23T03:56:05Z

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Compute and report the staleness metrics after stale profile matching so that we can know how effective the matching is, i. e. how many callsites and samples are recovered by the matching.

Note that the function hash samples are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples.
We add two new counters PostMatchNumMismatchedCallsites, PostMatchMismatchedCallsiteSamples for this. Also before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, then we removed TotalCallsiteSamples as now the we can use the TotalFuncHashSamples as base.
To make it simple, the post-match reuse the countProfileMismatches with a parameter and we leverage the matching results in IRToProfileLocationMap to update the IR anchors(see UpdateIRAnchors) so that we can reuse most of the counting functions.

Patch is 30.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79090.diff

5 Files Affected:

(modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+270-170)
(modified) llvm/test/Transforms/SampleProfile/Inputs/profile-mismatch.prof (+4-3)
(modified) llvm/test/Transforms/SampleProfile/profile-mismatch.ll (+6-6)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-thinlto.ll (+3-3)
(modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch.ll (+41-35)

diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 2fd8668d15e200f..a7170faa65dc07c 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -433,12 +433,19 @@ using CandidateQueue =
     PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
                   CandidateComparer>;
 
+using IRAnchorMap = std::map<LineLocation, StringRef>;
+using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;
+
 // Sample profile matching - fuzzy match.
 class SampleProfileMatcher {
   Module &M;
   SampleProfileReader &Reader;
   const PseudoProbeManager *ProbeManager;
   SampleProfileMap FlattenedProfiles;
+
+  std::unordered_map<const Function *, IRAnchorMap> FuncIRAnchors;
+  std::unordered_map<const Function *, ProfileAnchorMap> FuncProfileAnchors;
+
   // For each function, the matcher generates a map, of which each entry is a
   // mapping from the source location of current build to the source location in
   // the profile.
@@ -448,6 +455,8 @@ class SampleProfileMatcher {
   uint64_t TotalProfiledCallsites = 0;
   uint64_t NumMismatchedCallsites = 0;
   uint64_t MismatchedCallsiteSamples = 0;
+  uint64_t PostMatchNumMismatchedCallsites = 0;
+  uint64_t PostMatchMismatchedCallsiteSamples = 0;
   uint64_t TotalCallsiteSamples = 0;
   uint64_t TotalProfiledFunc = 0;
   uint64_t NumMismatchedFuncHash = 0;
@@ -474,24 +483,22 @@ class SampleProfileMatcher {
     return nullptr;
   }
   void runOnFunction(const Function &F);
-  void findIRAnchors(const Function &F,
-                     std::map<LineLocation, StringRef> &IRAnchors);
-  void findProfileAnchors(
+  void findFuncAnchors();
+  void UpdateIRAnchors();
+  void findIRAnchors(const Function &F, IRAnchorMap &IRAnchors);
+  void findProfileAnchors(const FunctionSamples &FS,
+                          ProfileAnchorMap &ProfileAnchors);
+  void countMismatchedHashSamples(const FunctionSamples &FS);
+  void countProfileMismatches(bool IsPreMatch);
+  void countMismatchedHashes(const Function &F, const FunctionSamples &FS);
+  void countMismatchedCallsites(
+      const Function &F,
+      StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+      uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const;
+  void countMismatchedCallsiteSamples(
       const FunctionSamples &FS,
-      std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countMismatchedSamples(const FunctionSamples &FS);
-  void countProfileMismatches(
-      const Function &F, const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countProfileCallsiteMismatches(
-      const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors,
-      uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
+      StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+      uint64_t &FuncMismatchedCallsiteSamples) const;
   LocToLocMap &getIRToProfileLocationMap(const Function &F) {
     auto Ret = FuncMappings.try_emplace(
         FunctionSamples::getCanonicalFnName(F.getName()), LocToLocMap());
@@ -499,11 +506,10 @@ class SampleProfileMatcher {
   }
   void distributeIRToProfileLocationMap();
   void distributeIRToProfileLocationMap(FunctionSamples &FS);
-  void runStaleProfileMatching(
-      const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors,
-      LocToLocMap &IRToProfileLocationMap);
+  void runStaleProfileMatching();
+  void runStaleProfileMatching(const Function &F, const IRAnchorMap &IRAnchors,
+                               const ProfileAnchorMap &ProfileAnchors,
+                               LocToLocMap &IRToProfileLocationMap);
 };
 
 /// Sample profile pass.
@@ -1129,7 +1135,7 @@ void SampleProfileLoader::findExternalInlineCandidate(
         CalleeSample->getContext().hasAttribute(ContextShouldBeInlined);
     if (!PreInline && CalleeSample->getHeadSamplesEstimate() < Threshold)
       continue;
-    
+
     Function *Func = SymbolMap.lookup(CalleeSample->getFunction());
     // Add to the import list only when it's defined out of module.
     if (!Func || Func->isDeclaration())
@@ -2123,8 +2129,8 @@ bool SampleProfileLoader::doInitialization(Module &M,
   return true;
 }
 
-void SampleProfileMatcher::findIRAnchors(
-    const Function &F, std::map<LineLocation, StringRef> &IRAnchors) {
+void SampleProfileMatcher::findIRAnchors(const Function &F,
+                                         IRAnchorMap &IRAnchors) {
   // For inlined code, recover the original callsite and callee by finding the
   // top-level inline frame. e.g. For frame stack "main:1 @ foo:2 @ bar:3", the
   // top-level frame is "main:1", the callsite is "1" and the callee is "foo".
@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors(
   }
 }
 
-void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
+void SampleProfileMatcher::countMismatchedHashSamples(
+    const FunctionSamples &FS) {
   const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
   // Skip the function that is external or renamed.
   if (!FuncDesc)
@@ -2202,96 +2209,11 @@ void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
   }
   for (const auto &I : FS.getCallsiteSamples())
     for (const auto &CS : I.second)
-      countMismatchedSamples(CS.second);
-}
-
-void SampleProfileMatcher::countProfileMismatches(
-    const Function &F, const FunctionSamples &FS,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors) {
-  [[maybe_unused]] bool IsFuncHashMismatch = false;
-  if (FunctionSamples::ProfileIsProbeBased) {
-    TotalFuncHashSamples += FS.getTotalSamples();
-    TotalProfiledFunc++;
-    const auto *FuncDesc = ProbeManager->getDesc(F);
-    if (FuncDesc) {
-      if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
-        NumMismatchedFuncHash++;
-        IsFuncHashMismatch = true;
-      }
-      countMismatchedSamples(FS);
-    }
-  }
-
-  uint64_t FuncMismatchedCallsites = 0;
-  uint64_t FuncProfiledCallsites = 0;
-  countProfileCallsiteMismatches(FS, IRAnchors, ProfileAnchors,
-                                 FuncMismatchedCallsites,
-                                 FuncProfiledCallsites);
-  TotalProfiledCallsites += FuncProfiledCallsites;
-  NumMismatchedCallsites += FuncMismatchedCallsites;
-  LLVM_DEBUG({
-    if (FunctionSamples::ProfileIsProbeBased && !IsFuncHashMismatch &&
-        FuncMismatchedCallsites)
-      dbgs() << "Function checksum is matched but there are "
-             << FuncMismatchedCallsites << "/" << FuncProfiledCallsites
-             << " mismatched callsites.\n";
-  });
+      countMismatchedHashSamples(CS.second);
 }
 
-void SampleProfileMatcher::countProfileCallsiteMismatches(
-    const FunctionSamples &FS,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors,
-    uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites) {
-
-  // Check if there are any callsites in the profile that does not match to any
-  // IR callsites, those callsite samples will be discarded.
-  for (const auto &I : ProfileAnchors) {
-    const auto &Loc = I.first;
-    const auto &Callees = I.second;
-    assert(!Callees.empty() && "Callees should not be empty");
-
-    StringRef IRCalleeName;
-    const auto &IR = IRAnchors.find(Loc);
-    if (IR != IRAnchors.end())
-      IRCalleeName = IR->second;
-
-    // Compute number of samples in the original profile.
-    uint64_t CallsiteSamples = 0;
-    if (auto CTM = FS.findCallTargetMapAt(Loc)) {
-      for (const auto &I : *CTM)
-        CallsiteSamples += I.second;
-    }
-    const auto *FSMap = FS.findFunctionSamplesMapAt(Loc);
-    if (FSMap) {
-      for (const auto &I : *FSMap)
-        CallsiteSamples += I.second.getTotalSamples();
-    }
-
-    bool CallsiteIsMatched = false;
-    // Since indirect call does not have CalleeName, check conservatively if
-    // callsite in the profile is a callsite location. This is to reduce num of
-    // false positive since otherwise all the indirect call samples will be
-    // reported as mismatching.
-    if (IRCalleeName == UnknownIndirectCallee)
-      CallsiteIsMatched = true;
-    else if (Callees.size() == 1 && Callees.count(getRepInFormat(IRCalleeName)))
-      CallsiteIsMatched = true;
-
-    FuncProfiledCallsites++;
-    TotalCallsiteSamples += CallsiteSamples;
-    if (!CallsiteIsMatched) {
-      FuncMismatchedCallsites++;
-      MismatchedCallsiteSamples += CallsiteSamples;
-    }
-  }
-}
-
-void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
-                                              std::map<LineLocation, std::unordered_set<FunctionId>> &ProfileAnchors) {
+void SampleProfileMatcher::findProfileAnchors(
+    const FunctionSamples &FS, ProfileAnchorMap &ProfileAnchors) {
   auto isInvalidLineOffset = [](uint32_t LineOffset) {
     return LineOffset & 0x8000;
   };
@@ -2338,10 +2260,8 @@ void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
 //   [1, 2, 3(foo), 4,  7,  8(bar), 9]
 // The output mapping: [2->3, 3->4, 5->7, 6->8, 7->9].
 void SampleProfileMatcher::runStaleProfileMatching(
-    const Function &F,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors,
+    const Function &F, const IRAnchorMap &IRAnchors,
+    const ProfileAnchorMap &ProfileAnchors,
     LocToLocMap &IRToProfileLocationMap) {
   LLVM_DEBUG(dbgs() << "Run stale profile matching for " << F.getName()
                     << "\n");
@@ -2422,59 +2342,226 @@ void SampleProfileMatcher::runStaleProfileMatching(
   }
 }
 
-void SampleProfileMatcher::runOnFunction(const Function &F) {
-  // We need to use flattened function samples for matching.
-  // Unlike IR, which includes all callsites from the source code, the callsites
-  // in profile only show up when they are hit by samples, i,e. the profile
-  // callsites in one context may differ from those in another context. To get
-  // the maximum number of callsites, we merge the function profiles from all
-  // contexts, aka, the flattened profile to find profile anchors.
-  const auto *FSFlattened = getFlattenedSamplesFor(F);
-  if (!FSFlattened)
-    return;
+void SampleProfileMatcher::runStaleProfileMatching() {
+  for (const auto &F : M) {
+    if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+      continue;
+    const auto *FSFlattened = getFlattenedSamplesFor(F);
+    if (!FSFlattened)
+      continue;
+    auto IR = FuncIRAnchors.find(&F);
+    auto P = FuncProfileAnchors.find(&F);
+    if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+      continue;
 
-  // Anchors for IR. It's a map from IR location to callee name, callee name is
-  // empty for non-call instruction and use a dummy name(UnknownIndirectCallee)
-  // for unknown indrect callee name.
-  std::map<LineLocation, StringRef> IRAnchors;
-  findIRAnchors(F, IRAnchors);
-  // Anchors for profile. It's a map from callsite location to a set of callee
-  // name.
-  std::map<LineLocation, std::unordered_set<FunctionId>> ProfileAnchors;
-  findProfileAnchors(*FSFlattened, ProfileAnchors);
-
-  // Detect profile mismatch for profile staleness metrics report.
-  // Skip reporting the metrics for imported functions.
-  if (!GlobalValue::isAvailableExternallyLinkage(F.getLinkage()) &&
-      (ReportProfileStaleness || PersistProfileStaleness)) {
-    // Use top-level nested FS for counting profile mismatch metrics since
-    // currently once a callsite is mismatched, all its children profiles are
-    // dropped.
-    if (const auto *FS = Reader.getSamplesFor(F))
-      countProfileMismatches(F, *FS, IRAnchors, ProfileAnchors);
+    // Run profile matching for checksum mismatched profile, currently only
+    // support for pseudo-probe.
+    if (FunctionSamples::ProfileIsProbeBased &&
+        !ProbeManager->profileIsValid(F, *FSFlattened)) {
+      runStaleProfileMatching(F, IR->second, P->second,
+                              getIRToProfileLocationMap(F));
+    }
   }
 
-  // Run profile matching for checksum mismatched profile, currently only
-  // support for pseudo-probe.
-  if (SalvageStaleProfile && FunctionSamples::ProfileIsProbeBased &&
-      !ProbeManager->profileIsValid(F, *FSFlattened)) {
-    // The matching result will be saved to IRToProfileLocationMap, create a new
-    // map for each function.
-    runStaleProfileMatching(F, IRAnchors, ProfileAnchors,
-                            getIRToProfileLocationMap(F));
-  }
+  distributeIRToProfileLocationMap();
 }
 
-void SampleProfileMatcher::runOnModule() {
+void SampleProfileMatcher::findFuncAnchors() {
   ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
                                    FunctionSamples::ProfileIsCS);
-  for (auto &F : M) {
+  for (const auto &F : M) {
     if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
       continue;
-    runOnFunction(F);
+    // We need to use flattened function samples for matching.
+    // Unlike IR, which includes all callsites from the source code, the
+    // callsites in profile only show up when they are hit by samples, i,e. the
+    // profile callsites in one context may differ from those in another
+    // context. To get the maximum number of callsites, we merge the function
+    // profiles from all contexts, aka, the flattened profile to find profile
+    // anchors.
+    const auto *FSFlattened = getFlattenedSamplesFor(F);
+    if (!FSFlattened)
+      continue;
+
+    // Anchors for IR. It's a map from IR location to callee name, callee name
+    // is empty for non-call instruction and use a dummy
+    // name(UnknownIndirectCallee) for unknown indrect callee name.
+    auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap());
+    findIRAnchors(F, IR.first->second);
+
+    // Anchors for profile. It's a map from callsite location to a set of callee
+    // name.
+    auto P = FuncProfileAnchors.emplace(&F, ProfileAnchorMap());
+    findProfileAnchors(*FSFlattened, P.first->second);
+  }
+}
+
+void SampleProfileMatcher::countMismatchedCallsiteSamples(
+    const FunctionSamples &FS,
+    StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+    uint64_t &FuncMismatchedCallsiteSamples) const {
+  auto It = FuncToMismatchCallsites.find(FS.getFuncName());
+  // Skip it if no mismatched callsite or this is an external function.
+  if (It == FuncToMismatchCallsites.end() || It->second.empty())
+    return;
+  const auto &MismatchCallsites = It->second;
+  for (const auto &I : FS.getBodySamples()) {
+    if (MismatchCallsites.count(I.first))
+      FuncMismatchedCallsiteSamples += I.second.getSamples();
+  }
+
+  for (const auto &I : FS.getCallsiteSamples()) {
+    const auto &Loc = I.first;
+    if (MismatchCallsites.count(Loc)) {
+      for (const auto &CS : I.second)
+        FuncMismatchedCallsiteSamples += CS.second.getTotalSamples();
+      continue;
+    }
+
+    // count mismatched samples for inlined samples.
+    for (const auto &CS : I.second)
+      countMismatchedCallsiteSamples(CS.second, FuncToMismatchCallsites,
+                                     FuncMismatchedCallsiteSamples);
+  }
+}
+
+void SampleProfileMatcher::countMismatchedCallsites(
+    const Function &F,
+    StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+    uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const {
+  auto IR = FuncIRAnchors.find(&F);
+  auto P = FuncProfileAnchors.find(&F);
+  if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+    return;
+  const auto &IRAnchors = IR->second;
+  const auto &ProfileAnchors = P->second;
+
+  auto &MismatchCallsites =
+      FuncToMismatchCallsites[FunctionSamples::getCanonicalFnName(F.getName())];
+
+  // Check if there are any callsites in the profile that does not match to any
+  // IR callsites, those callsite samples will be discarded.
+  for (const auto &I : ProfileAnchors) {
+    const auto &Loc = I.first;
+    const auto &Callees = I.second;
+    assert(!Callees.empty() && "Callees should not be empty");
+
+    StringRef IRCalleeName;
+    const auto &IR = IRAnchors.find(Loc);
+    if (IR != IRAnchors.end())
+      IRCalleeName = IR->second;
+    bool CallsiteIsMatched = false;
+    // Since indirect call does not have CalleeName, check conservatively if
+    // callsite in the profile is a callsite location. This is to reduce num of
+    // false positive since otherwise all the indirect call samples will be
+    // reported as mismatching.
+    if (IRCalleeName == UnknownIndirectCallee)
+      CallsiteIsMatched = true;
+    else if (Callees.count(FunctionId(IRCalleeName)))
+      CallsiteIsMatched = true;
+
+    FuncProfiledCallsites++;
+    if (!CallsiteIsMatched) {
+      FuncMismatchedCallsites++;
+      MismatchCallsites.insert(Loc);
+    }
+  }
+}
+
+void SampleProfileMatcher::countMismatchedHashes(const Function &F,
+                                                 const FunctionSamples &FS) {
+  if (!FunctionSamples::ProfileIsProbeBased)
+    return;
+  const auto *FuncDesc = ProbeManager->getDesc(F);
+  if (FuncDesc) {
+    if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
+      NumMismatchedFuncHash++;
+    }
+    countMismatchedHashSamples(FS);
+  }
+}
+
+void SampleProfileMatcher::UpdateIRAnchors() {
+  for (auto &I : FuncIRAnchors) {
+    const auto *F = I.first;
+    auto &IRAnchors = I.second;
+    const auto Mapping =
+        FuncMappings.find(FunctionSamples::getCanonicalFnName(F->getName()));
+    if (Mapping == FuncMappings.end())
+      continue;
+    IRAnchorMap UpdatedIRAnchors;
+    const auto &LocToLocMapping = Mapping->second;
+    for (const auto L : LocToLocMapping) {
+      UpdatedIRAnchors[L.second] = IRAnchors[L.first];
+      IRAnchors.erase(L.first);
+    }
+
+    for (const auto &IR : UpdatedIRAnchors) {
+      IRAnchors[IR.first] = IR.second;
+    }
+  }
+}
+
+void SampleProfileMatcher::countProfileMismatches(bool IsPreMatch) {
+  if (!ReportProfileStaleness && !PersistProfileStaleness)
+    return;
+
+  if (!IsPreMatch) {
+    // Use the profile matching results to update to the IR anchors.
+    UpdateIRAnchors();
+  }
+
+  uint64_t UnusedCounter = 0;
+  uint64_t *TotalProfiledCallsitesPtr =
+      IsPreMatch ? &TotalProfiledCallsites : &UnusedCounter;
+  uint64_t *NumMismatchedCallsitesPtr =
+      IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites;
+  uint64_t *MismatchedCallsiteSamplesPtr =
+      IsPreMatch ? &MismatchedCallsiteSamples
+                 : &PostMatchMismatchedCallsiteSamples;
+
+  auto SkipFunctionForReport = [](const Function &F) {
+    if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+      return true;
+    // Skip reporting the metrics for imported functions.
+    if (GlobalValue::isAvailableExternallyLinkage(F.getLinkage()))
+      return true;
+    return false;
+  };
+
+  StringMap<std::set<LineLocation>> FuncToMismatchCallsites;
+  for (const auto &F : M) {
+    if (SkipFunctionForReport(F))
+      continue;
+    const auto *FS = Reader.getSamplesFor(F);
+    if (FS && IsPreMatch) {
+      // Only count the total function metrics once in pre-match time.
+      TotalFuncHashSamples += FS->getTotalSamples();
+      TotalProfiledFunc++;
+      countMismatchedHashes(F, *FS);
+    }
+    countMismatchedCallsites(F, FuncToMismatchCallsites,
+                             *TotalProfiledCallsitesPtr,
+                             *NumMismatchedCallsitesPtr);
+  }
+
+  for (const auto &F : M) {
+    if (SkipFunctionForReport(F))
+      continue;
+    if (const auto *FS = Reader.getSamplesFor(F))
+      countMismatchedCallsiteSamples(*FS, FuncToMismatchCallsites,
+                                     *MismatchedCallsiteSamplesPtr);
+  }
+}
+
+void SampleProfileMatcher::runOnModule() {
+  ...
[truncated]

WenleiHe · 2024-01-24T06:01:56Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+using IRAnchorMap = std::map<LineLocation, StringRef>;
+using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;


As standalone types, add a comment explaining the key/value of the maps? Maybe just can move the original comments for these types here.

Also move this inside class SampleProfileMatcher since these types are internal to the matcher.

WenleiHe · 2024-01-24T21:04:17Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    // Anchors for IR. It's a map from IR location to callee name, callee name
+    // is empty for non-call instruction and use a dummy
+    // name(UnknownIndirectCallee) for unknown indrect callee name.
+    auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap());


nit IR naming can be confusing, this is an iterator, maybe It? Same for other places.

WenleiHe · 2024-01-24T21:19:34Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+
+  if (SalvageStaleProfile) {
+    runStaleProfileMatching();
+    countProfileMismatches(false);


Remove the boolean param for countProfileMismatches, move updateIRAnchors here instead of within countProfileMismatches to make it slightly more readable and simpler. countProfileMismatches should just do counting as the name suggests.

It's a little tricky, the flag is also used to differentiate the counters to compute, basically we have two sets of counters, like PrematchXXX and PostMatchXXX. So here the flag is used like XXX = IsPreMatch? PrematchXXX: PostMatchXXX
See the code around line 2514:

... uint64_t *NumMismatchedCallsitesPtr = IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites; ... ```

Yes I noticed that. Was thinking these can be passed in as parameters.

Yes I noticed that. Was thinking these can be passed in as parameters.

Maybe we can put them into a class ProfileMatchStats, then pass PreMatchStats and PostMatchStats instances respectively.

Yeah, that's a good idea.

WenleiHe · 2024-01-24T21:42:09Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors(
  }
 }

-void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
+void SampleProfileMatcher::countMismatchedHashSamples(


Why is the renaming? I can find the name HashSamples confusing... Including the original variable names

(there seems to be quite some renaming going on in the patch that makes reviewing a bit uneasy...)

WenleiHe · 2024-01-24T21:44:21Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  }
+}
+
+void SampleProfileMatcher::countMismatchedHashes(const Function &F,


This is also confusing name, we are not counting number of hashes, right?

WenleiHe · 2024-01-24T21:50:35Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

-    return;
+void SampleProfileMatcher::runStaleProfileMatching() {
+  for (const auto &F : M) {
+    if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))


This particular check is duplicated now because we turned runOnFunction into runOnModule. Maybe make a small helper to centralize this check and keep them always in sync for the different function loops.

WenleiHe · 2024-01-24T21:52:01Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  }
+}
+
+void SampleProfileMatcher::runOnModule() {


Could you explain why we need turn runOnFunctions into runOnModule? I understand that we now need to update anchor for reporting after matching, but what prevents us from doing that functions by function? Keep all anchors for all functions around also has a memory cost.

(Without that change and the renaming, or doing those separately, it'd be easier to review)

Here it's because we need to count the inlineed/nested callsite samples. (before we only count the top-level callsite samples)
e.g.

main : 1001:1 1:1 2: foo:1000:1 1: 1 bar: 999

Supposing the processing order is main --> foo --> bar
Before it only count for function main, and it will miss the bar's samples, because at the time, function bar is not processed, there is no anchors or any mismatch info.
Now I changed to find all the anchors and doing a mismatch computation, the results is saved into a map, then count the samples recursively(see countMismatchedCallsiteSamples)

that's why each process it needs to "runOnModule", this indeed need a lot of new function name which is confusing.

WenleiHe · 2024-01-24T21:56:33Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

@@ -2487,29 +2574,42 @@ void SampleProfileMatcher::runOnModule() {
    errs() << "(" << NumMismatchedCallsites << "/" << TotalProfiledCallsites
           << ")"
           << " of callsites' profile are invalid and "
-           << "(" << MismatchedCallsiteSamples << "/" << TotalCallsiteSamples
+           << "(" << MismatchedCallsiteSamples << "/" << TotalFuncHashSamples


Are TotalCallsiteSamples and TotalFuncHashSamples all just refer to the total samples of all functions in the profile?

The name TotalFuncHashSamples is confusing -- when we talk about samples, it should not have hash as entity, instead the entities should be function or call sites?

WenleiHe · 2024-01-24T22:09:40Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    // reported as mismatching.
+    if (IRCalleeName == UnknownIndirectCallee)
+      CallsiteIsMatched = true;
+    else if (Callees.count(FunctionId(IRCalleeName)))


This is not equivalent to the old code where we check there is only one callee. Is the change intentional?

Also removing getRepInFormat changes the behavior if UseMD5 is true.

It's sort of intentional, I found that there are several false positives related with this if we use the old condition(Callees.size() == 1). From my cases, I observes like profile callsite anchors contains the IR call name but the size is not zero.

profile anchors 1:[foo, bar] IR anchor 1:[foo]

It looks like a bug. Then again I was thinking maybe we don't need ensure the size, as long as it line up with with the sample loader's lookup which just query the name even there are multiple entires, it should be good.

Regarding this weird "bug", I dug some cases, some are from the instable EH IR issue, I'm working on to fix it. We can then see if we need a strict condition for this.

WenleiHe · 2024-01-24T22:23:27Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  }
+}
+
+void SampleProfileMatcher::UpdateIRAnchors() {


This actually makes those anchors incorrect, though these anchors are no longer useful anyways. It works but feels slightly hacky..

I would just slight tweak mismatch counting to go through LocToLocMap and avoid doing this. Is the tweak going to look invasive?

WenleiHe · 2024-01-24T22:28:34Z

Compute and report the staleness metrics after stale profile matching

Staleness is characteristic associated with a given profile and IR. There is no post-matching staleness. We can say the success rate of matching or renaming unmatched callsites/samples, but we don't want to use overloaded term post-matching staleness to describe it.

wlei-llvm · 2024-01-24T23:26:20Z

Thanks so much for the feedback. Answering some high-level questions, will address others later.

WenleiHe · 2024-01-27T17:39:50Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  uint64_t MismatchedCallsiteSamples = 0;
+  uint64_t NumMismatchedFuncHash = 0;
+  uint64_t TotalProfiledFunc = 0;
+  uint64_t MismatchedFuncHashSamples = 0;


This is really just MismatchedFunctionSamples, in contrast to TotalFunctionSamples.
I know this isn't changed in this patch, but why did we use the term FuncHashSamples?

Add some comment for those stats, particularly what is different between MismatchedCallsiteSamples and MismatchedFuncHashSamples.

Sounds good to remove the "Hash"

WenleiHe · 2024-01-27T17:51:04Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  void countMismatchedSamples(const FunctionSamples &FS);
+  void countProfileMismatches(
+      const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
+      const std::map<LineLocation, std::unordered_set<FunctionId>>
+          &ProfileAnchors);
+  void countMismatchedCallsites(
+      const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
+      const std::map<LineLocation, std::unordered_set<FunctionId>>
+          &ProfileAnchors,
+      const LocToLocMap &IRToProfileLocationMap);
+  void countMismatchedCallsiteSamples(const FunctionSamples &FS);
+  void countMismatchedCallsiteSamples();
+  void copyUnchangedCallsiteMismatches(
+      const StringMap<std::set<LineLocation>> &InputMismatchedCallsites);
+};


These functions needs some comments as they are somewhat similar but with subtle differences.

WenleiHe · 2024-01-27T17:57:14Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

-void SampleProfileMatcher::countProfileMismatches(
-    const Function &F, const FunctionSamples &FS,
-    const std::map<LineLocation, StringRef> &IRAnchors,
+void ProfileMatchStats::countMismatchedCallsites(


naming: count is a stateless and immutable action, but it's not the case here as you are tracking and recording FuncMismatchedCallsites that later stats depends on.

WenleiHe · 2024-01-27T18:10:56Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    // reported as mismatching.
+    if (IRCalleeName == SampleProfileMatcher::UnknownIndirectCallee)
+      MatchedCallsites.insert(Loc);
+    else if (Callees.count(getRepInFormat(IRCalleeName)))


Add comment explain why we don't expect exactly one callee in profile when there is only one callee in IR.

If there is potentially a bug and this is workaround, at least leave a TODO comment. Preferably we should not need a workaround.

WenleiHe · 2024-01-27T18:12:31Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  auto &MismatchedCallsites =
+      FuncMismatchedCallsites[FunctionSamples::getCanonicalFnName(F.getName())];
+
+  auto MapIRLocToProfileLoc = [&](const LineLocation &IRLoc) {


This is the place where pre-match and post-match is automatically handled in unified way, right?

Yep, the benefit here is pre-match and post-match using the same functioncountMismatchedCallsites, no special process to parsing the matching results. For pre-match, we just pass a empty IRToProfileLocationMap and no changes to RunStaleProfileMatching.

WenleiHe · 2024-01-27T18:21:10Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  uint64_t TotalFunctionSamples = 0;
+
+  // A map from function name to a set of mismatched callsite locations.
+  StringMap<std::set<LineLocation>> FuncMismatchedCallsites;


I don't see this container being order sensitive, unordered_set would perform better for lookups.

WenleiHe · 2024-01-27T18:35:43Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

@@ -433,6 +433,44 @@ using CandidateQueue =
    PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
                  CandidateComparer>;

+// Profile matching statstics.
+class ProfileMatchStats {


I was mostly thinking about a simple struct to encapsulate different stats, while keeping most of the logic in the Matcher. But don't have strong opinion.

I was thinking that too until I worked with this StringMap<std::set<LineLocation>> FuncMismatchedCallsites; which introduce a lot of actions making me think to put them into a new class. But as you suggested below, we can move FuncMismatchedCallsites out of the stats, or even we don't need a new class.

WenleiHe · 2024-01-27T18:39:20Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    PostMatchStats.copyUnchangedCallsiteMismatches(
+        PreMatchStats.FuncMismatchedCallsites);
+    PostMatchStats.countMismatchedCallsiteSamples();


I see two problems:

We could end up just copying the same mismatched call sites and creating duplicates.

The counting is somewhat spread out and tangled with matching -- there is countProfileMismatches and countMismatchedCallsites in function level runStaleProfileMatching; then there is counting again here at module level.

How about we make these changes:

Move FuncMismatchedCallsites out of stat class

Make FuncMismatchedCallsites StringMap<std::unordered_map<LineLocation,bool>> with the boolean indicating whether the input mismatch was recovered by fuzzy matching.

During function level runStaleProfileMatching, we only populate and update FuncMismatchedCallsites, but not doing any stats. We do stats all at the end on module level.

This is try to create clearer separation between matching and stats, and also be consistent as to at which level stats are done (function or module).

Try to understand this on high-level, I feel this is to compress as much as we can into a global structure(FuncMismatchedCallistes) during function-level matching.
Comparing to my first version(using a global IR/Prof anchor map), this use a more compact structure. Comparing to my second(this) version, even move all the TotalProfiledFunction/ total Function Samples / total callsites stats out of the matching.

That sounds good idea, just some minor thing I can think:

For StringMap<std::unordered_map<LineLocation,bool>>, a bool might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch.

We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

For StringMap<std::unordered_map<LineLocation,bool>>, a bool might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch.

Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.

We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

Incrementing counter along with tracking FuncMismatchedCallsites seems fine?

Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.

Yeah, It depends on which heuristic we use. Some anchors are aligned based on the previous matched anchors, there could be the original function is by luck matched but we run matching to "fix" it to a mismatched location. for example:

Before matching, supposing 5:E is matched by luck.

IR : 1:A, 2: B, 3:C, 4: D, 5:E | | | Profile : 3:A, 4: B, 5:E, 6:D, 7:E

After matching:

IR : 1:A, 2: B, 3:C, 4: D, 5:E | | | | | Profile : 3:A, 4: B, 5:E, 6:D, 8:E

However, after matching, the 5:E is synced to 3:c based on the offset (1:A --> 3:A), this is the new mismatch. So, that's why I was thinking to recompute/reuse the "counting" function from scratch, without relying on the matching heuristic.

We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

Incrementing counter along with tracking FuncMismatchedCallsites seems fine?

OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.

I see.. Now I understand what new mismatch means -- thanks for the clarification.

There are two scenarios I think:

The original match by luck is actually incorrect too. In this case there is no new mismatch, but we are under-reporting input mismatches as there is no easy way to tell whether it's correct match or not. I seems the example you have is in this category. I tend to think we just need to live with it?

The original match by luck is correct. In this case should we try to make sure the matching algorithm doesn't introduce new mismatches. Or at least make sure such case is very very unlikely to happen? Does this actually happen and would Meyers diffing have the same issue of introducing new mismatches?

Regardless if we really want consider such cases, maybe StringMap<std::unordered_map<LineLocation, MismatchState>> with MismatchState containing one bit for input mismatch, and one bit for final mismatch is all we need?

If we really want to shoot for best/most accurate metric, lucky but incorrect original match (like 1) should be counted as mismatch. But it's probably not worth the complexity to do that?

OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.

I guess strictly speaking you are right. But it still moves most of the non-trivial counting out of matching logic.

Pushed a new version. The changes are:

Most of the counting part are moved out of the function-level matching, and moved into a new function computeAndReportProfileStaleness, which is a module-level function. It includes all the counts about function checksum mismatch and callsite mismatch, and both weighted(samples) and unweighted mismatches.

Extended the (the old FuncMismatchedCallsites) to StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;, use a enum MatchState to present the match state. one reason is now we can compress the struture as we discussed before. Another reason is we have to move total profiled function metrics or total profiled callsites out into computeAndReportProfileStaleness. it's not easy /clean to compute the two metrics in the old way because it's only under non-imported function, however, for match states, we need to count the imported function too. So to achieve this, we need a "Matched" callsite state used to count the original profiled function.

… and samples

WenleiHe · 2024-02-01T04:18:55Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

@@ -690,6 +715,10 @@ void SampleProfileLoaderBaseImpl<Function>::computeDominanceAndLoopInfo(
 }
 } // namespace llvm

+bool ShouldSkipProfileLoading(const Function &F) {


make this static -- other free functions specific to sample loader are static.

if make this into a helper function, all similar code should be replaced. there are 5+ instance but only one is replaced in this change.

function name starts with lower case

nit: name it skipProfileForFunction or useProfileForFunction

WenleiHe · 2024-02-01T04:28:10Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  /// Profile mismatch statstics:
+  uint64_t TotalProfiledFunc = 0;
+  // Num of function whose checksum is mismatched.
+  uint64_t NumMismatchedFunc = 0;


Should we categorize them as NumStaleProfileFunc (has profile with same name but CFG checksum mismatches) and NumNoProfileFunc (no profile for the function at all, but we may recover a profile after we handle function renaming)?

Should we categorize them as NumStaleProfileFunc (has profile with same name but CFG checksum mismatches) and NumNoProfileFunc (no profile for the function at all, but we may recover a profile after we handle function renaming)?

Sounds good (IIUC, this will add a new stats NumNoProfileFunc, this way we will have 3 function number related stats: TotalProfiledFunc, NumStaleProfileFunc, NumNoProfileFunc. )

Or maybe we can defer that until when we handle function renaming and do call graph level matching -- that's where we need to know how many of the no profile functions we are able to recover profile for them.

WenleiHe · 2024-02-01T04:30:06Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

  uint64_t TotalProfiledCallsites = 0;
  uint64_t NumMismatchedCallsites = 0;
+  uint64_t NumRecoveredCallsites = 0;
+
+  /// Weigted profile samples mismatch statstics:


nit: why /// which is inconsistent with others. The comment is also not clear - was expecting something explaining TotalFunctionSamples.

nit: why /// which is inconsistent with others. The comment is also not clear - was expecting something explaining TotalFunctionSamples.

Ah, ok.. I want to comment for a group of counters.. hmm, seems unnecessary, will remove the comments

WenleiHe · 2024-02-01T04:33:17Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+
+  /// Weigted profile samples mismatch statstics:
+  uint64_t TotalFunctionSamples = 0;
+  // Samples for the mismatched checksum functions;


Grammatical error: Samples for the mismatched checksum functions: -> Total samples for all checksum-mismatched functions

WenleiHe · 2024-02-01T05:31:21Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    // TODO : Ideally, we should ensure it's a direct callsite location(Callees
+    // size is 1). However, there may be a bug for profile merge(like ODR
+    // violation) that causes the callees size to be more than 1. After we fix
+    // the bug, we can remove this check.


Have you found out why? This particular change/workaround seems unrelated to the main purpose of this patch. How about let's either 1) root cause and fix this or 2) not change it, not adding the workaround here.

I observed there are many cases and could be multiple reasons, I haven't root caused them. For some cases, I can guess it's likely due to unstable IR or ODR violation.

The linker deduplicate the different versions for the outlined function but not the inlined function. e.g.

bar:100:100 1: foo.1 1:A 2:B baz:110:110 2: foo.2 2:A 3:B

When the two foos are merged/flattened, for callsite 2 , the profile anchors are 2:[A, B] but we can only see one function(either A or B) in the IR
As we could have multiple reasons for ODR violation, it may not be quick to fix it. moreover, this is no a problem for sample loader, it will load the profile the look-up matched. Anyway, going to remove the workaround, it should not a very big false positive.

Anyway, going to remove the workaround, it should not a very big false positive.

Ok, if we end up needing this workaround for driving fall positive staleness % to zero, we can do a separate patch for it. It just feel like doesn't belong to this patch. :)

WenleiHe · 2024-02-01T06:33:55Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  };
+
+  for (const auto &I : FS.getBodySamples())
+    CountSamples(I.first, I.second.getSamples());


This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..

This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..

Yep, from my observation, inlined and non-inlined callsites samples are seperated.

What I meant is callsite samples in the context of sample PGO in LLVM particularly means samples for inlinees. :) (FunctionSamples::CallsiteSamples)

But here we are counting both inlined and not inlined samples as call site samples.. That is probably fine, but maybe worth a comment?

WenleiHe · 2024-02-01T06:34:18Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    return It->second == MatchState::Mismatched;
+  };
+
+  auto CountSamples = [&](const LineLocation &Loc, uint64_t Samples) {


nit: CountSamples -> AttributeMismatchedSamples this is not counting call site samples, but rather distribute/attribute already counted samples into different category.

WenleiHe · 2024-02-01T06:37:06Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    CountSamples(I.first, I.second.getSamples());
+
+  for (const auto &I : FS.getCallsiteSamples()) {
+    uint64_t Samples = 0;


nit: Samples -> CallsiteSamples

WenleiHe · 2024-02-01T06:49:17Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    CountSamples(I.first, Samples);
+
+    if (IsCallsiteMismatched(I.first))
+      continue;
+
+    // Count mismatched samples for matched inlines.
+    for (const auto &CS : I.second)
+      countMismatchedCallsiteSamples(CS.second);
+  }


Suggest an alternative structure to make the flow more intuitive:

if (IsCallsiteMismatched(I.first)) { // comment for (const auto &CS : I.second) Samples += CS.second.getTotalSamples(); AttributeMismatchedSamples(I.first, Samples); } else { // When the current level of inlined call site matches the profiled call site, we need to go deeper along the inline tree to count mismatches from lower level inlinees. for (const auto &CS : I.second) countMismatchedCallsiteSamples(CS.second); }

This is a bit tricky.
If a mismtched callsite is recovered and marked as recovered, it still goes through( AttributeMismatchedSamples to count the recovered samples but the IsCallsiteMismatched return false(needs to go counting it's inlinees(for (const auto &CS : I.second) countMismatchedCallsiteSamples(CS.second);) )

Ok, I think we need a clearer flow. Currently CountSamples checks match state and do something, later IsCallsiteMismatched checks match state again and do something else. It's all tangled together.

It'd be clearer if we can hoist the flow on checking match state, and then group together what needs to be happen to each match state. I think it's still doable even with what you mentioned above. Make sense?

WenleiHe · 2024-02-01T06:51:34Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+      const auto *FuncDesc = ProbeManager->getDesc(F);
+      if (FuncDesc && ProbeManager->profileIsHashMismatched(*FuncDesc, *FS))
+        NumMismatchedFunc++;


These are not needed: 1) they are handled inside countMismatchedFuncSamples, 2) for consistency, countMismatchCallsites/countMismatchedCallsiteSamples don't have mismatch check at top level caller.

WenleiHe

The latest version indeed looks a lot cleaner. Thanks for working through the changes.

llvm/lib/Transforms/IPO/SampleProfile.cpp

WenleiHe · 2024-02-13T04:06:03Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

@@ -2498,25 +2607,41 @@ void SampleProfileMatcher::runOnModule() {

    SmallVector<std::pair<StringRef, uint64_t>> ProfStatsVec;
    if (FunctionSamples::ProfileIsProbeBased) {
-      ProfStatsVec.emplace_back("NumMismatchedFuncHash", NumMismatchedFuncHash);
+      ProfStatsVec.emplace_back("NumStaleProfileFunc", NumStaleProfileFunc);


Does changing those strings require changing how we aggregates stats internally for dashboard etc?

Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.

Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.

We want to make sure there is continuity in staleness data, and we should not lose data before the name changes.

llvm/lib/Transforms/IPO/SampleProfile.cpp

WenleiHe

LGTM with some nits.

llvm/lib/Transforms/IPO/SampleProfile.cpp

WenleiHe · 2024-02-13T15:59:38Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    Matched = 0,
+    Mismatched = 1,
+    // Stay Matched after profile matching.
+    StayMatched = 2,


A minor inconsistency here as to what we want to cover in the state enum -- Recovered is a state change, StayMatched is a non-change. Do we describe state change or just state in the enum?

If we just describe state:
InitialMatch,
InitialMismatch,
FinalMatch,
FinalMismatch

If we describe state change
InitialMatch,
InitialMismatch,
RecoveredMismatch, (From initial mismatch to final match)
RemovedMatch (from initial match to final mismatch)

It'd be great if this can be improved, but not a deal breaker if the current state values are very convenient for implementation.

Sure, I lean towards the second one(state change), thinking it can align with the name of the stats counter such as NumRecoveredCallsites, RecoveredCallsiteSamples

Updated in the new version, thanks for the suggestion!

WenleiHe · 2024-02-13T17:20:58Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+  // For each function, store every callsite and its matching state into this
+  // map, of which each entry is a pair of callsite location and MatchState.
+  // This is used for profile staleness computation and report.
+  StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>>


note that StringMap keeps a copy of every string key.

I see. If we have the memory pressure, another option is to consolidate the two StringMap

StringMap<LocToLocMap> FuncMappings; StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;

Into one map, like

StringMap<std::pair<LocToLocMap, std::unordered_map<LineLocation, MatchState, LineLocationHash>>> FuncXXX...

the Key of LocToLocMap is also a LineLocation, I was thinking to further merge them

StringMap<std::unordered_map<LineLocation, std::pair<LineLocation, MatchState>, LineLocationHash>>> FuncXXX...

but unfortunately the key of LocToLocMap is IRlocation but the latter's key is the ProfileLocation, so merging them might be too hacky and confusing.

note that StringMap keeps a copy of every string key.

I think one optimization is we can release this data after the matching is done, we don't use those data for the later sample inlining or loading stuffs, similar to the StringMap<LocToLocMap> FuncMappings;.

WenleiHe · 2024-02-14T06:07:16Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    RecoveredMismatch = 2,
+    // From initial match to final mismatch.
+    RemovedMatch = 3,
+    Unknown = 32,


Usually we use 0 as uninitialized/unknown value. This is because for zero initialized memory, this naturally sets everything to unknown.

WenleiHe · 2024-02-14T06:12:00Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+
+    if (IsCallsiteMatched) {
+      auto R =
+          CallsiteMatchStates.emplace(ProfileLoc, MatchState::InitialMatch);


If CallsiteMatchStates[ProfileLoc] already exist, setting it to InitialMatch seems conceptually wrong as this is probably post-match (FinalMatch)? With state like InitialMatch, I thought we will not set such state in post-match stage.

Maybe for better readability, do a find first and then decide what value to set like how this is done in the for loop below.

WenleiHe · 2024-02-14T06:30:21Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

+    if (It == CallsiteMatchStates.end())
+      CallsiteMatchStates.emplace(Loc, MatchState::InitialMismatch);
+    // The inital match is removed in post-match.
+    else if (IsPostMatch && It->second == MatchState::InitialMatch)


A bit confused here, if there was an entry and it was matched in pre-match, it won't be changed in post-match if it still matches, why do we change it to RemovedMatch?

Oops, Good catch, this is wrong here, it's update all InitialMatch to RemovedMatch regardless it's matched or not.

The fact is that we don't have a FinalMatch state in current implementation, so we are not able to determine whether it's matched or not in post-match in this loop.

But we also can't only use the FinalMatch and FinalMismatch, because the counting is at the end of the function, then we don't know the recovered numbers(which is this diff's goal)

It think it's clear that in pre-match we can have 'InitialMatch' and 'IntialMismatch'

but in post match, I think we need 4 states, but that make it more complicated..

'InitialMatch' stay match, to UnchangedMatch

'IntialMismatch' stay mismatch, to UnchangedMismatch

'IntialMismatch' is recovered. to RecoveredMismatch

'InitialMatch' is removed. to RemovedMatch

So in total there would be 6 states, the InitialMatch and InitialMismatch is only used for transition. In the counting function, we only do counting on the 4 post-match states. UnchangedMismatch + RecoveredMismatch is the original mismatch,RemovedMatch + UnchangedMismatch is the final mismatch, RecoveredMismatch is the recovered mismatch.

Any thoughts?

6 explicit state sounds ok with me, that should make things really clear?

I'd add comment for each of the state.

Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.

6 explicit state sounds ok with me, that should make things really clear?

yeah, I feel it's clearer, please take another look.

Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.

Uh, I see, I checked that it's due to the tests are ether all matched cases or all mismatched cases. Updated the test.

llvm/lib/Transforms/IPO/SampleProfile.cpp

WenleiHe · 2024-02-15T07:33:29Z

llvm/lib/Transforms/IPO/SampleProfile.cpp

-      uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
+      const LocToLocMap *IRToProfileLocationMap);
+
+  bool isMismatchState(const enum MatchState &State) {


This is only called during counting (post-match), so we should not expect any Initial* state.

Can we assert somewhere during counting that we do not have any initial* state? If we have that means recordCallsiteMatchStates may have a bug.

I found that we still can get the Initial* state; because currently the fuzzy matching is only on when there is checksum mismatch, but the counting and reporting is for all non-import function, so for those checksum matched function, their state are on the Initial* state.

If checksum actually matches, do we expect any InitialMismatch from such functions?

Or in other words, if checksum matches, there is no state change to track, do we actually still need to run recordCallsiteMatchStates? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?

If checksum actually matches, do we expect any InitialMismatch from such functions?

Or in other words, if checksum matches, there is no state change to track, do we actually still need to run recordCallsiteMatchStates? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?

It could happen, I saw in our dashboard, even the checksum mismatched samples is zero, there are a few numbers of callsite mismatches(e.g. Stale function samples: 0.00%(773327/65978370982), Stale functions: 0.01%(6/42787) , Stale callsite samples: 0.29%(194149265/65978370982), Stale callsite samples after matching: 0.29%(194318114/65978370982), Stale callsites: 5.95%(20780/349307) , Stale callsites after matching: 5.95%(20781/349307))

I checked that it's mostly from the ctor alias optimization, it's one kind of "function-rename"(in this case it's not from user change), on the other hand, the function-renaming from user change is also in this category. Recording those mismatches on checksum matched function can help detect this issue.

Though this ctor alias functions are all very cold functions, it's maybe fine to ignore this if this "false positives" (this is more like a future support) is noisy.

Any thoughts?

Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?

I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.

It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?

Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?

Exactly.

I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.
It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?

This is a good question. We do use some callsite info for computing checksum: the total number of callsites. If we add/remove one more callsite, it's still captured as checksum changes. so it's not strict enough.

If we really want to capture the name changes, we can encode the name of callsite into the checksum. Then again, similar to the "unstable IR" issue, there could be mismatches from internal compiler but not from user code. This ctor alias is one case. We will run stale matching for this kind of changes. That said, it's doable, just we needs more changes to recognize : "this is not from user renaming, let's only run renaming opt but not fuzzy matching opt". the current fuzzy matching may not work well on this ctor alias cases. We need more check.

Another issue is then we will have the false positives from this ctor alias top if we make a strict checksum.

Yes I was thinking about capturing callee names in checksum. But we don't have to decide now. Let's deal with that when we work on handling renames.

There should be no other unresolved comments, I'm going to push this change. I really appreciate all the discussions and suggestions!

…llsiteMatchStates

wlei-llvm · 2024-02-16T01:10:45Z

Pushed a new version:
As we discussed offline, added an assertion to ensure the state are either all on initial state or on final state.
add a clear function to release the memory of StringMap<...> FuncCallsiteMatchStates;

wlei-llvm requested a review from WenleiHe January 23, 2024 03:55

llvmbot added the llvm:transforms label Jan 23, 2024

WenleiHe reviewed Jan 24, 2024

View reviewed changes

WenleiHe reviewed Jan 27, 2024

View reviewed changes

wlei-llvm added 5 commits January 31, 2024 15:53

[CSSPGO] Support post-match profile staleness metrics

b398fee

[CSSPGO] Support post-match profile staleness metrics

c76af25

Encapsulate mismatch counting into a new class ProfileMatchStats

4b46e1e

tmp

f32a98a

[PseudoProbe] Compute and report profile matching recovered callsites…

6c9f4fc

… and samples

wlei-llvm force-pushed the post-match-metrics branch from 380b662 to 6c9f4fc Compare February 1, 2024 00:07

wlei-llvm changed the title ~~[CSSPGO] Compute and report post-match profile staleness~~ [CSSPGO] Compute and report profile matching recovered callsites and samples Feb 1, 2024

WenleiHe reviewed Feb 1, 2024

View reviewed changes

wlei-llvm added 2 commits February 1, 2024 16:49

Addressing feedback

854eaf4

Update comments

15ca43a

WenleiHe reviewed Feb 13, 2024

View reviewed changes

addressing comments

0dc2c79

WenleiHe approved these changes Feb 13, 2024

View reviewed changes

WenleiHe reviewed Feb 13, 2024

View reviewed changes

Rename the MatchState value

353f0ac

WenleiHe reviewed Feb 14, 2024

View reviewed changes

update match state

7cdcff6

wlei-llvm force-pushed the post-match-metrics branch from be08c85 to 7cdcff6 Compare February 15, 2024 05:42

WenleiHe reviewed Feb 15, 2024

View reviewed changes

wlei-llvm added 2 commits February 15, 2024 10:46

update comments

ddec088

Add assert for inconsistent state and add a clear function for FuncCa…

7fc3e5a

…llsiteMatchStates

wlei-llvm merged commit c98da37 into llvm:main Feb 19, 2024
4 checks passed

		using IRAnchorMap = std::map<LineLocation, StringRef>;
		using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;

[CSSPGO] Compute and report profile matching recovered callsites and samples #79090

[CSSPGO] Compute and report profile matching recovered callsites and samples #79090

Conversation

wlei-llvm commented Jan 23, 2024 • edited

llvmbot commented Jan 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlei-llvm Jan 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WenleiHe commented Jan 24, 2024

wlei-llvm commented Jan 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WenleiHe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WenleiHe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlei-llvm commented Jan 23, 2024 •

edited

wlei-llvm Jan 24, 2024 •

edited