Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSSPGO] Compute and report profile matching recovered callsites and samples #79090

Merged
merged 12 commits into from
Feb 19, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Jan 23, 2024

This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching.
Some implementation notes:

  • The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples.
  • We add two new counters NumRecoveredCallsites, RecoveredCallsiteSamples for this and removed TotalCallsiteSamples as now the we can use the TotalFuncHashSamples as base, and renamed some counters.
  • In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the MatchState for the state, and used a new function recordCallsiteMatchStates to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a FuncCallsiteMatchStates map for later counting use.
  • Changed the counting function to run on module-level and moved it to the end of the whole process(computeAndReportProfileStaleness). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 23, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Compute and report the staleness metrics after stale profile matching so that we can know how effective the matching is, i. e. how many callsites and samples are recovered by the matching.

  • Note that the function hash samples are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples.
  • We add two new counters PostMatchNumMismatchedCallsites, PostMatchMismatchedCallsiteSamples for this. Also before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, then we removed TotalCallsiteSamples as now the we can use the TotalFuncHashSamples as base.
  • To make it simple, the post-match reuse the countProfileMismatches with a parameter and we leverage the matching results in IRToProfileLocationMap to update the IR anchors(see UpdateIRAnchors) so that we can reuse most of the counting functions.

Patch is 30.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79090.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+270-170)
  • (modified) llvm/test/Transforms/SampleProfile/Inputs/profile-mismatch.prof (+4-3)
  • (modified) llvm/test/Transforms/SampleProfile/profile-mismatch.ll (+6-6)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-thinlto.ll (+3-3)
  • (modified) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch.ll (+41-35)
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 2fd8668d15e200f..a7170faa65dc07c 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -433,12 +433,19 @@ using CandidateQueue =
     PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
                   CandidateComparer>;
 
+using IRAnchorMap = std::map<LineLocation, StringRef>;
+using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;
+
 // Sample profile matching - fuzzy match.
 class SampleProfileMatcher {
   Module &M;
   SampleProfileReader &Reader;
   const PseudoProbeManager *ProbeManager;
   SampleProfileMap FlattenedProfiles;
+
+  std::unordered_map<const Function *, IRAnchorMap> FuncIRAnchors;
+  std::unordered_map<const Function *, ProfileAnchorMap> FuncProfileAnchors;
+
   // For each function, the matcher generates a map, of which each entry is a
   // mapping from the source location of current build to the source location in
   // the profile.
@@ -448,6 +455,8 @@ class SampleProfileMatcher {
   uint64_t TotalProfiledCallsites = 0;
   uint64_t NumMismatchedCallsites = 0;
   uint64_t MismatchedCallsiteSamples = 0;
+  uint64_t PostMatchNumMismatchedCallsites = 0;
+  uint64_t PostMatchMismatchedCallsiteSamples = 0;
   uint64_t TotalCallsiteSamples = 0;
   uint64_t TotalProfiledFunc = 0;
   uint64_t NumMismatchedFuncHash = 0;
@@ -474,24 +483,22 @@ class SampleProfileMatcher {
     return nullptr;
   }
   void runOnFunction(const Function &F);
-  void findIRAnchors(const Function &F,
-                     std::map<LineLocation, StringRef> &IRAnchors);
-  void findProfileAnchors(
+  void findFuncAnchors();
+  void UpdateIRAnchors();
+  void findIRAnchors(const Function &F, IRAnchorMap &IRAnchors);
+  void findProfileAnchors(const FunctionSamples &FS,
+                          ProfileAnchorMap &ProfileAnchors);
+  void countMismatchedHashSamples(const FunctionSamples &FS);
+  void countProfileMismatches(bool IsPreMatch);
+  void countMismatchedHashes(const Function &F, const FunctionSamples &FS);
+  void countMismatchedCallsites(
+      const Function &F,
+      StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+      uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const;
+  void countMismatchedCallsiteSamples(
       const FunctionSamples &FS,
-      std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countMismatchedSamples(const FunctionSamples &FS);
-  void countProfileMismatches(
-      const Function &F, const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors);
-  void countProfileCallsiteMismatches(
-      const FunctionSamples &FS,
-      const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors,
-      uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
+      StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+      uint64_t &FuncMismatchedCallsiteSamples) const;
   LocToLocMap &getIRToProfileLocationMap(const Function &F) {
     auto Ret = FuncMappings.try_emplace(
         FunctionSamples::getCanonicalFnName(F.getName()), LocToLocMap());
@@ -499,11 +506,10 @@ class SampleProfileMatcher {
   }
   void distributeIRToProfileLocationMap();
   void distributeIRToProfileLocationMap(FunctionSamples &FS);
-  void runStaleProfileMatching(
-      const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
-      const std::map<LineLocation, std::unordered_set<FunctionId>>
-          &ProfileAnchors,
-      LocToLocMap &IRToProfileLocationMap);
+  void runStaleProfileMatching();
+  void runStaleProfileMatching(const Function &F, const IRAnchorMap &IRAnchors,
+                               const ProfileAnchorMap &ProfileAnchors,
+                               LocToLocMap &IRToProfileLocationMap);
 };
 
 /// Sample profile pass.
@@ -1129,7 +1135,7 @@ void SampleProfileLoader::findExternalInlineCandidate(
         CalleeSample->getContext().hasAttribute(ContextShouldBeInlined);
     if (!PreInline && CalleeSample->getHeadSamplesEstimate() < Threshold)
       continue;
-    
+
     Function *Func = SymbolMap.lookup(CalleeSample->getFunction());
     // Add to the import list only when it's defined out of module.
     if (!Func || Func->isDeclaration())
@@ -2123,8 +2129,8 @@ bool SampleProfileLoader::doInitialization(Module &M,
   return true;
 }
 
-void SampleProfileMatcher::findIRAnchors(
-    const Function &F, std::map<LineLocation, StringRef> &IRAnchors) {
+void SampleProfileMatcher::findIRAnchors(const Function &F,
+                                         IRAnchorMap &IRAnchors) {
   // For inlined code, recover the original callsite and callee by finding the
   // top-level inline frame. e.g. For frame stack "main:1 @ foo:2 @ bar:3", the
   // top-level frame is "main:1", the callsite is "1" and the callee is "foo".
@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors(
   }
 }
 
-void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
+void SampleProfileMatcher::countMismatchedHashSamples(
+    const FunctionSamples &FS) {
   const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
   // Skip the function that is external or renamed.
   if (!FuncDesc)
@@ -2202,96 +2209,11 @@ void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
   }
   for (const auto &I : FS.getCallsiteSamples())
     for (const auto &CS : I.second)
-      countMismatchedSamples(CS.second);
-}
-
-void SampleProfileMatcher::countProfileMismatches(
-    const Function &F, const FunctionSamples &FS,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors) {
-  [[maybe_unused]] bool IsFuncHashMismatch = false;
-  if (FunctionSamples::ProfileIsProbeBased) {
-    TotalFuncHashSamples += FS.getTotalSamples();
-    TotalProfiledFunc++;
-    const auto *FuncDesc = ProbeManager->getDesc(F);
-    if (FuncDesc) {
-      if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
-        NumMismatchedFuncHash++;
-        IsFuncHashMismatch = true;
-      }
-      countMismatchedSamples(FS);
-    }
-  }
-
-  uint64_t FuncMismatchedCallsites = 0;
-  uint64_t FuncProfiledCallsites = 0;
-  countProfileCallsiteMismatches(FS, IRAnchors, ProfileAnchors,
-                                 FuncMismatchedCallsites,
-                                 FuncProfiledCallsites);
-  TotalProfiledCallsites += FuncProfiledCallsites;
-  NumMismatchedCallsites += FuncMismatchedCallsites;
-  LLVM_DEBUG({
-    if (FunctionSamples::ProfileIsProbeBased && !IsFuncHashMismatch &&
-        FuncMismatchedCallsites)
-      dbgs() << "Function checksum is matched but there are "
-             << FuncMismatchedCallsites << "/" << FuncProfiledCallsites
-             << " mismatched callsites.\n";
-  });
+      countMismatchedHashSamples(CS.second);
 }
 
-void SampleProfileMatcher::countProfileCallsiteMismatches(
-    const FunctionSamples &FS,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors,
-    uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites) {
-
-  // Check if there are any callsites in the profile that does not match to any
-  // IR callsites, those callsite samples will be discarded.
-  for (const auto &I : ProfileAnchors) {
-    const auto &Loc = I.first;
-    const auto &Callees = I.second;
-    assert(!Callees.empty() && "Callees should not be empty");
-
-    StringRef IRCalleeName;
-    const auto &IR = IRAnchors.find(Loc);
-    if (IR != IRAnchors.end())
-      IRCalleeName = IR->second;
-
-    // Compute number of samples in the original profile.
-    uint64_t CallsiteSamples = 0;
-    if (auto CTM = FS.findCallTargetMapAt(Loc)) {
-      for (const auto &I : *CTM)
-        CallsiteSamples += I.second;
-    }
-    const auto *FSMap = FS.findFunctionSamplesMapAt(Loc);
-    if (FSMap) {
-      for (const auto &I : *FSMap)
-        CallsiteSamples += I.second.getTotalSamples();
-    }
-
-    bool CallsiteIsMatched = false;
-    // Since indirect call does not have CalleeName, check conservatively if
-    // callsite in the profile is a callsite location. This is to reduce num of
-    // false positive since otherwise all the indirect call samples will be
-    // reported as mismatching.
-    if (IRCalleeName == UnknownIndirectCallee)
-      CallsiteIsMatched = true;
-    else if (Callees.size() == 1 && Callees.count(getRepInFormat(IRCalleeName)))
-      CallsiteIsMatched = true;
-
-    FuncProfiledCallsites++;
-    TotalCallsiteSamples += CallsiteSamples;
-    if (!CallsiteIsMatched) {
-      FuncMismatchedCallsites++;
-      MismatchedCallsiteSamples += CallsiteSamples;
-    }
-  }
-}
-
-void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
-                                              std::map<LineLocation, std::unordered_set<FunctionId>> &ProfileAnchors) {
+void SampleProfileMatcher::findProfileAnchors(
+    const FunctionSamples &FS, ProfileAnchorMap &ProfileAnchors) {
   auto isInvalidLineOffset = [](uint32_t LineOffset) {
     return LineOffset & 0x8000;
   };
@@ -2338,10 +2260,8 @@ void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
 //   [1, 2, 3(foo), 4,  7,  8(bar), 9]
 // The output mapping: [2->3, 3->4, 5->7, 6->8, 7->9].
 void SampleProfileMatcher::runStaleProfileMatching(
-    const Function &F,
-    const std::map<LineLocation, StringRef> &IRAnchors,
-    const std::map<LineLocation, std::unordered_set<FunctionId>>
-        &ProfileAnchors,
+    const Function &F, const IRAnchorMap &IRAnchors,
+    const ProfileAnchorMap &ProfileAnchors,
     LocToLocMap &IRToProfileLocationMap) {
   LLVM_DEBUG(dbgs() << "Run stale profile matching for " << F.getName()
                     << "\n");
@@ -2422,59 +2342,226 @@ void SampleProfileMatcher::runStaleProfileMatching(
   }
 }
 
-void SampleProfileMatcher::runOnFunction(const Function &F) {
-  // We need to use flattened function samples for matching.
-  // Unlike IR, which includes all callsites from the source code, the callsites
-  // in profile only show up when they are hit by samples, i,e. the profile
-  // callsites in one context may differ from those in another context. To get
-  // the maximum number of callsites, we merge the function profiles from all
-  // contexts, aka, the flattened profile to find profile anchors.
-  const auto *FSFlattened = getFlattenedSamplesFor(F);
-  if (!FSFlattened)
-    return;
+void SampleProfileMatcher::runStaleProfileMatching() {
+  for (const auto &F : M) {
+    if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+      continue;
+    const auto *FSFlattened = getFlattenedSamplesFor(F);
+    if (!FSFlattened)
+      continue;
+    auto IR = FuncIRAnchors.find(&F);
+    auto P = FuncProfileAnchors.find(&F);
+    if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+      continue;
 
-  // Anchors for IR. It's a map from IR location to callee name, callee name is
-  // empty for non-call instruction and use a dummy name(UnknownIndirectCallee)
-  // for unknown indrect callee name.
-  std::map<LineLocation, StringRef> IRAnchors;
-  findIRAnchors(F, IRAnchors);
-  // Anchors for profile. It's a map from callsite location to a set of callee
-  // name.
-  std::map<LineLocation, std::unordered_set<FunctionId>> ProfileAnchors;
-  findProfileAnchors(*FSFlattened, ProfileAnchors);
-
-  // Detect profile mismatch for profile staleness metrics report.
-  // Skip reporting the metrics for imported functions.
-  if (!GlobalValue::isAvailableExternallyLinkage(F.getLinkage()) &&
-      (ReportProfileStaleness || PersistProfileStaleness)) {
-    // Use top-level nested FS for counting profile mismatch metrics since
-    // currently once a callsite is mismatched, all its children profiles are
-    // dropped.
-    if (const auto *FS = Reader.getSamplesFor(F))
-      countProfileMismatches(F, *FS, IRAnchors, ProfileAnchors);
+    // Run profile matching for checksum mismatched profile, currently only
+    // support for pseudo-probe.
+    if (FunctionSamples::ProfileIsProbeBased &&
+        !ProbeManager->profileIsValid(F, *FSFlattened)) {
+      runStaleProfileMatching(F, IR->second, P->second,
+                              getIRToProfileLocationMap(F));
+    }
   }
 
-  // Run profile matching for checksum mismatched profile, currently only
-  // support for pseudo-probe.
-  if (SalvageStaleProfile && FunctionSamples::ProfileIsProbeBased &&
-      !ProbeManager->profileIsValid(F, *FSFlattened)) {
-    // The matching result will be saved to IRToProfileLocationMap, create a new
-    // map for each function.
-    runStaleProfileMatching(F, IRAnchors, ProfileAnchors,
-                            getIRToProfileLocationMap(F));
-  }
+  distributeIRToProfileLocationMap();
 }
 
-void SampleProfileMatcher::runOnModule() {
+void SampleProfileMatcher::findFuncAnchors() {
   ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
                                    FunctionSamples::ProfileIsCS);
-  for (auto &F : M) {
+  for (const auto &F : M) {
     if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
       continue;
-    runOnFunction(F);
+    // We need to use flattened function samples for matching.
+    // Unlike IR, which includes all callsites from the source code, the
+    // callsites in profile only show up when they are hit by samples, i,e. the
+    // profile callsites in one context may differ from those in another
+    // context. To get the maximum number of callsites, we merge the function
+    // profiles from all contexts, aka, the flattened profile to find profile
+    // anchors.
+    const auto *FSFlattened = getFlattenedSamplesFor(F);
+    if (!FSFlattened)
+      continue;
+
+    // Anchors for IR. It's a map from IR location to callee name, callee name
+    // is empty for non-call instruction and use a dummy
+    // name(UnknownIndirectCallee) for unknown indrect callee name.
+    auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap());
+    findIRAnchors(F, IR.first->second);
+
+    // Anchors for profile. It's a map from callsite location to a set of callee
+    // name.
+    auto P = FuncProfileAnchors.emplace(&F, ProfileAnchorMap());
+    findProfileAnchors(*FSFlattened, P.first->second);
+  }
+}
+
+void SampleProfileMatcher::countMismatchedCallsiteSamples(
+    const FunctionSamples &FS,
+    StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+    uint64_t &FuncMismatchedCallsiteSamples) const {
+  auto It = FuncToMismatchCallsites.find(FS.getFuncName());
+  // Skip it if no mismatched callsite or this is an external function.
+  if (It == FuncToMismatchCallsites.end() || It->second.empty())
+    return;
+  const auto &MismatchCallsites = It->second;
+  for (const auto &I : FS.getBodySamples()) {
+    if (MismatchCallsites.count(I.first))
+      FuncMismatchedCallsiteSamples += I.second.getSamples();
+  }
+
+  for (const auto &I : FS.getCallsiteSamples()) {
+    const auto &Loc = I.first;
+    if (MismatchCallsites.count(Loc)) {
+      for (const auto &CS : I.second)
+        FuncMismatchedCallsiteSamples += CS.second.getTotalSamples();
+      continue;
+    }
+
+    // count mismatched samples for inlined samples.
+    for (const auto &CS : I.second)
+      countMismatchedCallsiteSamples(CS.second, FuncToMismatchCallsites,
+                                     FuncMismatchedCallsiteSamples);
+  }
+}
+
+void SampleProfileMatcher::countMismatchedCallsites(
+    const Function &F,
+    StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+    uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const {
+  auto IR = FuncIRAnchors.find(&F);
+  auto P = FuncProfileAnchors.find(&F);
+  if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+    return;
+  const auto &IRAnchors = IR->second;
+  const auto &ProfileAnchors = P->second;
+
+  auto &MismatchCallsites =
+      FuncToMismatchCallsites[FunctionSamples::getCanonicalFnName(F.getName())];
+
+  // Check if there are any callsites in the profile that does not match to any
+  // IR callsites, those callsite samples will be discarded.
+  for (const auto &I : ProfileAnchors) {
+    const auto &Loc = I.first;
+    const auto &Callees = I.second;
+    assert(!Callees.empty() && "Callees should not be empty");
+
+    StringRef IRCalleeName;
+    const auto &IR = IRAnchors.find(Loc);
+    if (IR != IRAnchors.end())
+      IRCalleeName = IR->second;
+    bool CallsiteIsMatched = false;
+    // Since indirect call does not have CalleeName, check conservatively if
+    // callsite in the profile is a callsite location. This is to reduce num of
+    // false positive since otherwise all the indirect call samples will be
+    // reported as mismatching.
+    if (IRCalleeName == UnknownIndirectCallee)
+      CallsiteIsMatched = true;
+    else if (Callees.count(FunctionId(IRCalleeName)))
+      CallsiteIsMatched = true;
+
+    FuncProfiledCallsites++;
+    if (!CallsiteIsMatched) {
+      FuncMismatchedCallsites++;
+      MismatchCallsites.insert(Loc);
+    }
+  }
+}
+
+void SampleProfileMatcher::countMismatchedHashes(const Function &F,
+                                                 const FunctionSamples &FS) {
+  if (!FunctionSamples::ProfileIsProbeBased)
+    return;
+  const auto *FuncDesc = ProbeManager->getDesc(F);
+  if (FuncDesc) {
+    if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
+      NumMismatchedFuncHash++;
+    }
+    countMismatchedHashSamples(FS);
+  }
+}
+
+void SampleProfileMatcher::UpdateIRAnchors() {
+  for (auto &I : FuncIRAnchors) {
+    const auto *F = I.first;
+    auto &IRAnchors = I.second;
+    const auto Mapping =
+        FuncMappings.find(FunctionSamples::getCanonicalFnName(F->getName()));
+    if (Mapping == FuncMappings.end())
+      continue;
+    IRAnchorMap UpdatedIRAnchors;
+    const auto &LocToLocMapping = Mapping->second;
+    for (const auto L : LocToLocMapping) {
+      UpdatedIRAnchors[L.second] = IRAnchors[L.first];
+      IRAnchors.erase(L.first);
+    }
+
+    for (const auto &IR : UpdatedIRAnchors) {
+      IRAnchors[IR.first] = IR.second;
+    }
+  }
+}
+
+void SampleProfileMatcher::countProfileMismatches(bool IsPreMatch) {
+  if (!ReportProfileStaleness && !PersistProfileStaleness)
+    return;
+
+  if (!IsPreMatch) {
+    // Use the profile matching results to update to the IR anchors.
+    UpdateIRAnchors();
+  }
+
+  uint64_t UnusedCounter = 0;
+  uint64_t *TotalProfiledCallsitesPtr =
+      IsPreMatch ? &TotalProfiledCallsites : &UnusedCounter;
+  uint64_t *NumMismatchedCallsitesPtr =
+      IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites;
+  uint64_t *MismatchedCallsiteSamplesPtr =
+      IsPreMatch ? &MismatchedCallsiteSamples
+                 : &PostMatchMismatchedCallsiteSamples;
+
+  auto SkipFunctionForReport = [](const Function &F) {
+    if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+      return true;
+    // Skip reporting the metrics for imported functions.
+    if (GlobalValue::isAvailableExternallyLinkage(F.getLinkage()))
+      return true;
+    return false;
+  };
+
+  StringMap<std::set<LineLocation>> FuncToMismatchCallsites;
+  for (const auto &F : M) {
+    if (SkipFunctionForReport(F))
+      continue;
+    const auto *FS = Reader.getSamplesFor(F);
+    if (FS && IsPreMatch) {
+      // Only count the total function metrics once in pre-match time.
+      TotalFuncHashSamples += FS->getTotalSamples();
+      TotalProfiledFunc++;
+      countMismatchedHashes(F, *FS);
+    }
+    countMismatchedCallsites(F, FuncToMismatchCallsites,
+                             *TotalProfiledCallsitesPtr,
+                             *NumMismatchedCallsitesPtr);
+  }
+
+  for (const auto &F : M) {
+    if (SkipFunctionForReport(F))
+      continue;
+    if (const auto *FS = Reader.getSamplesFor(F))
+      countMismatchedCallsiteSamples(*FS, FuncToMismatchCallsites,
+                                     *MismatchedCallsiteSamplesPtr);
+  }
+}
+
+void SampleProfileMatcher::runOnModule() {
+  ...
[truncated]

Comment on lines 436 to 437
using IRAnchorMap = std::map<LineLocation, StringRef>;
using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As standalone types, add a comment explaining the key/value of the maps? Maybe just can move the original comments for these types here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also move this inside class SampleProfileMatcher since these types are internal to the matcher.

// Anchors for IR. It's a map from IR location to callee name, callee name
// is empty for non-call instruction and use a dummy
// name(UnknownIndirectCallee) for unknown indrect callee name.
auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit IR naming can be confusing, this is an iterator, maybe It? Same for other places.


if (SalvageStaleProfile) {
runStaleProfileMatching();
countProfileMismatches(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the boolean param for countProfileMismatches, move updateIRAnchors here instead of within countProfileMismatches to make it slightly more readable and simpler. countProfileMismatches should just do counting as the name suggests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little tricky, the flag is also used to differentiate the counters to compute, basically we have two sets of counters, like PrematchXXX and PostMatchXXX. So here the flag is used like XXX = IsPreMatch? PrematchXXX: PostMatchXXX
See the code around line 2514:

...
uint64_t *NumMismatchedCallsitesPtr =
      IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites;
...
      
     ```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed that. Was thinking these can be passed in as parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed that. Was thinking these can be passed in as parameters.

Maybe we can put them into a class ProfileMatchStats, then pass PreMatchStats and PostMatchStats instances respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea.

@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors(
}
}

void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
void SampleProfileMatcher::countMismatchedHashSamples(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the renaming? I can find the name HashSamples confusing... Including the original variable names

(there seems to be quite some renaming going on in the patch that makes reviewing a bit uneasy...)

}
}

void SampleProfileMatcher::countMismatchedHashes(const Function &F,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also confusing name, we are not counting number of hashes, right?

return;
void SampleProfileMatcher::runStaleProfileMatching() {
for (const auto &F : M) {
if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular check is duplicated now because we turned runOnFunction into runOnModule. Maybe make a small helper to centralize this check and keep them always in sync for the different function loops.

}
}

void SampleProfileMatcher::runOnModule() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we need turn runOnFunctions into runOnModule? I understand that we now need to update anchor for reporting after matching, but what prevents us from doing that functions by function? Keep all anchors for all functions around also has a memory cost.

(Without that change and the renaming, or doing those separately, it'd be easier to review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's because we need to count the inlineed/nested callsite samples. (before we only count the top-level callsite samples)
e.g.

main : 1001:1
  1:1
  2: foo:1000:1
    1: 1
    bar: 999

Supposing the processing order is main --> foo --> bar
Before it only count for function main, and it will miss the bar's samples, because at the time, function bar is not processed, there is no anchors or any mismatch info.
Now I changed to find all the anchors and doing a mismatch computation, the results is saved into a map, then count the samples recursively(see countMismatchedCallsiteSamples)

that's why each process it needs to "runOnModule", this indeed need a lot of new function name which is confusing.

@@ -2487,29 +2574,42 @@ void SampleProfileMatcher::runOnModule() {
errs() << "(" << NumMismatchedCallsites << "/" << TotalProfiledCallsites
<< ")"
<< " of callsites' profile are invalid and "
<< "(" << MismatchedCallsiteSamples << "/" << TotalCallsiteSamples
<< "(" << MismatchedCallsiteSamples << "/" << TotalFuncHashSamples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are TotalCallsiteSamples and TotalFuncHashSamples all just refer to the total samples of all functions in the profile?

The name TotalFuncHashSamples is confusing -- when we talk about samples, it should not have hash as entity, instead the entities should be function or call sites?

// reported as mismatching.
if (IRCalleeName == UnknownIndirectCallee)
CallsiteIsMatched = true;
else if (Callees.count(FunctionId(IRCalleeName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent to the old code where we check there is only one callee. Is the change intentional?

Also removing getRepInFormat changes the behavior if UseMD5 is true.

Copy link
Contributor Author

@wlei-llvm wlei-llvm Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sort of intentional, I found that there are several false positives related with this if we use the old condition(Callees.size() == 1). From my cases, I observes like profile callsite anchors contains the IR call name but the size is not zero.

profile anchors  1:[foo, bar]
IR anchor 1:[foo]

It looks like a bug. Then again I was thinking maybe we don't need ensure the size, as long as it line up with with the sample loader's lookup which just query the name even there are multiple entires, it should be good.

Regarding this weird "bug", I dug some cases, some are from the instable EH IR issue, I'm working on to fix it. We can then see if we need a strict condition for this.

}
}

void SampleProfileMatcher::UpdateIRAnchors() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually makes those anchors incorrect, though these anchors are no longer useful anyways. It works but feels slightly hacky..

I would just slight tweak mismatch counting to go through LocToLocMap and avoid doing this. Is the tweak going to look invasive?

@WenleiHe
Copy link
Member

Compute and report the staleness metrics after stale profile matching

Staleness is characteristic associated with a given profile and IR. There is no post-matching staleness. We can say the success rate of matching or renaming unmatched callsites/samples, but we don't want to use overloaded term post-matching staleness to describe it.

@wlei-llvm
Copy link
Contributor Author

Thanks so much for the feedback. Answering some high-level questions, will address others later.

uint64_t MismatchedCallsiteSamples = 0;
uint64_t NumMismatchedFuncHash = 0;
uint64_t TotalProfiledFunc = 0;
uint64_t MismatchedFuncHashSamples = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just MismatchedFunctionSamples, in contrast to TotalFunctionSamples.
I know this isn't changed in this patch, but why did we use the term FuncHashSamples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comment for those stats, particularly what is different between MismatchedCallsiteSamples and MismatchedFuncHashSamples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to remove the "Hash"

Comment on lines 458 to 472
void countMismatchedSamples(const FunctionSamples &FS);
void countProfileMismatches(
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
const std::map<LineLocation, std::unordered_set<FunctionId>>
&ProfileAnchors);
void countMismatchedCallsites(
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
const std::map<LineLocation, std::unordered_set<FunctionId>>
&ProfileAnchors,
const LocToLocMap &IRToProfileLocationMap);
void countMismatchedCallsiteSamples(const FunctionSamples &FS);
void countMismatchedCallsiteSamples();
void copyUnchangedCallsiteMismatches(
const StringMap<std::set<LineLocation>> &InputMismatchedCallsites);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions needs some comments as they are somewhat similar but with subtle differences.

void SampleProfileMatcher::countProfileMismatches(
const Function &F, const FunctionSamples &FS,
const std::map<LineLocation, StringRef> &IRAnchors,
void ProfileMatchStats::countMismatchedCallsites(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: count is a stateless and immutable action, but it's not the case here as you are tracking and recording FuncMismatchedCallsites that later stats depends on.

// reported as mismatching.
if (IRCalleeName == SampleProfileMatcher::UnknownIndirectCallee)
MatchedCallsites.insert(Loc);
else if (Callees.count(getRepInFormat(IRCalleeName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explain why we don't expect exactly one callee in profile when there is only one callee in IR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is potentially a bug and this is workaround, at least leave a TODO comment. Preferably we should not need a workaround.

auto &MismatchedCallsites =
FuncMismatchedCallsites[FunctionSamples::getCanonicalFnName(F.getName())];

auto MapIRLocToProfileLoc = [&](const LineLocation &IRLoc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place where pre-match and post-match is automatically handled in unified way, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the benefit here is pre-match and post-match using the same functioncountMismatchedCallsites, no special process to parsing the matching results. For pre-match, we just pass a empty IRToProfileLocationMap and no changes to RunStaleProfileMatching.

uint64_t TotalFunctionSamples = 0;

// A map from function name to a set of mismatched callsite locations.
StringMap<std::set<LineLocation>> FuncMismatchedCallsites;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this container being order sensitive, unordered_set would perform better for lookups.

@@ -433,6 +433,44 @@ using CandidateQueue =
PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
CandidateComparer>;

// Profile matching statstics.
class ProfileMatchStats {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly thinking about a simple struct to encapsulate different stats, while keeping most of the logic in the Matcher. But don't have strong opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that too until I worked with this StringMap<std::set<LineLocation>> FuncMismatchedCallsites; which introduce a lot of actions making me think to put them into a new class. But as you suggested below, we can move FuncMismatchedCallsites out of the stats, or even we don't need a new class.

Comment on lines 2626 to 2628
PostMatchStats.copyUnchangedCallsiteMismatches(
PreMatchStats.FuncMismatchedCallsites);
PostMatchStats.countMismatchedCallsiteSamples();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two problems:

  1. We could end up just copying the same mismatched call sites and creating duplicates.
  2. The counting is somewhat spread out and tangled with matching -- there is countProfileMismatches and countMismatchedCallsites in function level runStaleProfileMatching; then there is counting again here at module level.

How about we make these changes:

  1. Move FuncMismatchedCallsites out of stat class
  2. Make FuncMismatchedCallsites StringMap<std::unordered_map<LineLocation,bool>> with the boolean indicating whether the input mismatch was recovered by fuzzy matching.
  3. During function level runStaleProfileMatching, we only populate and update FuncMismatchedCallsites, but not doing any stats. We do stats all at the end on module level.

This is try to create clearer separation between matching and stats, and also be consistent as to at which level stats are done (function or module).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to understand this on high-level, I feel this is to compress as much as we can into a global structure(FuncMismatchedCallistes) during function-level matching.
Comparing to my first version(using a global IR/Prof anchor map), this use a more compact structure. Comparing to my second(this) version, even move all the TotalProfiledFunction/ total Function Samples / total callsites stats out of the matching.

That sounds good idea, just some minor thing I can think:

  1. For StringMap<std::unordered_map<LineLocation,bool>>, a bool might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch.
  2. We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For StringMap<std::unordered_map<LineLocation,bool>>, a bool might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch.

Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.

We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

Incrementing counter along with tracking FuncMismatchedCallsites seems fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.

Yeah, It depends on which heuristic we use. Some anchors are aligned based on the previous matched anchors, there could be the original function is by luck matched but we run matching to "fix" it to a mismatched location. for example:

Before matching, supposing 5:E is matched by luck.

IR :       1:A, 2: B,   3:C,   4: D,   5:E 
                           |       |       |  
Profile :                 3:A,   4: B,    5:E,  6:D,  7:E

After matching:

IR :          1:A,  2: B,   3:C,     4: D,    5:E 
              |        |       |       |        |
Profile :    3:A,   4: B,   5:E,    6:D,     8:E

However, after matching, the 5:E is synced to 3:c based on the offset (1:A --> 3:A), this is the new mismatch. So, that's why I was thinking to recompute/reuse the "counting" function from scratch, without relying on the matching heuristic.

We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.

Incrementing counter along with tracking FuncMismatchedCallsites seems fine?

OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. Now I understand what new mismatch means -- thanks for the clarification.

There are two scenarios I think:

  1. The original match by luck is actually incorrect too. In this case there is no new mismatch, but we are under-reporting input mismatches as there is no easy way to tell whether it's correct match or not. I seems the example you have is in this category. I tend to think we just need to live with it?

  2. The original match by luck is correct. In this case should we try to make sure the matching algorithm doesn't introduce new mismatches. Or at least make sure such case is very very unlikely to happen? Does this actually happen and would Meyers diffing have the same issue of introducing new mismatches?

Regardless if we really want consider such cases, maybe StringMap<std::unordered_map<LineLocation, MismatchState>> with MismatchState containing one bit for input mismatch, and one bit for final mismatch is all we need?

If we really want to shoot for best/most accurate metric, lucky but incorrect original match (like 1) should be counted as mismatch. But it's probably not worth the complexity to do that?

OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.

I guess strictly speaking you are right. But it still moves most of the non-trivial counting out of matching logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new version. The changes are:

Most of the counting part are moved out of the function-level matching, and moved into a new function computeAndReportProfileStaleness, which is a module-level function. It includes all the counts about function checksum mismatch and callsite mismatch, and both weighted(samples) and unweighted mismatches.

Extended the (the old FuncMismatchedCallsites) to StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;, use a enum MatchState to present the match state. one reason is now we can compress the struture as we discussed before. Another reason is we have to move total profiled function metrics or total profiled callsites out into computeAndReportProfileStaleness. it's not easy /clean to compute the two metrics in the old way because it's only under non-imported function, however, for match states, we need to count the imported function too. So to achieve this, we need a "Matched" callsite state used to count the original profiled function.

@wlei-llvm wlei-llvm changed the title [CSSPGO] Compute and report post-match profile staleness [CSSPGO] Compute and report profile matching recovered callsites and samples Feb 1, 2024
@@ -690,6 +715,10 @@ void SampleProfileLoaderBaseImpl<Function>::computeDominanceAndLoopInfo(
}
} // namespace llvm

bool ShouldSkipProfileLoading(const Function &F) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. make this static -- other free functions specific to sample loader are static.
  2. if make this into a helper function, all similar code should be replaced. there are 5+ instance but only one is replaced in this change.
  3. function name starts with lower case
  4. nit: name it skipProfileForFunction or useProfileForFunction

/// Profile mismatch statstics:
uint64_t TotalProfiledFunc = 0;
// Num of function whose checksum is mismatched.
uint64_t NumMismatchedFunc = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we categorize them as NumStaleProfileFunc (has profile with same name but CFG checksum mismatches) and NumNoProfileFunc (no profile for the function at all, but we may recover a profile after we handle function renaming)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we categorize them as NumStaleProfileFunc (has profile with same name but CFG checksum mismatches) and NumNoProfileFunc (no profile for the function at all, but we may recover a profile after we handle function renaming)?

Sounds good (IIUC, this will add a new stats NumNoProfileFunc, this way we will have 3 function number related stats: TotalProfiledFunc, NumStaleProfileFunc, NumNoProfileFunc. )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can defer that until when we handle function renaming and do call graph level matching -- that's where we need to know how many of the no profile functions we are able to recover profile for them.

uint64_t TotalProfiledCallsites = 0;
uint64_t NumMismatchedCallsites = 0;
uint64_t NumRecoveredCallsites = 0;

/// Weigted profile samples mismatch statstics:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why /// which is inconsistent with others. The comment is also not clear - was expecting something explaining TotalFunctionSamples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why /// which is inconsistent with others. The comment is also not clear - was expecting something explaining TotalFunctionSamples.

Ah, ok.. I want to comment for a group of counters.. hmm, seems unnecessary, will remove the comments


/// Weigted profile samples mismatch statstics:
uint64_t TotalFunctionSamples = 0;
// Samples for the mismatched checksum functions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error: Samples for the mismatched checksum functions: -> Total samples for all checksum-mismatched functions

Comment on lines 2430 to 2433
// TODO : Ideally, we should ensure it's a direct callsite location(Callees
// size is 1). However, there may be a bug for profile merge(like ODR
// violation) that causes the callees size to be more than 1. After we fix
// the bug, we can remove this check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you found out why? This particular change/workaround seems unrelated to the main purpose of this patch. How about let's either 1) root cause and fix this or 2) not change it, not adding the workaround here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed there are many cases and could be multiple reasons, I haven't root caused them. For some cases, I can guess it's likely due to unstable IR or ODR violation.

The linker deduplicate the different versions for the outlined function but not the inlined function. e.g.

bar:100:100 
  1: foo.1  
      1:A
      2:B
baz:110:110   
  2: foo.2
      2:A
      3:B

When the two foos are merged/flattened, for callsite 2 , the profile anchors are 2:[A, B] but we can only see one function(either A or B) in the IR
As we could have multiple reasons for ODR violation, it may not be quick to fix it. moreover, this is no a problem for sample loader, it will load the profile the look-up matched. Anyway, going to remove the workaround, it should not a very big false positive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, going to remove the workaround, it should not a very big false positive.

Ok, if we end up needing this workaround for driving fall positive staleness % to zero, we can do a separate patch for it. It just feel like doesn't belong to this patch. :)

};

for (const auto &I : FS.getBodySamples())
CountSamples(I.first, I.second.getSamples());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..

Yep, from my observation, inlined and non-inlined callsites samples are seperated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is callsite samples in the context of sample PGO in LLVM particularly means samples for inlinees. :) (FunctionSamples::CallsiteSamples)

But here we are counting both inlined and not inlined samples as call site samples.. That is probably fine, but maybe worth a comment?

return It->second == MatchState::Mismatched;
};

auto CountSamples = [&](const LineLocation &Loc, uint64_t Samples) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CountSamples -> AttributeMismatchedSamples this is not counting call site samples, but rather distribute/attribute already counted samples into different category.

CountSamples(I.first, I.second.getSamples());

for (const auto &I : FS.getCallsiteSamples()) {
uint64_t Samples = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Samples -> CallsiteSamples

Comment on lines 2510 to 2518
CountSamples(I.first, Samples);

if (IsCallsiteMismatched(I.first))
continue;

// Count mismatched samples for matched inlines.
for (const auto &CS : I.second)
countMismatchedCallsiteSamples(CS.second);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest an alternative structure to make the flow more intuitive:

if (IsCallsiteMismatched(I.first)) {
    // comment 
    for (const auto &CS : I.second)
      Samples += CS.second.getTotalSamples();
    AttributeMismatchedSamples(I.first, Samples);
} else {
   // When the current level of inlined call site matches the profiled call site, we need to go deeper along the inline tree to count mismatches from lower level inlinees.
   for (const auto &CS : I.second)
      countMismatchedCallsiteSamples(CS.second);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky.
If a mismtched callsite is recovered and marked as recovered, it still goes through( AttributeMismatchedSamples to count the recovered samples but the IsCallsiteMismatched return false(needs to go counting it's inlinees(for (const auto &CS : I.second) countMismatchedCallsiteSamples(CS.second);) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we need a clearer flow. Currently CountSamples checks match state and do something, later IsCallsiteMismatched checks match state again and do something else. It's all tangled together.

It'd be clearer if we can hoist the flow on checking match state, and then group together what needs to be happen to each match state. I think it's still doable even with what you mentioned above. Make sense?

Comment on lines 2559 to 2561
const auto *FuncDesc = ProbeManager->getDesc(F);
if (FuncDesc && ProbeManager->profileIsHashMismatched(*FuncDesc, *FS))
NumMismatchedFunc++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not needed: 1) they are handled inside countMismatchedFuncSamples, 2) for consistency, countMismatchCallsites/countMismatchedCallsiteSamples don't have mismatch check at top level caller.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version indeed looks a lot cleaner. Thanks for working through the changes.

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
@@ -2498,25 +2607,41 @@ void SampleProfileMatcher::runOnModule() {

SmallVector<std::pair<StringRef, uint64_t>> ProfStatsVec;
if (FunctionSamples::ProfileIsProbeBased) {
ProfStatsVec.emplace_back("NumMismatchedFuncHash", NumMismatchedFuncHash);
ProfStatsVec.emplace_back("NumStaleProfileFunc", NumStaleProfileFunc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing those strings require changing how we aggregates stats internally for dashboard etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.

We want to make sure there is continuity in staleness data, and we should not lose data before the name changes.

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits.

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
Matched = 0,
Mismatched = 1,
// Stay Matched after profile matching.
StayMatched = 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor inconsistency here as to what we want to cover in the state enum -- Recovered is a state change, StayMatched is a non-change. Do we describe state change or just state in the enum?

If we just describe state:
InitialMatch,
InitialMismatch,
FinalMatch,
FinalMismatch

If we describe state change
InitialMatch,
InitialMismatch,
RecoveredMismatch, (From initial mismatch to final match)
RemovedMatch (from initial match to final mismatch)

It'd be great if this can be improved, but not a deal breaker if the current state values are very convenient for implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I lean towards the second one(state change), thinking it can align with the name of the stats counter such as NumRecoveredCallsites, RecoveredCallsiteSamples

Updated in the new version, thanks for the suggestion!

// For each function, store every callsite and its matching state into this
// map, of which each entry is a pair of callsite location and MatchState.
// This is used for profile staleness computation and report.
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that StringMap keeps a copy of every string key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If we have the memory pressure, another option is to consolidate the two StringMap

StringMap<LocToLocMap> FuncMappings;
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;

Into one map, like

StringMap<std::pair<LocToLocMap, std::unordered_map<LineLocation, MatchState, LineLocationHash>>>    FuncXXX... 

the Key of LocToLocMap is also a LineLocation, I was thinking to further merge them

StringMap<std::unordered_map<LineLocation, std::pair<LineLocation, MatchState>, LineLocationHash>>>    FuncXXX... 

but unfortunately the key of LocToLocMap is IRlocation but the latter's key is the ProfileLocation, so merging them might be too hacky and confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that StringMap keeps a copy of every string key.

I think one optimization is we can release this data after the matching is done, we don't use those data for the later sample inlining or loading stuffs, similar to the StringMap<LocToLocMap> FuncMappings;.

RecoveredMismatch = 2,
// From initial match to final mismatch.
RemovedMatch = 3,
Unknown = 32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we use 0 as uninitialized/unknown value. This is because for zero initialized memory, this naturally sets everything to unknown.


if (IsCallsiteMatched) {
auto R =
CallsiteMatchStates.emplace(ProfileLoc, MatchState::InitialMatch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CallsiteMatchStates[ProfileLoc] already exist, setting it to InitialMatch seems conceptually wrong as this is probably post-match (FinalMatch)? With state like InitialMatch, I thought we will not set such state in post-match stage.

Maybe for better readability, do a find first and then decide what value to set like how this is done in the for loop below.

if (It == CallsiteMatchStates.end())
CallsiteMatchStates.emplace(Loc, MatchState::InitialMismatch);
// The inital match is removed in post-match.
else if (IsPostMatch && It->second == MatchState::InitialMatch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused here, if there was an entry and it was matched in pre-match, it won't be changed in post-match if it still matches, why do we change it to RemovedMatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, Good catch, this is wrong here, it's update all InitialMatch to RemovedMatch regardless it's matched or not.

The fact is that we don't have a FinalMatch state in current implementation, so we are not able to determine whether it's matched or not in post-match in this loop.

But we also can't only use the FinalMatch and FinalMismatch, because the counting is at the end of the function, then we don't know the recovered numbers(which is this diff's goal)

It think it's clear that in pre-match we can have 'InitialMatch' and 'IntialMismatch'

but in post match, I think we need 4 states, but that make it more complicated..

  1. 'InitialMatch' stay match, to UnchangedMatch
  2. 'IntialMismatch' stay mismatch, to UnchangedMismatch
  3. 'IntialMismatch' is recovered. to RecoveredMismatch
  4. 'InitialMatch' is removed. to RemovedMatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in total there would be 6 states, the InitialMatch and InitialMismatch is only used for transition. In the counting function, we only do counting on the 4 post-match states. UnchangedMismatch + RecoveredMismatch is the original mismatch,RemovedMatch + UnchangedMismatch is the final mismatch, RecoveredMismatch is the recovered mismatch.

Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 explicit state sounds ok with me, that should make things really clear?

I'd add comment for each of the state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.

Copy link
Contributor Author

@wlei-llvm wlei-llvm Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 explicit state sounds ok with me, that should make things really clear?

yeah, I feel it's clearer, please take another look.

Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.

Uh, I see, I checked that it's due to the tests are ether all matched cases or all mismatched cases. Updated the test.

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
const LocToLocMap *IRToProfileLocationMap);

bool isMismatchState(const enum MatchState &State) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called during counting (post-match), so we should not expect any Initial* state.

Can we assert somewhere during counting that we do not have any initial* state? If we have that means recordCallsiteMatchStates may have a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that we still can get the Initial* state; because currently the fuzzy matching is only on when there is checksum mismatch, but the counting and reporting is for all non-import function, so for those checksum matched function, their state are on the Initial* state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checksum actually matches, do we expect any InitialMismatch from such functions?

Or in other words, if checksum matches, there is no state change to track, do we actually still need to run recordCallsiteMatchStates? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checksum actually matches, do we expect any InitialMismatch from such functions?

Or in other words, if checksum matches, there is no state change to track, do we actually still need to run recordCallsiteMatchStates? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?

It could happen, I saw in our dashboard, even the checksum mismatched samples is zero, there are a few numbers of callsite mismatches(e.g. Stale function samples: 0.00%(773327/65978370982), Stale functions: 0.01%(6/42787) , Stale callsite samples: 0.29%(194149265/65978370982), Stale callsite samples after matching: 0.29%(194318114/65978370982), Stale callsites: 5.95%(20780/349307) , Stale callsites after matching: 5.95%(20781/349307))

I checked that it's mostly from the ctor alias optimization, it's one kind of "function-rename"(in this case it's not from user change), on the other hand, the function-renaming from user change is also in this category. Recording those mismatches on checksum matched function can help detect this issue.

Though this ctor alias functions are all very cold functions, it's maybe fine to ignore this if this "false positives" (this is more like a future support) is noisy.

Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?

I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.

It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?

Exactly.

I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.
It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?

This is a good question. We do use some callsite info for computing checksum: the total number of callsites. If we add/remove one more callsite, it's still captured as checksum changes. so it's not strict enough.

If we really want to capture the name changes, we can encode the name of callsite into the checksum. Then again, similar to the "unstable IR" issue, there could be mismatches from internal compiler but not from user code. This ctor alias is one case. We will run stale matching for this kind of changes. That said, it's doable, just we needs more changes to recognize : "this is not from user renaming, let's only run renaming opt but not fuzzy matching opt". the current fuzzy matching may not work well on this ctor alias cases. We need more check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue is then we will have the false positives from this ctor alias top if we make a strict checksum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was thinking about capturing callee names in checksum. But we don't have to decide now. Let's deal with that when we work on handling renames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no other unresolved comments, I'm going to push this change. I really appreciate all the discussions and suggestions!

@wlei-llvm
Copy link
Contributor Author

Pushed a new version:
As we discussed offline, added an assertion to ensure the state are either all on initial state or on final state.
add a clear function to release the memory of StringMap<...> FuncCallsiteMatchStates;

@wlei-llvm wlei-llvm merged commit c98da37 into llvm:main Feb 19, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants