Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSSPGO] Reject high checksum mismatched profile #84097

Merged
merged 7 commits into from
Mar 27, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Mar 6, 2024

Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile.
Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(hot-func-cutoff-for-staleness-error and min-functions-for-staleness-error) to control the function selection to make sure the selected are hot enough and the num of function is not small.
Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile.
Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(checksum-mismatch-func-hot-block-skip and checksum-mismatch-num-func-skip) to control the function selection to make sure the selected are hot enough and the num of function is not small.
Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .


Full diff: https://github.com/llvm/llvm-project/pull/84097.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+81)
  • (added) llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll (+5)
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index ffc26f1a0a972d..9ec224bb32a551 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -234,6 +234,24 @@ static cl::opt<unsigned> ProfileICPRelativeHotnessSkip(
     cl::desc(
         "Skip relative hotness check for ICP up to given number of targets."));
 
+static cl::opt<unsigned> ChecksumMismatchFuncHotBlockSkip(
+    "checksum-mismatch-func-hot-block-skip", cl::Hidden, cl::init(100),
+    cl::desc("For checksum-mismatch error check, skip checking the function "
+             "whose num of hot(on average) blocks is smaller than the "
+             "given number."));
+
+static cl::opt<unsigned> ChecksumMismatchNumFuncSkip(
+    "checksum-mismatch-num-func-skip", cl::Hidden, cl::init(50),
+    cl::desc("For checksum-mismatch error check, skip the check if the total "
+             "number of selected function is smaller than the given number."));
+
+static cl::opt<unsigned> ChecksumMismatchErrorThreshold(
+    "checksum-mismatch-error-threshold", cl::Hidden, cl::init(80),
+    cl::desc(
+        "For checksum-mismatch error check, error out if the percentage of "
+        "function mismatched-checksum is higher than the given percentage "
+        "threshold"));
+
 static cl::opt<bool> CallsitePrioritizedInline(
     "sample-profile-prioritized-inline", cl::Hidden,
 
@@ -630,6 +648,8 @@ class SampleProfileLoader final : public SampleProfileLoaderBaseImpl<Function> {
   std::vector<Function *> buildFunctionOrder(Module &M, LazyCallGraph &CG);
   std::unique_ptr<ProfiledCallGraph> buildProfiledCallGraph(Module &M);
   void generateMDProfMetadata(Function &F);
+  bool errorIfHighChecksumMismatch(Module &M, ProfileSummaryInfo *PSI,
+                                   const SampleProfileMap &Profiles);
 
   /// Map from function name to Function *. Used to find the function from
   /// the function name. If the function name contains suffix, additional
@@ -2184,6 +2204,61 @@ bool SampleProfileLoader::doInitialization(Module &M,
   return true;
 }
 
+// Note that this is a module-level check. Even if one module is errored out,
+// the entire build will be errored out. However, the user could make big
+// changes to functions in single module but those changes might not be
+// performance significant to the whole binary. Therefore, we use a conservative
+// approach to make sure we only error out if it globally impacts the binary
+// performance. To achieve this, we use heuristics to select a reasonable
+// big set of functions that are supposed to be globally performance
+// significant, only compute and check the mismatch within those functions. The
+// function selection is based on two criteria: 1) The function is "hot" enough,
+// which is tuned by a hotness-based flag(ChecksumMismatchFuncHotBlockSkip). 2)
+// The num of function is large enough which is tuned by the
+// ChecksumMismatchNumFuncSkip flag.
+bool SampleProfileLoader::errorIfHighChecksumMismatch(
+    Module &M, ProfileSummaryInfo *PSI, const SampleProfileMap &Profiles) {
+  assert(FunctionSamples::ProfileIsProbeBased &&
+         "Only support for probe-based profile");
+  uint64_t TotalSelectedFunc = 0;
+  uint64_t NumMismatchedFunc = 0;
+  for (const auto &I : Profiles) {
+    const auto &FS = I.second;
+    const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
+    if (!FuncDesc)
+      continue;
+
+    // We want to select a set of functions that are globally performance
+    // significant, in other words, if those functions profiles are
+    // checksum-mismatched and dropped, the whole binary will likely be
+    // impacted, so here we use a hotness-based threshold to control the
+    // selection.
+    if (FS.getTotalSamples() <
+        ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
+      continue;
+
+    TotalSelectedFunc++;
+    if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS))
+      NumMismatchedFunc++;
+  }
+  // Make sure that the num of selected function is not too small to distinguish
+  // from the user's benign changes.
+  if (TotalSelectedFunc < ChecksumMismatchNumFuncSkip)
+    return false;
+
+  // Finally check the mismatch percentage against the threshold.
+  if (NumMismatchedFunc * 100 >=
+      TotalSelectedFunc * ChecksumMismatchErrorThreshold) {
+    auto &Ctx = M.getContext();
+    const char *Msg =
+        "The FDO profile is too old and will cause big performance regression, "
+        "please drop the profile and collect a new one.";
+    Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg));
+    return true;
+  }
+  return false;
+}
+
 void SampleProfileMatcher::findIRAnchors(
     const Function &F, std::map<LineLocation, StringRef> &IRAnchors) {
   // For inlined code, recover the original callsite and callee by finding the
@@ -2715,6 +2790,12 @@ bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
                         ProfileSummary::PSK_Sample);
     PSI->refresh();
   }
+
+  // Error out if the profile checksum mismatch is too high.
+  if (FunctionSamples::ProfileIsProbeBased &&
+      errorIfHighChecksumMismatch(M, PSI, Reader->getProfiles()))
+    return false;
+
   // Compute the total number of samples collected in this profile.
   for (const auto &I : Reader->getProfiles())
     TotalCollectedSamples += I.second.getTotalSamples();
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll
new file mode 100644
index 00000000000000..84a2a0ff2eb6ad
--- /dev/null
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll
@@ -0,0 +1,5 @@
+; REQUIRES: x86_64-linux
+; RUN: opt < %S/pseudo-probe-profile-mismatch.ll -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-profile-mismatch.prof -checksum-mismatch-func-hot-block-skip=0 -checksum-mismatch-num-func-skip=1 -checksum-mismatch-error-threshold=1 -S 2>%t -o %t.ll
+; RUN: FileCheck %s --input-file %t
+
+; CHECK: error: [[*]]: The FDO profile is too old and will cause big performance regression, please drop the profile and collect a new one.

// impacted, so here we use a hotness-based threshold to control the
// selection.
if (FS.getTotalSamples() <
ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The multiplier doesn't really has to do with "block" of the function, so the name can be confusing?
  2. What do you think of using isFunctionHotInCallGraphNthPercentile or isHotCountNthPercentile with a percentile cutoff? We can tun the percentile to get what we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The multiplier doesn't really has to do with "block" of the function, so the name can be confusing?
  2. What do you think of using isFunctionHotInCallGraphNthPercentile or isHotCountNthPercentile with a percentile cutoff? We can tun the percentile to get what we need.

Let me try to clarify my initial thought: IIUC, all the "HotCount" stuffs in PSI are "block-level"(or body-sample level) concept, the isFunctionHotXXX function are used to check the entry_count of the function which is also "block-level"(the first block), however what I used here is the getTotalSamples() which I intend to include all the callee/inlinee samples(it could be the entry count is cold, but it does a lot of hot inlinings), as total_samples is kind of sum of all the block counts of this function, so I used a multiplier which I want to mean how many blocks on average is hot in this function.

So if we pass a sum of blocks(getTotalSamples) to this isFunctionHotInCallGraphNthPercentile or isHotCountNthPercentile function, might still be confusing?

I feet we actually don't have a function-level thing for this hotness check. so I was also thinking to add a new function to PSI, like:(but no sure if it's generally useful)

int getNumHotBlocks(const FunctionSamples &FS) {
   int num = 0; 
   for(samples in all body samples) 
      if(samples > isHotCount())
         num++;
    return num;
}

Any thought? thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the isFunctionHotXXX function are used to check the entry_count of the function which is also "block-level"(the first block), however what I used here is the getTotalSamples() which I intend to include all the callee/inlinee samples(it could be the entry count is cold, but it does a lot of hot inlinings)

So there are different ways to decide whether a function is hot. Here we want to select a few hot functions and see how many of them has checksum mismatch. Checking on total samples is one way, and is different from existing isFunctionHotXXX. But do we really need to add another way of deciding whether a function is hot? I don't know the answer, but maybe use isFunctionHotInCallGraphNthPercentile to decide whether function is hot for our purpose is ok too?

I agree that passing total samples into isHotCountNthPercentile maybe not be a good idea since the latter expects block counts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sounds good to use total_samples to decide if the function is hot.

I took a look at isFunctionHotInCallGraphNthPercentile , looks like the function's parameter is Function (isFunctionHotOrColdInCallGraphNthPercentile(int PercentileCutoff, const FuncT *F, BFIT &FI)) https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , and it does looks for the annotations on the IR to check the hotness, however, here our function is before the sample annotation, so we can't directly use this function.
Actually I found that in isFunctionHotOrColdInCallGraphNthPercentile https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , it also uses isHotCountNthPercentile for TotalCallCount which seems also not block-level counts, probably right now we don't have an existing function for total_samples, so maybe to make it simple, we can just use isHotCountNthPercentile for total_samples? (Or maybe add a overload function like isFunctionHotInCallGraphNthPercentile(... , total_samples) to accept total_samples, but I think it's really just a wrapper of isHotCountNthPercentile)

Copy link
Contributor Author

@wlei-llvm wlei-llvm Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging more, I found that the callsiteIsHot https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp#L64 also uses total_samples to decide hot function..

Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to just use isHotCountNthPercentile against total samples for now.

Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..

Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses callsiteIsHot should be AutoFDO only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to just use isHotCountNthPercentile against total samples for now.

Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..

Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses callsiteIsHot should be AutoFDO only.

Good to know, checked that the usages of callsiteIsHot are indeed from AutoFDO path.

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Mar 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@WenleiHe
Copy link
Member

The current version looks good. For testing, have you tried to build with some workloads to make sure we don't have false positives?

llvm/lib/Transforms/IPO/SampleProfile.cpp Outdated Show resolved Hide resolved
// impacted, so here we use a hotness-based threshold to control the
// selection.
if (FS.getTotalSamples() <
ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to just use isHotCountNthPercentile against total samples for now.

Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..

Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses callsiteIsHot should be AutoFDO only.

@wlei-llvm
Copy link
Contributor Author

The current version looks good. For testing, have you tried to build with some workloads to make sure we don't have false positives?

Yeah, I'm going to give another tuning for this "HotFuncCutoffForStalenessError" flag value.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits, thanks.

@wlei-llvm
Copy link
Contributor Author

The current version looks good. For testing, have you tried to build with some workloads to make sure we don't have false positives?

Update the flag tuning: looks it's quite distinguishable. On one ads service, when using a super bad profile(applying both #79919 + #75092), surprisedly even using a 300000(30%) as hot-cut-off, it can still throw the error, 20% doesn't work... on the other side, when using a regular profile(1~2 week old), using 99.9% doesn't trigger the false positives.. so pretty large room for those upgrade error.. Now I set to 80%, leaning towards a low bar to avoid false negative.

@wlei-llvm wlei-llvm changed the title [CSSPGO] Error out if the checksum mismatch is high [CSSPGO] Add support to reject high checksum mismatched profile Mar 26, 2024
@wlei-llvm wlei-llvm changed the title [CSSPGO] Add support to reject high checksum mismatched profile [CSSPGO] Reject high checksum mismatched profile Mar 27, 2024
@wlei-llvm wlei-llvm merged commit 2598aa6 into llvm:main Mar 27, 2024
4 checks passed
@wlei-llvm wlei-llvm deleted the error-hash-mismatch branch March 28, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants