-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSSPGO] Reject high checksum mismatched profile #84097
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Lei Wang (wlei-llvm) ChangesError out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile. Full diff: https://github.com/llvm/llvm-project/pull/84097.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index ffc26f1a0a972d..9ec224bb32a551 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -234,6 +234,24 @@ static cl::opt<unsigned> ProfileICPRelativeHotnessSkip(
cl::desc(
"Skip relative hotness check for ICP up to given number of targets."));
+static cl::opt<unsigned> ChecksumMismatchFuncHotBlockSkip(
+ "checksum-mismatch-func-hot-block-skip", cl::Hidden, cl::init(100),
+ cl::desc("For checksum-mismatch error check, skip checking the function "
+ "whose num of hot(on average) blocks is smaller than the "
+ "given number."));
+
+static cl::opt<unsigned> ChecksumMismatchNumFuncSkip(
+ "checksum-mismatch-num-func-skip", cl::Hidden, cl::init(50),
+ cl::desc("For checksum-mismatch error check, skip the check if the total "
+ "number of selected function is smaller than the given number."));
+
+static cl::opt<unsigned> ChecksumMismatchErrorThreshold(
+ "checksum-mismatch-error-threshold", cl::Hidden, cl::init(80),
+ cl::desc(
+ "For checksum-mismatch error check, error out if the percentage of "
+ "function mismatched-checksum is higher than the given percentage "
+ "threshold"));
+
static cl::opt<bool> CallsitePrioritizedInline(
"sample-profile-prioritized-inline", cl::Hidden,
@@ -630,6 +648,8 @@ class SampleProfileLoader final : public SampleProfileLoaderBaseImpl<Function> {
std::vector<Function *> buildFunctionOrder(Module &M, LazyCallGraph &CG);
std::unique_ptr<ProfiledCallGraph> buildProfiledCallGraph(Module &M);
void generateMDProfMetadata(Function &F);
+ bool errorIfHighChecksumMismatch(Module &M, ProfileSummaryInfo *PSI,
+ const SampleProfileMap &Profiles);
/// Map from function name to Function *. Used to find the function from
/// the function name. If the function name contains suffix, additional
@@ -2184,6 +2204,61 @@ bool SampleProfileLoader::doInitialization(Module &M,
return true;
}
+// Note that this is a module-level check. Even if one module is errored out,
+// the entire build will be errored out. However, the user could make big
+// changes to functions in single module but those changes might not be
+// performance significant to the whole binary. Therefore, we use a conservative
+// approach to make sure we only error out if it globally impacts the binary
+// performance. To achieve this, we use heuristics to select a reasonable
+// big set of functions that are supposed to be globally performance
+// significant, only compute and check the mismatch within those functions. The
+// function selection is based on two criteria: 1) The function is "hot" enough,
+// which is tuned by a hotness-based flag(ChecksumMismatchFuncHotBlockSkip). 2)
+// The num of function is large enough which is tuned by the
+// ChecksumMismatchNumFuncSkip flag.
+bool SampleProfileLoader::errorIfHighChecksumMismatch(
+ Module &M, ProfileSummaryInfo *PSI, const SampleProfileMap &Profiles) {
+ assert(FunctionSamples::ProfileIsProbeBased &&
+ "Only support for probe-based profile");
+ uint64_t TotalSelectedFunc = 0;
+ uint64_t NumMismatchedFunc = 0;
+ for (const auto &I : Profiles) {
+ const auto &FS = I.second;
+ const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
+ if (!FuncDesc)
+ continue;
+
+ // We want to select a set of functions that are globally performance
+ // significant, in other words, if those functions profiles are
+ // checksum-mismatched and dropped, the whole binary will likely be
+ // impacted, so here we use a hotness-based threshold to control the
+ // selection.
+ if (FS.getTotalSamples() <
+ ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold())
+ continue;
+
+ TotalSelectedFunc++;
+ if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS))
+ NumMismatchedFunc++;
+ }
+ // Make sure that the num of selected function is not too small to distinguish
+ // from the user's benign changes.
+ if (TotalSelectedFunc < ChecksumMismatchNumFuncSkip)
+ return false;
+
+ // Finally check the mismatch percentage against the threshold.
+ if (NumMismatchedFunc * 100 >=
+ TotalSelectedFunc * ChecksumMismatchErrorThreshold) {
+ auto &Ctx = M.getContext();
+ const char *Msg =
+ "The FDO profile is too old and will cause big performance regression, "
+ "please drop the profile and collect a new one.";
+ Ctx.diagnose(DiagnosticInfoSampleProfile(M.getModuleIdentifier(), Msg));
+ return true;
+ }
+ return false;
+}
+
void SampleProfileMatcher::findIRAnchors(
const Function &F, std::map<LineLocation, StringRef> &IRAnchors) {
// For inlined code, recover the original callsite and callee by finding the
@@ -2715,6 +2790,12 @@ bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummary::PSK_Sample);
PSI->refresh();
}
+
+ // Error out if the profile checksum mismatch is too high.
+ if (FunctionSamples::ProfileIsProbeBased &&
+ errorIfHighChecksumMismatch(M, PSI, Reader->getProfiles()))
+ return false;
+
// Compute the total number of samples collected in this profile.
for (const auto &I : Reader->getProfiles())
TotalCollectedSamples += I.second.getTotalSamples();
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll
new file mode 100644
index 00000000000000..84a2a0ff2eb6ad
--- /dev/null
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll
@@ -0,0 +1,5 @@
+; REQUIRES: x86_64-linux
+; RUN: opt < %S/pseudo-probe-profile-mismatch.ll -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-profile-mismatch.prof -checksum-mismatch-func-hot-block-skip=0 -checksum-mismatch-num-func-skip=1 -checksum-mismatch-error-threshold=1 -S 2>%t -o %t.ll
+; RUN: FileCheck %s --input-file %t
+
+; CHECK: error: [[*]]: The FDO profile is too old and will cause big performance regression, please drop the profile and collect a new one.
|
// impacted, so here we use a hotness-based threshold to control the | ||
// selection. | ||
if (FS.getTotalSamples() < | ||
ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The multiplier doesn't really has to do with "block" of the function, so the name can be confusing?
- What do you think of using
isFunctionHotInCallGraphNthPercentile
orisHotCountNthPercentile
with a percentile cutoff? We can tun the percentile to get what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The multiplier doesn't really has to do with "block" of the function, so the name can be confusing?
- What do you think of using
isFunctionHotInCallGraphNthPercentile
orisHotCountNthPercentile
with a percentile cutoff? We can tun the percentile to get what we need.
Let me try to clarify my initial thought: IIUC, all the "HotCount" stuffs in PSI are "block-level"(or body-sample level) concept, the isFunctionHotXXX
function are used to check the entry_count of the function which is also "block-level"(the first block), however what I used here is the getTotalSamples()
which I intend to include all the callee/inlinee samples(it could be the entry count is cold, but it does a lot of hot inlinings), as total_samples is kind of sum of all the block counts of this function, so I used a multiplier which I want to mean how many blocks on average is hot in this function.
So if we pass a sum of blocks(getTotalSamples
) to this isFunctionHotInCallGraphNthPercentile
or isHotCountNthPercentile
function, might still be confusing?
I feet we actually don't have a function-level thing for this hotness check. so I was also thinking to add a new function to PSI, like:(but no sure if it's generally useful)
int getNumHotBlocks(const FunctionSamples &FS) {
int num = 0;
for(samples in all body samples)
if(samples > isHotCount())
num++;
return num;
}
Any thought? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the isFunctionHotXXX function are used to check the entry_count of the function which is also "block-level"(the first block), however what I used here is the getTotalSamples() which I intend to include all the callee/inlinee samples(it could be the entry count is cold, but it does a lot of hot inlinings)
So there are different ways to decide whether a function is hot. Here we want to select a few hot functions and see how many of them has checksum mismatch. Checking on total samples is one way, and is different from existing isFunctionHotXXX. But do we really need to add another way of deciding whether a function is hot? I don't know the answer, but maybe use isFunctionHotInCallGraphNthPercentile
to decide whether function is hot for our purpose is ok too?
I agree that passing total samples into isHotCountNthPercentile
maybe not be a good idea since the latter expects block counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sounds good to use total_samples
to decide if the function is hot.
I took a look at isFunctionHotInCallGraphNthPercentile
, looks like the function's parameter is Function
(isFunctionHotOrColdInCallGraphNthPercentile(int PercentileCutoff, const FuncT *F, BFIT &FI)
) https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , and it does looks for the annotations on the IR to check the hotness, however, here our function is before the sample annotation, so we can't directly use this function.
Actually I found that in isFunctionHotOrColdInCallGraphNthPercentile
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ProfileSummaryInfo.h#L285 , it also uses isHotCountNthPercentile
for TotalCallCount
which seems also not block-level counts, probably right now we don't have an existing function for total_samples, so maybe to make it simple, we can just use isHotCountNthPercentile
for total_samples? (Or maybe add a overload function like isFunctionHotInCallGraphNthPercentile(... , total_samples)
to accept total_samples, but I think it's really just a wrapper of isHotCountNthPercentile
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging more, I found that the callsiteIsHot
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp#L64 also uses total_samples to decide hot function..
Maybe a different topic, but I'm surprised why it uses total_samples
to make inlining decision..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to just use isHotCountNthPercentile
against total samples for now.
Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..
Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses callsiteIsHot
should be AutoFDO only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to just use
isHotCountNthPercentile
against total samples for now.Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..
Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses
callsiteIsHot
should be AutoFDO only.
Good to know, checked that the usages of callsiteIsHot
are indeed from AutoFDO path.
✅ With the latest revision this PR passed the C/C++ code formatter. |
The current version looks good. For testing, have you tried to build with some workloads to make sure we don't have false positives? |
// impacted, so here we use a hotness-based threshold to control the | ||
// selection. | ||
if (FS.getTotalSamples() < | ||
ChecksumMismatchFuncHotBlockSkip * PSI->getOrCompHotCountThreshold()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to just use isHotCountNthPercentile
against total samples for now.
Maybe a different topic, but I'm surprised why it uses total_samples to make inlining decision..
Yes, this is not ideal. We intentionally switched away from that for CSSPGO. So if you check, the path that uses callsiteIsHot
should be AutoFDO only.
Yeah, I'm going to give another tuning for this "HotFuncCutoffForStalenessError" flag value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits, thanks.
llvm/test/Transforms/SampleProfile/pseudo-probe-profile-mismatch-error.ll
Show resolved
Hide resolved
Update the flag tuning: looks it's quite distinguishable. On one ads service, when using a super bad profile(applying both #79919 + #75092), surprisedly even using a 300000(30%) as hot-cut-off, it can still throw the error, 20% doesn't work... on the other side, when using a regular profile(1~2 week old), using 99.9% doesn't trigger the false positives.. so pretty large room for those upgrade error.. Now I set to 80%, leaning towards a low bar to avoid false negative. |
0b1fb2b
to
7bd76b3
Compare
7bd76b3
to
8a09260
Compare
Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile.
Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(
hot-func-cutoff-for-staleness-error
andmin-functions-for-staleness-error
) to control the function selection to make sure the selected are hot enough and the num of function is not small.Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .