-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSSPGO] Compute and report profile matching recovered callsites and samples #79090
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Lei Wang (wlei-llvm) ChangesCompute and report the staleness metrics after stale profile matching so that we can know how effective the matching is, i. e. how many callsites and samples are recovered by the matching.
Patch is 30.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79090.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 2fd8668d15e200f..a7170faa65dc07c 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -433,12 +433,19 @@ using CandidateQueue =
PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
CandidateComparer>;
+using IRAnchorMap = std::map<LineLocation, StringRef>;
+using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>;
+
// Sample profile matching - fuzzy match.
class SampleProfileMatcher {
Module &M;
SampleProfileReader &Reader;
const PseudoProbeManager *ProbeManager;
SampleProfileMap FlattenedProfiles;
+
+ std::unordered_map<const Function *, IRAnchorMap> FuncIRAnchors;
+ std::unordered_map<const Function *, ProfileAnchorMap> FuncProfileAnchors;
+
// For each function, the matcher generates a map, of which each entry is a
// mapping from the source location of current build to the source location in
// the profile.
@@ -448,6 +455,8 @@ class SampleProfileMatcher {
uint64_t TotalProfiledCallsites = 0;
uint64_t NumMismatchedCallsites = 0;
uint64_t MismatchedCallsiteSamples = 0;
+ uint64_t PostMatchNumMismatchedCallsites = 0;
+ uint64_t PostMatchMismatchedCallsiteSamples = 0;
uint64_t TotalCallsiteSamples = 0;
uint64_t TotalProfiledFunc = 0;
uint64_t NumMismatchedFuncHash = 0;
@@ -474,24 +483,22 @@ class SampleProfileMatcher {
return nullptr;
}
void runOnFunction(const Function &F);
- void findIRAnchors(const Function &F,
- std::map<LineLocation, StringRef> &IRAnchors);
- void findProfileAnchors(
+ void findFuncAnchors();
+ void UpdateIRAnchors();
+ void findIRAnchors(const Function &F, IRAnchorMap &IRAnchors);
+ void findProfileAnchors(const FunctionSamples &FS,
+ ProfileAnchorMap &ProfileAnchors);
+ void countMismatchedHashSamples(const FunctionSamples &FS);
+ void countProfileMismatches(bool IsPreMatch);
+ void countMismatchedHashes(const Function &F, const FunctionSamples &FS);
+ void countMismatchedCallsites(
+ const Function &F,
+ StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+ uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const;
+ void countMismatchedCallsiteSamples(
const FunctionSamples &FS,
- std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors);
- void countMismatchedSamples(const FunctionSamples &FS);
- void countProfileMismatches(
- const Function &F, const FunctionSamples &FS,
- const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors);
- void countProfileCallsiteMismatches(
- const FunctionSamples &FS,
- const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors,
- uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites);
+ StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+ uint64_t &FuncMismatchedCallsiteSamples) const;
LocToLocMap &getIRToProfileLocationMap(const Function &F) {
auto Ret = FuncMappings.try_emplace(
FunctionSamples::getCanonicalFnName(F.getName()), LocToLocMap());
@@ -499,11 +506,10 @@ class SampleProfileMatcher {
}
void distributeIRToProfileLocationMap();
void distributeIRToProfileLocationMap(FunctionSamples &FS);
- void runStaleProfileMatching(
- const Function &F, const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors,
- LocToLocMap &IRToProfileLocationMap);
+ void runStaleProfileMatching();
+ void runStaleProfileMatching(const Function &F, const IRAnchorMap &IRAnchors,
+ const ProfileAnchorMap &ProfileAnchors,
+ LocToLocMap &IRToProfileLocationMap);
};
/// Sample profile pass.
@@ -1129,7 +1135,7 @@ void SampleProfileLoader::findExternalInlineCandidate(
CalleeSample->getContext().hasAttribute(ContextShouldBeInlined);
if (!PreInline && CalleeSample->getHeadSamplesEstimate() < Threshold)
continue;
-
+
Function *Func = SymbolMap.lookup(CalleeSample->getFunction());
// Add to the import list only when it's defined out of module.
if (!Func || Func->isDeclaration())
@@ -2123,8 +2129,8 @@ bool SampleProfileLoader::doInitialization(Module &M,
return true;
}
-void SampleProfileMatcher::findIRAnchors(
- const Function &F, std::map<LineLocation, StringRef> &IRAnchors) {
+void SampleProfileMatcher::findIRAnchors(const Function &F,
+ IRAnchorMap &IRAnchors) {
// For inlined code, recover the original callsite and callee by finding the
// top-level inline frame. e.g. For frame stack "main:1 @ foo:2 @ bar:3", the
// top-level frame is "main:1", the callsite is "1" and the callee is "foo".
@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors(
}
}
-void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
+void SampleProfileMatcher::countMismatchedHashSamples(
+ const FunctionSamples &FS) {
const auto *FuncDesc = ProbeManager->getDesc(FS.getGUID());
// Skip the function that is external or renamed.
if (!FuncDesc)
@@ -2202,96 +2209,11 @@ void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) {
}
for (const auto &I : FS.getCallsiteSamples())
for (const auto &CS : I.second)
- countMismatchedSamples(CS.second);
-}
-
-void SampleProfileMatcher::countProfileMismatches(
- const Function &F, const FunctionSamples &FS,
- const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors) {
- [[maybe_unused]] bool IsFuncHashMismatch = false;
- if (FunctionSamples::ProfileIsProbeBased) {
- TotalFuncHashSamples += FS.getTotalSamples();
- TotalProfiledFunc++;
- const auto *FuncDesc = ProbeManager->getDesc(F);
- if (FuncDesc) {
- if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
- NumMismatchedFuncHash++;
- IsFuncHashMismatch = true;
- }
- countMismatchedSamples(FS);
- }
- }
-
- uint64_t FuncMismatchedCallsites = 0;
- uint64_t FuncProfiledCallsites = 0;
- countProfileCallsiteMismatches(FS, IRAnchors, ProfileAnchors,
- FuncMismatchedCallsites,
- FuncProfiledCallsites);
- TotalProfiledCallsites += FuncProfiledCallsites;
- NumMismatchedCallsites += FuncMismatchedCallsites;
- LLVM_DEBUG({
- if (FunctionSamples::ProfileIsProbeBased && !IsFuncHashMismatch &&
- FuncMismatchedCallsites)
- dbgs() << "Function checksum is matched but there are "
- << FuncMismatchedCallsites << "/" << FuncProfiledCallsites
- << " mismatched callsites.\n";
- });
+ countMismatchedHashSamples(CS.second);
}
-void SampleProfileMatcher::countProfileCallsiteMismatches(
- const FunctionSamples &FS,
- const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors,
- uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites) {
-
- // Check if there are any callsites in the profile that does not match to any
- // IR callsites, those callsite samples will be discarded.
- for (const auto &I : ProfileAnchors) {
- const auto &Loc = I.first;
- const auto &Callees = I.second;
- assert(!Callees.empty() && "Callees should not be empty");
-
- StringRef IRCalleeName;
- const auto &IR = IRAnchors.find(Loc);
- if (IR != IRAnchors.end())
- IRCalleeName = IR->second;
-
- // Compute number of samples in the original profile.
- uint64_t CallsiteSamples = 0;
- if (auto CTM = FS.findCallTargetMapAt(Loc)) {
- for (const auto &I : *CTM)
- CallsiteSamples += I.second;
- }
- const auto *FSMap = FS.findFunctionSamplesMapAt(Loc);
- if (FSMap) {
- for (const auto &I : *FSMap)
- CallsiteSamples += I.second.getTotalSamples();
- }
-
- bool CallsiteIsMatched = false;
- // Since indirect call does not have CalleeName, check conservatively if
- // callsite in the profile is a callsite location. This is to reduce num of
- // false positive since otherwise all the indirect call samples will be
- // reported as mismatching.
- if (IRCalleeName == UnknownIndirectCallee)
- CallsiteIsMatched = true;
- else if (Callees.size() == 1 && Callees.count(getRepInFormat(IRCalleeName)))
- CallsiteIsMatched = true;
-
- FuncProfiledCallsites++;
- TotalCallsiteSamples += CallsiteSamples;
- if (!CallsiteIsMatched) {
- FuncMismatchedCallsites++;
- MismatchedCallsiteSamples += CallsiteSamples;
- }
- }
-}
-
-void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
- std::map<LineLocation, std::unordered_set<FunctionId>> &ProfileAnchors) {
+void SampleProfileMatcher::findProfileAnchors(
+ const FunctionSamples &FS, ProfileAnchorMap &ProfileAnchors) {
auto isInvalidLineOffset = [](uint32_t LineOffset) {
return LineOffset & 0x8000;
};
@@ -2338,10 +2260,8 @@ void SampleProfileMatcher::findProfileAnchors(const FunctionSamples &FS,
// [1, 2, 3(foo), 4, 7, 8(bar), 9]
// The output mapping: [2->3, 3->4, 5->7, 6->8, 7->9].
void SampleProfileMatcher::runStaleProfileMatching(
- const Function &F,
- const std::map<LineLocation, StringRef> &IRAnchors,
- const std::map<LineLocation, std::unordered_set<FunctionId>>
- &ProfileAnchors,
+ const Function &F, const IRAnchorMap &IRAnchors,
+ const ProfileAnchorMap &ProfileAnchors,
LocToLocMap &IRToProfileLocationMap) {
LLVM_DEBUG(dbgs() << "Run stale profile matching for " << F.getName()
<< "\n");
@@ -2422,59 +2342,226 @@ void SampleProfileMatcher::runStaleProfileMatching(
}
}
-void SampleProfileMatcher::runOnFunction(const Function &F) {
- // We need to use flattened function samples for matching.
- // Unlike IR, which includes all callsites from the source code, the callsites
- // in profile only show up when they are hit by samples, i,e. the profile
- // callsites in one context may differ from those in another context. To get
- // the maximum number of callsites, we merge the function profiles from all
- // contexts, aka, the flattened profile to find profile anchors.
- const auto *FSFlattened = getFlattenedSamplesFor(F);
- if (!FSFlattened)
- return;
+void SampleProfileMatcher::runStaleProfileMatching() {
+ for (const auto &F : M) {
+ if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+ continue;
+ const auto *FSFlattened = getFlattenedSamplesFor(F);
+ if (!FSFlattened)
+ continue;
+ auto IR = FuncIRAnchors.find(&F);
+ auto P = FuncProfileAnchors.find(&F);
+ if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+ continue;
- // Anchors for IR. It's a map from IR location to callee name, callee name is
- // empty for non-call instruction and use a dummy name(UnknownIndirectCallee)
- // for unknown indrect callee name.
- std::map<LineLocation, StringRef> IRAnchors;
- findIRAnchors(F, IRAnchors);
- // Anchors for profile. It's a map from callsite location to a set of callee
- // name.
- std::map<LineLocation, std::unordered_set<FunctionId>> ProfileAnchors;
- findProfileAnchors(*FSFlattened, ProfileAnchors);
-
- // Detect profile mismatch for profile staleness metrics report.
- // Skip reporting the metrics for imported functions.
- if (!GlobalValue::isAvailableExternallyLinkage(F.getLinkage()) &&
- (ReportProfileStaleness || PersistProfileStaleness)) {
- // Use top-level nested FS for counting profile mismatch metrics since
- // currently once a callsite is mismatched, all its children profiles are
- // dropped.
- if (const auto *FS = Reader.getSamplesFor(F))
- countProfileMismatches(F, *FS, IRAnchors, ProfileAnchors);
+ // Run profile matching for checksum mismatched profile, currently only
+ // support for pseudo-probe.
+ if (FunctionSamples::ProfileIsProbeBased &&
+ !ProbeManager->profileIsValid(F, *FSFlattened)) {
+ runStaleProfileMatching(F, IR->second, P->second,
+ getIRToProfileLocationMap(F));
+ }
}
- // Run profile matching for checksum mismatched profile, currently only
- // support for pseudo-probe.
- if (SalvageStaleProfile && FunctionSamples::ProfileIsProbeBased &&
- !ProbeManager->profileIsValid(F, *FSFlattened)) {
- // The matching result will be saved to IRToProfileLocationMap, create a new
- // map for each function.
- runStaleProfileMatching(F, IRAnchors, ProfileAnchors,
- getIRToProfileLocationMap(F));
- }
+ distributeIRToProfileLocationMap();
}
-void SampleProfileMatcher::runOnModule() {
+void SampleProfileMatcher::findFuncAnchors() {
ProfileConverter::flattenProfile(Reader.getProfiles(), FlattenedProfiles,
FunctionSamples::ProfileIsCS);
- for (auto &F : M) {
+ for (const auto &F : M) {
if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
continue;
- runOnFunction(F);
+ // We need to use flattened function samples for matching.
+ // Unlike IR, which includes all callsites from the source code, the
+ // callsites in profile only show up when they are hit by samples, i,e. the
+ // profile callsites in one context may differ from those in another
+ // context. To get the maximum number of callsites, we merge the function
+ // profiles from all contexts, aka, the flattened profile to find profile
+ // anchors.
+ const auto *FSFlattened = getFlattenedSamplesFor(F);
+ if (!FSFlattened)
+ continue;
+
+ // Anchors for IR. It's a map from IR location to callee name, callee name
+ // is empty for non-call instruction and use a dummy
+ // name(UnknownIndirectCallee) for unknown indrect callee name.
+ auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap());
+ findIRAnchors(F, IR.first->second);
+
+ // Anchors for profile. It's a map from callsite location to a set of callee
+ // name.
+ auto P = FuncProfileAnchors.emplace(&F, ProfileAnchorMap());
+ findProfileAnchors(*FSFlattened, P.first->second);
+ }
+}
+
+void SampleProfileMatcher::countMismatchedCallsiteSamples(
+ const FunctionSamples &FS,
+ StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+ uint64_t &FuncMismatchedCallsiteSamples) const {
+ auto It = FuncToMismatchCallsites.find(FS.getFuncName());
+ // Skip it if no mismatched callsite or this is an external function.
+ if (It == FuncToMismatchCallsites.end() || It->second.empty())
+ return;
+ const auto &MismatchCallsites = It->second;
+ for (const auto &I : FS.getBodySamples()) {
+ if (MismatchCallsites.count(I.first))
+ FuncMismatchedCallsiteSamples += I.second.getSamples();
+ }
+
+ for (const auto &I : FS.getCallsiteSamples()) {
+ const auto &Loc = I.first;
+ if (MismatchCallsites.count(Loc)) {
+ for (const auto &CS : I.second)
+ FuncMismatchedCallsiteSamples += CS.second.getTotalSamples();
+ continue;
+ }
+
+ // count mismatched samples for inlined samples.
+ for (const auto &CS : I.second)
+ countMismatchedCallsiteSamples(CS.second, FuncToMismatchCallsites,
+ FuncMismatchedCallsiteSamples);
+ }
+}
+
+void SampleProfileMatcher::countMismatchedCallsites(
+ const Function &F,
+ StringMap<std::set<LineLocation>> &FuncToMismatchCallsites,
+ uint64_t &FuncProfiledCallsites, uint64_t &FuncMismatchedCallsites) const {
+ auto IR = FuncIRAnchors.find(&F);
+ auto P = FuncProfileAnchors.find(&F);
+ if (IR == FuncIRAnchors.end() || P == FuncProfileAnchors.end())
+ return;
+ const auto &IRAnchors = IR->second;
+ const auto &ProfileAnchors = P->second;
+
+ auto &MismatchCallsites =
+ FuncToMismatchCallsites[FunctionSamples::getCanonicalFnName(F.getName())];
+
+ // Check if there are any callsites in the profile that does not match to any
+ // IR callsites, those callsite samples will be discarded.
+ for (const auto &I : ProfileAnchors) {
+ const auto &Loc = I.first;
+ const auto &Callees = I.second;
+ assert(!Callees.empty() && "Callees should not be empty");
+
+ StringRef IRCalleeName;
+ const auto &IR = IRAnchors.find(Loc);
+ if (IR != IRAnchors.end())
+ IRCalleeName = IR->second;
+ bool CallsiteIsMatched = false;
+ // Since indirect call does not have CalleeName, check conservatively if
+ // callsite in the profile is a callsite location. This is to reduce num of
+ // false positive since otherwise all the indirect call samples will be
+ // reported as mismatching.
+ if (IRCalleeName == UnknownIndirectCallee)
+ CallsiteIsMatched = true;
+ else if (Callees.count(FunctionId(IRCalleeName)))
+ CallsiteIsMatched = true;
+
+ FuncProfiledCallsites++;
+ if (!CallsiteIsMatched) {
+ FuncMismatchedCallsites++;
+ MismatchCallsites.insert(Loc);
+ }
+ }
+}
+
+void SampleProfileMatcher::countMismatchedHashes(const Function &F,
+ const FunctionSamples &FS) {
+ if (!FunctionSamples::ProfileIsProbeBased)
+ return;
+ const auto *FuncDesc = ProbeManager->getDesc(F);
+ if (FuncDesc) {
+ if (ProbeManager->profileIsHashMismatched(*FuncDesc, FS)) {
+ NumMismatchedFuncHash++;
+ }
+ countMismatchedHashSamples(FS);
+ }
+}
+
+void SampleProfileMatcher::UpdateIRAnchors() {
+ for (auto &I : FuncIRAnchors) {
+ const auto *F = I.first;
+ auto &IRAnchors = I.second;
+ const auto Mapping =
+ FuncMappings.find(FunctionSamples::getCanonicalFnName(F->getName()));
+ if (Mapping == FuncMappings.end())
+ continue;
+ IRAnchorMap UpdatedIRAnchors;
+ const auto &LocToLocMapping = Mapping->second;
+ for (const auto L : LocToLocMapping) {
+ UpdatedIRAnchors[L.second] = IRAnchors[L.first];
+ IRAnchors.erase(L.first);
+ }
+
+ for (const auto &IR : UpdatedIRAnchors) {
+ IRAnchors[IR.first] = IR.second;
+ }
+ }
+}
+
+void SampleProfileMatcher::countProfileMismatches(bool IsPreMatch) {
+ if (!ReportProfileStaleness && !PersistProfileStaleness)
+ return;
+
+ if (!IsPreMatch) {
+ // Use the profile matching results to update to the IR anchors.
+ UpdateIRAnchors();
+ }
+
+ uint64_t UnusedCounter = 0;
+ uint64_t *TotalProfiledCallsitesPtr =
+ IsPreMatch ? &TotalProfiledCallsites : &UnusedCounter;
+ uint64_t *NumMismatchedCallsitesPtr =
+ IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites;
+ uint64_t *MismatchedCallsiteSamplesPtr =
+ IsPreMatch ? &MismatchedCallsiteSamples
+ : &PostMatchMismatchedCallsiteSamples;
+
+ auto SkipFunctionForReport = [](const Function &F) {
+ if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile"))
+ return true;
+ // Skip reporting the metrics for imported functions.
+ if (GlobalValue::isAvailableExternallyLinkage(F.getLinkage()))
+ return true;
+ return false;
+ };
+
+ StringMap<std::set<LineLocation>> FuncToMismatchCallsites;
+ for (const auto &F : M) {
+ if (SkipFunctionForReport(F))
+ continue;
+ const auto *FS = Reader.getSamplesFor(F);
+ if (FS && IsPreMatch) {
+ // Only count the total function metrics once in pre-match time.
+ TotalFuncHashSamples += FS->getTotalSamples();
+ TotalProfiledFunc++;
+ countMismatchedHashes(F, *FS);
+ }
+ countMismatchedCallsites(F, FuncToMismatchCallsites,
+ *TotalProfiledCallsitesPtr,
+ *NumMismatchedCallsitesPtr);
+ }
+
+ for (const auto &F : M) {
+ if (SkipFunctionForReport(F))
+ continue;
+ if (const auto *FS = Reader.getSamplesFor(F))
+ countMismatchedCallsiteSamples(*FS, FuncToMismatchCallsites,
+ *MismatchedCallsiteSamplesPtr);
+ }
+}
+
+void SampleProfileMatcher::runOnModule() {
+ ...
[truncated]
|
using IRAnchorMap = std::map<LineLocation, StringRef>; | ||
using ProfileAnchorMap = std::map<LineLocation, std::unordered_set<FunctionId>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As standalone types, add a comment explaining the key/value of the maps? Maybe just can move the original comments for these types here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also move this inside class SampleProfileMatcher
since these types are internal to the matcher.
// Anchors for IR. It's a map from IR location to callee name, callee name | ||
// is empty for non-call instruction and use a dummy | ||
// name(UnknownIndirectCallee) for unknown indrect callee name. | ||
auto IR = FuncIRAnchors.emplace(&F, IRAnchorMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit IR
naming can be confusing, this is an iterator, maybe It
? Same for other places.
|
||
if (SalvageStaleProfile) { | ||
runStaleProfileMatching(); | ||
countProfileMismatches(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the boolean param for countProfileMismatches
, move updateIRAnchors
here instead of within countProfileMismatches
to make it slightly more readable and simpler. countProfileMismatches
should just do counting as the name suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little tricky, the flag is also used to differentiate the counters to compute, basically we have two sets of counters, like PrematchXXX and PostMatchXXX. So here the flag is used like XXX = IsPreMatch? PrematchXXX: PostMatchXXX
See the code around line 2514:
...
uint64_t *NumMismatchedCallsitesPtr =
IsPreMatch ? &NumMismatchedCallsites : &PostMatchNumMismatchedCallsites;
...
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I noticed that. Was thinking these can be passed in as parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I noticed that. Was thinking these can be passed in as parameters.
Maybe we can put them into a class ProfileMatchStats, then pass PreMatchStats and PostMatchStats instances respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea.
@@ -2190,7 +2196,8 @@ void SampleProfileMatcher::findIRAnchors( | |||
} | |||
} | |||
|
|||
void SampleProfileMatcher::countMismatchedSamples(const FunctionSamples &FS) { | |||
void SampleProfileMatcher::countMismatchedHashSamples( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the renaming? I can find the name HashSamples
confusing... Including the original variable names
(there seems to be quite some renaming going on in the patch that makes reviewing a bit uneasy...)
} | ||
} | ||
|
||
void SampleProfileMatcher::countMismatchedHashes(const Function &F, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also confusing name, we are not counting number of hashes, right?
return; | ||
void SampleProfileMatcher::runStaleProfileMatching() { | ||
for (const auto &F : M) { | ||
if (F.isDeclaration() || !F.hasFnAttribute("use-sample-profile")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular check is duplicated now because we turned runOnFunction
into runOnModule
. Maybe make a small helper to centralize this check and keep them always in sync for the different function loops.
} | ||
} | ||
|
||
void SampleProfileMatcher::runOnModule() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need turn runOnFunctions
into runOnModule
? I understand that we now need to update anchor for reporting after matching, but what prevents us from doing that functions by function? Keep all anchors for all functions around also has a memory cost.
(Without that change and the renaming, or doing those separately, it'd be easier to review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's because we need to count the inlineed/nested callsite samples. (before we only count the top-level callsite samples)
e.g.
main : 1001:1
1:1
2: foo:1000:1
1: 1
bar: 999
Supposing the processing order is main --> foo --> bar
Before it only count for function main, and it will miss the bar's samples, because at the time, function bar is not processed, there is no anchors or any mismatch info.
Now I changed to find all the anchors and doing a mismatch computation, the results is saved into a map, then count the samples recursively(see countMismatchedCallsiteSamples
)
that's why each process it needs to "runOnModule", this indeed need a lot of new function name which is confusing.
@@ -2487,29 +2574,42 @@ void SampleProfileMatcher::runOnModule() { | |||
errs() << "(" << NumMismatchedCallsites << "/" << TotalProfiledCallsites | |||
<< ")" | |||
<< " of callsites' profile are invalid and " | |||
<< "(" << MismatchedCallsiteSamples << "/" << TotalCallsiteSamples | |||
<< "(" << MismatchedCallsiteSamples << "/" << TotalFuncHashSamples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are TotalCallsiteSamples and TotalFuncHashSamples all just refer to the total samples of all functions in the profile?
The name TotalFuncHashSamples
is confusing -- when we talk about samples, it should not have hash as entity, instead the entities should be function or call sites?
// reported as mismatching. | ||
if (IRCalleeName == UnknownIndirectCallee) | ||
CallsiteIsMatched = true; | ||
else if (Callees.count(FunctionId(IRCalleeName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not equivalent to the old code where we check there is only one callee. Is the change intentional?
Also removing getRepInFormat
changes the behavior if UseMD5
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sort of intentional, I found that there are several false positives related with this if we use the old condition(Callees.size() == 1
). From my cases, I observes like profile callsite anchors contains the IR call name but the size is not zero.
profile anchors 1:[foo, bar]
IR anchor 1:[foo]
It looks like a bug. Then again I was thinking maybe we don't need ensure the size, as long as it line up with with the sample loader
's lookup which just query the name even there are multiple entires, it should be good.
Regarding this weird "bug", I dug some cases, some are from the instable EH IR issue, I'm working on to fix it. We can then see if we need a strict condition for this.
} | ||
} | ||
|
||
void SampleProfileMatcher::UpdateIRAnchors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually makes those anchors incorrect, though these anchors are no longer useful anyways. It works but feels slightly hacky..
I would just slight tweak mismatch counting to go through LocToLocMap
and avoid doing this. Is the tweak going to look invasive?
Staleness is characteristic associated with a given profile and IR. There is no post-matching staleness. We can say the success rate of matching or renaming unmatched callsites/samples, but we don't want to use overloaded term post-matching staleness to describe it. |
Thanks so much for the feedback. Answering some high-level questions, will address others later. |
uint64_t MismatchedCallsiteSamples = 0; | ||
uint64_t NumMismatchedFuncHash = 0; | ||
uint64_t TotalProfiledFunc = 0; | ||
uint64_t MismatchedFuncHashSamples = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really just MismatchedFunctionSamples
, in contrast to TotalFunctionSamples
.
I know this isn't changed in this patch, but why did we use the term FuncHashSamples
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comment for those stats, particularly what is different between MismatchedCallsiteSamples
and MismatchedFuncHashSamples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to remove the "Hash"
void countMismatchedSamples(const FunctionSamples &FS); | ||
void countProfileMismatches( | ||
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors, | ||
const std::map<LineLocation, std::unordered_set<FunctionId>> | ||
&ProfileAnchors); | ||
void countMismatchedCallsites( | ||
const Function &F, const std::map<LineLocation, StringRef> &IRAnchors, | ||
const std::map<LineLocation, std::unordered_set<FunctionId>> | ||
&ProfileAnchors, | ||
const LocToLocMap &IRToProfileLocationMap); | ||
void countMismatchedCallsiteSamples(const FunctionSamples &FS); | ||
void countMismatchedCallsiteSamples(); | ||
void copyUnchangedCallsiteMismatches( | ||
const StringMap<std::set<LineLocation>> &InputMismatchedCallsites); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions needs some comments as they are somewhat similar but with subtle differences.
void SampleProfileMatcher::countProfileMismatches( | ||
const Function &F, const FunctionSamples &FS, | ||
const std::map<LineLocation, StringRef> &IRAnchors, | ||
void ProfileMatchStats::countMismatchedCallsites( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: count
is a stateless and immutable action, but it's not the case here as you are tracking and recording FuncMismatchedCallsites
that later stats depends on.
// reported as mismatching. | ||
if (IRCalleeName == SampleProfileMatcher::UnknownIndirectCallee) | ||
MatchedCallsites.insert(Loc); | ||
else if (Callees.count(getRepInFormat(IRCalleeName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment explain why we don't expect exactly one callee in profile when there is only one callee in IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is potentially a bug and this is workaround, at least leave a TODO comment. Preferably we should not need a workaround.
auto &MismatchedCallsites = | ||
FuncMismatchedCallsites[FunctionSamples::getCanonicalFnName(F.getName())]; | ||
|
||
auto MapIRLocToProfileLoc = [&](const LineLocation &IRLoc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where pre-match and post-match is automatically handled in unified way, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the benefit here is pre-match and post-match using the same functioncountMismatchedCallsites
, no special process to parsing the matching results. For pre-match, we just pass a empty IRToProfileLocationMap
and no changes to RunStaleProfileMatching
.
uint64_t TotalFunctionSamples = 0; | ||
|
||
// A map from function name to a set of mismatched callsite locations. | ||
StringMap<std::set<LineLocation>> FuncMismatchedCallsites; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this container being order sensitive, unordered_set would perform better for lookups.
@@ -433,6 +433,44 @@ using CandidateQueue = | |||
PriorityQueue<InlineCandidate, std::vector<InlineCandidate>, | |||
CandidateComparer>; | |||
|
|||
// Profile matching statstics. | |||
class ProfileMatchStats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly thinking about a simple struct to encapsulate different stats, while keeping most of the logic in the Matcher. But don't have strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that too until I worked with this StringMap<std::set<LineLocation>> FuncMismatchedCallsites;
which introduce a lot of actions making me think to put them into a new class. But as you suggested below, we can move FuncMismatchedCallsites
out of the stats, or even we don't need a new class.
PostMatchStats.copyUnchangedCallsiteMismatches( | ||
PreMatchStats.FuncMismatchedCallsites); | ||
PostMatchStats.countMismatchedCallsiteSamples(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two problems:
- We could end up just copying the same mismatched call sites and creating duplicates.
- The counting is somewhat spread out and tangled with matching -- there is
countProfileMismatches
andcountMismatchedCallsites
in function levelrunStaleProfileMatching
; then there is counting again here at module level.
How about we make these changes:
- Move FuncMismatchedCallsites out of stat class
- Make FuncMismatchedCallsites
StringMap<std::unordered_map<LineLocation,bool>>
with the boolean indicating whether the input mismatch was recovered by fuzzy matching. - During function level
runStaleProfileMatching
, we only populate and updateFuncMismatchedCallsites
, but not doing any stats. We do stats all at the end on module level.
This is try to create clearer separation between matching and stats, and also be consistent as to at which level stats are done (function or module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to understand this on high-level, I feel this is to compress as much as we can into a global structure(FuncMismatchedCallistes
) during function-level matching.
Comparing to my first version(using a global IR/Prof anchor map), this use a more compact structure. Comparing to my second(this) version, even move all the TotalProfiledFunction/ total Function Samples / total callsites
stats out of the matching.
That sounds good idea, just some minor thing I can think:
- For
StringMap<std::unordered_map<LineLocation,bool>>
, abool
might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch. - We need to save
TotalProfiledCallsite
, as we don't have the ProfAnchors for the counting at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For StringMap<std::unordered_map<LineLocation,bool>>, a bool might not be enough, the matching could work badly and introduce more mismatching. may need another bool to indicate it's a new mismatch.
Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.
We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.
Incrementing counter along with tracking FuncMismatchedCallsites
seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this? what is a new mismatch? My understanding is there is a set of mismatches from input, after fuzzy matching, we recovered some of the originally mismatched, though we don't know if the recovery is correct or not.
Yeah, It depends on which heuristic we use. Some anchors are aligned based on the previous matched anchors, there could be the original function is by luck matched but we run matching to "fix" it to a mismatched location. for example:
Before matching, supposing 5:E is matched by luck.
IR : 1:A, 2: B, 3:C, 4: D, 5:E
| | |
Profile : 3:A, 4: B, 5:E, 6:D, 7:E
After matching:
IR : 1:A, 2: B, 3:C, 4: D, 5:E
| | | | |
Profile : 3:A, 4: B, 5:E, 6:D, 8:E
However, after matching, the 5:E is synced to 3:c based on the offset (1:A --> 3:A), this is the new mismatch. So, that's why I was thinking to recompute/reuse the "counting" function from scratch, without relying on the matching heuristic.
We need to save TotalProfiledCallsite, as we don't have the ProfAnchors for the counting at the end.
Incrementing counter along with tracking
FuncMismatchedCallsites
seems fine?
OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.. Now I understand what new mismatch means -- thanks for the clarification.
There are two scenarios I think:
-
The original match by luck is actually incorrect too. In this case there is no new mismatch, but we are under-reporting input mismatches as there is no easy way to tell whether it's correct match or not. I seems the example you have is in this category. I tend to think we just need to live with it?
-
The original match by luck is correct. In this case should we try to make sure the matching algorithm doesn't introduce new mismatches. Or at least make sure such case is very very unlikely to happen? Does this actually happen and would Meyers diffing have the same issue of introducing new mismatches?
Regardless if we really want consider such cases, maybe StringMap<std::unordered_map<LineLocation, MismatchState>>
with MismatchState
containing one bit for input mismatch, and one bit for final mismatch is all we need?
If we really want to shoot for best/most accurate metric, lucky but incorrect original match (like 1) should be counted as mismatch. But it's probably not worth the complexity to do that?
OK, that seems not obey the rule to move "all" the stats out of function-level. No strong option on this.
I guess strictly speaking you are right. But it still moves most of the non-trivial counting out of matching logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new version. The changes are:
Most of the counting part are moved out of the function-level matching, and moved into a new function computeAndReportProfileStaleness
, which is a module-level function. It includes all the counts about function checksum mismatch and callsite mismatch, and both weighted(samples) and unweighted mismatches.
Extended the (the old FuncMismatchedCallsites) to StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;
, use a enum MatchState to present the match state. one reason is now we can compress the struture as we discussed before. Another reason is we have to move total profiled function metrics
or total profiled callsites
out into computeAndReportProfileStaleness
. it's not easy /clean to compute the two metrics in the old way because it's only under non-imported function, however, for match states, we need to count the imported function too. So to achieve this, we need a "Matched" callsite state used to count the original profiled function.
380b662
to
6c9f4fc
Compare
@@ -690,6 +715,10 @@ void SampleProfileLoaderBaseImpl<Function>::computeDominanceAndLoopInfo( | |||
} | |||
} // namespace llvm | |||
|
|||
bool ShouldSkipProfileLoading(const Function &F) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- make this static -- other free functions specific to sample loader are static.
- if make this into a helper function, all similar code should be replaced. there are 5+ instance but only one is replaced in this change.
- function name starts with lower case
- nit: name it
skipProfileForFunction
oruseProfileForFunction
/// Profile mismatch statstics: | ||
uint64_t TotalProfiledFunc = 0; | ||
// Num of function whose checksum is mismatched. | ||
uint64_t NumMismatchedFunc = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we categorize them as NumStaleProfileFunc
(has profile with same name but CFG checksum mismatches) and NumNoProfileFunc
(no profile for the function at all, but we may recover a profile after we handle function renaming)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we categorize them as
NumStaleProfileFunc
(has profile with same name but CFG checksum mismatches) andNumNoProfileFunc
(no profile for the function at all, but we may recover a profile after we handle function renaming)?
Sounds good (IIUC, this will add a new stats NumNoProfileFunc
, this way we will have 3 function number related stats: TotalProfiledFunc
, NumStaleProfileFunc
, NumNoProfileFunc
. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can defer that until when we handle function renaming and do call graph level matching -- that's where we need to know how many of the no profile functions we are able to recover profile for them.
uint64_t TotalProfiledCallsites = 0; | ||
uint64_t NumMismatchedCallsites = 0; | ||
uint64_t NumRecoveredCallsites = 0; | ||
|
||
/// Weigted profile samples mismatch statstics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why ///
which is inconsistent with others. The comment is also not clear - was expecting something explaining TotalFunctionSamples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why
///
which is inconsistent with others. The comment is also not clear - was expecting something explainingTotalFunctionSamples
.
Ah, ok.. I want to comment for a group of counters.. hmm, seems unnecessary, will remove the comments
|
||
/// Weigted profile samples mismatch statstics: | ||
uint64_t TotalFunctionSamples = 0; | ||
// Samples for the mismatched checksum functions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatical error: Samples for the mismatched checksum functions:
-> Total samples for all checksum-mismatched functions
// TODO : Ideally, we should ensure it's a direct callsite location(Callees | ||
// size is 1). However, there may be a bug for profile merge(like ODR | ||
// violation) that causes the callees size to be more than 1. After we fix | ||
// the bug, we can remove this check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you found out why? This particular change/workaround seems unrelated to the main purpose of this patch. How about let's either 1) root cause and fix this or 2) not change it, not adding the workaround here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed there are many cases and could be multiple reasons, I haven't root caused them. For some cases, I can guess it's likely due to unstable IR or ODR violation.
The linker deduplicate the different versions for the outlined function but not the inlined function. e.g.
bar:100:100
1: foo.1
1:A
2:B
baz:110:110
2: foo.2
2:A
3:B
When the two foo
s are merged/flattened, for callsite 2 , the profile anchors are 2:[A, B] but we can only see one function(either A or B) in the IR
As we could have multiple reasons for ODR violation, it may not be quick to fix it. moreover, this is no a problem for sample loader, it will load the profile the look-up matched. Anyway, going to remove the workaround, it should not a very big false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, going to remove the workaround, it should not a very big false positive.
Ok, if we end up needing this workaround for driving fall positive staleness % to zero, we can do a separate patch for it. It just feel like doesn't belong to this patch. :)
}; | ||
|
||
for (const auto &I : FS.getBodySamples()) | ||
CountSamples(I.first, I.second.getSamples()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for counting not inlined callsites? I thought callsite samples kinda implies inlined ones..
Yep, from my observation, inlined and non-inlined callsites samples are seperated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is callsite samples in the context of sample PGO in LLVM particularly means samples for inlinees. :) (FunctionSamples::CallsiteSamples
)
But here we are counting both inlined and not inlined samples as call site samples.. That is probably fine, but maybe worth a comment?
return It->second == MatchState::Mismatched; | ||
}; | ||
|
||
auto CountSamples = [&](const LineLocation &Loc, uint64_t Samples) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: CountSamples
-> AttributeMismatchedSamples
this is not counting call site samples, but rather distribute/attribute already counted samples into different category.
CountSamples(I.first, I.second.getSamples()); | ||
|
||
for (const auto &I : FS.getCallsiteSamples()) { | ||
uint64_t Samples = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Samples
-> CallsiteSamples
CountSamples(I.first, Samples); | ||
|
||
if (IsCallsiteMismatched(I.first)) | ||
continue; | ||
|
||
// Count mismatched samples for matched inlines. | ||
for (const auto &CS : I.second) | ||
countMismatchedCallsiteSamples(CS.second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest an alternative structure to make the flow more intuitive:
if (IsCallsiteMismatched(I.first)) {
// comment
for (const auto &CS : I.second)
Samples += CS.second.getTotalSamples();
AttributeMismatchedSamples(I.first, Samples);
} else {
// When the current level of inlined call site matches the profiled call site, we need to go deeper along the inline tree to count mismatches from lower level inlinees.
for (const auto &CS : I.second)
countMismatchedCallsiteSamples(CS.second);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky.
If a mismtched callsite is recovered and marked as recovered, it still goes through( AttributeMismatchedSamples
to count the recovered samples but the IsCallsiteMismatched
return false(needs to go counting it's inlinees(for (const auto &CS : I.second) countMismatchedCallsiteSamples(CS.second);
) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think we need a clearer flow. Currently CountSamples
checks match state and do something, later IsCallsiteMismatched
checks match state again and do something else. It's all tangled together.
It'd be clearer if we can hoist the flow on checking match state, and then group together what needs to be happen to each match state. I think it's still doable even with what you mentioned above. Make sense?
const auto *FuncDesc = ProbeManager->getDesc(F); | ||
if (FuncDesc && ProbeManager->profileIsHashMismatched(*FuncDesc, *FS)) | ||
NumMismatchedFunc++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not needed: 1) they are handled inside countMismatchedFuncSamples
, 2) for consistency, countMismatchCallsites
/countMismatchedCallsiteSamples
don't have mismatch check at top level caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version indeed looks a lot cleaner. Thanks for working through the changes.
@@ -2498,25 +2607,41 @@ void SampleProfileMatcher::runOnModule() { | |||
|
|||
SmallVector<std::pair<StringRef, uint64_t>> ProfStatsVec; | |||
if (FunctionSamples::ProfileIsProbeBased) { | |||
ProfStatsVec.emplace_back("NumMismatchedFuncHash", NumMismatchedFuncHash); | |||
ProfStatsVec.emplace_back("NumStaleProfileFunc", NumStaleProfileFunc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing those strings require changing how we aggregates stats internally for dashboard etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the logging is fine, we aggregate by the key without checking a specific stats, so it shouldn't break our internal build. Just for some offline tools, the last step to show the percentage things would be error, and hopefully caught by the exception. I will make some updates to those usages.
We want to make sure there is continuity in staleness data, and we should not lose data before the name changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits.
Matched = 0, | ||
Mismatched = 1, | ||
// Stay Matched after profile matching. | ||
StayMatched = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor inconsistency here as to what we want to cover in the state enum -- Recovered is a state change, StayMatched is a non-change. Do we describe state change or just state in the enum?
If we just describe state:
InitialMatch,
InitialMismatch,
FinalMatch,
FinalMismatch
If we describe state change
InitialMatch,
InitialMismatch,
RecoveredMismatch, (From initial mismatch to final match)
RemovedMatch (from initial match to final mismatch)
It'd be great if this can be improved, but not a deal breaker if the current state values are very convenient for implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I lean towards the second one(state change), thinking it can align with the name of the stats counter such as NumRecoveredCallsites
, RecoveredCallsiteSamples
Updated in the new version, thanks for the suggestion!
// For each function, store every callsite and its matching state into this | ||
// map, of which each entry is a pair of callsite location and MatchState. | ||
// This is used for profile staleness computation and report. | ||
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that StringMap keeps a copy of every string key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If we have the memory pressure, another option is to consolidate the two StringMap
StringMap<LocToLocMap> FuncMappings;
StringMap<std::unordered_map<LineLocation, MatchState, LineLocationHash>> FuncCallsiteMatchStates;
Into one map, like
StringMap<std::pair<LocToLocMap, std::unordered_map<LineLocation, MatchState, LineLocationHash>>> FuncXXX...
the Key of LocToLocMap
is also a LineLocation, I was thinking to further merge them
StringMap<std::unordered_map<LineLocation, std::pair<LineLocation, MatchState>, LineLocationHash>>> FuncXXX...
but unfortunately the key of LocToLocMap
is IRlocation
but the latter's key is the ProfileLocation
, so merging them might be too hacky and confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that StringMap keeps a copy of every string key.
I think one optimization is we can release this data after the matching is done, we don't use those data for the later sample inlining or loading stuffs, similar to the StringMap<LocToLocMap> FuncMappings;
.
RecoveredMismatch = 2, | ||
// From initial match to final mismatch. | ||
RemovedMatch = 3, | ||
Unknown = 32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we use 0 as uninitialized/unknown value. This is because for zero initialized memory, this naturally sets everything to unknown.
|
||
if (IsCallsiteMatched) { | ||
auto R = | ||
CallsiteMatchStates.emplace(ProfileLoc, MatchState::InitialMatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CallsiteMatchStates[ProfileLoc]
already exist, setting it to InitialMatch
seems conceptually wrong as this is probably post-match (FinalMatch)? With state like InitialMatch
, I thought we will not set such state in post-match stage.
Maybe for better readability, do a find first and then decide what value to set like how this is done in the for loop below.
if (It == CallsiteMatchStates.end()) | ||
CallsiteMatchStates.emplace(Loc, MatchState::InitialMismatch); | ||
// The inital match is removed in post-match. | ||
else if (IsPostMatch && It->second == MatchState::InitialMatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confused here, if there was an entry and it was matched in pre-match, it won't be changed in post-match if it still matches, why do we change it to RemovedMatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, Good catch, this is wrong here, it's update all InitialMatch to RemovedMatch regardless it's matched or not.
The fact is that we don't have a FinalMatch
state in current implementation, so we are not able to determine whether it's matched or not in post-match in this loop.
But we also can't only use the FinalMatch
and FinalMismatch
, because the counting is at the end of the function, then we don't know the recovered numbers(which is this diff's goal)
It think it's clear that in pre-match we can have 'InitialMatch' and 'IntialMismatch'
but in post match, I think we need 4 states, but that make it more complicated..
- 'InitialMatch' stay match, to
UnchangedMatch
- 'IntialMismatch' stay mismatch, to
UnchangedMismatch
- 'IntialMismatch' is recovered. to
RecoveredMismatch
- 'InitialMatch' is removed. to
RemovedMatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in total there would be 6 states, the InitialMatch
and InitialMismatch
is only used for transition. In the counting
function, we only do counting on the 4 post-match states. UnchangedMismatch
+ RecoveredMismatch
is the original mismatch,RemovedMatch
+ UnchangedMismatch
is the final mismatch, RecoveredMismatch
is the recovered mismatch.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 explicit state sounds ok with me, that should make things really clear?
I'd add comment for each of the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 explicit state sounds ok with me, that should make things really clear?
yeah, I feel it's clearer, please take another look.
Also the fact that this was wrong and tests all passed means that the tests are probably not strong enough.
Uh, I see, I checked that it's due to the tests are ether all matched cases or all mismatched cases. Updated the test.
be08c85
to
7cdcff6
Compare
uint64_t &FuncMismatchedCallsites, uint64_t &FuncProfiledCallsites); | ||
const LocToLocMap *IRToProfileLocationMap); | ||
|
||
bool isMismatchState(const enum MatchState &State) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called during counting (post-match), so we should not expect any Initial* state.
Can we assert somewhere during counting that we do not have any initial* state? If we have that means recordCallsiteMatchStates
may have a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that we still can get the Initial* state; because currently the fuzzy matching is only on when there is checksum mismatch, but the counting and reporting is for all non-import function, so for those checksum matched function, their state are on the Initial* state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If checksum actually matches, do we expect any InitialMismatch from such functions?
Or in other words, if checksum matches, there is no state change to track, do we actually still need to run recordCallsiteMatchStates
? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If checksum actually matches, do we expect any InitialMismatch from such functions?
Or in other words, if checksum matches, there is no state change to track, do we actually still need to run
recordCallsiteMatchStates
? Can we just count how many samples, callsites and callsite samples in those functions and assume they all match?
It could happen, I saw in our dashboard, even the checksum mismatched samples is zero, there are a few numbers of callsite mismatches(e.g. Stale function samples: 0.00%(773327/65978370982), Stale functions: 0.01%(6/42787) , Stale callsite samples: 0.29%(194149265/65978370982), Stale callsite samples after matching: 0.29%(194318114/65978370982), Stale callsites: 5.95%(20780/349307) , Stale callsites after matching: 5.95%(20781/349307))
I checked that it's mostly from the ctor alias optimization, it's one kind of "function-rename"(in this case it's not from user change), on the other hand, the function-renaming from user change is also in this category. Recording those mismatches on checksum matched function can help detect this issue.
Though this ctor alias functions are all very cold functions, it's maybe fine to ignore this if this "false positives" (this is more like a future support) is noisy.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?
I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.
It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand this, so this is because we compute hash on CFG (ignoring callee names). If a callee is renamed, even if checksum matches for the caller, we can have call site mismatch due to callee renaming (without caller CFG change). Is that the issue?
Exactly.
I was hoping that we don't need to pay the cost of recording and computing stats if there is no mismatch. But if that's not possible, current change is fine.
It begs another questions though... should checksum actually capture callee renaming? like if callee renamed, should caller's checksum change?
This is a good question. We do use some callsite info for computing checksum: the total number of callsites. If we add/remove one more callsite, it's still captured as checksum changes. so it's not strict enough.
If we really want to capture the name changes, we can encode the name of callsite into the checksum. Then again, similar to the "unstable IR" issue, there could be mismatches from internal compiler but not from user code. This ctor alias is one case. We will run stale matching for this kind of changes. That said, it's doable, just we needs more changes to recognize : "this is not from user renaming, let's only run renaming opt but not fuzzy matching opt". the current fuzzy matching may not work well on this ctor alias cases. We need more check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue is then we will have the false positives from this ctor alias top if we make a strict checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was thinking about capturing callee names in checksum. But we don't have to decide now. Let's deal with that when we work on handling renames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no other unresolved comments, I'm going to push this change. I really appreciate all the discussions and suggestions!
Pushed a new version: |
This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching.
Some implementation notes:
NumRecoveredCallsites
,RecoveredCallsiteSamples
for this and removedTotalCallsiteSamples
as now the we can use theTotalFuncHashSamples
as base, and renamed some counters.MatchState
for the state, and used a new functionrecordCallsiteMatchStates
to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into aFuncCallsiteMatchStates
map for later counting use.computeAndReportProfileStaleness
). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.