-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT][NFCI] Use FileSymbols for local symbol disambiguation #89088
base: main
Are you sure you want to change the base?
Conversation
62c18d7
to
8987eb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the tradeoff here? Did we try on large binaries? It looks like you're trading O(1)
'ish access for potentially O(N)
. Can this ever be a problem?
In discoverFileObjects, replace mapping from every local symbol to associated file name with a vector of symbol data for FILE symbols only. This cuts down on memory needed to resolve local file names. Test Plan: NFC
The tradeoff is:
For reference, in one of our medium-sized test binaries we have:
So space savings are quite significant, we gain storage locality, and an extra per-symbol cost is reasonable. If your concern is about Symbol->getName cost, we can memorize FileName and store pairs of symbol data (8b) and StringRef (16b) instead, reducing the per-symbol cost to just O(log FileSyms). The switch to storing FILE symbols is a prerequisite for using the symbol table information for split fragment matching. |
Can we try on large binaries and try to have an upper bound on the size? I am not against this I just want to make sure we don't regress time dramatically.
Thanks for the context. Feel free to add more motivation to your NFC patches/commits, in particular if they're in preparation for something bigger. |
For one large prod binary, we have the following:
The space tradeoff becomes:
Done. |
What's the BOLT processing time before/after the change? |
It's a drop in the bucket, processing time- and peak memory- wise (~1/1000). Treat this diff as a data structure change to facilitate new use case (which will bring memory consumption back up). |
Would be useful, before merging, to see the whole set of patches. The difficulty with tiny NFC patches is that they result in larger picture to be lost, sometimes. |
@llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) ChangesIn discoverFileObjects, replace mapping from every local symbol to Test Plan: NFC Full diff: https://github.com/llvm/llvm-project/pull/89088.diff 1 Files Affected:
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 329649c1ca67c5b..8578c2c6c8edcc6 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -813,14 +813,8 @@ void RewriteInstance::discoverFileObjects() {
// For local symbols we want to keep track of associated FILE symbol name for
// disambiguation by combined name.
- StringRef FileSymbolName;
bool SeenFileName = false;
- struct SymbolRefHash {
- size_t operator()(SymbolRef const &S) const {
- return std::hash<decltype(DataRefImpl::p)>{}(S.getRawDataRefImpl().p);
- }
- };
- std::unordered_map<SymbolRef, StringRef, SymbolRefHash> SymbolToFileName;
+ std::vector<std::pair<DataRefImpl, StringRef>> FileSymbols;
for (const ELFSymbolRef &Symbol : InputFile->symbols()) {
Expected<StringRef> NameOrError = Symbol.getName();
if (NameOrError && NameOrError->starts_with("__asan_init")) {
@@ -847,13 +841,10 @@ void RewriteInstance::discoverFileObjects() {
// and this uncertainty is causing havoc in function name matching.
if (Name == "ld-temp.o")
continue;
- FileSymbolName = Name;
+ FileSymbols.emplace_back(Symbol.getRawDataRefImpl(), Name);
SeenFileName = true;
continue;
}
- if (!FileSymbolName.empty() &&
- !(cantFail(Symbol.getFlags()) & SymbolRef::SF_Global))
- SymbolToFileName[Symbol] = FileSymbolName;
}
// Sort symbols in the file by value. Ignore symbols from non-allocatable
@@ -1028,14 +1019,18 @@ void RewriteInstance::discoverFileObjects() {
// The <id> field is used for disambiguation of local symbols since there
// could be identical function names coming from identical file names
// (e.g. from different directories).
- std::string AltPrefix;
- auto SFI = SymbolToFileName.find(Symbol);
- if (SymbolType == SymbolRef::ST_Function && SFI != SymbolToFileName.end())
- AltPrefix = Name + "/" + std::string(SFI->second);
+ auto CompareSymsByIdx = [](const std::pair<DataRefImpl, StringRef> &A,
+ const std::pair<DataRefImpl, StringRef> &B) {
+ return A.first.d.b < B.first.d.b;
+ };
+ DataRefImpl SymDataRef = Symbol.getRawDataRefImpl();
+ auto SFI = llvm::upper_bound(FileSymbols,
+ std::make_pair(SymDataRef, StringRef()),
+ CompareSymsByIdx);
+ if (SymbolType == SymbolRef::ST_Function && SFI != FileSymbols.begin())
+ AlternativeName = NR.uniquify(Name + "/" + SFI[-1].second.str());
UniqueName = NR.uniquify(Name);
- if (!AltPrefix.empty())
- AlternativeName = NR.uniquify(AltPrefix);
}
uint64_t SymbolSize = ELFSymbolRef(Symbol).getSize();
|
For chromium 5735 binary with ~1.2M symbols, discover file objects wall time:
|
Remove SymbolToFileName mapping from every local symbol to its
containing FILE symbol name, and reuse FileSymbols to disambiguate
local symbols instead.
Also removes the check for
ld-temp.o
file symbol which was added toprevent LTO build mode from affecting the disambiguated name. This may
cause incompatibility when using the profile collected on a binary built
in a different mode than the input binary.
Addresses #90661.