Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT][NFC] Unify two symbol table iterations #90724

Closed
wants to merge 2 commits into from

Conversation

corona10
Copy link
Contributor

@corona10 corona10 commented May 1, 2024

Simple refactoring which was suggested from #90661 (comment)

@llvmbot
Copy link
Collaborator

llvmbot commented May 1, 2024

@llvm/pr-subscribers-bolt

Author: Donghee Na (corona10)

Changes

Simple refactoring which was suggested from #90661 (comment)


Full diff: https://github.com/llvm/llvm-project/pull/90724.diff

1 Files Affected:

  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+21-21)
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 23f79e3c135a78..2e91856c0ebfe6 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -820,7 +820,26 @@ void RewriteInstance::discoverFileObjects() {
       return std::hash<decltype(DataRefImpl::p)>{}(S.getRawDataRefImpl().p);
     }
   };
+
+  // Sort symbols in the file by value. Ignore symbols from non-allocatable
+  // sections. We memoize getAddress(), as it has rather high overhead.
+  struct SymbolInfo {
+    uint64_t Address;
+    SymbolRef Symbol;
+  };
+  auto isSymbolInMemory = [this](const SymbolRef &Sym) {
+    if (cantFail(Sym.getType()) == SymbolRef::ST_File)
+      return false;
+    if (cantFail(Sym.getFlags()) & SymbolRef::SF_Absolute)
+      return true;
+    if (cantFail(Sym.getFlags()) & SymbolRef::SF_Undefined)
+      return false;
+    BinarySection Section(*BC, *cantFail(Sym.getSection()));
+    return Section.isAllocatable();
+  };
+  std::vector<SymbolInfo> SortedSymbols;
   std::unordered_map<SymbolRef, StringRef, SymbolRefHash> SymbolToFileName;
+
   for (const ELFSymbolRef &Symbol : InputFile->symbols()) {
     Expected<StringRef> NameOrError = Symbol.getName();
     if (NameOrError && NameOrError->starts_with("__asan_init")) {
@@ -835,6 +854,8 @@ void RewriteInstance::discoverFileObjects() {
              "support. Cannot optimize.\n";
       exit(1);
     }
+    if (isSymbolInMemory(Symbol))
+      SortedSymbols.push_back({cantFail(Symbol.getAddress()), Symbol});
 
     if (cantFail(Symbol.getFlags()) & SymbolRef::SF_Undefined)
       continue;
@@ -856,27 +877,6 @@ void RewriteInstance::discoverFileObjects() {
       SymbolToFileName[Symbol] = FileSymbolName;
   }
 
-  // Sort symbols in the file by value. Ignore symbols from non-allocatable
-  // sections. We memoize getAddress(), as it has rather high overhead.
-  struct SymbolInfo {
-    uint64_t Address;
-    SymbolRef Symbol;
-  };
-  std::vector<SymbolInfo> SortedSymbols;
-  auto isSymbolInMemory = [this](const SymbolRef &Sym) {
-    if (cantFail(Sym.getType()) == SymbolRef::ST_File)
-      return false;
-    if (cantFail(Sym.getFlags()) & SymbolRef::SF_Absolute)
-      return true;
-    if (cantFail(Sym.getFlags()) & SymbolRef::SF_Undefined)
-      return false;
-    BinarySection Section(*BC, *cantFail(Sym.getSection()));
-    return Section.isAllocatable();
-  };
-  for (const SymbolRef &Symbol : InputFile->symbols())
-    if (isSymbolInMemory(Symbol))
-      SortedSymbols.push_back({cantFail(Symbol.getAddress()), Symbol});
-
   auto CompareSymbols = [this](const SymbolInfo &A, const SymbolInfo &B) {
     if (A.Address != B.Address)
       return A.Address < B.Address;

@corona10
Copy link
Contributor Author

corona10 commented May 1, 2024

I checked the build success on my local machine but not familiar to run the BOLT only test, is there any guide exists?

@corona10
Copy link
Contributor Author

corona10 commented May 1, 2024

I checked the build success on my local machine but not familiar to run the BOLT only test, is there any guide exists?

Okay ninja check-bolt works.

@aaupov
Copy link
Contributor

aaupov commented May 1, 2024

Thanks Donghee! Looks good, but please retitle to "[BOLT][NFC] Unify two symbol table iterations".
One way to estimate the effect (in seconds) would be to exit(0) just above auto CompareSymbols and measure processing time before and after this fix. Can you please check with a large binary and add the information to the PR message?

@corona10 corona10 changed the title [BOLT] Unify two symbol lookup interation into the one [BOLT][NFC] Unify two symbol table iterations May 2, 2024
@corona10
Copy link
Contributor Author

corona10 commented May 2, 2024

Can you please check with a large binary and add the information to the PR message?

Yeah, I will share the info soon :)

@corona10
Copy link
Contributor Author

corona10 commented May 2, 2024

@aaupov
For the CPython build(11083 symbols): 4693µs -> 4194µs (Constantly enhanced) from my environment with handy chrono-based measurement.
But when considering the possibility of noise, I am setting up the clang build.

@corona10
Copy link
Contributor Author

corona10 commented May 2, 2024

@aaupov
Copy link
Contributor

aaupov commented May 2, 2024

https://github.com/llvm/llvm-project/blob/main/bolt/docs/OptimizingClang.md

Is this still recommened setup?

The guide covers more than what we need in this case. You can just run on BOLT on any Clang binary, for instance, you can try BOLT instrumentation:

llvm-bolt clang-18 -instrument -o clang-18.bolt.inst

@corona10
Copy link
Contributor Author

corona10 commented May 2, 2024

Hmm, my system clang-18 and pre-built clang binary from llvm-release page is not able to be processed because of the compile option (in my guess)

BOLT-ERROR: instrumentation runtime libraries require relocation

@aaupov
Copy link
Contributor

aaupov commented May 2, 2024

Hmm, my system clang-18 and pre-built clang binary from llvm-release page is not able to be processed because of the compile option (in my guess)

BOLT-ERROR: instrumentation runtime libraries require relocation

Sorry, yes, instrumentation requires relocations in the input binary. I've checked that rewriting in -lite=0 mode works:

llvm-bolt clang-18 -lite=0 -o /dev/null -time-rewrite

@corona10
Copy link
Contributor Author

corona10 commented May 3, 2024

There was a binary issue from Fedora 39 and I tested with release pre-built binary from GitHub that given more proper information
But anyway, for me, it's noisy level changes.

CPU: AMD Ryzen™ 7 8845HS
Memory: 32gb

AS-IS
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0789 ( 21.2%) 0.0063 ( 2.1%) 0.0851 ( 12.6%) 0.0854 ( 12.6%) discover file objects

TO-BE
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0804 ( 20.3%) 0.0046 ( 1.6%) 0.0850 ( 12.3%) 0.0853 ( 12.3%) discover file objects

@aaupov
Copy link
Contributor

aaupov commented May 3, 2024

Let me test it with a large binary.

@aaupov
Copy link
Contributor

aaupov commented May 5, 2024

To isolate the effect of the change, I added exit(0) here just after the second loop:

for (const SymbolRef &Symbol : InputFile->symbols())
if (isSymbolInMemory(Symbol))
SortedSymbols.push_back({cantFail(Symbol.getAddress()), Symbol});
auto CompareSymbols = [this](const SymbolInfo &A, const SymbolInfo &B) {

For chromium 5735 with 1.2M symbols on Intel ADL i7-12700K the effect is the following (warmup=3, 10 runs):
before:

  Time (mean ± σ):     400.6 ms ±   1.0 ms    [User: 354.3 ms, System: 46.2 ms]
  Range (min … max):   398.8 ms … 402.3 ms    10 runs

after:

  Time (mean ± σ):     415.5 ms ±   0.7 ms    [User: 362.4 ms, System: 52.8 ms]
  Range (min … max):   414.7 ms … 416.6 ms    10 runs

So merging loops has an opposite effect to what I had hoped. Thanks for putting the effort nonetheless!

@corona10
Copy link
Contributor Author

corona10 commented May 5, 2024

For chromium 5735 with 1.2M symbols

Next time I will consider this binary for the benchmark.
Yeah it's slowing down sadly :)

@corona10 corona10 closed this May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants