[BOLT] [NFC] Remove special DWARF expressions handling from LLVM #196

rafaelauler · 2021-08-05T19:45:49Z

Summary:
This is aimed at reducing our changes to LLVM. We originally
added support to LLVM MC layer to be able to encode DWARF expressions
out of an intermediate representation in memory that is based on
DWARFDebugFrame parsing of these expressions. However, this is an
overkill and BOLT does not need the ability to re-encode DWARF
expressions. So, for now, let's remove this support and make BOLT read
DWARF expressions as a blob of data, and write it back to the binary
using cfi_escape, the same method that some LLVM backends (AArch64)
use to encode DWARF expressions into CFIs when they need them.

We are still left with a one line change to LLVM libraries, to allow
us to read the raw bytes of the DWARF expression out of
DWARFDebugFrame parsing structures, so we can write them back with
cfi_escape.

Summary: The interface is no longer in use. (cherry picked from commit 460bdaaf5c48910d228d17211548bdc410f89d44)

(cherry picked from commit b392da1fbd055a555ef0fab3f943bbf214ace0ee)

(cherry picked from commit 0290dbaa15546fafa27a9a17240f0da4218299ce)

Summary: Shrink wrapping has a mode where it will directly move push pop pairs, instead of replacing them with stores/loads. This is an ambitious mode that is triggered sometimes, but whenever matching with a push, it would operate with the assumption that the restoring instruction was a pop, not a load, otherwise it would assert. Fix this assertion to bail nicely back to non-pushpop mode (use regular store and load instructions). (cherry picked from commit 2ff448dee7461b1b31ae06837e8d859b77985e6b)

(cherry picked from commit d4667d4e3eb21f481d2fd8df30d1be160fa0f646)

Summary: Fix begin decrementing. (cherry picked from commit 664a2dbdae4741bb4a6048c9bb5de6e262708845)

Summary: The parameter is no longer used. (cherry picked from commit 7ac00974b07895aeb9ad53f5f417119c8b15c7e5)

Summary: The option is not used. Remove all related code. (cherry picked from commit ad05832b1e039215512df7a5e652931210963d6a)

Summary: Temporarily mark functions containing data as non-simple. (cherry picked from commit 0e099b021028f71c447adc3d531b992313d2ae76)

Summary: 1. Uniquify names of local symbols. 2. Handle aliases. (cherry picked from commit ced7da5591a5e688f4725d872355feae4800bc4d)

Summary: There is no need to treat the emission of the original `.eh_frame` section as a special case. (cherry picked from commit 79adc11ef27b9a15aa417e0d9ae8581a664dd2ab)

Summary: This is a prerequisite for larger emitter refactoring. Since .dynamic is read unconditionally, add an error message if the section is missing, or the size of the section is zero. (cherry picked from commit 70c0e23504ef8463ef15950690f458affc6ba511)

Summary: Consolidate code and data emission code in ELF-independent BinaryEmitter. The high-level interface includes only two functions emitBinaryContext() and emitFunctionBody() used by RewriteInstance and BinaryContext respectively. (cherry picked from commit 55c0e69dfd6ecf65c9d93340347f52e4751210bc)

(cherry picked from commit bd7ce3bc754556a7d4a31bc626f57148ffb89be0)

Summary: Make ELF symbol table rewriting code more structured. While at it, remove symbols from non-allocatable sections. (cherry picked from commit 751a3b565878d16cd124dfddfb07baa3fddb03a4)

Summary: The version of LLVM that we are based on lacks the support for base address in DWARF location lists. Add the missing pieces. (cherry picked from commit 39a52458e17adcdc8e718efbc690f469a5fdc44a)

Summary: Some functions may have exactly the same code and exception handlers. However, their action tables could be different leading to mismatching semantics. We should verify their equivalence while running ICF. (cherry picked from commit 2449406494a66fbe4995f92747795fc130ae7907)

Summary: ICF may fold functions in arbitrary order when running multi-threaded. This is fine in relocation mode as we end up with just one function holding all function symbols. However, in non-relocation mode we keep all function bodies, and if we keep merging profiles in non-deterministic order, we end up with functions with non deterministic profiles. The fix for non-relocation mode is to not merge profiles as the factual new profile could be different from the merged one since both function instances are potentially callable. Additionally, emit extra symbols for ICF functions in non-relocation mode to make it possible to track the folding. (cherry picked from commit 201427172392b8fe17ac91d09a3b1f9db5a38e26)

Summary: Too many hash collisions may cause ICF to run slowly. We used to hash BinaryFunction only looking at instruction opcodes, ignoring instruction operands. With many almost identical functions, such approach may lead to long ICF processing time. By including operands into the hash, we reduce the number of collisions and improve the runtime often by a factor of 2 or more. (cherry picked from commit d425f3c272fdf5f15e68d62e6f4880b976ed58e6)

Summary: Further speedup ICF by applying stricter rules for congruent functions. While checking symbolic operands in congruent functions, consider operands congruent only if they are equal or reference functions with identical hashes, i.e. potentially foldable functions. Note that jump table operands are handled as a special case. (cherry picked from commit 03bca75462804d9ab12b64530e55ca376e0fef24)

Summary: Indirect calls that use RSP to compute the target address would break in instrumentation mode because we were adding instructions that changed the stack pointer. Fix this. (cherry picked from commit 7815cc30c420107cfeee77d42b2acdeb0bca1035)

Summary: RuntimeDyldImpl::resolveExternalSymbols() some time ago used to call getSymbolAddress() while in the second loop. That call could have modified the contents of ExternalSymbolRelocations that the loop was iterating over. Thus the code was written in a way that erased the processed entry on every loop iteration and reset the map iterator. With large number of entries in ExternalSymbolRelocations the loop code becomes a performance bottleneck. Since getSymbolAddress() is no longer used, the ExternalSymbolRelocations could be iterated in a straightforward way and the map cleared before the function exit. (cherry picked from commit 55ed8de2b3fe59ac84eef9fbb341185b6b1763dc)

Summary: In a rare case, we may fold a function and fail to emit it in non-relocation mode due to a function size increase. At the same time, the function that the original function was folded into could have been successfully emitted, e.g. because it was split in the presence of a profile information. Later, because the function was not emitted, we have to use its original .eh_frame entry in the preserved .eh_frame section. However, that entry is no longer referencing the original function, but the function that the original was folded into. This happens since the original symbol gets emitted at the other function location. As a result, .eh_frame entry for the folded function is missing. To prevent incorrect update of the original .eh_frame, create relocations against absolute values. This guarantees preservation of the section contents while updating pc-relative references. (cherry picked from commit 8f7b7c86e2acc79cb553705d2431fb68002701d7)

Summary: In non-relocation mode, make sure we emit extra symbols for a folded function even if the function was not overwritten due to its large size. (cherry picked from commit 7c395040de56a6554910144070c70e217ebaa512)

Summary: Add option `-align-text=<n>` to control .text alignment within a segment. Set to page size by default. (cherry picked from commit ec8e4998bb726a188d01293cfc686a7ba5e58e5a)

Summary: In relocation mode, there is no use for old .eh_frame section. Moreover, it can interfere with new EH frames via .eh_frame_hdr when the original .text is reused. (cherry picked from commit 2eaba65a3b22e629e2622566516c1336a1d431fa)

Summary: Add an option to fail processing of the input binary if the profile is not accurate: -stale-threshold=<uint> - maximum percentage of stale functions to tolerate (default: 100) Default (100) means never to fail. A function profile is considered stale if any branch in its profile has invalid source or destination. Use `-stale-threshold=0` to fail if any staleness is detected in the profile. (cherry picked from commit 402735094214eabe994f7b3c58a29524db3e5acd)

Summary: With larger PLT sizes, linear PLT symbol name lookup becomes a bottleneck. (cherry picked from commit 8f13e6a001a2da6127e2724687644efdb49eff24)

Summary: (cherry picked from commit 47a9c165d27c030516c08416a94030a107cdb096)

Summary: Some functions could be called at an address inside their function body. Typically, these functions are written in assembly as C/C++ does not have a multi-entry function concept. The addresses inside a function body that could be referenced from outside are called secondary entry points. In BOLT we support processing functions with secondary/multiple entry points. We used to mark basic blocks representing those entry points with a special flag. There was only one problem - each basic block has exactly one MCSymbol associated with it, and for the most efficient processing we prefer that symbol to be local/temporary. However, in certain scenarios, e.g. when running in non-relocation mode, we need the entry symbol to be global/non-temporary. We could create global symbols for secondary points ahead of time when the entry point is marked in the symbol table. But not all such entries are properly marked. This means that potentially we could discover an entry point only after disassembling the code that references it, and it could happen after a local label was already created at the same location together with all its references. Replacing the local symbol and updating the references turned out to be an error-prone process. This diff takes a different approach. All basic blocks are created with permanently local symbols. Whenever there's a need to add a secondary entry point, we create an extra global symbol or use an existing one at that location. Containing BinaryFunction maps a local symbol of a basic block to the global symbol representing a secondary entry point. This way we can tell if the basic block is a secondary entry point, and we emit both symbols for all secondary entry points. Since secondary entry points are quite rare, the overhead of this approach is minimal. Note that the same location could be referenced via local symbol from inside a function and via global entry point symbol from outside. This is true for both primary and secondary entry points. (cherry picked from commit 7596bdaff991b631d13b9546fbf083f35cd99faf)

Summary: .stab and .stabstr are special sections containing debugging information and strings associated with the debugging information. This commit adds them to the list of debugging sections, so these sections can be removed for output binary. Vasily Leonenko, Advanced Software Technology Lab, Huawei

Summary: This shouldn't be upstreamed, it's just for our github presentation page.

Summary: Replaced `rebased` with `main`

Summary: Amended the Tail Duplication analysis pass to do the tail duplication in question

Summary: Remove llvm.patch from build instructions.

Summary: The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec sections. On X86, the main differentiator is the presence of endbr64 instruction at the beginning of the entry. Detect the instruction and adjust the size accordingly.

…il call as annotation Summary: Remove `const` qualifier for methods that would add/remove `TC` annotation. This diff needs to be squashed into the Merge commit along with the fixup.

Summary: NFC. Remove `TAILJMP*` encoding changes from LLVM X86InstrControl.td. Specifically, revert TAILJMPs to a `PseudoI` insn type. With this removal, there are two ways to represent X86 tail calls: 1) Continue to use `TAILJMP*` opcodes, or 2) Use `TC` annotation, same as AArch64. Option 1) is a dead end, as TAILJMPs are now pseudo instructions, which exempts them from various analyses: e.g. `X86II::getMemoryOperandNo` stops working. Thus, the solution is to represent tail calls via annotation, for both X86 and AArch64, and to switch to JMP opcodes for X86 (function hashes will change as a result). Note: this change has ~6% time overhead, and neutral RSS-wise.

Summary: Move the common code into MCPlusBuilder.h. Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated annotation. This diff reduces the processing time overhead to 1.5% vs using TAILJMP opcode.

Summary: This commit implements new method for _start & _fini functions hooking which allows to use relative jumps for future PIE & .so library support. Instead of using absolute address of _start & _fini functions known on linking stage - we'll use dynamically created trampoline functions and use corresponding symbols in instrumentation runtime library. As we would like to use instrumentation for dynamically loaded binaries (with PIE & .so), thus we need to compile instrumentation library with "-fPIC" flag to support relative address resolution for functions and data. For shared libraries we need to handle initialization of instrumentation library case by using DT_INIT section entry point. Also this commit adds detection if the binary is executable or shared library based on existence of PT_INTERP header. In case of shared library we save information about real library init function address for further usage for instrumentation library init trampoline function creation and also update DT_INIT to point instrumentation library init function. Functions called from init/fini functions should be called with forced stack alignment to avoid issues with instructions which relies on it. E.g. optimized string operations. Vasily Leonenko, Advanced Software Technology Lab, Huawei

Summary: This commit adds support for getting directory entries and reading value of a symbolic link in instrumentation runtime library Elvina Yakubova, Advanced Software Technology Lab, Huawei

…oc/self/map_files Summary: This commit adds support for opening libs based on links /proc/self/map_files. For this we're getting current virtual address and searching the lib in the directory with such address range. After that, we're getting full path to the binary by using readlink function. Direct read from link in /proc/self/map_files entries is not possible because of lack of permissions. Elvina Yakubova, Advanced Software Technology Lab, Huawei

Summary: This commit introduces static binaries instrumentation support. Note that current implementation does not support profile output on the instrumented binary finalization. So it requires to use -instrumentation-sleep-time=N (N>0) option usage. Note: There is unhandled case with static PIE executable which might have dynamic header. Vasily Leonenko, Advanced Software Technology Lab, Huawei

Summary: This commit fixes runtime instrumentation handlers for PIE binaries case. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

Summary: Vasily Leonenko, Advanced Software Technology Lab, Huawei

Summary: This commit introduces -instrumentation-binpath argument used to point instuqmented binary in runtime in case if /proc/self/map_files path is not accessible due to access restriction issues. Vasily Leonenko Advanced Software Technology Lab, Huawei

Summary: The trampolines are no loger pointers to the functions. For propper name resolving by bolt use extern "C" for all external symbols in instr.cpp Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

…ibrary Summary: To avoid RELATIVE relocations avoid using of GOT table by using hidden visibility for all symbols in library. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

Summary: This commit adds dummy tests for checking instrumentation support for PIE executables and shared libraries. Vasily Leonenko, Advanced Software Technology Lab, Huawei

Summary: This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.

Summary: This is follow up to https://reviews.llvm.org/D106198 where llvm-dwp was refactored in to multiple files. In this patch moving them in to lib/include directories.

Summary: This is aimed at reducing our changes to LLVM. We originally added support to LLVM MC layer to be able to encode DWARF expressions out of an intermediate representation in memory that is based on DWARFDebugFrame parsing of these expressions. However, this is an overkill and BOLT does not need the ability to re-encode DWARF expressions. So, for now, let's remove this support and make BOLT read DWARF expressions as a blob of data, and write it back to the binary using cfi_escape, the same method that some LLVM backends (AArch64) use to encode DWARF expressions into CFIs when they need them. We are still left with a one line change to LLVM libraries, to allow us to read the raw bytes of the DWARF expression out of DWARFDebugFrame parsing structures, so we can write them back with cfi_escape.

maksfb and others added 30 commits August 5, 2021 11:57

[BOLT] Delete ExecutableFileMemoryManager::registerNoteSection()

ef9bc41

Summary: The interface is no longer in use. (cherry picked from commit 460bdaaf5c48910d228d17211548bdc410f89d44)

[BOLT][NFC] Remove unused BinarySection member functions

1bd8813

(cherry picked from commit b392da1fbd055a555ef0fab3f943bbf214ace0ee)

[BOLT][NFC] Minor refactoring of RewriteInstance

d409c68

(cherry picked from commit 0290dbaa15546fafa27a9a17240f0da4218299ce)

[BOLT][NFC] Factor out relocation processing

13093cc

(cherry picked from commit d4667d4e3eb21f481d2fd8df30d1be160fa0f646)

[BOLT] Fix begin decrementing

2c54b7f

Summary: Fix begin decrementing. (cherry picked from commit 664a2dbdae4741bb4a6048c9bb5de6e262708845)

[BOLT][NFC] Get rid of BestFit parameter

9864776

Summary: The parameter is no longer used. (cherry picked from commit 7ac00974b07895aeb9ad53f5f417119c8b15c7e5)

[BOLT] Remove allow-section-relocations option

876a732

Summary: The option is not used. Remove all related code. (cherry picked from commit ad05832b1e039215512df7a5e652931210963d6a)

[BOLT] Mark functions containing data as non-simple

78e4214

Summary: Temporarily mark functions containing data as non-simple. (cherry picked from commit 0e099b021028f71c447adc3d531b992313d2ae76)

[BOLT] Uniquify names of local symbols

7e83a0c

Summary: 1. Uniquify names of local symbols. 2. Handle aliases. (cherry picked from commit ced7da5591a5e688f4725d872355feae4800bc4d)

[BOLT] Refactor emission of original .eh_frame

72471e7

Summary: There is no need to treat the emission of the original `.eh_frame` section as a special case. (cherry picked from commit 79adc11ef27b9a15aa417e0d9ae8581a664dd2ab)

[BOLT] Refactor section prefixes

d963884

(cherry picked from commit bd7ce3bc754556a7d4a31bc626f57148ffb89be0)

[BOLT] Refactor ELF symbol table rewriting code

69d1e88

Summary: Make ELF symbol table rewriting code more structured. While at it, remove symbols from non-allocatable sections. (cherry picked from commit 751a3b565878d16cd124dfddfb07baa3fddb03a4)

[BOLT][DWARF] Add support for base address in DWARF location lists

55c46c0

Summary: The version of LLVM that we are based on lacks the support for base address in DWARF location lists. Add the missing pieces. (cherry picked from commit 39a52458e17adcdc8e718efbc690f469a5fdc44a)

[BOLT] Emit ICF symbols for large functions

ce288a1

Summary: In non-relocation mode, make sure we emit extra symbols for a folded function even if the function was not overwritten due to its large size. (cherry picked from commit 7c395040de56a6554910144070c70e217ebaa512)

[BOLT] Option to control .text alignment

ce92b98

Summary: Add option `-align-text=<n>` to control .text alignment within a segment. Set to page size by default. (cherry picked from commit ec8e4998bb726a188d01293cfc686a7ba5e58e5a)

[BOLT] Speedup PLT processing

ed19f13

Summary: With larger PLT sizes, linear PLT symbol name lookup becomes a bottleneck. (cherry picked from commit 8f13e6a001a2da6127e2724687644efdb49eff24)

[BOLT][NFC] Change wording while reporting functions stats

c2c2d8b

Summary: (cherry picked from commit 47a9c165d27c030516c08416a94030a107cdb096)

Vasily Leonenko and others added 23 commits August 5, 2021 11:59

Replace LLVM's README with our own

0e443f8

Summary: This shouldn't be upstreamed, it's just for our github presentation page.

[BOLT][GitHub] Updated the branch that triggers docker-image Action

c9f0287

Summary: Replaced `rebased` with `main`

[BOLT] Tail Duplication active pass

fa8b0a4

Summary: Amended the Tail Duplication analysis pass to do the tail duplication in question

[BOLT] Update build instructions in README

fb12b4e

Summary: Remove llvm.patch from build instructions.

[BOLT][NFC] Un-const some MCPlusBuilder methods in preparation for ta…

c770c11

…il call as annotation Summary: Remove `const` qualifier for methods that would add/remove `TC` annotation. This diff needs to be squashed into the Merge commit along with the fixup.

[BOLT][NFC] Unify isTailCall interface across X86 and AArch64

08c9c99

Summary: Move the common code into MCPlusBuilder.h. Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated annotation. This diff reduces the processing time overhead to 1.5% vs using TAILJMP opcode.

[PR] Instrumentation: Add readlink and getdents support

e264ce8

Summary: This commit adds support for getting directory entries and reading value of a symbolic link in instrumentation runtime library Elvina Yakubova, Advanced Software Technology Lab, Huawei

[PR] Instrumentation: Fix runtime handlers for PIE files

03700b7

Summary: This commit fixes runtime instrumentation handlers for PIE binaries case. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

[PR] README: remove note about experimental status of instrumentation

1437d30

Summary: Vasily Leonenko, Advanced Software Technology Lab, Huawei

[PR] Instrumentation: Fix start and fini trampoline pointers

083a705

Summary: The trampolines are no loger pointers to the functions. For propper name resolving by bolt use extern "C" for all external symbols in instr.cpp Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

[PR] Instrumentation: Avoid generating GOT table in instrumentation l…

db9a6a0

…ibrary Summary: To avoid RELATIVE relocations avoid using of GOT table by using hidden visibility for all symbols in library. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei

[PR] Tests: add instrumentation tests for PIE exec & shared libs

440f71f

Summary: This commit adds dummy tests for checking instrumentation support for PIE executables and shared libraries. Vasily Leonenko, Advanced Software Technology Lab, Huawei

[DWP] Refactoring llvm-dwp in to a library

a1a27f3

Summary: This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.

[DWP] Fix for Refactoring llvm-dwp in to a library

ddb78af

[DWP] Refactoring llvm-dwp in to a library part 2

dafed12

Summary: This is follow up to https://reviews.llvm.org/D106198 where llvm-dwp was refactored in to multiple files. In this patch moving them in to lib/include directories.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 5, 2021

rafaelauler force-pushed the main branch from dafed12 to 37e51a3 Compare August 6, 2021 23:07

rafaelauler force-pushed the main branch from 37e51a3 to c888af0 Compare August 17, 2021 23:44

rafaelauler force-pushed the main branch from c888af0 to 3c7f836 Compare August 25, 2021 20:21

rafaelauler force-pushed the main branch from 3c7f836 to f5840b0 Compare September 2, 2021 01:02

rafaelauler force-pushed the main branch from f5840b0 to 09adc38 Compare September 9, 2021 20:24

rafaelauler closed this Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOLT] [NFC] Remove special DWARF expressions handling from LLVM #196

[BOLT] [NFC] Remove special DWARF expressions handling from LLVM #196

rafaelauler commented Aug 5, 2021

[BOLT] [NFC] Remove special DWARF expressions handling from LLVM #196

[BOLT] [NFC] Remove special DWARF expressions handling from LLVM #196

Conversation

rafaelauler commented Aug 5, 2021