Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

[BOLT] Handle multiple pred->succ edges in removeAllSuccessors #201

Closed
wants to merge 924 commits into from

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Aug 16, 2021

For basic blocks with multiple edges to the same successor, the default
behavior of removePredecessor is to remove all occurrences of the
predecessor block in its predecessor list (Multiple=true).

Example:
A -> B (two edges)

A->removeAllSuccessors()
  for each successor of block A: // B twice
  // this removes both occurrences of A in B's predecessors list
  B->removePredecessor(A);
  // this invocation triggers an assert as A is no longer in B's
  // predecessor list
  B->removePredecessor(A);

If removePredecessor removes one block at a time (Multiple=false),
this removes blocks from A's successor list and B's predecessor list in
sync.

Resolves #187.

Test plan: ninja check-bolt
Tested on inputs in #187 (Pyston binary)

rafaelauler and others added 30 commits August 6, 2021 15:44
Summary:
Shrink wrapping has a mode where it will directly move push
pop pairs, instead of replacing them with stores/loads. This is an
ambitious mode that is triggered sometimes, but whenever matching with
a push, it would operate with the assumption that the restoring
instruction was a pop, not a load, otherwise it would assert. Fix this
assertion to bail nicely back to non-pushpop mode (use regular store and
load instructions).

(cherry picked from commit 2ff448dee7461b1b31ae06837e8d859b77985e6b)
(cherry picked from commit d4667d4e3eb21f481d2fd8df30d1be160fa0f646)
Summary: Fix begin decrementing.

(cherry picked from commit 664a2dbdae4741bb4a6048c9bb5de6e262708845)
Summary: The parameter is no longer used.

(cherry picked from commit 7ac00974b07895aeb9ad53f5f417119c8b15c7e5)
Summary: The option is not used. Remove all related code.

(cherry picked from commit ad05832b1e039215512df7a5e652931210963d6a)
Summary: Temporarily mark functions containing data as non-simple.

(cherry picked from commit 0e099b021028f71c447adc3d531b992313d2ae76)
Summary:
1. Uniquify names of local symbols.
2. Handle aliases.

(cherry picked from commit ced7da5591a5e688f4725d872355feae4800bc4d)
Summary:
There is no need to treat the emission of the original `.eh_frame`
section as a special case.

(cherry picked from commit 79adc11ef27b9a15aa417e0d9ae8581a664dd2ab)
Summary:
This is a prerequisite for larger emitter refactoring.

Since .dynamic is read unconditionally, add an error message if the
section is missing, or the size of the section is zero.

(cherry picked from commit 70c0e23504ef8463ef15950690f458affc6ba511)
Summary:
Consolidate code and data emission code in ELF-independent
BinaryEmitter. The high-level interface includes only two
functions emitBinaryContext() and emitFunctionBody() used
by RewriteInstance and BinaryContext respectively.

(cherry picked from commit 55c0e69dfd6ecf65c9d93340347f52e4751210bc)
(cherry picked from commit bd7ce3bc754556a7d4a31bc626f57148ffb89be0)
Summary:
Make ELF symbol table rewriting code more structured. While at it,
remove symbols from non-allocatable sections.

(cherry picked from commit 751a3b565878d16cd124dfddfb07baa3fddb03a4)
Summary:
The version of LLVM that we are based on lacks the support for base
address in DWARF location lists. Add the missing pieces.

(cherry picked from commit 39a52458e17adcdc8e718efbc690f469a5fdc44a)
Summary:
Some functions may have exactly the same code and exception handlers.
However, their action tables could be different leading to mismatching
semantics. We should verify their equivalence while running ICF.

(cherry picked from commit 2449406494a66fbe4995f92747795fc130ae7907)
Summary:
ICF may fold functions in arbitrary order when running multi-threaded.
This is fine in relocation mode as we end up with just one function
holding all function symbols.

However, in non-relocation mode we keep all function bodies, and if we
keep merging profiles in non-deterministic order, we end up with
functions with non deterministic profiles. The fix for non-relocation
mode is to not merge profiles as the factual new profile could be
different from the merged one since both function instances are
potentially callable.

Additionally, emit extra symbols for ICF functions in non-relocation
mode to make it possible to track the folding.

(cherry picked from commit 201427172392b8fe17ac91d09a3b1f9db5a38e26)
Summary:
Too many hash collisions may cause ICF to run slowly.

We used to hash BinaryFunction only looking at instruction opcodes,
ignoring instruction operands. With many almost identical functions,
such approach may lead to long ICF processing time. By including
operands into the hash, we reduce the number of collisions and
improve the runtime often by a factor of 2 or more.

(cherry picked from commit d425f3c272fdf5f15e68d62e6f4880b976ed58e6)
Summary:
Further speedup ICF by applying stricter rules for congruent functions.
While checking symbolic operands in congruent functions, consider
operands congruent only if they are equal or reference functions
with identical hashes, i.e. potentially foldable functions.
Note that jump table operands are handled as a special case.

(cherry picked from commit 03bca75462804d9ab12b64530e55ca376e0fef24)
Summary:
Indirect calls that use RSP to compute the target address would
break in instrumentation mode because we were adding instructions that
changed the stack pointer. Fix this.

(cherry picked from commit 7815cc30c420107cfeee77d42b2acdeb0bca1035)
Summary:
RuntimeDyldImpl::resolveExternalSymbols() some time ago used to call
getSymbolAddress() while in the second loop. That call could have
modified the contents of ExternalSymbolRelocations that the loop was
iterating over. Thus the code was written in a way that erased the
processed entry on every loop iteration and reset the map
iterator. With large number of entries in ExternalSymbolRelocations the
loop code becomes a performance bottleneck.

Since getSymbolAddress() is no longer used, the
ExternalSymbolRelocations could be iterated in a straightforward way and
the map cleared before the function exit.

(cherry picked from commit 55ed8de2b3fe59ac84eef9fbb341185b6b1763dc)
Summary:
In a rare case, we may fold a function and fail to emit it in
non-relocation mode due to a function size increase. At the same time,
the function that the original function was folded into could have been
successfully emitted, e.g. because it was split in the presence of a
profile information.

Later, because the function was not emitted, we have to use its original
.eh_frame entry in the preserved .eh_frame section. However, that entry
is no longer referencing the original function, but the function that
the original was folded into. This happens since the original symbol gets
emitted at the other function location. As a result, .eh_frame entry for
the folded function is missing.

To prevent incorrect update of the original .eh_frame, create
relocations against absolute values. This guarantees preservation of the
section contents while updating pc-relative references.

(cherry picked from commit 8f7b7c86e2acc79cb553705d2431fb68002701d7)
Summary:
In non-relocation mode, make sure we emit extra symbols for a folded
function even if the function was not overwritten due to its large
size.

(cherry picked from commit 7c395040de56a6554910144070c70e217ebaa512)
Summary:
Add option `-align-text=<n>` to control .text alignment within a
segment. Set to page size by default.

(cherry picked from commit ec8e4998bb726a188d01293cfc686a7ba5e58e5a)
Summary:
In relocation mode, there is no use for old .eh_frame section. Moreover,
it can interfere with new EH frames via .eh_frame_hdr when the original
.text is reused.

(cherry picked from commit 2eaba65a3b22e629e2622566516c1336a1d431fa)
Summary:
Add an option to fail processing of the input binary if the profile
is not accurate:

  -stale-threshold=<uint>
    - maximum percentage of stale functions to tolerate (default: 100)

Default (100) means never to fail.

A function profile is considered stale if any branch in its profile
has invalid source or destination.

Use `-stale-threshold=0` to fail if any staleness is detected in the
profile.

(cherry picked from commit 402735094214eabe994f7b3c58a29524db3e5acd)
Summary:
With larger PLT sizes, linear PLT symbol name lookup becomes a
bottleneck.

(cherry picked from commit 8f13e6a001a2da6127e2724687644efdb49eff24)
Summary:
(cherry picked from commit 47a9c165d27c030516c08416a94030a107cdb096)
Summary:
Some functions could be called at an address inside their function body.
Typically, these functions are written in assembly as C/C++ does not
have a multi-entry function concept. The addresses inside a function
body that could be referenced from outside are called secondary entry
points.

In BOLT we support processing functions with secondary/multiple entry
points. We used to mark basic blocks representing those entry points
with a special flag. There was only one problem - each basic block has
exactly one MCSymbol associated with it, and for the most efficient
processing we prefer that symbol to be local/temporary. However, in
certain scenarios, e.g. when running in non-relocation mode, we need
the entry symbol to be global/non-temporary.

We could create global symbols for secondary points ahead of time when
the entry point is marked in the symbol table. But not all such entries
are properly marked. This means that potentially we could discover an
entry point only after disassembling the code that references it, and
it could happen after a local label was already created at the same
location together with all its references. Replacing the local symbol
and updating the references turned out to be an error-prone process.

This diff takes a different approach. All basic blocks are created with
permanently local symbols. Whenever there's a need to add a secondary
entry point, we create an extra global symbol or use an existing one
at that location. Containing BinaryFunction maps a local symbol of a
basic block to the global symbol representing a secondary entry point.
This way we can tell if the basic block is a secondary entry point,
and we emit both symbols for all secondary entry points. Since secondary
entry points are quite rare, the overhead of this approach is minimal.

Note that the same location could be referenced via local symbol from
inside a function and via global entry point symbol from outside.
This is true for both primary and secondary entry points.

(cherry picked from commit 7596bdaff991b631d13b9546fbf083f35cd99faf)
Summary:
In non-relocation mode, the code for marking a function non-simple was
decoupled from the code that added new entry points.  Fix that.

(cherry picked from commit 3a1a070a6cd90f1fb78dc07c523b56caeddfa319)
Summary:
In non-strict relocation mode it was possible to miss a jump table
reference leading to incorrect code.

(cherry picked from commit e6fa6fcb2f90d28347a08fdcdd55536d04490533)
Summary:
The commit that fixed ICF determinism in non-relocation mode disabled
profile merging for functions. Dyno stats output needs to be updated to
reflect the lack of merge.

(cherry picked from commit 080cf7758203622bd8d0a1c8c627515527164a6b)
yota9 and others added 13 commits August 6, 2021 15:46
Summary:
This commit fixes runtime instrumentation handlers for PIE
binaries case.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Summary:
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
Summary:
This commit introduces -instrumentation-binpath argument used
to point instuqmented binary in runtime in case if /proc/self/map_files
path is not accessible due to access restriction issues.

Vasily Leonenko
Advanced Software Technology Lab, Huawei
Summary:
The trampolines are no loger pointers to the functions.  For
propper name resolving by bolt use extern "C" for all external symbols
in instr.cpp

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
…ibrary

Summary:
To avoid RELATIVE relocations avoid using of GOT table
by using hidden visibility for all symbols in library.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Summary:
This commit adds dummy tests for checking instrumentation
support for PIE executables and shared libraries.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei
Summary: This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.
Summary:
This is follow up to https://reviews.llvm.org/D106198 where llvm-dwp was refactored in to multiple files.
In this patch moving them in to lib/include directories.
Summary: Move TrapFillValue from LLVM MCAsmInfo to BOLT MCPlusBuilder. NFC.
Summary:
Remove the extra README in BOLT. Keep only the root level
one. Sync changes to instrumentation documentation.
Summary:
Remove NeverAlign fragment support to reduce our changes to LLVM.
The support is reintroduced in https://reviews.llvm.org/D97982
@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 16, 2021
@rafaelauler
Copy link
Contributor

LGTM

@aaupov
Copy link
Contributor Author

aaupov commented Dec 21, 2021

This should be made redundant by the new normalize-cfg pass introduced recently: d3b007d

@aaupov aaupov closed this Dec 21, 2021
@aaupov aaupov deleted the remove_all_succs branch March 20, 2022 12:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf2bolt crashes