-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening #120064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
860fc0c
[BOLT] Recognize paciasp and autiasp instructions
bgergely0 8fe3244
[BOLT] Support for OpNegateRAState
bgergely0 e81445c
[BOLT][AArch64] Fix which PSign and PAuth variants are used (#120064)
bgergely0 de4b989
[BOLT] Add unit tests for negate_ra_state cfi handling
bgergely0 1ddea7b
[BOLT] Basic exception unwinding test
bgergely0 0cec69a
[BOLT] Add OpNegateRAState to printCFI
bgergely0 9e19f0a
[BOLT] Improve function splitting at OpNegateRAState handling
bgergely0 8182c9b
[BOLT] Bugfix: CFIs can be placed before the first Instruction
bgergely0 0bfae2c
[BOLT] Add function-splitting test
bgergely0 2a5fd80
[BOLT] Improve warnings in MarkRAStates
bgergely0 0c28761
[BOLT] Improve negate-ra-state-incorrect test
bgergely0 054f614
[BOLT] Do run negate-ra-state rewriting passes on all functions
bgergely0 855c74d
[BOLT] Introduce --disallow-pacret flag
bgergely0 4a976a4
[BOLT][NFC] Review nits
bgergely0 7df2811
[BOLT] Refactor and improve InsertNegateRAStatePass
bgergely0 c75645f
[BOLT] Add function splitting lit test without runtime dependency
bgergely0 ad2f029
[BOLT] Remove unnecessary script, and rewrite unit test using it
bgergely0 8b5b732
[BOLT][NFC] Simplify RAState tracking
bgergely0 910c091
[BOLT] Add negate-ra-state-reorder test
bgergely0 3b73dda
[BOLT] Print stats in MarkRAStates, InsertNegateRAState
bgergely0 d2141bc
[BOLT] Added docs/PacRetDesign.md
bgergely0 250f060
[BOLT] Review nits
bgergely0 5cbd4ce
[BOLT] Fix multithreading in #120064
bgergely0 8e03527
[BOLT] Address review
bgergely0 fd3642a
[BOLT] Change scheduling policy to SP_INT_LINEAR
bgergely0 e12cb5b
[BOLT] Rename pac-ret feature flag
bgergely0 532c150
Update InsertNegateRAStatePass
bgergely0 35cf13d
[BOLT] Review changes
bgergely0 c0b4df4
[BOLT] Remove workaround
bgergely0 11e897d
[BOLT] Update bolt/docs/PacRetDesign.md
bgergely0 ea9120e
Merge branch 'main' into negate-ra-state-support-v2
bgergely0 b375f75
Update bolt/docs/PacRetDesign.md
bgergely0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
# Optimizing binaries with pac-ret hardening | ||
|
||
This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state` | ||
DWARF instruction in BOLT. As it describes internal design decisions, the | ||
intended audience is BOLT developers. The document is an updated version of the | ||
[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594). | ||
|
||
|
||
`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state` | ||
in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use | ||
**negate-ra-state** as a shorthand. | ||
|
||
## Introduction | ||
|
||
### Pointer Authentication | ||
|
||
For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis). | ||
|
||
### DW_CFA_AARCH64_negate_ra_state | ||
|
||
The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in | ||
the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1). | ||
|
||
``` | ||
The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register. | ||
``` | ||
|
||
This bit indicates to the unwinder whether the current return address is signed | ||
or not (hence the name). The unwinder uses this information to authenticate the | ||
pointer, and remove the Pointer Authentication Code (PAC) bits. | ||
Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt | ||
to authenticate an unsigned pointer (resulting in a segmentation fault), or skip | ||
authentication on a signed pointer, which can also cause a fault. | ||
|
||
Note: some unwinders use the `xpac` instruction to strip the PAC bits without | ||
authenticating the pointer. This is an incorrect (incomplete) implementation, | ||
as it allows control-flow modification in the case of unwinding. | ||
|
||
There are no DWARF instructions to directly set or clear the RA State. However, | ||
two other CFIs can also affect the RA state: | ||
- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack. | ||
- `DW_CFA_restore_state`: this CFI pops rules from this stack. | ||
|
||
Example: | ||
|
||
| CFI | Effect on RA state | | ||
| ------------------------------ | ------------------------------ | | ||
| (default) | 0 | | ||
| DW_CFA_AARCH64_negate_ra_state | 0 -> 1 | | ||
| DW_CFA_remember_state | 1 pushed to the stack | | ||
| DW_CFA_AARCH64_negate_ra_state | 1 -> 0 | | ||
| DW_CFA_restore_state | 0 -> 1 (popped from the stack) | | ||
|
||
The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it | ||
is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327). | ||
|
||
### Where are these CFIs needed? | ||
|
||
Whenever two consecutive instructions have different RA states, the unwinder must | ||
be informed of the change. This typically occurs during pointer signing or | ||
authentication. If adjacent instructions differ in RA state but neither signs | ||
nor authenticates the return address, they must belong to different control flow | ||
paths. One is part of an execution path with signed RA, the other is part of a | ||
path with an unsigned RA. | ||
|
||
In the example below, the first BasicBlock ends in a conditional branch, and | ||
jumps to two different BasicBlocks, each with their own authentication, and | ||
return. The instructions on the border of the second and third BasicBlock have | ||
different RA states. The `ret` at the end of the second BasicBlock is in unsigned | ||
state. The start of the third BasicBlock is after the `paciasp` in the control | ||
flow, but before the authentication. In this case, a negate-ra-state is needed | ||
at the end of the second BasicBlock. | ||
|
||
``` | ||
+----------------+ | ||
| paciasp | | ||
| | | ||
| b.cc | | ||
+--------+-------+ | ||
| | ||
+----------------+ | ||
| | | ||
| +--------v-------+ | ||
| | | | ||
| | autiasp | | ||
| | ret | // RA: unsigned | ||
| +----------------+ | ||
+----------------+ | ||
| | ||
+--------v-------+ // RA: signed | ||
| | | ||
| autiasp | | ||
| ret | | ||
+----------------+ | ||
``` | ||
|
||
> [!important] | ||
> The unwinder does not follow the control flow graph. It reads unwind | ||
> information in the layout order. | ||
Because these locations are dependent on how the function layout looks, | ||
negate-ra-state CFIs will become invalid during BasicBlock reordering. | ||
|
||
## Solution design | ||
|
||
The implementation introduces two new passes: | ||
1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs | ||
in the input binary | ||
2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after | ||
optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct | ||
places: wherever there is a state change between two consecutive instructions | ||
in the layout order. | ||
|
||
To track metadata on individual instructions, the `MCAnnotation` class was | ||
extended. These also have helper functions in `MCPlusBuilder`. | ||
|
||
### Saving annotations at CFI reading | ||
|
||
CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`. | ||
At this point, we add MCAnnotations about negate-ra-state, remember-state and | ||
restore-state CFIs to the instructions they refer to. This is to not interfere | ||
with the CFI processing that already happens in BOLT (e.g. remember-state and | ||
restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC). | ||
|
||
As we add the MCAnnotations *to instructions*, we have to account for the case | ||
where the function starts with a CFI altering the RA state. As CFIs modify the RA | ||
state of the instructions before them, we cannot add the annotation to the first | ||
instruction. | ||
This special case is handled by adding an `initialRAState` bool to each BinaryFunction. | ||
If the `Offset` the CFI refers to is zero, we don't store an annotation, but set | ||
the `initialRAState` in `FillCFIInfoFor`. This information is then used in | ||
`MarkRAStates`. | ||
|
||
### Binaries without DWARF info | ||
|
||
In some cases, the DWARF tables are stripped from the binary. These programs | ||
usually have some other unwind-mechanism. | ||
These passes only run on functions that include at least one negate-ra-state CFI. | ||
This avoids processing functions that do not use Pointer Authentication, or on | ||
functions that use Pointer Authentication, but do not have DWARF info. | ||
|
||
In summary: | ||
- pointer auth is not used: no change, the new passes do not run. | ||
- pointer auth is used, but DWARF info is stripped: no change, the new passes | ||
do not run. | ||
- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the | ||
negate-ra-state CFI. | ||
|
||
### MarkRAStates pass | ||
|
||
This pass runs before optimizations reorder anything. | ||
|
||
It processes MCAnnotations generated during the CFI reading stage to check if | ||
instructions have either of the three CFIs that can modify RA state: | ||
- negate-ra-state, | ||
- remember-state, | ||
- restore-state. | ||
|
||
Then it adds new MCAnnotations to each instruction, indicating their RA state. | ||
Those annotations are: | ||
- Signed, | ||
- Unsigned. | ||
|
||
Below is a simple example, that shows the two different type of annotations: | ||
what we have before the pass, and after it. | ||
|
||
| Instruction | Before | After | | ||
| ----------------------------- | --------------- | -------- | | ||
| paciasp | negate-ra-state | unsigned | | ||
| stp x29, x30, [sp, #-0x10]! | | signed | | ||
| mov x29, sp | | signed | | ||
| ldp x29, x30, [sp], #0x10 | | signed | | ||
| autiasp | negate-ra-state | signed | | ||
| ret | | unsigned | | ||
|
||
##### Error handling in MarkRAState Pass: | ||
|
||
Whenever the MarkRAStates pass finds inconsistencies in the current | ||
BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT | ||
will not optimize this function but will emit it unchanged in the original section | ||
(`.bolt.org.text`). | ||
|
||
The inconsistencies are as follows: | ||
- finding a `pac*` instruction when already in signed state | ||
- finding an `aut*` instruction when already in unsigned state | ||
- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`. | ||
|
||
Users will be informed about the number of ignored functions in the pass, the | ||
exact functions ignored, and the found inconsistency. | ||
|
||
### InsertNegateRAStatePass | ||
|
||
This pass runs after optimizations. It performns the _inverse_ of MarkRAState pa s: | ||
1. it reads the RA state annotations attached to the instructions, and | ||
2. whenever the state changes, it adds a PseudoInstruction that holds an | ||
OpNegateRAState CFI. | ||
|
||
##### Covering newly generated instructions: | ||
|
||
Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have | ||
to know what RA state these have. | ||
|
||
The current solution has the `inferUnknownStates` function to cover these, using | ||
a fairly simple strategy: unknown states inherit the last known state. | ||
|
||
This will be updated to a more robust solution. | ||
|
||
> [!important] | ||
> As issue #160989 describes, unwind info is incorrect in stubs with multiple callers. | ||
> For this same reason, we cannot generate correct pac-specific unwind info: the signess | ||
> of the _incorrect_ return address is meaningless. | ||
### Optimizations requiring special attention | ||
|
||
Marking states before optimizations ensure that instructions can be moved around | ||
freely. The only special case is function splitting. When a function is split, | ||
the split part becomes a new function in the emitted binary. For unwinding to | ||
work, it needs to "replay" all CFIs that lead up to the split point. BOLT does | ||
this for other CFIs. As negate-ra-state is not read (only stored as an Annotation), | ||
we have to do this manually in InsertNegateRAStatePass. Here, if the split part | ||
starts with an instruction that has Signed RA state, we add a negate-ra-state CFI | ||
to indicate this. | ||
|
||
## Option to disallow the feature | ||
|
||
The feature can be guarded with the `--update-branch-prediction` flag, which is | ||
on by default. If the flag is set to false, and a function | ||
`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.