Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Improve handling of relocations targeting specific instructions #66395

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

mtvec
Copy link
Contributor

@mtvec mtvec commented Sep 14, 2023

On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol foo:

nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)

This results in two relocation:

  • auipc: R_RISCV_PCREL_HI20 referencing foo;
  • ld: R_RISCV_PCREL_LO12_I referencing to local label 1 which points to the auipc instruction.

It is of utmost importance that the R_RISCV_PCREL_LO12_I keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this.

BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this:

.BB0:
    nop
.BB1:
    auipc t0, %pcrel_hi(foo)
    ld t0, %pcrel_lo(.BB1)(t0)

While this currently works (i.e., the R_RISCV_PCREL_LO12_I relocation points to the correct instruction), it has two downsides:

  • Too many basic blocks are created (the example above is logically only one yet two are created);
  • If instructions are inserted in .BB1 (e.g., by instrumentation), things will break since the label will not point to the auipc anymore.

This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows:

  • Add a new annotation type (kLabel) that allows us to annotate instructions with an MCSymbol *;
  • Whenever we encounter a relocation type that is used to refer to a specific instruction (Relocation::isInstructionReference), we register a new type of label (InstructionLabels) with the corresponding BinaryFunction;
  • During disassembly, add these instruction labels to the correct instructions;
  • During emission, emit these labels right before the instruction.

I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed.

I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of MCAnnotation suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.

@yota9
Copy link
Member

yota9 commented Sep 14, 2023

I'm not familiar yet with RISC asm, could you please describe why loading value from symbol results in text relocation (e.g. reference to the instruction)? Just interesting :)
Overall it seems to me reasonable and the implementation is nice, LGTM. Let's wait Meta team suggestions though.

On RISC-V, there are certain relocations that target a specific
instruction instead of a more abstract location like a function or basic
block. Take the following example that loads a value from symbol `foo`:

```
nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)
```

This results in two relocation:
- auipc: `R_RISCV_PCREL_HI20` referencing `foo`;
- ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points
  to the auipc instruction.

It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps
referring to the auipc instruction; if not, the program will fail to
assemble. However, BOLT currently does not guarantee this.

BOLT currently assumes that all local symbols are jump targets and
always starts a new basic block at symbol locations. The example above
results in a CFG the looks like this:

```
.BB0:
    nop
.BB1:
    auipc t0, %pcrel_hi(foo)
    ld t0, %pcrel_lo(.BB1)(t0)
```

While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation
points to the correct instruction), it has two downsides:
- Too many basic blocks are created (the example above is logically only
  one yet two are created);
- If instructions are inserted in `.BB1` (e.g., by instrumentation),
  things will break since the label will not point to the auipc anymore.

This patch proposes to fix this issue by teaching BOLT to track labels
that should always point to a specific instruction. This is implemented
as follows:
- Add a new annotation type (`kLabel`) that allows us to annotate
  instructions with an `MCSymbol *`;
- Whenever we encounter a relocation type that is used to refer to a
  specific instruction (`Relocation::isInstructionReference`), we
  register a new type of label (`InstructionLabels`) with the
  corresponding `BinaryFunction`;
- During disassembly, add these instruction labels to the correct
  instructions;
- During emission, emit these labels right before the instruction.

I believe the use of annotations works quite well for this use case as
it allows us to reliably track instruction labels. If we were to store
them as offsets in basic blocks, it would be error prone to keep them
updated whenever instructions are inserted or removed.

I have chosen to add labels as first-class annotations (as opposed to a
generic one) because the documentation of `MCAnnotation` suggests that
generic annotations are to be used for optional metadata that can be
discarded without affecting correctness. As this is not the case for
labels, a first-class annotation seemed more appropriate.
@mtvec
Copy link
Contributor Author

mtvec commented Sep 14, 2023

I'm not familiar yet with RISC asm, could you please describe why loading value from symbol results in text relocation (e.g. reference to the instruction)? Just interesting :)

To materialize a 32-bit pc-relative address on RISC-V, you need two instruction: one auipc which adds a 20-bit immediate to the upper 20 bits of pc, and another I- or S-type instruction (e.g., ld, addi) to add the remaining 12 bits. The auipc gets a relocation pointing to the symbol you want the address of. The second instruction gets a relocation pointing to the auipc. The reason it has to point there (as opposed to the target symbol) is that the two instructions are not necessarily consecutive so the location of the auipc must be known in order to calculate the correct immediate for the second instruction.

See the psABI for more info.

@yota9
Copy link
Member

yota9 commented Sep 14, 2023

Thanks! Am I understand it right that this is from linker standpoint? In the final binary ld/addi would have just an imm operand.
But since we need lo12 offset between 1 instruction and the symbol we're addressing the 1st instruction (the second might have other HI address) and the linker knowing where we're addressing would find the 1st instruction, it's relocation and it's symbol to complete the lower part of the address.
If I'm right his schematic is soo weird to me. I mean at aarch64 we're doing:
adrp x0, symb // PC-relative high 20 bit of address. I think it is ~ to auipc
// Any count of instructions
add x0, x0, symb // lo-12 part of address is offset in 4k page. Since we're addressing inside the 4k page this offset would never be changed.
So really only the adrp is PC-relative, the second one is page-relative offset. But RISC-V for some reason seems doesn't want to use such schematics and at linker time it "truly" calculates the lower part of the offset using the 1st instruction as a base and it's relocation to find the symbol. I really had to strain my brain to understood this schematic).

@mtvec
Copy link
Contributor Author

mtvec commented Sep 15, 2023

Thanks! Am I understand it right that this is from linker standpoint? In the final binary ld/addi would have just an imm operand. But since we need lo12 offset between 1 instruction and the symbol we're addressing the 1st instruction (the second might have other HI address) and the linker knowing where we're addressing would find the 1st instruction, it's relocation and it's symbol to complete the lower part of the address.

That all sounds correct.

If I'm right his schematic is soo weird to me. I mean at aarch64 we're doing:
adrp x0, symb // PC-relative high 20 bit of address. I think it is ~ to auipc

It's similar to auipc but not the same. adrp clear the lower 12 bits while auipc doesn't so the latter doesn't result in a 4KiB-aligned address. This also means that auipc can, for example, be used to calculate the value of PC.

// Any count of instructions add x0, x0, symb // lo-12 part of address is offset in 4k page.

Since we're addressing inside the 4k page this offset would never be changed. So really only the adrp is PC-relative, the second one is page-relative offset. But RISC-V for some reason seems doesn't want to use such schematics and at linker time it "truly" calculates the lower part of the offset using the 1st instruction as a base and it's relocation to find the symbol. I really had to strain my brain to understood this schematic).

As far as I can tell, using something like adrp wouldn't work on RISC-V since the I- and S-type instructions it would need to pair-up with take a signed 12-bit immediate so cannot represent arbitrary page offsets. On AArch64 this can work since these instructions have multiple addressing modes including one with unsigned 12-bit offsets.

@yota9
Copy link
Member

yota9 commented Sep 15, 2023

Thanks for clarifications, Job! Now it seems to be completely clear to me :)

@mtvec
Copy link
Contributor Author

mtvec commented Sep 19, 2023

I've noticed one downside/annoyance with this approach. When emitting loads/stores for instrumentation (e.g., in createInstrIncMemory), I need to attach labels to instructions (same as the example given in the description of this patch). However, createInstrIncMemory is const while setLabel cannot be. Moreover, I really should add an AllocatorIdTy argument to setLabel and this means that createInstrIncMemory also needs one (it's called in the context of runOnEachFunctionWithUniqueAllocId).

None of this is a blocker (I have a working instrumentation implementation using the approach above) but it feels slightly annoying so I'd be interested in thoughts on this.

@yota9
Copy link
Member

yota9 commented Sep 19, 2023

Is it possible to directly address needed symbol using -4 addend to symbol from the second instruction, since in this case you can be sure they would be emitted one by one?

@mtvec
Copy link
Contributor Author

mtvec commented Sep 20, 2023

No, that isn't possible. For one, the linker needs a R_RISCV_PCREL_LO12_I reloc to do its job and the only way to make the backend generate one is to add a MCSymbolRefExpr that refers to the first instruction.

Another issue is that the offset isn't always -4. I'm actually generating code like this:

1: auipc t0, %pcrel_hi(counter)
ld t1, %pcrel_lo(1b)(t0)
addi t1, t1, 1
sd t1, %pcrel_lo(1b)(t0)

So the sd refers to the same auipc as the earlier ld.

@mtvec
Copy link
Contributor Author

mtvec commented Sep 28, 2023

Ping

@rafaelauler
Copy link
Contributor

Adding data structures to BinaryFunction class is a bit expensive for us because it increases our memory footprint -- an uninitialized std::map<> will cost 48 bytes, which means an additional 5MB for 100k BinaryFunction objects loaded at runtime. The x86 target primarily deals with very large binaries, so it will be good if this solution could be mindful of that and perhaps reduce the memory footprint for x86 that will likely not use this additional InstructionLabels data structure. Perhaps convert it to an std::optionalstd::map? Or better yet, maybe try to reuse the existing Relocations map that we have in BF and register your relocs there, then fetch that in ::disassemble() to try to fill up your new annotations?

@mtvec
Copy link
Contributor Author

mtvec commented Oct 5, 2023

Adding data structures to BinaryFunction class is a bit expensive for us because it increases our memory footprint

Alright, that makes sense. I got rid of BinaryFunction::InstructionLabels as follows:

  • While analyzing relocations, simply add instruction references to the Relocations map. This is slightly awkward for two reasons (but still workable imo):
    • We cannot add a symbol to the relocation since we have nowhere to store it. We cannot store in the reloc itself because multiple relocs might want to point to the same symbol. Relocations without symbols didn't happen before (afaict) so I had to add a null-check to an unrelated debug output.
    • In order to be able to reconstruct the symbol reference later, we need to know the address the reloc points to. This is typically calculated by extracting the value. However, in case of instruction-referencing relocs like PCREL_LO* on RISC-V, the extracted value is unrelated to the referenced symbol. I simply set the value of the reloc to the address of the symbol it points to to solve this.
  • While disassembling, keep track of instruction labels in a map:
    • Whenever we encounter and instruction with an instruction-referencing reloc, look up the label it points in the map (or create a new one) and replace its immediate with a symbol ref as usual.
    • After having disassembled all instructions, iterate the map to add the labels to the corresponding instructions.

So basically, the InstructionLabels map is moved to the stack of disassemble() at the cost of a slightly awkward representation of some relocs.

Copy link
Contributor

@rafaelauler rafaelauler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@mtvec mtvec merged commit ff5e2ba into llvm:main Oct 6, 2023
2 of 3 checks passed
@mtvec mtvec deleted the bolt-instruction-labels branch October 6, 2023 06:46
mtvec added a commit to mtvec/llvm-project that referenced this pull request Oct 10, 2023
On RISC-V, this function will have to insert `auipc`+`lw`/`sw` pairs
where the `auipc` needs a label annotation (see llvm#66395). In order to do
this correctly within the context of `ParallelUtilities`, we need an
allocator id when setting the label. This patch adds this allocator id
argument and wires it through to all call sites.

Note that this function is also made non-const since `setLabel` is
non-const.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants