[BOLT] Improve handling of relocations targeting specific instructions #66395

mtvec · 2023-09-14T16:16:22Z

On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol foo:

nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)

This results in two relocation:

auipc: R_RISCV_PCREL_HI20 referencing foo;
ld: R_RISCV_PCREL_LO12_I referencing to local label 1 which points to the auipc instruction.

It is of utmost importance that the R_RISCV_PCREL_LO12_I keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this.

BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this:

.BB0:
    nop
.BB1:
    auipc t0, %pcrel_hi(foo)
    ld t0, %pcrel_lo(.BB1)(t0)

While this currently works (i.e., the R_RISCV_PCREL_LO12_I relocation points to the correct instruction), it has two downsides:

Too many basic blocks are created (the example above is logically only one yet two are created);
If instructions are inserted in .BB1 (e.g., by instrumentation), things will break since the label will not point to the auipc anymore.

This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows:

Add a new annotation type (kLabel) that allows us to annotate instructions with an MCSymbol *;
Whenever we encounter a relocation type that is used to refer to a specific instruction (Relocation::isInstructionReference), we register a new type of label (InstructionLabels) with the corresponding BinaryFunction;
During disassembly, add these instruction labels to the correct instructions;
During emission, emit these labels right before the instruction.

I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed.

I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of MCAnnotation suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.

yota9 · 2023-09-14T16:29:17Z

I'm not familiar yet with RISC asm, could you please describe why loading value from symbol results in text relocation (e.g. reference to the instruction)? Just interesting :)
Overall it seems to me reasonable and the implementation is nice, LGTM. Let's wait Meta team suggestions though.

On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register a new type of label (`InstructionLabels`) with the corresponding `BinaryFunction`; - During disassembly, add these instruction labels to the correct instructions; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.

mtvec · 2023-09-14T16:45:52Z

I'm not familiar yet with RISC asm, could you please describe why loading value from symbol results in text relocation (e.g. reference to the instruction)? Just interesting :)

To materialize a 32-bit pc-relative address on RISC-V, you need two instruction: one auipc which adds a 20-bit immediate to the upper 20 bits of pc, and another I- or S-type instruction (e.g., ld, addi) to add the remaining 12 bits. The auipc gets a relocation pointing to the symbol you want the address of. The second instruction gets a relocation pointing to the auipc. The reason it has to point there (as opposed to the target symbol) is that the two instructions are not necessarily consecutive so the location of the auipc must be known in order to calculate the correct immediate for the second instruction.

See the psABI for more info.

yota9 · 2023-09-14T18:57:26Z

Thanks! Am I understand it right that this is from linker standpoint? In the final binary ld/addi would have just an imm operand.
But since we need lo12 offset between 1 instruction and the symbol we're addressing the 1st instruction (the second might have other HI address) and the linker knowing where we're addressing would find the 1st instruction, it's relocation and it's symbol to complete the lower part of the address.
If I'm right his schematic is soo weird to me. I mean at aarch64 we're doing:
adrp x0, symb // PC-relative high 20 bit of address. I think it is ~ to auipc
// Any count of instructions
add x0, x0, symb // lo-12 part of address is offset in 4k page. Since we're addressing inside the 4k page this offset would never be changed.
So really only the adrp is PC-relative, the second one is page-relative offset. But RISC-V for some reason seems doesn't want to use such schematics and at linker time it "truly" calculates the lower part of the offset using the 1st instruction as a base and it's relocation to find the symbol. I really had to strain my brain to understood this schematic).

mtvec · 2023-09-15T08:16:15Z

Thanks! Am I understand it right that this is from linker standpoint? In the final binary ld/addi would have just an imm operand. But since we need lo12 offset between 1 instruction and the symbol we're addressing the 1st instruction (the second might have other HI address) and the linker knowing where we're addressing would find the 1st instruction, it's relocation and it's symbol to complete the lower part of the address.

That all sounds correct.

If I'm right his schematic is soo weird to me. I mean at aarch64 we're doing:
adrp x0, symb // PC-relative high 20 bit of address. I think it is ~ to auipc

It's similar to auipc but not the same. adrp clear the lower 12 bits while auipc doesn't so the latter doesn't result in a 4KiB-aligned address. This also means that auipc can, for example, be used to calculate the value of PC.

// Any count of instructions add x0, x0, symb // lo-12 part of address is offset in 4k page.

Since we're addressing inside the 4k page this offset would never be changed. So really only the adrp is PC-relative, the second one is page-relative offset. But RISC-V for some reason seems doesn't want to use such schematics and at linker time it "truly" calculates the lower part of the offset using the 1st instruction as a base and it's relocation to find the symbol. I really had to strain my brain to understood this schematic).

As far as I can tell, using something like adrp wouldn't work on RISC-V since the I- and S-type instructions it would need to pair-up with take a signed 12-bit immediate so cannot represent arbitrary page offsets. On AArch64 this can work since these instructions have multiple addressing modes including one with unsigned 12-bit offsets.

yota9 · 2023-09-15T09:24:12Z

Thanks for clarifications, Job! Now it seems to be completely clear to me :)

mtvec · 2023-09-19T16:16:11Z

I've noticed one downside/annoyance with this approach. When emitting loads/stores for instrumentation (e.g., in createInstrIncMemory), I need to attach labels to instructions (same as the example given in the description of this patch). However, createInstrIncMemory is const while setLabel cannot be. Moreover, I really should add an AllocatorIdTy argument to setLabel and this means that createInstrIncMemory also needs one (it's called in the context of runOnEachFunctionWithUniqueAllocId).

None of this is a blocker (I have a working instrumentation implementation using the approach above) but it feels slightly annoying so I'd be interested in thoughts on this.

yota9 · 2023-09-19T16:47:37Z

Is it possible to directly address needed symbol using -4 addend to symbol from the second instruction, since in this case you can be sure they would be emitted one by one?

mtvec · 2023-09-20T08:01:36Z

No, that isn't possible. For one, the linker needs a R_RISCV_PCREL_LO12_I reloc to do its job and the only way to make the backend generate one is to add a MCSymbolRefExpr that refers to the first instruction.

Another issue is that the offset isn't always -4. I'm actually generating code like this:

1: auipc t0, %pcrel_hi(counter)
ld t1, %pcrel_lo(1b)(t0)
addi t1, t1, 1
sd t1, %pcrel_lo(1b)(t0)

So the sd refers to the same auipc as the earlier ld.

mtvec · 2023-09-28T20:27:53Z

Ping

rafaelauler · 2023-10-04T22:10:00Z

Adding data structures to BinaryFunction class is a bit expensive for us because it increases our memory footprint -- an uninitialized std::map<> will cost 48 bytes, which means an additional 5MB for 100k BinaryFunction objects loaded at runtime. The x86 target primarily deals with very large binaries, so it will be good if this solution could be mindful of that and perhaps reduce the memory footprint for x86 that will likely not use this additional InstructionLabels data structure. Perhaps convert it to an std::optionalstd::map? Or better yet, maybe try to reuse the existing Relocations map that we have in BF and register your relocs there, then fetch that in ::disassemble() to try to fill up your new annotations?

mtvec · 2023-10-05T12:09:03Z

Adding data structures to BinaryFunction class is a bit expensive for us because it increases our memory footprint

Alright, that makes sense. I got rid of BinaryFunction::InstructionLabels as follows:

While analyzing relocations, simply add instruction references to the Relocations map. This is slightly awkward for two reasons (but still workable imo):
- We cannot add a symbol to the relocation since we have nowhere to store it. We cannot store in the reloc itself because multiple relocs might want to point to the same symbol. Relocations without symbols didn't happen before (afaict) so I had to add a null-check to an unrelated debug output.
- In order to be able to reconstruct the symbol reference later, we need to know the address the reloc points to. This is typically calculated by extracting the value. However, in case of instruction-referencing relocs like PCREL_LO* on RISC-V, the extracted value is unrelated to the referenced symbol. I simply set the value of the reloc to the address of the symbol it points to to solve this.
While disassembling, keep track of instruction labels in a map:
- Whenever we encounter and instruction with an instruction-referencing reloc, look up the label it points in the map (or create a new one) and replace its immediate with a symbol ref as usual.
- After having disassembled all instructions, iterate the map to add the labels to the corresponding instructions.

So basically, the InstructionLabels map is moved to the stack of disassemble() at the cost of a slightly awkward representation of some relocs.

rafaelauler

Thanks! LGTM

On RISC-V, this function will have to insert `auipc`+`lw`/`sw` pairs where the `auipc` needs a label annotation (see llvm#66395). In order to do this correctly within the context of `ParallelUtilities`, we need an allocator id when setting the label. This patch adds this allocator id argument and wires it through to all call sites. Note that this function is also made non-const since `setLabel` is non-const.

mtvec added the BOLT label Sep 14, 2023

mtvec requested review from aaupov, maksfb, rafaelauler and yota9 September 14, 2023 16:16

mtvec force-pushed the bolt-instruction-labels branch from e714030 to 8233eec Compare September 14, 2023 16:36

Remove BinaryFunction::InstructionLabels

6e96ad0

rafaelauler approved these changes Oct 5, 2023

View reviewed changes

mtvec merged commit ff5e2ba into llvm:main Oct 6, 2023
2 of 3 checks passed

mtvec deleted the bolt-instruction-labels branch October 6, 2023 06:46

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

mtvec mentioned this pull request Oct 10, 2023

[BOLT][NFC] Add allocator id to createInstrIncMemory #68709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOLT] Improve handling of relocations targeting specific instructions #66395

[BOLT] Improve handling of relocations targeting specific instructions #66395

mtvec commented Sep 14, 2023

yota9 commented Sep 14, 2023

mtvec commented Sep 14, 2023

yota9 commented Sep 14, 2023

mtvec commented Sep 15, 2023

yota9 commented Sep 15, 2023

mtvec commented Sep 19, 2023

yota9 commented Sep 19, 2023 •

edited

mtvec commented Sep 20, 2023

mtvec commented Sep 28, 2023

rafaelauler commented Oct 4, 2023

mtvec commented Oct 5, 2023

rafaelauler left a comment

[BOLT] Improve handling of relocations targeting specific instructions #66395

[BOLT] Improve handling of relocations targeting specific instructions #66395

Conversation

mtvec commented Sep 14, 2023

yota9 commented Sep 14, 2023

mtvec commented Sep 14, 2023

yota9 commented Sep 14, 2023

mtvec commented Sep 15, 2023

yota9 commented Sep 15, 2023

mtvec commented Sep 19, 2023

yota9 commented Sep 19, 2023 • edited

mtvec commented Sep 20, 2023

mtvec commented Sep 28, 2023

rafaelauler commented Oct 4, 2023

mtvec commented Oct 5, 2023

rafaelauler left a comment

Choose a reason for hiding this comment

yota9 commented Sep 19, 2023 •

edited