-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[lld][ARM] Don't emit veneers for wraparound branches. #165263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer. `ARM::inBranchRange` was failing to detect this because it calculated the offset as an int64_t, so that the offset was a small value ± 2^32 instead of just the smalle value. Fixes llvm#165211.
|
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-elf Author: Simon Tatham (statham-arm) ChangesIf an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer. Fixes #165211. Full diff: https://github.com/llvm/llvm-project/pull/165263.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/ARM.cpp b/lld/ELF/Arch/ARM.cpp
index 91a673f13d68e..6c4290ff1e448 100644
--- a/lld/ELF/Arch/ARM.cpp
+++ b/lld/ELF/Arch/ARM.cpp
@@ -472,7 +472,7 @@ bool ARM::inBranchRange(RelType type, uint64_t src, uint64_t dst) const {
// Bit 0 == 1 denotes Thumb state, it is not part of the range.
dst &= ~0x1;
- int64_t offset = dst - src;
+ int64_t offset = llvm::SignExtend64<32>(dst - src);
switch (type) {
case R_ARM_PC24:
case R_ARM_PLT32:
diff --git a/lld/test/ELF/arm-wraparound-veneer.s b/lld/test/ELF/arm-wraparound-veneer.s
new file mode 100644
index 0000000000000..6fe211edc5e10
--- /dev/null
+++ b/lld/test/ELF/arm-wraparound-veneer.s
@@ -0,0 +1,118 @@
+// REQUIRES: arm
+// RUN: rm -rf %t && split-file %s %t && cd %t
+// RUN: llvm-mc -filetype=obj -triple=armv7-none-eabi code.s -o code.o
+// RUN: ld.lld -T unsigned1.ld code.o -o unsigned1.elf
+// RUN: llvm-objdump --triple=armv7 -d unsigned1.elf | FileCheck %s --check-prefix=UNSIGNED1
+// RUN: ld.lld -T unsigned2.ld code.o -o unsigned2.elf
+// RUN: llvm-objdump --triple=armv7 -d unsigned2.elf | FileCheck %s --check-prefix=UNSIGNED2
+// RUN: ld.lld -T signed1.ld code.o -o signed1.elf
+// RUN: llvm-objdump --triple=armv7 -d signed1.elf | FileCheck %s --check-prefix=SIGNED1
+// RUN: ld.lld -T signed2.ld code.o -o signed2.elf
+// RUN: llvm-objdump --triple=armv7 -d signed2.elf | FileCheck %s --check-prefix=SIGNED2
+
+// The aim of this test is to ensure that a BL instruction near one end of the
+// address space can reach a function at the extreme other end, directly, using
+// a branch offset that makes the address wrap round. We check this at both the
+// unsigned wraparound point (one address near 0 and the other near 0xFFFFFFFF)
+// and the signed wraparound point (addresses either side of 0x80000000),
+// crossing the boundary in both directions. In all four cases we expect a
+// direct branch with no veneer.
+
+// UNSIGNED1: Disassembly of section .text.lowaddr:
+// UNSIGNED1: 00010000 <func>:
+// UNSIGNED1: 10000: e12fff1e bx lr
+//
+// UNSIGNED1: Disassembly of section .text.highaddr:
+// UNSIGNED1: ffff0000 <_start>:
+// UNSIGNED1: ffff0000: eb007ffe bl 0x10000
+// UNSIGNED1: ffff0004: e12fff1e bx lr
+
+// UNSIGNED2: Disassembly of section .text.lowaddr:
+// UNSIGNED2: 00010000 <_start>:
+// UNSIGNED2: 10000: ebff7ffe bl 0xffff0000
+// UNSIGNED2: 10004: e12fff1e bx lr
+//
+// UNSIGNED2: Disassembly of section .text.highaddr:
+// UNSIGNED2: ffff0000 <func>:
+// UNSIGNED2: ffff0000: e12fff1e bx lr
+
+// SIGNED1: Disassembly of section .text.posaddr:
+// SIGNED1: 7fff0000 <_start>:
+// SIGNED1: 7fff0000: eb007ffe bl 0x80010000
+// SIGNED1: 7fff0004: e12fff1e bx lr
+//
+// SIGNED1: Disassembly of section .text.negaddr:
+// SIGNED1: 80010000 <func>:
+// SIGNED1: 80010000: e12fff1e bx lr
+
+// SIGNED2: Disassembly of section .text.posaddr:
+// SIGNED2: 7fff0000 <func>:
+// SIGNED2: 7fff0000: e12fff1e bx lr
+//
+// SIGNED2: Disassembly of section .text.negaddr:
+// SIGNED2: 80010000 <_start>:
+// SIGNED2: 80010000: ebff7ffe bl 0x7fff0000
+// SIGNED2: 80010004: e12fff1e bx lr
+
+//--- code.s
+
+ .section .text.callee, "ax", %progbits
+ .global func
+ .type func, %function
+func:
+ bx lr
+
+ .section .text.caller, "ax", %progbits
+ .global _start
+ .type _start, %function
+_start:
+ bl func
+ bx lr
+
+//--- unsigned1.ld
+
+ENTRY(_start)
+PHDRS {
+ lowaddr PT_LOAD FLAGS(0x1 | 0x4);
+ highaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+ .text.lowaddr 0x00010000 : { *(.text.callee) } :lowaddr
+ .text.highaddr 0xffff0000 : { *(.text.caller) } :highaddr
+}
+
+//--- unsigned2.ld
+
+ENTRY(_start)
+PHDRS {
+ lowaddr PT_LOAD FLAGS(0x1 | 0x4);
+ highaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+ .text.lowaddr 0x00010000 : { *(.text.caller) } :lowaddr
+ .text.highaddr 0xffff0000 : { *(.text.callee) } :highaddr
+}
+
+//--- signed1.ld
+
+ENTRY(_start)
+PHDRS {
+ posaddr PT_LOAD FLAGS(0x1 | 0x4);
+ negaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+ .text.posaddr 0x7fff0000 : { *(.text.caller) } :posaddr
+ .text.negaddr 0x80010000 : { *(.text.callee) } :negaddr
+}
+
+//--- signed2.ld
+
+ENTRY(_start)
+PHDRS {
+ posaddr PT_LOAD FLAGS(0x1 | 0x4);
+ negaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+ .text.posaddr 0x7fff0000 : { *(.text.callee) } :posaddr
+ .text.negaddr 0x80010000 : { *(.text.caller) } :negaddr
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try and support the branch around the address space as I wasn't sure that this was entirely legal according to the Arm ARM. It does happen to work on a lot of Arm CPUs but I wasn't sure if that was something we could rely on architecturally.
I've put what I've found in the various Arm ARMs. From what I've been able to find it seems fine, apart from v8-A and above, where I think it is UNKNOWN (and hence can't be relied on). This could just be my reading of the Arm ARM though as the UNKNOWN parts may not be related to branch instructions.
Would you be able to check the Arm ARM to see whether you can find out if this is architecturally legal on v8-A? If it isn't then we could use the build attributes to see if at least one object is v8-a or above and suppress the wrap-around there.
v8-m Arm ARM suggests that this is supported
B7.4 Address space
The address space is a single, flat address space of 232 bytes.
Applies to an implementation of the architecture Armv8.0-M onward.
In the address space, byte addresses are unsigned numbers in the range 0-(232-1).
Applies to an implementation of the architecture Armv8.0-M onward.
If an address calculation overflows or underflows the address space, it wraps around. Address calculations are modulo 232.
As does the v7-ar Arm ARM
Address space overflow or underflow
Address space overflow occurs when the memory address increments beyond the top byte of the address space at 0xFFFFFFFF. When this happens, the address wraps round, so that, for example, incrementing 0xFFFFFFFF by 2 gives a result of 0x00000001.
Address space underflow occurs when the memory address decrements below the first byte of the address space at 0x00000000. When this happens, the address wraps round, so that, for example, decrementing 0x00000002 by 4 gives a result of 0xFFFFFFFE.
Arm v8-A for the 32-bit section is not as helpful. The pseudo code for branch instruction just has PC + imm32 which is passed as a 32-bit type.
E2.1.1 Address space
...
Address calculations are performed modulo 232.
The result of an address calculation is UNKNOWN if it overflows or underflows the 32-bit address range A[31:0].
Where UNKNOWN has this clause
An UNKNOWN value must not be documented or promoted as having a defined value or effect.
There is a section on address space underflow/overflow which unlike v7-A is explicitly UNKNOWN, however it only seems to apply to an instruction that after execution causes the PC to wrap round
Address space overflow or underflow
When a PE performs a normal, sequential execution of instructions, it calculates:
(address_of_current_instruction) + (size_of_executed_instruction)
This calculation is performed after each instruction to determine which instruction to execute next. If the address calculation performed after executing an A32 or T32 instruction overflows 0xFFFF FFFF, the Program Counter becomes UNKNOWN.
lld/test/ELF/arm-wraparound-veneer.s
Outdated
| // RUN: ld.lld -T signed2.ld code.o -o signed2.elf | ||
| // RUN: llvm-objdump --triple=armv7 -d signed2.elf | FileCheck %s --check-prefix=SIGNED2 | ||
|
|
||
| // The aim of this test is to ensure that a BL instruction near one end of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLD convention is to use /// for non FileCheck comments.
lld/test/ELF/arm-wraparound-veneer.s
Outdated
| // RUN: ld.lld -T unsigned2.ld code.o -o unsigned2.elf | ||
| // RUN: llvm-objdump --triple=armv7 -d unsigned2.elf | FileCheck %s --check-prefix=UNSIGNED2 | ||
| // RUN: ld.lld -T signed1.ld code.o -o signed1.elf | ||
| // RUN: llvm-objdump --triple=armv7 -d signed1.elf | FileCheck %s --check-prefix=SIGNED1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lld convention is to use --no-show-raw-insn in llvm-objdump unless it is a vital part of the test.
lld/test/ELF/arm-wraparound-veneer.s
Outdated
| // UNSIGNED1: 10000: e12fff1e bx lr | ||
| // | ||
| // UNSIGNED1: Disassembly of section .text.highaddr: | ||
| // UNSIGNED1: ffff0000 <_start>: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLD convention is to only put the first address in a sequence of instructions. For example:
// UNSIGNED1: ffff0000: eb007ffe bl 0x10000
// UNSIGNED1-NEXT: e12fff1e bx lr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, omit address for the function label <func>, and use -CHECK: with address in the next line.
This improves FileCheck diagnostics when the address changes.
lld/test/ELF/arm-wraparound-veneer.s
Outdated
| //--- unsigned1.ld | ||
|
|
||
| ENTRY(_start) | ||
| PHDRS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the PHDRS here. To get LLD to create a separate program header for each output section you can use AT.
.text.lowaddr 0x00010000 : AT(0x00010000) { *(.text.callee) }
Hmmm, you had a completely different methodology from me in checking the Arm ARM. I didn't try to find any high-level text about address space wraparound in general. I made a beeline for the architecture pseudocode of the BL instruction and checked the semantics specified in that. Armv8-A Arm ARM (ARM DDI 0487 version L.a): the pseudocode for the BL instruction in §F.5.1.25 creates the offset The semantics of the The So I think that a formal interpretation of the pseudocode says that wraparound is supposed to work, and the same is true for the |
I had a memory that there was something written about this in the ABI or the Arm ARM so I tried to find it. Unfortunately I couldn't. The pseudo code looked a bit ambiguous with just PC + imm32, but thanks for following through on the rules for bitstrings to show that this is permitted. The UNKNOWN parts did seem to be related to a non-branch instruction at 0xfffffffc wrapping round back to 0x0. Assuming the test comments can be addressed I can approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test updates. LGTM on my side. Please leave some time for MaskRay to make any further suggestions.
If an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer. `ARM::inBranchRange` was failing to detect this because it calculated the offset as an int64_t, so that the offset was a small value ± 2^32 instead of just the small value. Fixes llvm#165211.
If an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer.
ARM::inBranchRangewas failing to detect this because it calculated the offset as an int64_t, so that the offset was a small value ± 2^32 instead of just the small value.Fixes #165211.