Skip to content

Conversation

@statham-arm
Copy link
Collaborator

If an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer. ARM::inBranchRange was failing to detect this because it calculated the offset as an int64_t, so that the offset was a small value ± 2^32 instead of just the small value.

Fixes #165211.

If an instruction at the high end of the 32-bit address space branches
to one at the low end, then the branch can be within range for a B or
BL instruction, and doesn't need a veneer. `ARM::inBranchRange` was
failing to detect this because it calculated the offset as an int64_t,
so that the offset was a small value ± 2^32 instead of just the smalle
value.

Fixes llvm#165211.
@llvmbot
Copy link
Member

llvmbot commented Oct 27, 2025

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Simon Tatham (statham-arm)

Changes

If an instruction at the high end of the 32-bit address space branches to one at the low end, then the branch can be within range for a B or BL instruction, and doesn't need a veneer. ARM::inBranchRange was failing to detect this because it calculated the offset as an int64_t, so that the offset was a small value ± 2^32 instead of just the small value.

Fixes #165211.


Full diff: https://github.com/llvm/llvm-project/pull/165263.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/ARM.cpp (+1-1)
  • (added) lld/test/ELF/arm-wraparound-veneer.s (+118)
diff --git a/lld/ELF/Arch/ARM.cpp b/lld/ELF/Arch/ARM.cpp
index 91a673f13d68e..6c4290ff1e448 100644
--- a/lld/ELF/Arch/ARM.cpp
+++ b/lld/ELF/Arch/ARM.cpp
@@ -472,7 +472,7 @@ bool ARM::inBranchRange(RelType type, uint64_t src, uint64_t dst) const {
     // Bit 0 == 1 denotes Thumb state, it is not part of the range.
     dst &= ~0x1;
 
-  int64_t offset = dst - src;
+  int64_t offset = llvm::SignExtend64<32>(dst - src);
   switch (type) {
   case R_ARM_PC24:
   case R_ARM_PLT32:
diff --git a/lld/test/ELF/arm-wraparound-veneer.s b/lld/test/ELF/arm-wraparound-veneer.s
new file mode 100644
index 0000000000000..6fe211edc5e10
--- /dev/null
+++ b/lld/test/ELF/arm-wraparound-veneer.s
@@ -0,0 +1,118 @@
+// REQUIRES: arm
+// RUN: rm -rf %t && split-file %s %t && cd %t
+// RUN: llvm-mc -filetype=obj -triple=armv7-none-eabi code.s -o code.o
+// RUN: ld.lld -T unsigned1.ld code.o -o unsigned1.elf
+// RUN: llvm-objdump --triple=armv7 -d unsigned1.elf | FileCheck %s --check-prefix=UNSIGNED1
+// RUN: ld.lld -T unsigned2.ld code.o -o unsigned2.elf
+// RUN: llvm-objdump --triple=armv7 -d unsigned2.elf | FileCheck %s --check-prefix=UNSIGNED2
+// RUN: ld.lld -T signed1.ld code.o -o signed1.elf
+// RUN: llvm-objdump --triple=armv7 -d signed1.elf | FileCheck %s --check-prefix=SIGNED1
+// RUN: ld.lld -T signed2.ld code.o -o signed2.elf
+// RUN: llvm-objdump --triple=armv7 -d signed2.elf | FileCheck %s --check-prefix=SIGNED2
+
+// The aim of this test is to ensure that a BL instruction near one end of the
+// address space can reach a function at the extreme other end, directly, using
+// a branch offset that makes the address wrap round. We check this at both the
+// unsigned wraparound point (one address near 0 and the other near 0xFFFFFFFF)
+// and the signed wraparound point (addresses either side of 0x80000000),
+// crossing the boundary in both directions. In all four cases we expect a
+// direct branch with no veneer.
+
+// UNSIGNED1: Disassembly of section .text.lowaddr:
+// UNSIGNED1: 00010000 <func>:
+// UNSIGNED1:    10000: e12fff1e      bx      lr
+//
+// UNSIGNED1: Disassembly of section .text.highaddr:
+// UNSIGNED1: ffff0000 <_start>:
+// UNSIGNED1: ffff0000: eb007ffe      bl      0x10000
+// UNSIGNED1: ffff0004: e12fff1e      bx      lr
+
+// UNSIGNED2: Disassembly of section .text.lowaddr:
+// UNSIGNED2: 00010000 <_start>:
+// UNSIGNED2:    10000: ebff7ffe      bl      0xffff0000
+// UNSIGNED2:    10004: e12fff1e      bx      lr
+//
+// UNSIGNED2: Disassembly of section .text.highaddr:
+// UNSIGNED2: ffff0000 <func>:
+// UNSIGNED2: ffff0000: e12fff1e      bx      lr
+
+// SIGNED1:   Disassembly of section .text.posaddr:
+// SIGNED1:   7fff0000 <_start>:
+// SIGNED1:   7fff0000: eb007ffe      bl      0x80010000
+// SIGNED1:   7fff0004: e12fff1e      bx      lr
+//
+// SIGNED1:   Disassembly of section .text.negaddr:
+// SIGNED1:   80010000 <func>:
+// SIGNED1:   80010000: e12fff1e      bx      lr
+
+// SIGNED2:   Disassembly of section .text.posaddr:
+// SIGNED2:   7fff0000 <func>:
+// SIGNED2:   7fff0000: e12fff1e      bx      lr
+//
+// SIGNED2:   Disassembly of section .text.negaddr:
+// SIGNED2:   80010000 <_start>:
+// SIGNED2:   80010000: ebff7ffe      bl      0x7fff0000
+// SIGNED2:   80010004: e12fff1e      bx      lr
+
+//--- code.s
+
+  .section .text.callee, "ax", %progbits
+  .global func
+  .type func, %function
+func:
+  bx lr
+
+  .section .text.caller, "ax", %progbits
+  .global _start
+  .type _start, %function
+_start:
+  bl func
+  bx lr
+
+//--- unsigned1.ld
+
+ENTRY(_start)
+PHDRS {
+  lowaddr  PT_LOAD FLAGS(0x1 | 0x4);
+  highaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+  .text.lowaddr  0x00010000 : { *(.text.callee) } :lowaddr
+  .text.highaddr 0xffff0000 : { *(.text.caller) } :highaddr
+}
+
+//--- unsigned2.ld
+
+ENTRY(_start)
+PHDRS {
+  lowaddr  PT_LOAD FLAGS(0x1 | 0x4);
+  highaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+  .text.lowaddr  0x00010000 : { *(.text.caller) } :lowaddr
+  .text.highaddr 0xffff0000 : { *(.text.callee) } :highaddr
+}
+
+//--- signed1.ld
+
+ENTRY(_start)
+PHDRS {
+  posaddr PT_LOAD FLAGS(0x1 | 0x4);
+  negaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+  .text.posaddr  0x7fff0000 : { *(.text.caller) } :posaddr
+  .text.negaddr  0x80010000 : { *(.text.callee) } :negaddr
+}
+
+//--- signed2.ld
+
+ENTRY(_start)
+PHDRS {
+  posaddr PT_LOAD FLAGS(0x1 | 0x4);
+  negaddr PT_LOAD FLAGS(0x1 | 0x4);
+}
+SECTIONS {
+  .text.posaddr  0x7fff0000 : { *(.text.callee) } :posaddr
+  .text.negaddr  0x80010000 : { *(.text.caller) } :negaddr
+}

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try and support the branch around the address space as I wasn't sure that this was entirely legal according to the Arm ARM. It does happen to work on a lot of Arm CPUs but I wasn't sure if that was something we could rely on architecturally.

I've put what I've found in the various Arm ARMs. From what I've been able to find it seems fine, apart from v8-A and above, where I think it is UNKNOWN (and hence can't be relied on). This could just be my reading of the Arm ARM though as the UNKNOWN parts may not be related to branch instructions.

Would you be able to check the Arm ARM to see whether you can find out if this is architecturally legal on v8-A? If it isn't then we could use the build attributes to see if at least one object is v8-a or above and suppress the wrap-around there.

v8-m Arm ARM suggests that this is supported

B7.4 Address space
The address space is a single, flat address space of 232 bytes.
  Applies to an implementation of the architecture Armv8.0-M onward.

In the address space, byte addresses are unsigned numbers in the range 0-(232-1).
  Applies to an implementation of the architecture Armv8.0-M onward.
If an address calculation overflows or underflows the address space, it wraps around. Address calculations are modulo 232.

As does the v7-ar Arm ARM

Address space overflow or underflow

Address space overflow occurs when the memory address increments beyond the top byte of the address space at 0xFFFFFFFF. When this happens, the address wraps round, so that, for example, incrementing 0xFFFFFFFF by 2 gives a result of 0x00000001.

Address space underflow occurs when the memory address decrements below the first byte of the address space at 0x00000000. When this happens, the address wraps round, so that, for example, decrementing 0x00000002 by 4 gives a result of 0xFFFFFFFE.

Arm v8-A for the 32-bit section is not as helpful. The pseudo code for branch instruction just has PC + imm32 which is passed as a 32-bit type.

E2.1.1 Address space
...
Address calculations are performed modulo 232.

The result of an address calculation is UNKNOWN if it overflows or underflows the 32-bit address range A[31:0].

Where UNKNOWN has this clause

An UNKNOWN value must not be documented or promoted as having a defined value or effect.

There is a section on address space underflow/overflow which unlike v7-A is explicitly UNKNOWN, however it only seems to apply to an instruction that after execution causes the PC to wrap round

Address space overflow or underflow

When a PE performs a normal, sequential execution of instructions, it calculates:
(address_of_current_instruction) + (size_of_executed_instruction)
This calculation is performed after each instruction to determine which instruction to execute next. If the address calculation performed after executing an A32 or T32 instruction overflows 0xFFFF FFFF, the Program Counter becomes UNKNOWN.

// RUN: ld.lld -T signed2.ld code.o -o signed2.elf
// RUN: llvm-objdump --triple=armv7 -d signed2.elf | FileCheck %s --check-prefix=SIGNED2

// The aim of this test is to ensure that a BL instruction near one end of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLD convention is to use /// for non FileCheck comments.

// RUN: ld.lld -T unsigned2.ld code.o -o unsigned2.elf
// RUN: llvm-objdump --triple=armv7 -d unsigned2.elf | FileCheck %s --check-prefix=UNSIGNED2
// RUN: ld.lld -T signed1.ld code.o -o signed1.elf
// RUN: llvm-objdump --triple=armv7 -d signed1.elf | FileCheck %s --check-prefix=SIGNED1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lld convention is to use --no-show-raw-insn in llvm-objdump unless it is a vital part of the test.

// UNSIGNED1: 10000: e12fff1e bx lr
//
// UNSIGNED1: Disassembly of section .text.highaddr:
// UNSIGNED1: ffff0000 <_start>:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLD convention is to only put the first address in a sequence of instructions. For example:

// UNSIGNED1: ffff0000: eb007ffe      bl      0x10000
// UNSIGNED1-NEXT:      e12fff1e      bx      lr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, omit address for the function label <func>, and use -CHECK: with address in the next line.
This improves FileCheck diagnostics when the address changes.

//--- unsigned1.ld

ENTRY(_start)
PHDRS {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the PHDRS here. To get LLD to create a separate program header for each output section you can use AT.

.text.lowaddr 0x00010000 : AT(0x00010000) { *(.text.callee) }

@statham-arm
Copy link
Collaborator Author

I've put what I've found in the various Arm ARMs. From what I've been able to find it seems fine, apart from v8-A and above, where I think it is UNKNOWN (and hence can't be relied on). This could just be my reading of the Arm ARM though as the UNKNOWN parts may not be related to branch instructions.

Hmmm, you had a completely different methodology from me in checking the Arm ARM. I didn't try to find any high-level text about address space wraparound in general. I made a beeline for the architecture pseudocode of the BL instruction and checked the semantics specified in that.

Armv8-A Arm ARM (ARM DDI 0487 version L.a): the pseudocode for the BL instruction in §F.5.1.25 creates the offset imm32 by calling SignExtend(..., 32) with the first parameter being a pile of bits from the instruction encoding, and then makes the target address as either PC32 + imm32 or Align(PC32, 4) + imm32.

The semantics of the + operator are defined in §K.16.4.4.2, and vary with the data type – there's provision made for bits(N) bitstring types, and actual integers, and a combination of both. In the case where both inputs are bits(N) for the same N, the definition is the obvious one: addition is mod 2^N, so that wraparound has the obvious behaviour. So we only need to check that the two operands to + in the target address calculation are bits(32) and not some other type.

The SignExtend pseudocode function, creating the offset imm32, returns the type bits(N) where N is the second parameter. So that's bits(32). The PC32 pseudo-variable is also defined as having type bits(32). And the Align pseudocode function can take an integer or a bitstring, and returns the same type.

So I think that a formal interpretation of the pseudocode says that wraparound is supposed to work, and the same is true for the B instruction (§F5.1.18) as well as BL.

@smithp35
Copy link
Collaborator

I've put what I've found in the various Arm ARMs. From what I've been able to find it seems fine, apart from v8-A and above, where I think it is UNKNOWN (and hence can't be relied on). This could just be my reading of the Arm ARM though as the UNKNOWN parts may not be related to branch instructions.

Hmmm, you had a completely different methodology from me in checking the Arm ARM. I didn't try to find any high-level text about address space wraparound in general. I made a beeline for the architecture pseudocode of the BL instruction and checked the semantics specified in that.

I had a memory that there was something written about this in the ABI or the Arm ARM so I tried to find it. Unfortunately I couldn't. The pseudo code looked a bit ambiguous with just PC + imm32, but thanks for following through on the rules for bitstrings to show that this is permitted. The UNKNOWN parts did seem to be related to a non-branch instruction at 0xfffffffc wrapping round back to 0x0.

Assuming the test comments can be addressed I can approve.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test updates. LGTM on my side. Please leave some time for MaskRay to make any further suggestions.

@statham-arm statham-arm merged commit f244386 into llvm:main Oct 31, 2025
10 checks passed
@statham-arm statham-arm deleted the lld-branch-wraparound branch October 31, 2025 09:31
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
If an instruction at the high end of the 32-bit address space branches
to one at the low end, then the branch can be within range for a B or BL
instruction, and doesn't need a veneer. `ARM::inBranchRange` was failing
to detect this because it calculated the offset as an int64_t, so that
the offset was a small value ± 2^32 instead of just the small value.

Fixes llvm#165211.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLD calculates relocation offsets incorrectly and creates useless veneers

4 participants