Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][MC][ARM][Assembly] Emit relocations for LDRs #72873

Merged
merged 5 commits into from
Nov 25, 2023

Conversation

eleanor-arm
Copy link
Contributor

It's possible (though inadvisable) to use LDR and refer to labels in different
sections. In the Arm state, the assembler resolves the LDR instruction without
emitting a relocation. That's incorrect because the assembler cannot make any
assumptions about the relative position of the sections and the compiler output
is therefore wrong.

This patch ensures relocations are generated for all LDR <Rt...>, label
instructions in the Arm state (little endian). This is not necessary when the
label is in the same section but the relocation is now generated regardless.
Instructions that now generate relocations have been removed from the
pcrel-global.s test.

Fortunately, LLD already implements the generated relocations and can fix LDR
instructions when the symbol is in a different section, or report an error if
the offset is too large for the immediate field in the particular LDR's
encoding.

The patch to address this problem for big endian targets will follow, as well
as a fix for ADR that exhibits a similar behavior.

@llvmbot llvmbot added backend:ARM mc Machine (object) code labels Nov 20, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 20, 2023

@llvm/pr-subscribers-mc

Author: Eleanor Bonnnici (eleanor-arm)

Changes

It's possible (though inadvisable) to use LDR and refer to labels in different
sections. In the Arm state, the assembler resolves the LDR instruction without
emitting a relocation. That's incorrect because the assembler cannot make any
assumptions about the relative position of the sections and the compiler output
is therefore wrong.

This patch ensures relocations are generated for all LDR &lt;Rt...&gt;, label
instructions in the Arm state (little endian). This is not necessary when the
label is in the same section but the relocation is now generated regardless.
Instructions that now generate relocations have been removed from the
pcrel-global.s test.

Fortunately, LLD already implements the generated relocations and can fix LDR
instructions when the symbol is in a different section, or report an error if
the offset is too large for the immediate field in the particular LDR's
encoding.

The patch to address this problem for big endian targets will follow, as well
as a fix for ADR that exhibits a similar behavior.


Full diff: https://github.com/llvm/llvm-project/pull/72873.diff

7 Files Affected:

  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+3-3)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp (+6)
  • (added) llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s (+39)
  • (modified) llvm/test/MC/ARM/pcrel-global.s (+2-6)
  • (added) llvm/test/MC/ARM/pcrel-ldr-relocs.s (+45)
  • (added) llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s (+36)
  • (modified) llvm/test/MC/ARM/thumb1-relax-ldrlit.s (-1)
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
index 9230ff7baedadaf..7a307bfdd1e61b8 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
@@ -74,10 +74,10 @@ const MCFixupKindInfo &ARMAsmBackend::getFixupKindInfo(MCFixupKind Kind) const {
       // ARMFixupKinds.h.
       //
       // Name                      Offset (bits) Size (bits)     Flags
-      {"fixup_arm_ldst_pcrel_12", 0, 32, IsPCRelConstant},
+      {"fixup_arm_ldst_pcrel_12", 0, 32, MCFixupKindInfo::FKF_IsPCRel},
       {"fixup_t2_ldst_pcrel_12", 0, 32,
-       IsPCRelConstant | MCFixupKindInfo::FKF_IsAlignedDownTo32Bits},
-      {"fixup_arm_pcrel_10_unscaled", 0, 32, IsPCRelConstant},
+       MCFixupKindInfo::FKF_IsPCRel | MCFixupKindInfo::FKF_IsAlignedDownTo32Bits},
+      {"fixup_arm_pcrel_10_unscaled", 0, 32, MCFixupKindInfo::FKF_IsPCRel},
       {"fixup_arm_pcrel_10", 0, 32, IsPCRelConstant},
       {"fixup_t2_pcrel_10", 0, 32,
        MCFixupKindInfo::FKF_IsPCRel |
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
index f2ca6f91477f44d..985097fc3281051 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
@@ -158,6 +158,12 @@ unsigned ARMELFObjectWriter::GetRelocTypeInner(const MCValue &Target,
       default:
         return ELF::R_ARM_THM_CALL;
       }
+    case ARM::fixup_arm_ldst_pcrel_12:
+      return ELF::R_ARM_LDR_PC_G0;
+    case ARM::fixup_arm_pcrel_10_unscaled:
+      return ELF::R_ARM_LDRS_PC_G0;
+    case ARM::fixup_t2_ldst_pcrel_12:
+      return ELF::R_ARM_THM_PC12;
     case ARM::fixup_bf_target:
       return ELF::R_ARM_THM_BF16;
     case ARM::fixup_bfc_target:
diff --git a/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s b/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s
new file mode 100644
index 000000000000000..e67992ab811588d
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s
@@ -0,0 +1,39 @@
+@ RUN: llvm-mc -filetype=obj -triple=armv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=ARM
+@ RUN: llvm-objdump -d --triple=armv7 %t | FileCheck %s --check-prefix=ARM_ADDEND
+
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo1
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo2
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo3
+
+// Value is decimal at the moment but hex in other cases (things could change)
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+
+    .arm
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldrh r0, foo1
+    ldrsb r0, foo2
+    ldrsh r0, foo3
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo1
+    .global foo2
+    .global foo3
+foo1:
+foo2:
+foo3:
+    .word 0x11223344, 0x55667788
diff --git a/llvm/test/MC/ARM/pcrel-global.s b/llvm/test/MC/ARM/pcrel-global.s
index 91ef3b6ca7b15ac..15d46cf2063ecff 100644
--- a/llvm/test/MC/ARM/pcrel-global.s
+++ b/llvm/test/MC/ARM/pcrel-global.s
@@ -9,10 +9,9 @@
 @ DISASM-LABEL: <bar>:
 @ DISASM-NEXT:    adr.w   r0, #-4
 @ DISASM-NEXT:    adr.w   r0, #-8
-@ DISASM-NEXT:    ldr.w   pc, [pc, #-0xc]         @ 0x10 <bar>
-@ DISASM-NEXT:    ldr     r0, [pc, #0x0]          @ 0x20 <bar+0x10>
+@ DISASM-NEXT:    ldr     r0, [pc, #0x0]          @ 0x14 <bar+0xc>
 @ DISASM-NEXT:    add     r0, pc
-@ DISASM-NEXT:   .word   0xffffffef
+@ DISASM-NEXT:   .word   0xfffffff3
 @@ GNU assembler creates an R_ARM_REL32 referencing bar.
 @ DISASM-NOT:    {{.}}
 
@@ -20,10 +19,8 @@
 
 .globl foo
 foo:
-ldrd r0, r1, foo @ arm_pcrel_10_unscaled
 vldr d0, foo     @ arm_pcrel_10
 adr r2, foo      @ arm_adr_pcrel_12
-ldr r0, foo      @ arm_ldst_pcrel_12
 
 .thumb
 .thumb_func
@@ -32,7 +29,6 @@ ldr r0, foo      @ arm_ldst_pcrel_12
 bar:
 adr r0, bar      @ thumb_adr_pcrel_10
 adr.w r0, bar    @ t2_adr_pcrel_12
-ldr.w pc, bar    @ t2_ldst_pcrel_12
 
   ldr r0, .LCPI
 .LPC0_1:
diff --git a/llvm/test/MC/ARM/pcrel-ldr-relocs.s b/llvm/test/MC/ARM/pcrel-ldr-relocs.s
new file mode 100644
index 000000000000000..e73378f2f990a9e
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-ldr-relocs.s
@@ -0,0 +1,45 @@
+@ RUN: llvm-mc -filetype=obj -triple=armv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=ARM
+@ RUN: llvm-objdump -d --triple=armv7 %t | FileCheck %s --check-prefix=ARM_ADDEND
+@ RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=THUMB
+@ RUN: llvm-objdump -d --triple=thumbv7 %t | FileCheck %s --check-prefix=THUMB_ADDEND
+
+@ ARM: R_ARM_LDR_PC_G0
+@ ARM: foo1
+@ ARM: R_ARM_LDR_PC_G0
+@ ARM: foo2
+
+@ ARM_ADDEND: r0, [pc, #-0x8]
+@ ARM_ADDEND: r0, [pc, #-0x8]
+@ ARM_ADDEND: r0, [pc, #-0x10]
+
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo1
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo2
+
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0xc]
+
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldr r0, foo1
+    ldrb r0, foo2
+    ldr r0, foo3-8
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo1
+    .global foo2
+    .global foo3
+foo1:
+foo2:
+    .word 0x11223344, 0x55667788
+foo3:
+    .word 0x99aabbcc, 0xddeeff00
diff --git a/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s b/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s
new file mode 100644
index 000000000000000..dcc260fe5f4b6bd
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s
@@ -0,0 +1,36 @@
+@ RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=THUMB
+@ RUN: llvm-objdump -d --triple=thumbv7 %t | FileCheck %s --check-prefix=THUMB_ADDEND
+
+@ All the ldr variants produce a relocation
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo3
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo4
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo5
+
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+
+    .thumb
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldrh r0, foo3
+    ldrsb r0, foo4
+    ldrsh r0, foo5
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo3
+    .global foo4
+    .global foo5
+foo3:
+foo4:
+foo5:
+    .word 0x11223344, 0x55667788
diff --git a/llvm/test/MC/ARM/thumb1-relax-ldrlit.s b/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
index 5cba3690f1feb82..8f455c89d41eb06 100644
--- a/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
+++ b/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
@@ -1,6 +1,5 @@
 @ RUN: not llvm-mc -triple thumbv6m-none-macho -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
 @ RUN: not llvm-mc -triple thumbv7m-none-macho -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
-@ RUN: not llvm-mc -triple thumbv7m-none-eabi -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
 
         .global func1
 _func1:

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 20, 2023

@llvm/pr-subscribers-backend-arm

Author: Eleanor Bonnnici (eleanor-arm)

Changes

It's possible (though inadvisable) to use LDR and refer to labels in different
sections. In the Arm state, the assembler resolves the LDR instruction without
emitting a relocation. That's incorrect because the assembler cannot make any
assumptions about the relative position of the sections and the compiler output
is therefore wrong.

This patch ensures relocations are generated for all LDR &lt;Rt...&gt;, label
instructions in the Arm state (little endian). This is not necessary when the
label is in the same section but the relocation is now generated regardless.
Instructions that now generate relocations have been removed from the
pcrel-global.s test.

Fortunately, LLD already implements the generated relocations and can fix LDR
instructions when the symbol is in a different section, or report an error if
the offset is too large for the immediate field in the particular LDR's
encoding.

The patch to address this problem for big endian targets will follow, as well
as a fix for ADR that exhibits a similar behavior.


Full diff: https://github.com/llvm/llvm-project/pull/72873.diff

7 Files Affected:

  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+3-3)
  • (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp (+6)
  • (added) llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s (+39)
  • (modified) llvm/test/MC/ARM/pcrel-global.s (+2-6)
  • (added) llvm/test/MC/ARM/pcrel-ldr-relocs.s (+45)
  • (added) llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s (+36)
  • (modified) llvm/test/MC/ARM/thumb1-relax-ldrlit.s (-1)
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
index 9230ff7baedadaf..7a307bfdd1e61b8 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
@@ -74,10 +74,10 @@ const MCFixupKindInfo &ARMAsmBackend::getFixupKindInfo(MCFixupKind Kind) const {
       // ARMFixupKinds.h.
       //
       // Name                      Offset (bits) Size (bits)     Flags
-      {"fixup_arm_ldst_pcrel_12", 0, 32, IsPCRelConstant},
+      {"fixup_arm_ldst_pcrel_12", 0, 32, MCFixupKindInfo::FKF_IsPCRel},
       {"fixup_t2_ldst_pcrel_12", 0, 32,
-       IsPCRelConstant | MCFixupKindInfo::FKF_IsAlignedDownTo32Bits},
-      {"fixup_arm_pcrel_10_unscaled", 0, 32, IsPCRelConstant},
+       MCFixupKindInfo::FKF_IsPCRel | MCFixupKindInfo::FKF_IsAlignedDownTo32Bits},
+      {"fixup_arm_pcrel_10_unscaled", 0, 32, MCFixupKindInfo::FKF_IsPCRel},
       {"fixup_arm_pcrel_10", 0, 32, IsPCRelConstant},
       {"fixup_t2_pcrel_10", 0, 32,
        MCFixupKindInfo::FKF_IsPCRel |
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
index f2ca6f91477f44d..985097fc3281051 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMELFObjectWriter.cpp
@@ -158,6 +158,12 @@ unsigned ARMELFObjectWriter::GetRelocTypeInner(const MCValue &Target,
       default:
         return ELF::R_ARM_THM_CALL;
       }
+    case ARM::fixup_arm_ldst_pcrel_12:
+      return ELF::R_ARM_LDR_PC_G0;
+    case ARM::fixup_arm_pcrel_10_unscaled:
+      return ELF::R_ARM_LDRS_PC_G0;
+    case ARM::fixup_t2_ldst_pcrel_12:
+      return ELF::R_ARM_THM_PC12;
     case ARM::fixup_bf_target:
       return ELF::R_ARM_THM_BF16;
     case ARM::fixup_bfc_target:
diff --git a/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s b/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s
new file mode 100644
index 000000000000000..e67992ab811588d
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-arm-ldr-imm8-relocs.s
@@ -0,0 +1,39 @@
+@ RUN: llvm-mc -filetype=obj -triple=armv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=ARM
+@ RUN: llvm-objdump -d --triple=armv7 %t | FileCheck %s --check-prefix=ARM_ADDEND
+
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo1
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo2
+@ ARM: R_ARM_LDRS_PC_G0
+@ ARM: foo3
+
+// Value is decimal at the moment but hex in other cases (things could change)
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+@ ARM_ADDEND: r0, [pc, #-
+@ ARM_ADDEND 8]
+
+    .arm
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldrh r0, foo1
+    ldrsb r0, foo2
+    ldrsh r0, foo3
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo1
+    .global foo2
+    .global foo3
+foo1:
+foo2:
+foo3:
+    .word 0x11223344, 0x55667788
diff --git a/llvm/test/MC/ARM/pcrel-global.s b/llvm/test/MC/ARM/pcrel-global.s
index 91ef3b6ca7b15ac..15d46cf2063ecff 100644
--- a/llvm/test/MC/ARM/pcrel-global.s
+++ b/llvm/test/MC/ARM/pcrel-global.s
@@ -9,10 +9,9 @@
 @ DISASM-LABEL: <bar>:
 @ DISASM-NEXT:    adr.w   r0, #-4
 @ DISASM-NEXT:    adr.w   r0, #-8
-@ DISASM-NEXT:    ldr.w   pc, [pc, #-0xc]         @ 0x10 <bar>
-@ DISASM-NEXT:    ldr     r0, [pc, #0x0]          @ 0x20 <bar+0x10>
+@ DISASM-NEXT:    ldr     r0, [pc, #0x0]          @ 0x14 <bar+0xc>
 @ DISASM-NEXT:    add     r0, pc
-@ DISASM-NEXT:   .word   0xffffffef
+@ DISASM-NEXT:   .word   0xfffffff3
 @@ GNU assembler creates an R_ARM_REL32 referencing bar.
 @ DISASM-NOT:    {{.}}
 
@@ -20,10 +19,8 @@
 
 .globl foo
 foo:
-ldrd r0, r1, foo @ arm_pcrel_10_unscaled
 vldr d0, foo     @ arm_pcrel_10
 adr r2, foo      @ arm_adr_pcrel_12
-ldr r0, foo      @ arm_ldst_pcrel_12
 
 .thumb
 .thumb_func
@@ -32,7 +29,6 @@ ldr r0, foo      @ arm_ldst_pcrel_12
 bar:
 adr r0, bar      @ thumb_adr_pcrel_10
 adr.w r0, bar    @ t2_adr_pcrel_12
-ldr.w pc, bar    @ t2_ldst_pcrel_12
 
   ldr r0, .LCPI
 .LPC0_1:
diff --git a/llvm/test/MC/ARM/pcrel-ldr-relocs.s b/llvm/test/MC/ARM/pcrel-ldr-relocs.s
new file mode 100644
index 000000000000000..e73378f2f990a9e
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-ldr-relocs.s
@@ -0,0 +1,45 @@
+@ RUN: llvm-mc -filetype=obj -triple=armv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=ARM
+@ RUN: llvm-objdump -d --triple=armv7 %t | FileCheck %s --check-prefix=ARM_ADDEND
+@ RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=THUMB
+@ RUN: llvm-objdump -d --triple=thumbv7 %t | FileCheck %s --check-prefix=THUMB_ADDEND
+
+@ ARM: R_ARM_LDR_PC_G0
+@ ARM: foo1
+@ ARM: R_ARM_LDR_PC_G0
+@ ARM: foo2
+
+@ ARM_ADDEND: r0, [pc, #-0x8]
+@ ARM_ADDEND: r0, [pc, #-0x8]
+@ ARM_ADDEND: r0, [pc, #-0x10]
+
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo1
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo2
+
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0xc]
+
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldr r0, foo1
+    ldrb r0, foo2
+    ldr r0, foo3-8
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo1
+    .global foo2
+    .global foo3
+foo1:
+foo2:
+    .word 0x11223344, 0x55667788
+foo3:
+    .word 0x99aabbcc, 0xddeeff00
diff --git a/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s b/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s
new file mode 100644
index 000000000000000..dcc260fe5f4b6bd
--- /dev/null
+++ b/llvm/test/MC/ARM/pcrel-thumb-ldr2-relocs.s
@@ -0,0 +1,36 @@
+@ RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o %t
+@ RUN: llvm-readelf -r %t | FileCheck %s --check-prefix=THUMB
+@ RUN: llvm-objdump -d --triple=thumbv7 %t | FileCheck %s --check-prefix=THUMB_ADDEND
+
+@ All the ldr variants produce a relocation
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo3
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo4
+@ THUMB: R_ARM_THM_PC12
+@ THUMB: foo5
+
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+@ THUMB_ADDEND: r0, [pc, #-0x4]
+
+    .thumb
+    .section .text.bar, "ax"
+    .balign 4
+    .global bar
+    .type bar, %function
+bar:
+    ldrh r0, foo3
+    ldrsb r0, foo4
+    ldrsh r0, foo5
+    bx lr
+
+    .section .data.foo, "a", %progbits
+    .balign 4
+    .global foo3
+    .global foo4
+    .global foo5
+foo3:
+foo4:
+foo5:
+    .word 0x11223344, 0x55667788
diff --git a/llvm/test/MC/ARM/thumb1-relax-ldrlit.s b/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
index 5cba3690f1feb82..8f455c89d41eb06 100644
--- a/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
+++ b/llvm/test/MC/ARM/thumb1-relax-ldrlit.s
@@ -1,6 +1,5 @@
 @ RUN: not llvm-mc -triple thumbv6m-none-macho -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
 @ RUN: not llvm-mc -triple thumbv7m-none-macho -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
-@ RUN: not llvm-mc -triple thumbv7m-none-eabi -filetype=obj -o /dev/null %s 2>&1 | FileCheck --check-prefix=CHECK-ERROR %s
 
         .global func1
 _func1:

Copy link

github-actions bot commented Nov 20, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@DavidSpickett
Copy link
Collaborator

Thank you for including tags in the title (always useful for us folks on buildbot duty), but might I suggest they be this instead:

[llvm][MC][ARM] ...

Since llvm is a subproject of the monorepo (albeit the key one) and ARM is the equivalent name for AArch32 in llvm.

It's possible (though inadvisable) to use LDR and refer to labels in different
sections. In the Arm state, the assembler resolves the LDR instruction without
emitting a relocation. That's incorrect because the assembler cannot make any
assumptions about the relative position of the sections and the compiler output
is therefore wrong.

This patch ensures relocations are generated for all `LDR <Rt...>, label`
instructions in the Arm state (little endian). This is not necessary when the
label is in the same section but the relocation is now generated regardless.
Instructions that now generate relocations have been removed from the
pcrel-global.s test.

Fortunately, LLD already implements the generated relocations and can fix LDR
instructions when the symbol is in a different section, or report an error if
the offset is too large for the immediate field in the particular LDR's
encoding.

The patch to address this problem for big endian targets will follow, as well
as a fix for ADR that exhibits a similar behavior.
@DavidSpickett
Copy link
Collaborator

Tags are fine now, but you'll need to edit the title here on the PR page too. The way llvm is setup, the final commit title/message defaults to the content from the PR not any of the commits.

(you could paste it in later, but I know I'd forget)

@eleanor-arm eleanor-arm changed the title [MC][Aarch32][Assembly] Emit relocs for LDRs [llvm][MC][ARM][Assembly] Emit relocs for LDRs Nov 20, 2023
@ ARM_ADDEND: r0, [pc, #-
@ ARM_ADDEND 8]
@ ARM_ADDEND: r0, [pc, #-
@ ARM_ADDEND 8]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we'd use a regex here to make the 0x optional, something like {{(0x)?}}, at least that's what's done in other types of test file. That also means you don't get any weird stuff in the middle, , #-stuffgoeshere8, not that that's very likely here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking a file at random for example: llvm/test/Feature/load_plugin_error.ll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I don't know how you make it optional exactly, so I chose another route. The regex way is better ofc, I'll try to find out how to do it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per usual there's a few ways and they are all indirectly documented. This is the closest I found for regex https://llvm.org/docs/TestingGuide.html

And not for this, but in general you'll see these https://llvm.org/docs/CommandGuide/lit.html#substitutions around.

I wouldn't commit them to memory though, just be aware they are out there and only a few "find in files" away if you need to know what they are.

Copy link
Collaborator

@DavidSpickett DavidSpickett Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at lit not filecheck, here's the documentation: https://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-regex-matching-syntax .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's been improved now :)

@ RUN: llvm-objdump -d --triple=thumbv7 %t | FileCheck %s --check-prefix=THUMB_ADDEND

@ ARM: R_ARM_LDR_PC_G0
@ ARM: foo1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the text you're checking for here? Again, just being paranoid that:

R_ARM_LDR_PC_G0
random_symbol
R_SOMETHING_ELSE
foo1

Would match this too.

If they are on the next line you can do ARM-NEXT: and lit will make sure there's no extra stuff there, or regex is an option if you've got stuff in between that varies e.g. offsets and the like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-next-directive

And there's one for same line, I think. They should all be listed in that doc somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checking of symbol name is unnecessary now because a relocation is generated for every instruction in the test. It was a remainder from the initial version of the test where I had a mix of instructions, some were generating relocations, some were not... I've removed these checks from all the tests now.

@DavidSpickett
Copy link
Collaborator

I'm just nit picking tests now, other than that it looks good to me.

I'm not familiar with what motivated this fix so nominate @smithp35 to give it a final check.

I am curious though...

The patch to address this problem for big endian targets will follow

Is the little endian specific bit here what offset into the symbol is chosen when you load a doubleword/word/halfword, etc. ?

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've talked to @smithp35 offline, so this LGTM.

@DavidSpickett
Copy link
Collaborator

Emit relocs for LDRs

Though I'd write "relocations" instead just because it makes it a bit more search friendly.

@eleanor-arm eleanor-arm changed the title [llvm][MC][ARM][Assembly] Emit relocs for LDRs [llvm][MC][ARM][Assembly] Emit relocations for LDRs Nov 24, 2023
@eleanor-arm
Copy link
Contributor Author

Is the little endian specific bit here what offset into the symbol is chosen when you load a doubleword/word/halfword, etc. ?

I missed this comment, sorry. I couldn’t get all the relocations generated for big endian targets, so I’m assuming that somewhere in the assembler code, there’s logic that separates the output generation for different endianness, before the relocations are decided… It should be clearer in the next patch :)

@eleanor-arm eleanor-arm merged commit bbc5d9f into llvm:main Nov 25, 2023
3 checks passed
@eleanor-arm eleanor-arm deleted the for_upstream branch November 25, 2023 23:20
eleanor-arm added a commit to eleanor-arm/llvm-project that referenced this pull request Nov 30, 2023
…rgets

Follow-up on llvm#72873

When ADR/LDR instructions reference a label in a different section, the
offset is not known until link time, however, the assembler assumes it
can resolve them in some cases.

The previous patch addressed the issue for most LDR instructions,
focusing on little-endian targets.

This patch addresses the remaining work for ADRs and big-endian targets.
eleanor-arm added a commit that referenced this pull request Dec 1, 2023
…rgets (#73834)

Follow-up on #72873
    
When ADR/LDR instructions reference a label in a different section, the
offset is not known until link time, however, the assembler assumes it
    can resolve them in some cases.
    
    The previous patch addressed the issue for most LDR instructions,
    focusing on little-endian targets.
    
This patch addresses the remaining work for ADRs and big-endian targets.
@bd1976bris
Copy link
Collaborator

bd1976bris commented Dec 21, 2023

@eleanor-arm / @smithp35 / @DavidSpickett - I don't think this patch or the subsequent one is valid. Please address the comment that I have left on #73834.

Edit - Thanks. There is a discussion is on #73834. Conclusion is that these patches are valid but there is a problem with processing the relocations that are being emitted in LLD.

eleanor-arm added a commit to eleanor-arm/llvm-project that referenced this pull request Dec 29, 2023
Removes logic that caused some fixups to be marked as resolved in the
assembler without actually resolving them. Sometimes to resolve these
fixups a relocation was needed, which was not generated, resulting in
invalid code. Now in situations when the fixup cannot be resolved by the
assembler, either a relocation is generated, or an error is produced.

This was partially addressed previously
(llvm#72873,
llvm#73834) specifically for LDRx
and ADR instructions, This patch expands it LDRD and VLDR, which should
cover all instructions affected.
eleanor-arm added a commit to eleanor-arm/llvm-project that referenced this pull request Jan 2, 2024
[llvm][MC][ARM] Don't autoresolve fixups

Removes logic that caused some fixups to be marked as resolved in the
assembler without actually resolving them. Assembler must either resolve
the fixup, reject the code (error out) or defer the resolution to the
linker. In general assembler can resolve offsets in pcrel instructions
if the symbol referred to is in the same section and it cannot make
assumptions about relative position of sections.  For example, when LDRD
instruction in arm mode is encountered,  fixup_arm_pcrel_10_unscaled is
raised.  Prior to llvm#72873 the
assembler would always mark it as resolved without generating a
relocation. The resulting code would likely be incorrect whenever the
label referred to is in a different section.

This patch finishes the series to prevent incorrect code being generated
for pcrel instructions referring to labels in different sections. This
kind of assembly code is very rare and most likely a user error, so both
options (relocation or error) are acceptable.  Previously this was
resolved by adding relocations. Here, for VLDR instructions an error is
generated as relocation does not exist for Thumb mode and  we wanted the
tool's behaviour to be consistent across modes. In the LDRD case, Thumb
mode  does not have a relocation and errors out, but LDRD in Arm mode
generates relocation because its fixup kind is shared with other
instructions.

Patch series:
llvm#72873 - LDRx
llvm#73834 - ADR
this PR -  LDRD and VLDR
eleanor-arm added a commit that referenced this pull request Jan 15, 2024
Removes logic that caused some fixups to be marked as resolved in the
assembler without actually resolving them. Assembler must either resolve
the fixup, reject the code (error out) or defer the resolution to the
linker. In general assembler can resolve offsets in pcrel instructions
if the symbol referred to is in the same section and it cannot make
assumptions about relative position of sections. For example, when LDRD
instruction in arm mode is encountered, fixup_arm_pcrel_10_unscaled is
raised. Prior to #72873 the
assembler would always mark it as resolved without generating a
relocation. The resulting code would likely be incorrect whenever the
label referred to is in a different section.

This patch finishes the series to prevent incorrect code being generated
for pcrel instructions referring to labels in different sections. This
kind of assembly code is very rare and most likely a user error, so both
options (relocation or error) are acceptable. In previous patches this
was resolved by adding relocations. Here, for VLDR instructions an error
is generated as relocation does not exist for Thumb mode and we wanted
the tool's behaviour to be consistent across modes. In the LDRD case,
Thumb mode does not have a relocation and errors out, but LDRD in Arm
mode generates R_ARM_LDRS_PC_G0 relocation because its fixup kind is
shared with other instructions.

It also fixed the case when ADR is used in the big-endian mode, which is
not covered by the ADR patch.

Patch series:
#72873 - LDRx
#73834 - ADR 
this PR - LDRD and VLDR
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Removes logic that caused some fixups to be marked as resolved in the
assembler without actually resolving them. Assembler must either resolve
the fixup, reject the code (error out) or defer the resolution to the
linker. In general assembler can resolve offsets in pcrel instructions
if the symbol referred to is in the same section and it cannot make
assumptions about relative position of sections. For example, when LDRD
instruction in arm mode is encountered, fixup_arm_pcrel_10_unscaled is
raised. Prior to llvm#72873 the
assembler would always mark it as resolved without generating a
relocation. The resulting code would likely be incorrect whenever the
label referred to is in a different section.

This patch finishes the series to prevent incorrect code being generated
for pcrel instructions referring to labels in different sections. This
kind of assembly code is very rare and most likely a user error, so both
options (relocation or error) are acceptable. In previous patches this
was resolved by adding relocations. Here, for VLDR instructions an error
is generated as relocation does not exist for Thumb mode and we wanted
the tool's behaviour to be consistent across modes. In the LDRD case,
Thumb mode does not have a relocation and errors out, but LDRD in Arm
mode generates R_ARM_LDRS_PC_G0 relocation because its fixup kind is
shared with other instructions.

It also fixed the case when ADR is used in the big-endian mode, which is
not covered by the ADR patch.

Patch series:
llvm#72873 - LDRx
llvm#73834 - ADR 
this PR - LDRD and VLDR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants