[lld] Support thumb PLTs #86223

PiJoules · 2024-03-22T00:32:24Z

We are using PLTs for cortex-m33 which only supports thumb. More specifically, this is for a very restricted use case. There's no MMU so there's no sharing of virtual addresses between two processes, but this is fine. The MCU is used for running chre nanoapps for android. Each nanoapp is a shared library (but effectively acts as an executable containing a test suite) that is loaded and run on the MCU one binary at a time and there's only one process running at a time, so we ensure that the same text segment cannot be shared by two different running executables. GNU LD supports thumb PLTs but we want to migrate to a clang toolchain and use LLD, so thumb PLTs are needed.

github-actions · 2024-03-22T00:35:20Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 73eb9b33147ba5157cbf5d8276ee718629dfbbda a12e0eace68bcee6def5af4868053cdb739554c3 -- lld/ELF/Arch/ARM.cpp lld/ELF/Config.h lld/ELF/InputFiles.cpp

View the diff from clang-format here.

diff --git a/lld/ELF/Arch/ARM.cpp b/lld/ELF/Arch/ARM.cpp
index 3e0efe540e..46c1b29148 100644
--- a/lld/ELF/Arch/ARM.cpp
+++ b/lld/ELF/Arch/ARM.cpp
@@ -244,7 +244,8 @@ void ARM::writePltHeader(uint8_t *buf) const {
     // `pc` in the add instruction and 8 bytes for the `lr` adjustment.
     //
     uint64_t offset = in.gotPlt->getVA() - in.plt->getVA() - 16;
-    assert(llvm::isUInt<32>(offset) && "This should always fit into a 32-bit offset");
+    assert(llvm::isUInt<32>(offset) &&
+           "This should always fit into a 32-bit offset");
     write16(buf + 0, 0xb500);
     // Split into two halves to support endianness correctly.
     write16(buf + 2, 0xf8df);
@@ -255,7 +256,7 @@ void ARM::writePltHeader(uint8_t *buf) const {
     write16(buf + 10, 0xff08);
     write32(buf + 12, offset);
 
-    memcpy(buf + 16, trapInstr.data(), 4);  // Pad to 32-byte boundary
+    memcpy(buf + 16, trapInstr.data(), 4); // Pad to 32-byte boundary
     memcpy(buf + 20, trapInstr.data(), 4);
     memcpy(buf + 24, trapInstr.data(), 4);
     memcpy(buf + 28, trapInstr.data(), 4);
@@ -339,7 +340,8 @@ void ARM::writePlt(uint8_t *buf, const Symbol &sym,
     memcpy(buf + 12, trapInstr.data(), 4); // Pad to 16-byte boundary
   } else {
     uint64_t offset = sym.getGotPltVA() - pltEntryAddr - 12;
-    assert(llvm::isUInt<32>(offset) && "This should always fit into a 32-bit offset");
+    assert(llvm::isUInt<32>(offset) &&
+           "This should always fit into a 32-bit offset");
 
     // A PLT entry will be:
     //
@@ -359,10 +361,10 @@ void ARM::writePlt(uint8_t *buf, const Symbol &sym,
     write16(buf + 6, 0x0c00); // use `ip`
     relocateNoSym(buf + 4, R_ARM_THM_MOVT_ABS, offset);
 
-    write16(buf + 8, 0x44fc);       // add ip, pc
-    write16(buf + 10, 0xf8dc);      // ldr.w   pc, [ip] (bottom half)
-    write16(buf + 12, 0xf000);      // ldr.w   pc, [ip] (upper half)
-    write16(buf + 14, 0xe7fc);      // Branch to previous instruction
+    write16(buf + 8, 0x44fc);  // add ip, pc
+    write16(buf + 10, 0xf8dc); // ldr.w   pc, [ip] (bottom half)
+    write16(buf + 12, 0xf000); // ldr.w   pc, [ip] (upper half)
+    write16(buf + 14, 0xe7fc); // Branch to previous instruction
   }
 }
 
@@ -407,7 +409,8 @@ bool ARM::needsThunk(RelExpr expr, RelType type, const InputFile *file,
   case R_ARM_THM_JUMP24:
     // Source is Thumb, when all PLT entries are ARM interworking is required.
     // Otherwise we need to interwork if STT_FUNC Symbol has bit 0 clear (ARM).
-    if ((expr == R_PLT_PC && !config->armThumbPLTs) || (s.isFunc() && (s.getVA() & 1) == 0))
+    if ((expr == R_PLT_PC && !config->armThumbPLTs) ||
+        (s.isFunc() && (s.getVA() & 1) == 0))
       return true;
     [[fallthrough]];
   case R_ARM_THM_CALL: {

smithp35

I know this is marked WIP so please ignore my comments if you don't want them.

For Cortex-m PLTs presumably for shared-libraries? Aren't a great fit due to the lack of an MMU. Although much more work to implement the fdpic ABI https://github.com/mickael-guene/fdpic_doc/blob/master/abi.txt may be a better fit.

lld/ELF/Arch/ARM.cpp

smithp35 · 2024-03-22T11:23:06Z

lld/ELF/Arch/ARM.cpp

+    write16(buf + 14, 0xe7fc);  // Branch to the previous instruction.
+    //memcpy(buf + 14, trapInstr.data(), 2); // Pad to 16-byte boundary
+
+    // The PLT size for ARM is 16 bytes and the above sequence is 14 bytes so we could potentially fit one more instruction.


I suspect that aligning to 16-bytes may give better cache performance for A-profile at least. I'm not sure how the cached M-profile would respond though.

lld/ELF/Arch/ARM.cpp

lld/ELF/InputFiles.cpp

PiJoules · 2024-03-23T00:07:00Z

For Cortex-m PLTs presumably for shared-libraries?

Correct.

github-actions · 2024-03-23T00:07:07Z

✅ With the latest revision this PR passed the Python code formatter.

llvmbot · 2024-03-23T00:08:15Z

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: None (PiJoules)

Changes

We are using PLTs for cortex-m33 which only supports thumb.

Full diff: https://github.com/llvm/llvm-project/pull/86223.diff

4 Files Affected:

(modified) lld/ELF/Arch/ARM.cpp (+148-52)
(modified) lld/ELF/Config.h (+1)
(modified) lld/ELF/InputFiles.cpp (+12)
(added) lld/test/ELF/armv8-thumb-plt-reloc.s (+121)

diff --git a/lld/ELF/Arch/ARM.cpp b/lld/ELF/Arch/ARM.cpp
index 687f9499009d5e..9e6cbc87645aba 100644
--- a/lld/ELF/Arch/ARM.cpp
+++ b/lld/ELF/Arch/ARM.cpp
@@ -231,36 +231,67 @@ static void writePltHeaderLong(uint8_t *buf) {
 // The default PLT header requires the .got.plt to be within 128 Mb of the
 // .plt in the positive direction.
 void ARM::writePltHeader(uint8_t *buf) const {
-  // Use a similar sequence to that in writePlt(), the difference is the calling
-  // conventions mean we use lr instead of ip. The PLT entry is responsible for
-  // saving lr on the stack, the dynamic loader is responsible for reloading
-  // it.
-  const uint32_t pltData[] = {
-      0xe52de004, // L1: str lr, [sp,#-4]!
-      0xe28fe600, //     add lr, pc,  #0x0NN00000 &(.got.plt - L1 - 4)
-      0xe28eea00, //     add lr, lr,  #0x000NN000 &(.got.plt - L1 - 4)
-      0xe5bef000, //     ldr pc, [lr, #0x00000NNN] &(.got.plt -L1 - 4)
-  };
-
-  uint64_t offset = in.gotPlt->getVA() - in.plt->getVA() - 4;
-  if (!llvm::isUInt<27>(offset)) {
-    // We cannot encode the Offset, use the long form.
-    writePltHeaderLong(buf);
-    return;
+  if (!config->armThumbPLTs) {
+    // Use a similar sequence to that in writePlt(), the difference is the calling
+    // conventions mean we use lr instead of ip. The PLT entry is responsible for
+    // saving lr on the stack, the dynamic loader is responsible for reloading
+    // it.
+    const uint32_t pltData[] = {
+        0xe52de004, // L1: str lr, [sp,#-4]!
+        0xe28fe600, //     add lr, pc,  #0x0NN00000 &(.got.plt - L1 - 4)
+        0xe28eea00, //     add lr, lr,  #0x000NN000 &(.got.plt - L1 - 4)
+        0xe5bef000, //     ldr pc, [lr, #0x00000NNN] &(.got.plt -L1 - 4)
+    };
+
+    uint64_t offset = in.gotPlt->getVA() - in.plt->getVA() - 4;
+    if (!llvm::isUInt<27>(offset)) {
+      // We cannot encode the Offset, use the long form.
+      writePltHeaderLong(buf);
+      return;
+    }
+    write32(buf + 0, pltData[0]);
+    write32(buf + 4, pltData[1] | ((offset >> 20) & 0xff));
+    write32(buf + 8, pltData[2] | ((offset >> 12) & 0xff));
+    write32(buf + 12, pltData[3] | (offset & 0xfff));
+    memcpy(buf + 16, trapInstr.data(), 4); // Pad to 32-byte boundary
+    memcpy(buf + 20, trapInstr.data(), 4);
+    memcpy(buf + 24, trapInstr.data(), 4);
+    memcpy(buf + 28, trapInstr.data(), 4);
+  } else {
+    // The instruction sequence for thumb:
+    //
+    // 0: b500          push    {lr}
+    // 2: f8df e008     ldr.w   lr, [pc, #0x8]          @ 0xe <func+0xe>
+    // 6: 44fe          add     lr, pc
+    // 8: f85e ff08     ldr     pc, [lr, #8]!
+    // e:               .word   .got.plt - .plt - 16
+    //
+    // At 0x8, we want to jump to .got.plt, the -16 accounts for 8 bytes from
+    // `pc` in the add instruction and 8 bytes for the `lr` adjustment.
+    //
+    uint64_t offset = in.gotPlt->getVA() - in.plt->getVA() - 16;
+    assert(llvm::isUInt<32>(offset) && "This should always fit into a 32-bit offset");
+    write16(buf + 0, 0xb500);
+    write32(buf + 2, 0xe008f8df);
+    write16(buf + 6, 0x44fe);
+    write32(buf + 8, 0xff08f85e);
+    write32(buf + 12, offset);
+
+    memcpy(buf + 16, trapInstr.data(), 4);  // Pad to 32-byte boundary
+    memcpy(buf + 20, trapInstr.data(), 4);
+    memcpy(buf + 24, trapInstr.data(), 4);
+    memcpy(buf + 28, trapInstr.data(), 4);
   }
-  write32(buf + 0, pltData[0]);
-  write32(buf + 4, pltData[1] | ((offset >> 20) & 0xff));
-  write32(buf + 8, pltData[2] | ((offset >> 12) & 0xff));
-  write32(buf + 12, pltData[3] | (offset & 0xfff));
-  memcpy(buf + 16, trapInstr.data(), 4); // Pad to 32-byte boundary
-  memcpy(buf + 20, trapInstr.data(), 4);
-  memcpy(buf + 24, trapInstr.data(), 4);
-  memcpy(buf + 28, trapInstr.data(), 4);
 }
 
 void ARM::addPltHeaderSymbols(InputSection &isec) const {
-  addSyntheticLocal("$a", STT_NOTYPE, 0, 0, isec);
-  addSyntheticLocal("$d", STT_NOTYPE, 16, 0, isec);
+  if (!config->armThumbPLTs) {
+    addSyntheticLocal("$a", STT_NOTYPE, 0, 0, isec);
+    addSyntheticLocal("$d", STT_NOTYPE, 16, 0, isec);
+  } else {
+    addSyntheticLocal("$t", STT_NOTYPE, 0, 0, isec);
+    addSyntheticLocal("$d", STT_NOTYPE, 12, 0, isec);
+  }
 }
 
 // Long form PLT entries that do not have any restrictions on the displacement
@@ -279,32 +310,97 @@ static void writePltLong(uint8_t *buf, uint64_t gotPltEntryAddr,
 // .plt in the positive direction.
 void ARM::writePlt(uint8_t *buf, const Symbol &sym,
                    uint64_t pltEntryAddr) const {
-  // The PLT entry is similar to the example given in Appendix A of ELF for
-  // the Arm Architecture. Instead of using the Group Relocations to find the
-  // optimal rotation for the 8-bit immediate used in the add instructions we
-  // hard code the most compact rotations for simplicity. This saves a load
-  // instruction over the long plt sequences.
-  const uint32_t pltData[] = {
-      0xe28fc600, // L1: add ip, pc,  #0x0NN00000  Offset(&(.got.plt) - L1 - 8
-      0xe28cca00, //     add ip, ip,  #0x000NN000  Offset(&(.got.plt) - L1 - 8
-      0xe5bcf000, //     ldr pc, [ip, #0x00000NNN] Offset(&(.got.plt) - L1 - 8
-  };
 
-  uint64_t offset = sym.getGotPltVA() - pltEntryAddr - 8;
-  if (!llvm::isUInt<27>(offset)) {
-    // We cannot encode the Offset, use the long form.
-    writePltLong(buf, sym.getGotPltVA(), pltEntryAddr);
-    return;
+  if (!config->armThumbPLTs) {
+    uint64_t offset = sym.getGotPltVA() - pltEntryAddr - 8;
+
+    // The PLT entry is similar to the example given in Appendix A of ELF for
+    // the Arm Architecture. Instead of using the Group Relocations to find the
+    // optimal rotation for the 8-bit immediate used in the add instructions we
+    // hard code the most compact rotations for simplicity. This saves a load
+    // instruction over the long plt sequences.
+    const uint32_t pltData[] = {
+        0xe28fc600, // L1: add ip, pc,  #0x0NN00000  Offset(&(.got.plt) - L1 - 8
+        0xe28cca00, //     add ip, ip,  #0x000NN000  Offset(&(.got.plt) - L1 - 8
+        0xe5bcf000, //     ldr pc, [ip, #0x00000NNN] Offset(&(.got.plt) - L1 - 8
+    };
+    if (!llvm::isUInt<27>(offset)) {
+      // We cannot encode the Offset, use the long form.
+      writePltLong(buf, sym.getGotPltVA(), pltEntryAddr);
+      return;
+    }
+    write32(buf + 0, pltData[0] | ((offset >> 20) & 0xff));
+    write32(buf + 4, pltData[1] | ((offset >> 12) & 0xff));
+    write32(buf + 8, pltData[2] | (offset & 0xfff));
+    memcpy(buf + 12, trapInstr.data(), 4); // Pad to 16-byte boundary
+  } else {
+    uint64_t offset = sym.getGotPltVA() - pltEntryAddr - 12;
+    assert(llvm::isUInt<32>(offset) && "This should always fit into a 32-bit offset");
+
+    // A PLT entry will be:
+    //
+    //       movw ip, #<lower 16 bits>
+    //       movt ip, #<upper 16 bits>
+    //       add ip, pc
+    //   L1: ldr.w pc, [ip]
+    //       b L1
+    //
+    // where ip = r12 = 0xc
+    //
+    constexpr uint32_t pltData[] = {
+        0x0c00f240, //     movw ip, <offset lower 16>
+        0x0c00f2c0, //     movt ip, <offset higher 16>
+    };
+
+    // movw encoding:
+    //
+    //   15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+    //   1  1  1  1  0  i  1 0 0 1 0 0 imm4
+    //
+    //   15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+    //   0  imm3     Rd0       imm8
+    //
+    //   imm16 = imm4:i:imm3:imm8, i = bit 11
+    //
+    uint16_t offset_lower = offset & 0xffff;
+    uint32_t movwImm8 = offset_lower & 0xff;
+    uint32_t movwImm3 = (offset_lower >> 8) & 0x7;
+    uint32_t movwI = (offset_lower >> 11) & 0x1;
+    uint32_t movwImm4 = (offset_lower >> 12) & 0xf;
+    uint32_t movwBits = (movwI << 10) | (movwImm4 << 0) | (movwImm3 << 28) | (movwImm8 << 16);
+    write32(buf + 0, pltData[0] | movwBits);
+
+    // movt encoding:
+    //
+    //   15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+    //   1  1  1  1  0  i  1 0 1 1 0 0 imm4
+    //
+    //   15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+    //   0  imm3     Rd0       imm8
+    //
+    //   imm16 = imm4:i:imm3:imm8, i = bit 11
+    //
+    uint16_t offset_upper = static_cast<uint16_t>(offset >> 16);
+    uint32_t movtImm8 = offset_upper & 0xff;
+    uint32_t movtImm3 = (offset_upper >> 8) & 0x7;
+    uint32_t movtI = (offset_upper >> 11) & 0x1;
+    uint32_t movtImm4 = (offset_upper >> 12) & 0xf;
+    uint32_t movtBits = (movtI << 10) | (movtImm4 << 0) | (movtImm3 << 28) | (movtImm8 << 16);
+    write32(buf + 4, pltData[1] | movtBits);
+
+    write16(buf + 8, 0x44fc);       // add ip, pc
+    write32(buf + 10, 0xf000f8dc);  // ldr.w   pc, [ip]
+    write16(buf + 14, 0xe7fc);      // Branch to previous instruction
   }
-  write32(buf + 0, pltData[0] | ((offset >> 20) & 0xff));
-  write32(buf + 4, pltData[1] | ((offset >> 12) & 0xff));
-  write32(buf + 8, pltData[2] | (offset & 0xfff));
-  memcpy(buf + 12, trapInstr.data(), 4); // Pad to 16-byte boundary
 }
 
 void ARM::addPltSymbols(InputSection &isec, uint64_t off) const {
-  addSyntheticLocal("$a", STT_NOTYPE, off, 0, isec);
-  addSyntheticLocal("$d", STT_NOTYPE, off + 12, 0, isec);
+  if (!config->armThumbPLTs) {
+    addSyntheticLocal("$a", STT_NOTYPE, off, 0, isec);
+    addSyntheticLocal("$d", STT_NOTYPE, off + 12, 0, isec);
+  } else {
+    addSyntheticLocal("$t", STT_NOTYPE, off, 0, isec);
+  }
 }
 
 bool ARM::needsThunk(RelExpr expr, RelType type, const InputFile *file,
@@ -337,7 +433,7 @@ bool ARM::needsThunk(RelExpr expr, RelType type, const InputFile *file,
   case R_ARM_THM_JUMP24:
     // Source is Thumb, all PLT entries are ARM so interworking is required.
     // Otherwise we need to interwork if STT_FUNC Symbol has bit 0 clear (ARM).
-    if (expr == R_PLT_PC || (s.isFunc() && (s.getVA() & 1) == 0))
+    if ((expr == R_PLT_PC && !config->armThumbPLTs) || (s.isFunc() && (s.getVA() & 1) == 0))
       return true;
     [[fallthrough]];
   case R_ARM_THM_CALL: {
@@ -547,7 +643,6 @@ void ARM::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const {
     // STT_FUNC we choose whether to write a BL or BLX depending on the
     // value of bit 0 of Val. With bit 0 == 1 denoting Thumb. If the symbol is
     // not of type STT_FUNC then we must preserve the original instruction.
-    // PLT entries are always ARM state so we know we don't need to interwork.
     assert(rel.sym); // R_ARM_CALL is always reached via relocate().
     bool bit0Thumb = val & 1;
     bool isBlx = (read32(loc) & 0xfe000000) == 0xfa000000;
@@ -606,12 +701,13 @@ void ARM::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const {
     // PLT entries are always ARM state so we know we need to interwork.
     assert(rel.sym); // R_ARM_THM_CALL is always reached via relocate().
     bool bit0Thumb = val & 1;
+    bool useThumb = bit0Thumb || config->armThumbPLTs;
     bool isBlx = (read16(loc + 2) & 0x1000) == 0;
     // lld 10.0 and before always used bit0Thumb when deciding to write a BLX
-    // even when type not STT_FUNC. PLT entries generated by LLD are always ARM.
-    if (!rel.sym->isFunc() && !rel.sym->isInPlt() && isBlx == bit0Thumb)
+    // even when type not STT_FUNC.
+    if (!rel.sym->isFunc() && !rel.sym->isInPlt() && isBlx == useThumb)
       stateChangeWarning(loc, rel.type, *rel.sym);
-    if (rel.sym->isFunc() || rel.sym->isInPlt() ? !bit0Thumb : isBlx) {
+    if ((rel.sym->isFunc() || rel.sym->isInPlt()) ? !useThumb : isBlx) {
       // We are writing a BLX. Ensure BLX destination is 4-byte aligned. As
       // the BLX instruction may only be two byte aligned. This must be done
       // before overflow check.
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index bb3608da80b21f..89121c625e302a 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -212,6 +212,7 @@ struct Config {
   bool allowMultipleDefinition;
   bool fatLTOObjects;
   bool androidPackDynRelocs = false;
+  bool armThumbPLTs = false;
   bool armHasBlx = false;
   bool armHasMovtMovw = false;
   bool armJ1J2BranchEncoding = false;
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index 4c614c865d24a9..9d7690688bfd61 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -194,6 +194,18 @@ static void updateSupportedARMFeatures(const ARMAttributeParser &attributes) {
   if (arch >= ARMBuildAttrs::CPUArch::v8_M_Base &&
       profile == ARMBuildAttrs::MicroControllerProfile)
     config->armCMSESupport = true;
+
+  // The thumb PLT entries require Thumb2 which can be used on multiple archs.
+  // For now, let's limit it to ones where ARM isn't available and we know have
+  // Thumb2.
+  std::optional<unsigned> armISA =
+      attributes.getAttributeValue(ARMBuildAttrs::ARM_ISA_use);
+  std::optional<unsigned> thumb =
+      attributes.getAttributeValue(ARMBuildAttrs::THUMB_ISA_use);
+  bool noArmISA = !armISA || *armISA == ARMBuildAttrs::Not_Allowed;
+  bool hasThumb2 = thumb && *thumb >= ARMBuildAttrs::AllowThumb32;
+  if (noArmISA && hasThumb2)
+    config->armThumbPLTs = true;
 }
 
 InputFile::InputFile(Kind k, MemoryBufferRef m)
diff --git a/lld/test/ELF/armv8-thumb-plt-reloc.s b/lld/test/ELF/armv8-thumb-plt-reloc.s
new file mode 100644
index 00000000000000..608362b658bdd6
--- /dev/null
+++ b/lld/test/ELF/armv8-thumb-plt-reloc.s
@@ -0,0 +1,121 @@
+// REQUIRES: arm
+// RUN: llvm-mc -filetype=obj -arm-add-build-attributes --arch=thumb --mcpu=cortex-m33 %p/Inputs/arm-plt-reloc.s -o %t1
+// RUN: llvm-mc -filetype=obj -arm-add-build-attributes --arch=thumb --mcpu=cortex-m33 %s -o %t2
+// RUN: ld.lld %t1 %t2 -o %t
+// RUN: llvm-objdump --no-print-imm-hex -d %t | FileCheck %s
+// RUN: ld.lld -shared %t1 %t2 -o %t.so
+// RUN: llvm-objdump --no-print-imm-hex -d %t.so | FileCheck --check-prefix=DSO %s
+// RUN: llvm-readobj -S -r %t.so | FileCheck -check-prefix=DSOREL %s
+
+// RUN: llvm-mc -filetype=obj -arm-add-build-attributes --arch=thumb --mcpu=cortex-m33 %p/Inputs/arm-plt-reloc.s -o %t1
+// RUN: llvm-mc -filetype=obj -arm-add-build-attributes --arch=thumb --mcpu=cortex-m33 %s -o %t2
+// RUN: ld.lld %t1 %t2 -o %t
+// RUN: llvm-objdump --no-print-imm-hex -d %t | FileCheck %s
+// RUN: ld.lld -shared %t1 %t2 -o %t.so
+// RUN: llvm-objdump --no-print-imm-hex -d %t.so | FileCheck --check-prefix=DSO %s
+// RUN: llvm-readobj -S -r %t.so | FileCheck -check-prefix=DSOREL %s
+
+// RUN: ld.lld --be8 %t1 %t2 -o %t
+// RUN: llvm-objdump --no-print-imm-hex -d %t | FileCheck %s
+// RUN: ld.lld --be8 -shared %t1 %t2 -o %t.so
+// RUN: llvm-objdump --no-print-imm-hex -d %t.so | FileCheck --check-prefix=DSO %s
+// RUN: llvm-readobj -S -r %t.so | FileCheck -check-prefix=DSOREL %s
+
+// Test PLT entry generation
+ .text
+ .align 2
+ .globl _start
+ .type  _start,%function
+_start:
+// FIXME, interworking is only supported for BL via BLX at the moment, when
+// interworking thunks are available for b.w and b<cond>.w this can be altered
+// to test the different forms of interworking.
+ bl func1
+ bl func2
+ bl func3
+
+// Executable, expect no PLT
+// CHECK: Disassembly of section .text:
+// CHECK-EMPTY:
+// CHECK-NEXT: <func1>:
+// CHECK-NEXT:   200b4: 4770    bx      lr
+// CHECK: <func2>:
+// CHECK-NEXT:   200b6: 4770    bx      lr
+// CHECK: <func3>:
+// CHECK-NEXT:   200b8: 4770    bx      lr
+// CHECK-NEXT:   200ba: d4d4 
+// CHECK: <_start>:
+// CHECK-NEXT:   200bc: f7ff fffa       bl      0x200b4 <func1>
+// CHECK-NEXT:   200c0: f7ff fff9       bl      0x200b6 <func2>
+// CHECK-NEXT:   200c4: f7ff fff8       bl      0x200b8 <func3>
+
+// Expect PLT entries as symbols can be preempted
+// .text is Thumb and .plt is ARM, llvm-objdump can currently only disassemble
+// as ARM or Thumb. Work around by disassembling twice.
+// DSO: Disassembly of section .text:
+// DSO-EMPTY:
+// DSO-NEXT: <func1>:
+// DSO-NEXT:     10214:     4770    bx      lr
+// DSO: <func2>:
+// DSO-NEXT:     10216:     4770    bx      lr
+// DSO: <func3>:
+// DSO-NEXT:     10218:     4770    bx      lr
+// DSO-NEXT:     1021a:     d4d4 
+// DSO: <_start>:
+// 0x10250 = PLT func1
+// DSO-NEXT:     1021c:     f000 f818       bl     0x10250
+// 0x10260 = PLT func2
+// DSO-NEXT:     10220:     f000 f81e       bl     0x10260
+// 0x10270 = PLT func3
+// DSO-NEXT:     10224:     f000 f824       bl     0x10270
+// DSO: Disassembly of section .plt:
+// DSO-EMPTY:
+// DSO-NEXT: <.plt>:
+// DSO-NEXT:     10230: b500          push    {lr}
+// DSO-NEXT:     10232: f8df e008     ldr.w   lr, [pc, #8]
+// DSO-NEXT:     10236: 44fe          add     lr, pc
+// DSO-NEXT:     10238: f85e ff08     ldr     pc, [lr, #8]!
+// 0x20098 = .got.plt (0x302D8) - pc (0x10238 = .plt + 8) - 8
+// DSO-NEXT:     1023c: 98 00 02 00   .word   0x00020098
+// DSO-NEXT:     10240: d4 d4 d4 d4   .word   0xd4d4d4d4
+// DSO-NEXT:     10244: d4 d4 d4 d4   .word   0xd4d4d4d4
+// DSO-NEXT:     10248: d4 d4 d4 d4   .word   0xd4d4d4d4
+// DSO-NEXT:     1024c: d4 d4 d4 d4   .word   0xd4d4d4d4
+
+// 136 + 2 << 16 + 0x1025c = 0x302e4 = got entry 1
+// DSO-NEXT:     10250:       f240 0c88     movw    r12, #136
+// DSO-NEXT:     10254:       f2c0 0c02     movt    r12, #2
+// DSO-NEXT:     10258:       44fc          add     r12, pc
+// DSO-NEXT:     1025a:       f8dc f000     ldr.w   pc, [r12]
+// DSO-NEXT:     1025e:       e7fc          b       0x1025a
+// 124 + 2 << 16 + 0x1026c = 0x302e8 = got entry 2
+// DSO-NEXT:     10260:       f240 0c7c     movw    r12, #124
+// DSO-NEXT:     10264:       f2c0 0c02     movt    r12, #2
+// DSO-NEXT:     10268:       44fc          add     r12, pc
+// DSO-NEXT:     1026a:       f8dc f000     ldr.w   pc, [r12]
+// DSO-NEXT:     1026e:       e7fc          b       0x1026a
+// 112 + 2 << 16 + 0x1027c = 0x302ec = got entry 3
+// DSO-NEXT:     10270:       f240 0c70     movw    r12, #112
+// DSO-NEXT:     10274:       f2c0 0c02     movt    r12, #2
+// DSO-NEXT:     10278:       44fc          add     r12, pc
+// DSO-NEXT:     1027a:       f8dc f000     ldr.w   pc, [r12]
+// DSO-NEXT:     1027e:       e7fc          b       0x1027a
+
+// DSOREL:    Name: .got.plt
+// DSOREL-NEXT:    Type: SHT_PROGBITS
+// DSOREL-NEXT:    Flags [
+// DSOREL-NEXT:      SHF_ALLOC
+// DSOREL-NEXT:      SHF_WRITE
+// DSOREL-NEXT:    ]
+// DSOREL-NEXT:    Address: 0x302D8
+// DSOREL-NEXT:    Offset:
+// DSOREL-NEXT:    Size: 24
+// DSOREL-NEXT:    Link:
+// DSOREL-NEXT:    Info:
+// DSOREL-NEXT:    AddressAlignment: 4
+// DSOREL-NEXT:    EntrySize:
+// DSOREL:  Relocations [
+// DSOREL-NEXT:  Section (5) .rel.plt {
+// DSOREL-NEXT:    0x302E4 R_ARM_JUMP_SLOT func1
+// DSOREL-NEXT:    0x302E8 R_ARM_JUMP_SLOT func2
+// DSOREL-NEXT:    0x302EC R_ARM_JUMP_SLOT func3

PiJoules · 2024-03-23T00:08:36Z

Ok, formally requesting reviews now

MaskRay · 2024-03-23T21:39:24Z

For Cortex-m PLTs presumably for shared-libraries? Aren't a great fit due to the lack of an MMU. Although much more work to implement the fdpic ABI mickael-guene/fdpic_doc@master/abi.txt may be a better fit.

I am curious about the use case as well. I have a fdpic WIP implementation for my FDPIC study that hard codes the FDPIC register r9, which does not yield the best performance. A better implementation is to make r9 like a hidden argument that is call-clobbered.

(BTW: gcc's arm fdpic does not handle r9 very well. gcc's sh4 port handles r12 in a better way.)

smithp35

Thanks for the update.

I think the current code won't work for big-endian systems. Other than that it looks good to me.

I'll be away on vacation till the 2nd April. I'm happy for MaskRay to approve if he is happy with the changes.

lld/ELF/Arch/ARM.cpp

lld/test/ELF/armv8-thumb-plt-reloc.s

smithp35

Apologies for the delay. Have just come back from holiday. Some of the comments in the test don't look right, perhaps cut-and-paste from a different PLT test. Other than that, looks good to me.

lld/test/ELF/armv8-thumb-plt-reloc.s

MaskRay · 2024-05-27T00:44:09Z

This Thumb PLT support is for very restricted use cases.
The lack of a MMU implies no sharing for the text segment used by two processes:

Two processes running the same executable must have different data segments. The System V dynamic linking model requires a constant distance between the text segment and the data segment. Corollary: the text segment cannot be shared.
Similarly, a shared object's text segment cannot be shared by two processes running different executables.

Perhaps the use case has a stronger requirement: no two executables (nanoapps?) are running at the same time.

It seems useful to update the description to clarify the case. The code generally looks good.

PiJoules · 2024-05-28T18:51:14Z

This Thumb PLT support is for very restricted use cases. The lack of a MMU implies no sharing for the text segment used by two processes:

Two processes running the same executable must have different data segments. The System V dynamic linking model requires a constant distance between the text segment and the data segment. Corollary: the text segment cannot be shared.

Similarly, a shared object's text segment cannot be shared by two processes running different executables.

Perhaps the use case has a stronger requirement: no two executables (nanoapps?) are running at the same time.

It seems useful to update the description to clarify the case. The code generally looks good.

Updated

MaskRay

(Realized that I forgot to push the comments)

lld/ELF/Arch/ARM.cpp

lld/test/ELF/armv8-thumb-plt-reloc.s

We are using PLTs for cortex-m33 which only supports thumb.

joker-eph · 2024-05-29T01:46:01Z

This broke windows CI, and it was already visible before you most recent force-push: https://buildkite.com/llvm-project/github-pull-requests/builds/67798#018fc158-f8b8-4a4f-aafa-5bd92a0a8d8a

This reverts commit 760c2aa.

Reverts #86223 windows pre-merge is broken.

We are using PLTs for cortex-m33 which only supports thumb. More specifically, this is for a very restricted use case. There's no MMU so there's no sharing of virtual addresses between two processes, but this is fine. The MCU is used for running [chre nanoapps](https://android.googlesource.com/platform/system/chre/+/HEAD/doc/nanoapp_overview.md) for android. Each nanoapp is a shared library (but effectively acts as an executable containing a test suite) that is loaded and run on the MCU one binary at a time and there's only one process running at a time, so we ensure that the same text segment cannot be shared by two different running executables. GNU LD supports thumb PLTs but we want to migrate to a clang toolchain and use LLD, so thumb PLTs are needed.

Reverts llvm#86223 windows pre-merge is broken.

PiJoules requested review from MaskRay and smithp35 March 22, 2024 00:32

smithp35 reviewed Mar 22, 2024

View reviewed changes

PiJoules force-pushed the thumb-plts branch from 4f19a83 to d6c9b78 Compare March 23, 2024 00:04

PiJoules changed the title ~~[WIP][lld] Support thumb PLTs for cortex-M~~ [lld] Support thumb PLTs Mar 23, 2024

PiJoules requested a review from smithp35 March 23, 2024 00:07

PiJoules marked this pull request as ready for review March 23, 2024 00:07

llvmbot added lld lld:ELF labels Mar 23, 2024

PiJoules requested a review from frobtech March 23, 2024 00:08

smithp35 reviewed Mar 24, 2024

View reviewed changes

PiJoules force-pushed the thumb-plts branch from d6c9b78 to cc8b901 Compare March 25, 2024 20:28

PiJoules requested a review from smithp35 March 25, 2024 20:30

smithp35 reviewed Apr 2, 2024

View reviewed changes

lld/test/ELF/armv8-thumb-plt-reloc.s Outdated Show resolved Hide resolved

lld/test/ELF/armv8-thumb-plt-reloc.s Outdated Show resolved Hide resolved

PiJoules force-pushed the thumb-plts branch from cc8b901 to ba2f022 Compare May 1, 2024 23:41

PiJoules requested a review from smithp35 May 1, 2024 23:41

MaskRay reviewed May 28, 2024

View reviewed changes

PiJoules force-pushed the thumb-plts branch from ba2f022 to a2a8649 Compare May 28, 2024 21:33

PiJoules requested a review from MaskRay May 28, 2024 21:34

PiJoules force-pushed the thumb-plts branch from a2a8649 to a12e0ea Compare May 28, 2024 21:35

MaskRay approved these changes May 28, 2024

View reviewed changes

lld/test/ELF/armv8-thumb-plt-reloc.s Outdated Show resolved Hide resolved

[lld] Support thumb PLTs

04a566a

We are using PLTs for cortex-m33 which only supports thumb.

PiJoules force-pushed the thumb-plts branch from a12e0ea to 04a566a Compare May 28, 2024 22:36

PiJoules merged commit 760c2aa into llvm:main May 28, 2024
3 of 6 checks passed

PiJoules deleted the thumb-plts branch May 28, 2024 22:37

joker-eph added a commit that referenced this pull request May 29, 2024

Revert "[lld] Support thumb PLTs (#86223)"

ad28187

This reverts commit 760c2aa.

joker-eph mentioned this pull request May 29, 2024

Revert "[lld] Support thumb PLTs" #93631

Merged

joker-eph added a commit that referenced this pull request May 29, 2024

Revert "[lld] Support thumb PLTs" (#93631)

7832769

Reverts #86223 windows pre-merge is broken.

vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024

Revert "[lld] Support thumb PLTs" (llvm#93631)

6e49d6b

Reverts llvm#86223 windows pre-merge is broken.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lld] Support thumb PLTs #86223

[lld] Support thumb PLTs #86223

PiJoules commented Mar 22, 2024 •

edited

github-actions bot commented Mar 22, 2024 •

edited

smithp35 left a comment

smithp35 Mar 22, 2024

PiJoules commented Mar 23, 2024

github-actions bot commented Mar 23, 2024

llvmbot commented Mar 23, 2024 •

edited

PiJoules commented Mar 23, 2024

MaskRay commented Mar 23, 2024

smithp35 left a comment

smithp35 left a comment

MaskRay commented May 27, 2024

PiJoules commented May 28, 2024

MaskRay left a comment

joker-eph commented May 29, 2024

[lld] Support thumb PLTs #86223

[lld] Support thumb PLTs #86223

Conversation

PiJoules commented Mar 22, 2024 • edited

github-actions bot commented Mar 22, 2024 • edited

smithp35 left a comment

Choose a reason for hiding this comment

smithp35 Mar 22, 2024

Choose a reason for hiding this comment

PiJoules commented Mar 23, 2024

github-actions bot commented Mar 23, 2024

llvmbot commented Mar 23, 2024 • edited

PiJoules commented Mar 23, 2024

MaskRay commented Mar 23, 2024

smithp35 left a comment

Choose a reason for hiding this comment

smithp35 left a comment

Choose a reason for hiding this comment

MaskRay commented May 27, 2024

PiJoules commented May 28, 2024

MaskRay left a comment

Choose a reason for hiding this comment

joker-eph commented May 29, 2024

PiJoules commented Mar 22, 2024 •

edited

github-actions bot commented Mar 22, 2024 •

edited

llvmbot commented Mar 23, 2024 •

edited