Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JITLink][AArch32] Implement Armv5 ldr-pc stubs and use them for all pre-v7 targets #79082

Merged
merged 4 commits into from
Jan 23, 2024

Conversation

weliveindetail
Copy link
Contributor

@weliveindetail weliveindetail commented Jan 23, 2024

This stub type loads an absolute address directly into the PC register. It's the simplest and most compatible way to implement a branch indirection across the entire address space (and probably the slowest as well). It's the ideal fallback for all targets for which we did not (yet) implement a more performant solution.

Copy link

github-actions bot commented Jan 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@weliveindetail
Copy link
Contributor Author

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can help too much as I don't think I'm familiar enough with JITLink to know what cases are possible or not. I've left what I can.

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp Outdated Show resolved Hide resolved
llvm/lib/ExecutionEngine/JITLink/aarch32.cpp Outdated Show resolved Hide resolved
llvm/lib/ExecutionEngine/JITLink/aarch32.cpp Outdated Show resolved Hide resolved
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 23, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 23, 2024

@llvm/pr-subscribers-clang

Author: Stefan Gränitz (weliveindetail)

Changes

This stub type loads an absolute address directly into the PC register. It's the simplest and most compatible way to implement a branch indirection across the entire address space (and probably the slowest as well). It's the ideal fallback for all targets for which we did not (yet) implement a more performant solution.


Patch is 32.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79082.diff

13 Files Affected:

  • (modified) clang/tools/clang-repl/CMakeLists.txt (+1-1)
  • (modified) llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h (+44-12)
  • (modified) llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp (+7-16)
  • (modified) llvm/lib/ExecutionEngine/JITLink/aarch32.cpp (+67)
  • (modified) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_arm.s (+18-36)
  • (added) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_armv7plus.s (+49)
  • (modified) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_data.s (+6-1)
  • (removed) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumb.s (-145)
  • (added) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumbv6m.s (+60)
  • (added) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumbv7a.s (+45)
  • (added) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumbv7m.s (+107)
  • (modified) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_stubs_arm.s (+21-9)
  • (modified) llvm/test/ExecutionEngine/JITLink/AArch32/ELF_stubs_thumb.s (+15-8)
diff --git a/clang/tools/clang-repl/CMakeLists.txt b/clang/tools/clang-repl/CMakeLists.txt
index 2a0f617a2c0ff6b..031dcaba5e4468f 100644
--- a/clang/tools/clang-repl/CMakeLists.txt
+++ b/clang/tools/clang-repl/CMakeLists.txt
@@ -23,7 +23,7 @@ if(CLANG_PLUGIN_SUPPORT)
   export_executable_symbols_for_plugins(clang-repl)
 endif()
 
-string(TOUPPER ${CMAKE_SYSTEM_PROCESSOR} system_processor)
+string(TOUPPER "${CMAKE_SYSTEM_PROCESSOR}" system_processor)
 if(system_processor MATCHES "ARM")
   set(FLAG_LONG_PLT "-Wl,--long-plt")
   llvm_check_linker_flag(CXX ${FLAG_LONG_PLT} LINKER_HAS_FLAG_LONG_PLT)
diff --git a/llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h b/llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
index ed53fa409ade895..8f3b5b2e48c011d 100644
--- a/llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
+++ b/llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
@@ -131,14 +131,15 @@ const char *getEdgeKindName(Edge::Kind K);
 /// Stubs are often called "veneers" in the official docs and online.
 ///
 enum class StubsFlavor {
-  Unsupported = 0,
+  Undefined = 0,
+  pre_v7,
   v7,
 };
 
 /// JITLink sub-arch configuration for Arm CPU models
 struct ArmConfig {
   bool J1J2BranchEncoding = false;
-  StubsFlavor Stubs = StubsFlavor::Unsupported;
+  StubsFlavor Stubs = StubsFlavor::Undefined;
   // In the long term, we might want a linker switch like --target1-rel
   bool Target1Rel = false;
 };
@@ -146,18 +147,12 @@ struct ArmConfig {
 /// Obtain the sub-arch configuration for a given Arm CPU model.
 inline ArmConfig getArmConfigForCPUArch(ARMBuildAttrs::CPUArch CPUArch) {
   ArmConfig ArmCfg;
-  switch (CPUArch) {
-  case ARMBuildAttrs::v7:
-  case ARMBuildAttrs::v8_A:
+  if (CPUArch == ARMBuildAttrs::v7 || CPUArch >= ARMBuildAttrs::v7E_M) {
     ArmCfg.J1J2BranchEncoding = true;
     ArmCfg.Stubs = StubsFlavor::v7;
-    break;
-  default:
-    DEBUG_WITH_TYPE("jitlink", {
-      dbgs() << "  Warning: ARM config not defined for CPU architecture "
-             << getCPUArchName(CPUArch) << " (" << CPUArch << ")\n";
-    });
-    break;
+  } else {
+    ArmCfg.J1J2BranchEncoding = false;
+    ArmCfg.Stubs = StubsFlavor::pre_v7;
   }
   return ArmCfg;
 }
@@ -341,6 +336,43 @@ class GOTBuilder : public TableManager<GOTBuilder> {
   Section *GOTSection = nullptr;
 };
 
+/// Stubs builder emits non-position-independent Arm stubs for pre-v7 CPUs.
+/// These architectures have no MovT/MovW instructions and don't support Thumb2.
+/// BL is the only Thumb instruction that can generate stubs and they can always
+/// be transformed into BLX.
+class StubsManager_prev7 {
+public:
+  StubsManager_prev7() = default;
+
+  /// Name of the object file section that will contain all our stubs.
+  static StringRef getSectionName() {
+    return "__llvm_jitlink_aarch32_STUBS_prev7";
+  }
+
+  /// Implements link-graph traversal via visitExistingEdges()
+  bool visitEdge(LinkGraph &G, Block *B, Edge &E);
+
+private:
+  // Each stub uses a single block that can have 2 entryponts, one for Arm and
+  // one for Thumb
+  struct StubMapEntry {
+    Block *B = nullptr;
+    Symbol *ArmEntry = nullptr;
+    Symbol *ThumbEntry = nullptr;
+  };
+
+  std::pair<StubMapEntry *, bool> getStubMapSlot(StringRef Name) {
+    auto &&[Stubs, NewStub] = StubMap.try_emplace(Name);
+    return std::make_pair(&Stubs->second, NewStub);
+  }
+
+  Symbol *getOrCreateSlotEntrypoint(LinkGraph &G, StubMapEntry &Slot,
+                                    bool Thumb);
+
+  DenseMap<StringRef, StubMapEntry> StubMap;
+  Section *StubsSection = nullptr;
+};
+
 /// Stubs builder for v7 emits non-position-independent Arm and Thumb stubs.
 class StubsManager_v7 {
 public:
diff --git a/llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp b/llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
index 15c209e1ebe5bf6..c1f923d69c52d76 100644
--- a/llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
+++ b/llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
@@ -265,21 +265,8 @@ createLinkGraphFromELFObject_aarch32(MemoryBufferRef ObjectBuffer) {
   // Resolve our internal configuration for the target. If at some point the
   // CPUArch alone becomes too unprecise, we can find more details in the
   // Tag_CPU_arch_profile.
-  aarch32::ArmConfig ArmCfg;
-  using namespace ARMBuildAttrs;
-  auto Arch = static_cast<CPUArch>(ARM::getArchAttr(AK));
-  switch (Arch) {
-  case v7:
-  case v8_A:
-    ArmCfg = aarch32::getArmConfigForCPUArch(Arch);
-    assert(ArmCfg.Stubs != aarch32::StubsFlavor::Unsupported &&
-           "Provide a config for each supported CPU");
-    break;
-  default:
-    return make_error<JITLinkError>(
-        "Failed to build ELF link graph: Unsupported CPU arch " +
-        StringRef(aarch32::getCPUArchName(Arch)));
-  }
+  auto Arch = static_cast<ARMBuildAttrs::CPUArch>(ARM::getArchAttr(AK));
+  aarch32::ArmConfig ArmCfg = aarch32::getArmConfigForCPUArch(Arch);
 
   // Populate the link-graph.
   switch (TT.getArch()) {
@@ -324,11 +311,15 @@ void link_ELF_aarch32(std::unique_ptr<LinkGraph> G,
       PassCfg.PrePrunePasses.push_back(markAllSymbolsLive);
 
     switch (ArmCfg.Stubs) {
+    case aarch32::StubsFlavor::pre_v7:
+      PassCfg.PostPrunePasses.push_back(
+          buildTables_ELF_aarch32<aarch32::StubsManager_prev7>);
+      break;
     case aarch32::StubsFlavor::v7:
       PassCfg.PostPrunePasses.push_back(
           buildTables_ELF_aarch32<aarch32::StubsManager_v7>);
       break;
-    case aarch32::StubsFlavor::Unsupported:
+    case aarch32::StubsFlavor::Undefined:
       llvm_unreachable("Check before building graph");
     }
   }
diff --git a/llvm/lib/ExecutionEngine/JITLink/aarch32.cpp b/llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
index 9508cde07b42a65..f143a79f1eeb1a3 100644
--- a/llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
+++ b/llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
@@ -725,6 +725,13 @@ bool GOTBuilder::visitEdge(LinkGraph &G, Block *B, Edge &E) {
   return true;
 }
 
+const uint8_t ArmThumbv5LdrPc[] = {
+    0x78, 0x47,             // bx pc
+    0xfd, 0xe7,             // b #-6 ; Arm recommended sequence to follow bx pc
+    0x04, 0xf0, 0x1f, 0xe5, // ldr pc, [pc,#-4] ; L1
+    0x00, 0x00, 0x00, 0x00, // L1: .word S
+};
+
 const uint8_t Armv7ABS[] = {
     0x00, 0xc0, 0x00, 0xe3, // movw r12, #0x0000     ; lower 16-bit
     0x00, 0xc0, 0x40, 0xe3, // movt r12, #0x0000     ; upper 16-bit
@@ -745,6 +752,12 @@ static Block &allocStub(LinkGraph &G, Section &S, const uint8_t (&Code)[Size]) {
   return G.createContentBlock(S, Template, orc::ExecutorAddr(), Alignment, 0);
 }
 
+static Block &createStubPrev7(LinkGraph &G, Section &S, Symbol &Target) {
+  Block &B = allocStub(G, S, ArmThumbv5LdrPc);
+  B.addEdge(Data_Pointer32, 8, Target, 0);
+  return B;
+}
+
 static Block &createStubThumbv7(LinkGraph &G, Section &S, Symbol &Target) {
   Block &B = allocStub(G, S, Thumbv7ABS);
   B.addEdge(Thumb_MovwAbsNC, 0, Target, 0);
@@ -802,6 +815,60 @@ static bool needsStub(const Edge &E) {
   return false;
 }
 
+// The ArmThumbv5LdrPc stub has 2 entrypoints: Thumb at offset 0 is taken only
+// for Thumb B instructions. Thumb BL is rewritten to BLX and takes the Arm
+// entrypoint at offset 4. Arm branches always use that one.
+Symbol *StubsManager_prev7::getOrCreateSlotEntrypoint(LinkGraph &G,
+                                                      StubMapEntry &Slot,
+                                                      bool Thumb) {
+  constexpr orc::ExecutorAddrDiff ThumbEntrypointOffset = 0;
+  constexpr orc::ExecutorAddrDiff ArmEntrypointOffset = 4;
+  if (Thumb && !Slot.ThumbEntry) {
+    Slot.ThumbEntry =
+        &G.addAnonymousSymbol(*Slot.B, ThumbEntrypointOffset, 4, true, false);
+    Slot.ThumbEntry->setTargetFlags(ThumbSymbol);
+  }
+  if (!Thumb && !Slot.ArmEntry)
+    Slot.ArmEntry =
+        &G.addAnonymousSymbol(*Slot.B, ArmEntrypointOffset, 8, true, false);
+  return Thumb ? Slot.ThumbEntry : Slot.ArmEntry;
+}
+
+bool StubsManager_prev7::visitEdge(LinkGraph &G, Block *B, Edge &E) {
+  if (!needsStub(E))
+    return false;
+
+  Symbol &Target = E.getTarget();
+  assert(Target.hasName() && "Edge cannot point to anonymous target");
+  auto [Slot, NewStub] = getStubMapSlot(Target.getName());
+
+  if (NewStub) {
+    if (!StubsSection)
+      StubsSection = &G.createSection(getSectionName(),
+                                      orc::MemProt::Read | orc::MemProt::Exec);
+    LLVM_DEBUG({
+      dbgs() << "    Created stub entry for " << Target.getName() << " in "
+             << StubsSection->getName() << "\n";
+    });
+    Slot->B = &createStubPrev7(G, *StubsSection, Target);
+  }
+
+  // The ArmThumbv5LdrPc stub has 2 entrypoints: Thumb at offset 0 is taken only
+  // for Thumb B instructions. Thumb BL is rewritten to BLX and takes the Arm
+  // entrypoint at offset 4. Arm branches always use that one.
+  bool UseThumb = E.getKind() == Thumb_Jump24;
+  Symbol *StubEntrypoint = getOrCreateSlotEntrypoint(G, *Slot, UseThumb);
+
+  LLVM_DEBUG({
+    dbgs() << "    Using " << (UseThumb ? "Thumb" : "Arm") << " entrypoint "
+           << *StubEntrypoint << " in "
+           << StubEntrypoint->getBlock().getSection().getName() << "\n";
+  });
+
+  E.setTarget(*StubEntrypoint);
+  return true;
+}
+
 bool StubsManager_v7::visitEdge(LinkGraph &G, Block *B, Edge &E) {
   if (!needsStub(E))
     return false;
diff --git a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_arm.s b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_arm.s
index 6fd383e2cce5c9f..3dec8c96f5cd575 100644
--- a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_arm.s
+++ b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_arm.s
@@ -1,8 +1,22 @@
-# RUN: llvm-mc -triple=armv7-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t.o %s
-# RUN: llvm-objdump -r %t.o | FileCheck --check-prefix=CHECK-TYPE %s
-# RUN: llvm-objdump --disassemble %t.o | FileCheck --check-prefix=CHECK-INSTR %s
+# Test pre-v7 Arm features
+#
+# RUN: llvm-mc -triple=armv4t-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t_armv4t.o %s
+# RUN: llvm-objdump -r %t_armv4t.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-objdump --disassemble %t_armv4t.o | FileCheck --check-prefix=CHECK-INSTR %s
 # RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
-# RUN:              -slab-page-size 4096 -show-entry-es -check %s %t.o
+# RUN:              -slab-page-size 4096 -check %s %t_armv4t.o
+#
+# RUN: llvm-mc -triple=armv7-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t_armv7.o %s
+# RUN: llvm-objdump -r %t_armv7.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-objdump --disassemble %t_armv7.o | FileCheck --check-prefix=CHECK-INSTR %s
+# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
+# RUN:              -slab-page-size 4096 -check %s %t_armv7.o
+#
+# RUN: llvm-mc -triple=armv9-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t_armv9.o %s
+# RUN: llvm-objdump -r %t_armv9.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-objdump --disassemble %t_armv9.o | FileCheck --check-prefix=CHECK-INSTR %s
+# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
+# RUN:              -slab-page-size 4096 -check %s %t_armv9.o
 
 
 	.text
@@ -63,38 +77,6 @@ jump24_target:
 	bx	lr
 	.size	jump24_target,	.-jump24_target
 
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_MOVW_ABS_NC data_symbol
-# CHECK-INSTR: 	0000001c <movw>:
-# CHECK-INSTR: 	      1c: e3000000     movw      r0, #0x0
-# jitlink-check: decode_operand(movw, 1) = (data_symbol&0x0000ffff)
-	.globl	movw
-	.type	movw,%function
-	.p2align	2
-movw:
-	movw r0, :lower16:data_symbol
-	.size	movw,	.-movw
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_MOVT_ABS data_symbol
-# CHECK-INSTR: 	00000020 <movt>:
-# CHECK-INSTR: 	      20: e3400000     movt      r0, #0x0
-# We decode the operand with index 2, because movt generates one leading implicit
-# predicate operand that we have to skip in order to decode the data_symbol operand
-# jitlink-check: decode_operand(movt, 2) = (data_symbol&0xffff0000>>16)
-	.globl	movt
-	.type	movt,%function
-	.p2align	2
-movt:
-	movt r0, :upper16:data_symbol
-	.size	movt,	.-movt
-
-	.data
-	.global data_symbol
-data_symbol:
-	.long 1073741822
-
-	.text
-
 # Empty main function for jitlink to be happy
 	.globl	main
 	.type	main,%function
diff --git a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_armv7plus.s b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_armv7plus.s
new file mode 100644
index 000000000000000..890b2136959ef12
--- /dev/null
+++ b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_armv7plus.s
@@ -0,0 +1,49 @@
+# Test v7 Arm features
+#
+# RUN: llvm-mc -triple=armv7-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t_armv7.o %s
+# RUN: llvm-objdump -r %t_armv7.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-objdump --disassemble %t_armv7.o | FileCheck --check-prefix=CHECK-INSTR %s
+# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
+# RUN:              -slab-page-size 4096 -abs data_symbol=0x00001234 -check %s %t_armv7.o
+#
+# RUN: llvm-mc -triple=armv9-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t_armv9.o %s
+# RUN: llvm-objdump -r %t_armv9.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-objdump --disassemble %t_armv9.o | FileCheck --check-prefix=CHECK-INSTR %s
+# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
+# RUN:              -slab-page-size 4096 -abs data_symbol=0x00001234 -check %s %t_armv9.o
+
+
+	.text
+	.syntax unified
+
+# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_MOVW_ABS_NC data_symbol
+# CHECK-INSTR: <movw>:
+# CHECK-INSTR: e3000000 movw r0, #0x0
+# jitlink-check: decode_operand(movw, 1) = data_symbol[15:0]
+	.globl	movw
+	.type	movw,%function
+	.p2align	2
+movw:
+	movw r0, :lower16:data_symbol
+	.size	movw,	.-movw
+
+# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_MOVT_ABS data_symbol
+# CHECK-INSTR: <movt>:
+# CHECK-INSTR: e3400000 movt r0, #0x0
+# We decode the operand with index 2, because movt generates one leading implicit
+# predicate operand that we have to skip in order to decode the data_symbol operand
+# jitlink-check: decode_operand(movt, 2) = data_symbol[31:16]
+	.globl	movt
+	.type	movt,%function
+	.p2align	2
+movt:
+	movt r0, :upper16:data_symbol
+	.size	movt,	.-movt
+
+# Empty main function for jitlink to be happy
+	.globl	main
+	.type	main,%function
+	.p2align	2
+main:
+	bx	lr
+	.size	main,	.-main
diff --git a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_data.s b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_data.s
index 7bd59f8a52de6d8..590ca816ecb9eb2 100644
--- a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_data.s
+++ b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_data.s
@@ -1,4 +1,9 @@
-# RUN: rm -rf %t && mkdir -p %t/armv7 && mkdir -p %t/thumbv7
+# RUN: rm -rf %t && mkdir -p %t/armv6 && mkdir -p %t/armv7 && mkdir -p %t/thumbv7
+# RUN: llvm-mc -triple=armv6-none-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t/armv6/out.o %s
+# RUN: llvm-objdump -r %t/armv6/out.o | FileCheck --check-prefix=CHECK-TYPE %s
+# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb -slab-page-size 4096 \
+# RUN:              -abs target=0x76bbe88f -check %s %t/armv6/out.o
+
 # RUN: llvm-mc -triple=armv7-none-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t/armv7/out.o %s
 # RUN: llvm-objdump -r %t/armv7/out.o | FileCheck --check-prefix=CHECK-TYPE %s
 # RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb -slab-page-size 4096 \
diff --git a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumb.s b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumb.s
deleted file mode 100644
index 86f011834baae9f..000000000000000
--- a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumb.s
+++ /dev/null
@@ -1,145 +0,0 @@
-# RUN: llvm-mc -triple=thumbv7-linux-gnueabi -arm-add-build-attributes -filetype=obj -o %t.o %s
-# RUN: llvm-objdump -r %t.o | FileCheck --check-prefix=CHECK-TYPE %s
-# RUN: llvm-objdump --disassemble %t.o | FileCheck --check-prefix=CHECK-INSTR %s
-# RUN: llvm-jitlink -noexec -slab-address 0x76ff0000 -slab-allocate 10Kb \
-# RUN:              -slab-page-size 4096 -abs external_func=0x76bbe880 \
-# RUN:              -check %s %t.o
-
-
-	.text
-	.syntax unified
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_CALL call_target_thumb
-# CHECK-INSTR: 	00000000 <call_site>:
-# CHECK-INSTR: 	       0: f7ff fffe     bl
-# CHECK-INSTR: 	       4: f7ff fffe     bl
-# CHECK-INSTR: 	00000008 <call_target_thumb>
-# CHECK-INSTR: 	0000000c <call_target_arm>
-# We decode the operand with index 2, because bl generates two leading implicit
-# predicate operands that we have to skip in order to decode the call_target operand
-# jitlink-check: decode_operand(call_site + 0, 2) = call_target_thumb - (call_site + 4)
-# jitlink-check: decode_operand(call_site + 4, 2) = call_target_arm   - (call_site + 8)
-	.globl	call_site
-	.type	call_site,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-call_site:
-	bl	call_target_thumb
-	bl	call_target_arm
-	.size	call_site, .-call_site
-
-	.globl	call_target_thumb
-	.type	call_target_thumb,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-call_target_thumb:
-	bx	lr
-	.size	call_target_thumb, .-call_target_thumb
-
-	.globl	call_target_arm
-	.type	call_target_arm,%function
-	.p2align	2
-	.code	32
-call_target_arm:
-	bx	lr
-	.size	call_target_arm, .-call_target_arm
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_JUMP24 jump24_target
-# CHECK-INSTR: 	00000010 <jump24_site>:
-# CHECK-INSTR: 	      10: f7ff bffe     b.w
-# CHECK-INSTR: 	00000014 <jump24_target>
-# b.w generates two implicit predicate operands as well, but they are trailing
-# operands, so there is no need to adjust the operand index.
-# jitlink-check: decode_operand(jump24_site, 0) = jump24_target - next_pc(jump24_site)
-	.globl	jump24_site
-	.type	jump24_site,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-jump24_site:
-	b.w	jump24_target
-	.size	jump24_site,	.-jump24_site
-
-	.globl	jump24_target
-	.type	jump24_target,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-jump24_target:
-	bx	lr
-	.size	jump24_target,	.-jump24_target
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_MOVW_ABS_NC data_symbol
-# CHECK-INSTR: 	00000016 <movw>:
-# CHECK-INSTR: 	      16: f240 0000     movw    r0, #0x0
-# jitlink-check: decode_operand(movw, 1) = (data_symbol&0x0000ffff)
-	.globl	movw
-	.type	movw,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-movw:
-	movw r0, :lower16:data_symbol
-	.size	movw,	.-movw
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_MOVT_ABS data_symbol
-# CHECK-INSTR: 	0000001a <movt>:
-# CHECK-INSTR: 	      1a: f2c0 0000     movt    r0, #0x0
-# We decode the operand with index 2, because movt generates one leading implicit
-# predicate operand that we have to skip in order to decode the data_symbol operand
-# jitlink-check: decode_operand(movt, 2) = (data_symbol&0xffff0000>>16)
-	.globl	movt
-	.type	movt,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-movt:
-	movt r0, :upper16:data_symbol
-	.size	movt,	.-movt
-
-	.data
-	.global data_symbol
-data_symbol:
-	.long 1073741822
-
-	.text
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_MOVW_PREL_NC external_func
-# CHECK-INSTR: 	0000001e <movw_prel>:
-# CHECK-INSTR: 	      1e: f240 0000     movw    r0, #0x0
-# jitlink-check: decode_operand(movw_prel, 1) = \
-# jitlink-check:              ((external_func - movw_prel)&0x0000ffff)
-.globl	movw_prel
-.type	movw_prel,%function
-.p2align	1
-.code	16
-.thumb_func
-movw_prel:
-	movw r0, :lower16:external_func - .
-	.size	movw_prel,	.-movw_prel
-
-# CHECK-TYPE: {{[0-9a-f]+}} R_ARM_THM_MOVT_PREL external_func 
-# CHECK-INSTR: 	00000022 <movt_prel>:
-# CHECK-INSTR: 	      22: f2c0 0000    movt    r0, #0x0
-# jitlink-check: decode_operand(movt_prel, 2) = \
-# jitlink-check:               ((external_func - movt_prel)&0xffff0000>>16)
-.globl	movt_prel
-.type	movt_prel,%function
-.p2align	1
-.code	16
-.thumb_func
-movt_prel:
-	movt r0, :upper16:external_func - .
-	.size	movt_prel,	.-movt_prel
-
-# Empty main function for jitlink to be happy
-	.globl	main
-	.type	main,%function
-	.p2align	1
-	.code	16
-	.thumb_func
-main:
-	bx	lr
-	.size	main,	.-main
diff --git a/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumbv6m.s b/llvm/test/ExecutionEngine/JITLink/AArch32/ELF_relocations_thumbv6m.s
new file mode...
[truncated]

Copy link
Contributor Author

@weliveindetail weliveindetail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your notes @smithp35. This worked out nicely! A test for Thumb B to Arm interworking is todo, because we need support for R_ARM_THM_JUMP11 first. I will work on it now. I think it's quite rare though and anyway, this is more than RuntimeDyld ever supported. So, I think we could land it already.

@@ -23,7 +23,7 @@ if(CLANG_PLUGIN_SUPPORT)
export_executable_symbols_for_plugins(clang-repl)
endif()

string(TOUPPER ${CMAKE_SYSTEM_PROCESSOR} system_processor)
string(TOUPPER "${CMAKE_SYSTEM_PROCESSOR}" system_processor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unrelated change that is on mainline already)

0xfd, 0xe7, // b #-6 ; Arm recommended sequence to follow bx pc
0x04, 0xf0, 0x1f, 0xe5, // ldr pc, [pc,#-4] ; L1
0x00, 0x00, 0x00, 0x00, // L1: .word S
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the v5LdrPc stub with the Thumb prologue. Thumb B instructions will branch to offset 0 of this block. Thumb BL are be rewritten to BLX and branch to offset 4. Same for Arm branches.

@smithp35
Copy link
Collaborator

Thanks for your notes @smithp35. This worked out nicely! A test for Thumb B to Arm interworking is todo, because we need support for R_ARM_THM_JUMP11 first. I will work on it now. I think it's quite rare though and anyway, this is more than RuntimeDyld ever supported. So, I think we could land it already.

Thanks I'll try and take a look later today, but as I mentioned I'm probably not familiar enough with JITLink to add much value to the code-changes so feel free to land this as an improvement on what is already there.

Just in case you weren't aware. The ABI does not require a stub for R_ARM_THM_JUMP11 for either range extension or interworking, the range is so small it would be quite a challenge to place a stub close enough to it. In practice this makes the 2-byte Thumb B instruction restricted to targets within the same section (or even same function), which a code-generator won't change-state. An assembly programmer might, but if they do then they have to do the interworking manually.

It is possible in Thumb-1 to have a 4-byte B instruction that uses R_ARM_THM_JUMP24 that will need a stub from Thumb to Arm.

This is in Call and Jump Relocations https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst

A linker may use a veneer (a sequence of instructions) to implement the
relocated branch if the relocation is one of R_ARM_PC24, R_ARM_CALL,
R_ARM_JUMP24, (or, in Thumb state, R_ARM_THM_CALL, R_ARM_THM_JUMP24, or
R_ARM_THM_JUMP19) and:

@weliveindetail weliveindetail merged commit 55929cd into llvm:main Jan 23, 2024
4 of 5 checks passed
Copy link
Contributor

@lhames lhames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I love the detail in the testing here -- thanks @weliveindetail!

@weliveindetail weliveindetail deleted the jitlink-aarch32-pre-v7 branch October 24, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants