Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64,ELF] Restrict MOVZ/MOVK to non-PIC large code model #70178

Merged
merged 1 commit into from Nov 1, 2023

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Oct 25, 2023

There is no PIC support for -mcmodel=large
(https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst)
and Clang recently rejects -mcmodel= with PIC (#70262).

The current backend code assumes that the large code model is non-PIC.
This patch adds !getTargetMachine().isPositionIndependent() conditions
to clarify that the support is non-PIC only. In addition, add some tests
as change detectors in case PIC large code model is supported in the
future.

If other front-ends/JITs use the large code model with PIC, they will
get small code model code sequence, instead of potentially-incorrect
MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence
will cause text relocations using ELF linkers.

(The small code model code sequence is usually sufficient as ADRP+ADD or
ADRP+LDR targets [-232,232), which has a doubled range of x86-64
R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-232,232).)

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 25, 2023

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Fangrui Song (MaskRay)

Changes

Getting the address of a symbol using a MOVZ/MOVK sequence is only suitable for
non-PIC. The sequence will cause text relocations using ELF linkers. The PIC
large code model is largely undefined in toolchains, but Clang doesn't reject
-fpic -mcmodel=large as GCC does. Nevertheless, it seems sensible to restrict
MOVZ/MOVK to the non-PIC large code model.

The small code model code sequence is usually fine as ADRP+ADD or ADRP+LDR
targets [-232,232), which has a doubled range of x86-64
R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-232,232). Just use the small code
model sequence for now.


Patch is 23.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70178.diff

9 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+17-14)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/select-blockaddress.mir (+12-1)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/select-gv-cmodel-large.mir (+31-18)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/select-gv-with-offset.mir (+25-12)
  • (modified) llvm/test/CodeGen/AArch64/blockaddress.ll (+1)
  • (removed) llvm/test/CodeGen/AArch64/code-model-large-abs.ll (-72)
  • (added) llvm/test/CodeGen/AArch64/code-model-large.ll (+144)
  • (modified) llvm/test/CodeGen/AArch64/jump-table.ll (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7211607fee528a6..723a0e38ce5e1e5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8192,7 +8192,8 @@ SDValue AArch64TargetLowering::LowerGlobalAddress(SDValue Op,
   }
 
   SDValue Result;
-  if (getTargetMachine().getCodeModel() == CodeModel::Large) {
+  if (getTargetMachine().getCodeModel() == CodeModel::Large &&
+      !getTargetMachine().isPositionIndependent()) {
     Result = getAddrLarge(GN, DAG, OpFlags);
   } else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {
     Result = getAddrTiny(GN, DAG, OpFlags);
@@ -9557,12 +9558,12 @@ SDValue AArch64TargetLowering::LowerJumpTable(SDValue Op,
   // is necessary here. Just get the address of the jump table.
   JumpTableSDNode *JT = cast<JumpTableSDNode>(Op);
 
-  if (getTargetMachine().getCodeModel() == CodeModel::Large &&
-      !Subtarget->isTargetMachO()) {
+  CodeModel::Model CM = getTargetMachine().getCodeModel();
+  if (CM == CodeModel::Large && !getTargetMachine().isPositionIndependent() &&
+      !Subtarget->isTargetMachO())
     return getAddrLarge(JT, DAG);
-  } else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {
+  if (CM == CodeModel::Tiny)
     return getAddrTiny(JT, DAG);
-  }
   return getAddr(JT, DAG);
 }
 
@@ -9588,27 +9589,29 @@ SDValue AArch64TargetLowering::LowerBR_JT(SDValue Op,
 SDValue AArch64TargetLowering::LowerConstantPool(SDValue Op,
                                                  SelectionDAG &DAG) const {
   ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(Op);
-
-  if (getTargetMachine().getCodeModel() == CodeModel::Large) {
+  CodeModel::Model CM = getTargetMachine().getCodeModel();
+  if (CM == CodeModel::Large) {
     // Use the GOT for the large code model on iOS.
     if (Subtarget->isTargetMachO()) {
       return getGOT(CP, DAG);
     }
-    return getAddrLarge(CP, DAG);
-  } else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {
+    if (!getTargetMachine().isPositionIndependent())
+      return getAddrLarge(CP, DAG);
+  } else if (CM == CodeModel::Tiny) {
     return getAddrTiny(CP, DAG);
-  } else {
-    return getAddr(CP, DAG);
   }
+  return getAddr(CP, DAG);
 }
 
 SDValue AArch64TargetLowering::LowerBlockAddress(SDValue Op,
                                                SelectionDAG &DAG) const {
   BlockAddressSDNode *BA = cast<BlockAddressSDNode>(Op);
-  if (getTargetMachine().getCodeModel() == CodeModel::Large &&
+  CodeModel::Model CM = getTargetMachine().getCodeModel();
+  if (CM == CodeModel::Large &&
       !Subtarget->isTargetMachO()) {
-    return getAddrLarge(BA, DAG);
-  } else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {
+    if (!getTargetMachine().isPositionIndependent())
+      return getAddrLarge(BA, DAG);
+  } else if (CM == CodeModel::Tiny) {
     return getAddrTiny(BA, DAG);
   }
   return getAddr(BA, DAG);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 941607dae29bb90..d37c577bfbe29c8 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2848,7 +2848,7 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
     if (OpFlags & AArch64II::MO_GOT) {
       I.setDesc(TII.get(AArch64::LOADgot));
       I.getOperand(1).setTargetFlags(OpFlags);
-    } else if (TM.getCodeModel() == CodeModel::Large) {
+    } else if (TM.getCodeModel() == CodeModel::Large && !TM.isPositionIndependent()) {
       // Materialize the global using movz/movk instructions.
       materializeLargeCMVal(I, GV, OpFlags);
       I.eraseFromParent();
@@ -3498,7 +3498,7 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
     return true;
   }
   case TargetOpcode::G_BLOCK_ADDR: {
-    if (TM.getCodeModel() == CodeModel::Large) {
+    if (TM.getCodeModel() == CodeModel::Large && !TM.isPositionIndependent()) {
       materializeLargeCMVal(I, I.getOperand(1).getBlockAddress(), 0);
       I.eraseFromParent();
       return true;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-blockaddress.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-blockaddress.mir
index ee54f388683696d..28d279d74216422 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-blockaddress.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-blockaddress.mir
@@ -1,8 +1,8 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=aarch64-unknown-unknown -o - -verify-machineinstrs -run-pass=instruction-select %s | FileCheck %s
 # RUN: llc -mtriple=aarch64-unknown-unknown -o - -verify-machineinstrs -run-pass=instruction-select -code-model=large %s | FileCheck %s --check-prefix=LARGE
+# RUN: llc -mtriple=aarch64-unknown-unknown -o - -verify-machineinstrs -run-pass=instruction-select -code-model=large -relocation-model=pic %s | FileCheck %s --check-prefix=LARGE-PIC
 --- |
-  ; ModuleID = 'blockaddress.ll'
   source_filename = "blockaddress.ll"
   target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
   target triple = "aarch64-none-linux-gnu"
@@ -37,6 +37,7 @@ body:             |
   ; CHECK-NEXT:   BR [[MOVaddrBA]]
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.block (ir-block-address-taken %ir-block.block):
+  ;
   ; LARGE-LABEL: name: test_blockaddress
   ; LARGE: bb.0 (%ir-block.0):
   ; LARGE-NEXT:   [[MOVZXi:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) blockaddress(@test_blockaddress, %ir-block.block), 0
@@ -51,6 +52,16 @@ body:             |
   ; LARGE-NEXT:   BR [[MOVKXi2]]
   ; LARGE-NEXT: {{  $}}
   ; LARGE-NEXT: bb.1.block (ir-block-address-taken %ir-block.block):
+  ;
+  ; LARGE-PIC-LABEL: name: test_blockaddress
+  ; LARGE-PIC: bb.0 (%ir-block.0):
+  ; LARGE-PIC-NEXT:   [[MOVaddrBA:%[0-9]+]]:gpr64common = MOVaddrBA target-flags(aarch64-page) blockaddress(@test_blockaddress, %ir-block.block), target-flags(aarch64-pageoff, aarch64-nc) blockaddress(@test_blockaddress, %ir-block.block)
+  ; LARGE-PIC-NEXT:   [[MOVaddr:%[0-9]+]]:gpr64common = MOVaddr target-flags(aarch64-page) @addr, target-flags(aarch64-pageoff, aarch64-nc) @addr
+  ; LARGE-PIC-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY [[MOVaddrBA]]
+  ; LARGE-PIC-NEXT:   STRXui [[COPY]], [[MOVaddr]], 0 :: (store (p0) into @addr)
+  ; LARGE-PIC-NEXT:   BR [[MOVaddrBA]]
+  ; LARGE-PIC-NEXT: {{  $}}
+  ; LARGE-PIC-NEXT: bb.1.block (ir-block-address-taken %ir-block.block):
   bb.1 (%ir-block.0):
     %0:gpr(p0) = G_BLOCK_ADDR blockaddress(@test_blockaddress, %ir-block.block)
     %1:gpr(p0) = G_GLOBAL_VALUE @addr
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-cmodel-large.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-cmodel-large.mir
index b2aadd7ca5c2d43..d0ac97e6908aa53 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-cmodel-large.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-cmodel-large.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s
+# RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -relocation-model=static -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=STATIC
+# RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -relocation-model=pic -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=PIC
 --- |
   target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
@@ -29,23 +30,35 @@ stack:
 constants:
 body:             |
   bb.1:
-    ; CHECK-LABEL: name: gv_large
-    ; CHECK: [[MOVZXi:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @foo1, 0
-    ; CHECK: [[MOVKXi:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi]], target-flags(aarch64-g1, aarch64-nc) @foo1, 16
-    ; CHECK: [[MOVKXi1:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi]], target-flags(aarch64-g2, aarch64-nc) @foo1, 32
-    ; CHECK: [[MOVKXi2:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi1]], target-flags(aarch64-g3) @foo1, 48
-    ; CHECK: [[COPY:%[0-9]+]]:gpr64sp = COPY [[MOVKXi2]]
-    ; CHECK: [[MOVZXi1:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @foo2, 0
-    ; CHECK: [[MOVKXi3:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi1]], target-flags(aarch64-g1, aarch64-nc) @foo2, 16
-    ; CHECK: [[MOVKXi4:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi3]], target-flags(aarch64-g2, aarch64-nc) @foo2, 32
-    ; CHECK: [[MOVKXi5:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi4]], target-flags(aarch64-g3) @foo2, 48
-    ; CHECK: [[COPY1:%[0-9]+]]:gpr64sp = COPY [[MOVKXi5]]
-    ; CHECK: STRWui $wzr, %stack.0.retval, 0 :: (store (s32) into %ir.retval)
-    ; CHECK: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui [[COPY]], 0 :: (load (s32) from @foo1)
-    ; CHECK: [[LDRWui1:%[0-9]+]]:gpr32 = LDRWui [[COPY1]], 0 :: (load (s32) from @foo2)
-    ; CHECK: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[LDRWui]], [[LDRWui1]]
-    ; CHECK: $w0 = COPY [[ADDWrr]]
-    ; CHECK: RET_ReallyLR implicit $w0
+    ; STATIC-LABEL: name: gv_large
+    ; STATIC: [[MOVZXi:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @foo1, 0
+    ; STATIC-NEXT: [[MOVKXi:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi]], target-flags(aarch64-g1, aarch64-nc) @foo1, 16
+    ; STATIC-NEXT: [[MOVKXi1:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi]], target-flags(aarch64-g2, aarch64-nc) @foo1, 32
+    ; STATIC-NEXT: [[MOVKXi2:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi1]], target-flags(aarch64-g3) @foo1, 48
+    ; STATIC-NEXT: [[COPY:%[0-9]+]]:gpr64sp = COPY [[MOVKXi2]]
+    ; STATIC-NEXT: [[MOVZXi1:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @foo2, 0
+    ; STATIC-NEXT: [[MOVKXi3:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi1]], target-flags(aarch64-g1, aarch64-nc) @foo2, 16
+    ; STATIC-NEXT: [[MOVKXi4:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi3]], target-flags(aarch64-g2, aarch64-nc) @foo2, 32
+    ; STATIC-NEXT: [[MOVKXi5:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi4]], target-flags(aarch64-g3) @foo2, 48
+    ; STATIC-NEXT: [[COPY1:%[0-9]+]]:gpr64sp = COPY [[MOVKXi5]]
+    ; STATIC-NEXT: STRWui $wzr, %stack.0.retval, 0 :: (store (s32) into %ir.retval)
+    ; STATIC-NEXT: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui [[COPY]], 0 :: (load (s32) from @foo1)
+    ; STATIC-NEXT: [[LDRWui1:%[0-9]+]]:gpr32 = LDRWui [[COPY1]], 0 :: (load (s32) from @foo2)
+    ; STATIC-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[LDRWui]], [[LDRWui1]]
+    ; STATIC-NEXT: $w0 = COPY [[ADDWrr]]
+    ; STATIC-NEXT: RET_ReallyLR implicit $w0
+    ;
+    ; PIC-LABEL: name: gv_large
+    ; PIC: [[MOVaddr:%[0-9]+]]:gpr64common = MOVaddr target-flags(aarch64-page) @foo1, target-flags(aarch64-pageoff, aarch64-nc) @foo1
+    ; PIC-NEXT: [[COPY:%[0-9]+]]:gpr64sp = COPY [[MOVaddr]]
+    ; PIC-NEXT: [[MOVaddr1:%[0-9]+]]:gpr64common = MOVaddr target-flags(aarch64-page) @foo2, target-flags(aarch64-pageoff, aarch64-nc) @foo2
+    ; PIC-NEXT: [[COPY1:%[0-9]+]]:gpr64sp = COPY [[MOVaddr1]]
+    ; PIC-NEXT: STRWui $wzr, %stack.0.retval, 0 :: (store (s32) into %ir.retval)
+    ; PIC-NEXT: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui [[COPY]], 0 :: (load (s32) from @foo1)
+    ; PIC-NEXT: [[LDRWui1:%[0-9]+]]:gpr32 = LDRWui [[COPY1]], 0 :: (load (s32) from @foo2)
+    ; PIC-NEXT: [[ADDWrr:%[0-9]+]]:gpr32 = ADDWrr [[LDRWui]], [[LDRWui1]]
+    ; PIC-NEXT: $w0 = COPY [[ADDWrr]]
+    ; PIC-NEXT: RET_ReallyLR implicit $w0
     %1:gpr(s32) = G_CONSTANT i32 0
     %4:gpr(p0) = G_GLOBAL_VALUE @foo1
     %3:gpr(p0) = COPY %4(p0)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-with-offset.mir b/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-with-offset.mir
index 8b4ae941eb4d70d..1713f36683ef93d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-with-offset.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/select-gv-with-offset.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=aarch64 -code-model=large -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=LARGE
+# RUN: llc -mtriple=aarch64 -code-model=large -relocation-model=pic -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=LARGE-PIC
 # RUN: llc -mtriple=aarch64 -code-model=small -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=SMALL
 # RUN: llc -mtriple=aarch64 -code-model=tiny -run-pass=instruction-select -verify-machineinstrs -O0 %s -o - | FileCheck %s --check-prefix=TINY
 
@@ -17,22 +18,34 @@ body:             |
     liveins: $x0
     ; LARGE-LABEL: name: select_gv_with_offset
     ; LARGE: liveins: $x0
-    ; LARGE: [[MOVZXi:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @g + 1, 0
-    ; LARGE: [[MOVKXi:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi]], target-flags(aarch64-g1, aarch64-nc) @g + 1, 16
-    ; LARGE: [[MOVKXi1:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi]], target-flags(aarch64-g2, aarch64-nc) @g + 1, 32
-    ; LARGE: %g:gpr64 = MOVKXi [[MOVKXi1]], target-flags(aarch64-g3) @g + 1, 48
-    ; LARGE: $x0 = COPY %g
-    ; LARGE: RET_ReallyLR implicit $x0
+    ; LARGE-NEXT: {{  $}}
+    ; LARGE-NEXT: [[MOVZXi:%[0-9]+]]:gpr64 = MOVZXi target-flags(aarch64-g0, aarch64-nc) @g + 1, 0
+    ; LARGE-NEXT: [[MOVKXi:%[0-9]+]]:gpr64 = MOVKXi [[MOVZXi]], target-flags(aarch64-g1, aarch64-nc) @g + 1, 16
+    ; LARGE-NEXT: [[MOVKXi1:%[0-9]+]]:gpr64 = MOVKXi [[MOVKXi]], target-flags(aarch64-g2, aarch64-nc) @g + 1, 32
+    ; LARGE-NEXT: %g:gpr64 = MOVKXi [[MOVKXi1]], target-flags(aarch64-g3) @g + 1, 48
+    ; LARGE-NEXT: $x0 = COPY %g
+    ; LARGE-NEXT: RET_ReallyLR implicit $x0
+    ;
+    ; LARGE-PIC-LABEL: name: select_gv_with_offset
+    ; LARGE-PIC: liveins: $x0
+    ; LARGE-PIC-NEXT: {{  $}}
+    ; LARGE-PIC-NEXT: %g:gpr64common = MOVaddr target-flags(aarch64-page) @g + 1, target-flags(aarch64-pageoff, aarch64-nc) @g + 1
+    ; LARGE-PIC-NEXT: $x0 = COPY %g
+    ; LARGE-PIC-NEXT: RET_ReallyLR implicit $x0
+    ;
     ; SMALL-LABEL: name: select_gv_with_offset
     ; SMALL: liveins: $x0
-    ; SMALL: %g:gpr64common = MOVaddr target-flags(aarch64-page) @g + 1, target-flags(aarch64-pageoff, aarch64-nc) @g + 1
-    ; SMALL: $x0 = COPY %g
-    ; SMALL: RET_ReallyLR implicit $x0
+    ; SMALL-NEXT: {{  $}}
+    ; SMALL-NEXT: %g:gpr64common = MOVaddr target-flags(aarch64-page) @g + 1, target-flags(aarch64-pageoff, aarch64-nc) @g + 1
+    ; SMALL-NEXT: $x0 = COPY %g
+    ; SMALL-NEXT: RET_ReallyLR implicit $x0
+    ;
     ; TINY-LABEL: name: select_gv_with_offset
     ; TINY: liveins: $x0
-    ; TINY: %g:gpr64 = ADR @g + 1
-    ; TINY: $x0 = COPY %g
-    ; TINY: RET_ReallyLR implicit $x0
+    ; TINY-NEXT: {{  $}}
+    ; TINY-NEXT: %g:gpr64 = ADR @g + 1
+    ; TINY-NEXT: $x0 = COPY %g
+    ; TINY-NEXT: RET_ReallyLR implicit $x0
     %g:gpr(p0) = G_GLOBAL_VALUE @g + 1
     $x0 = COPY %g(p0)
     RET_ReallyLR implicit $x0
diff --git a/llvm/test/CodeGen/AArch64/blockaddress.ll b/llvm/test/CodeGen/AArch64/blockaddress.ll
index 732cfc23292d7ee..b222d2f373192b8 100644
--- a/llvm/test/CodeGen/AArch64/blockaddress.ll
+++ b/llvm/test/CodeGen/AArch64/blockaddress.ll
@@ -1,5 +1,6 @@
 ; RUN: llc -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -verify-machineinstrs < %s | FileCheck %s
 ; RUN: llc -code-model=large -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -verify-machineinstrs < %s | FileCheck --check-prefix=CHECK-LARGE %s
+; RUN: llc -code-model=large -relocation-model=pic -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 < %s | FileCheck %s
 ; RUN: llc -code-model=tiny -mtriple=aarch64-none-elf -aarch64-enable-atomic-cfg-tidy=0 -verify-machineinstrs < %s | FileCheck --check-prefix=CHECK-TINY %s
 
 @addr = global ptr null
diff --git a/llvm/test/CodeGen/AArch64/code-model-large-abs.ll b/llvm/test/CodeGen/AArch64/code-model-large-abs.ll
deleted file mode 100644
index e6f1f28778df079..000000000000000
--- a/llvm/test/CodeGen/AArch64/code-model-large-abs.ll
+++ /dev/null
@@ -1,72 +0,0 @@
-; RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -o - %s | FileCheck %s
-
-@var8 = dso_local global i8 0
-@var16 = dso_local global i16 0
-@var32 = dso_local global i32 0
-@var64 = dso_local global i64 0
-
-define dso_local ptr @global_addr() {
-; CHECK-LABEL: global_addr:
-  ret ptr @var8
-  ; The movz/movk calculation should end up returned directly in x0.
-; CHECK: movz x0, #:abs_g0_nc:var8
-; CHECK: movk x0, #:abs_g1_nc:var8
-; CHECK: movk x0, #:abs_g2_nc:var8
-; CHECK: movk x0, #:abs_g3:var8
-; CHECK-NEXT: ret
-}
-
-define dso_local i8 @global_i8() {
-; CHECK-LABEL: global_i8:
-  %val = load i8, ptr @var8
-  ret i8 %val
-; CHECK: movz x[[ADDR_REG:[0-9]+]], #:abs_g0_nc:var8
-; CHECK: movk x[[ADDR_REG]], #:abs_g1_nc:var8
-; CHECK: movk x[[ADDR_REG]], #:abs_g2_nc:var8
-; CHECK: movk x[[ADDR_REG]], #:abs_g3:var8
-; CHECK: ldrb w0, [x[[ADDR_REG]]]
-}
-
-define dso_local i16 @global_i16() {
-; CHECK-LABEL: global_i16:
-  %val = load i16, ptr @var16
-  ret i16 %val
-; CHECK: movz x[[ADDR_REG:[0-9]+]], #:abs_g0_nc:var16
-; CHECK: movk x[[ADDR_REG]], #:abs_g1_nc:var16
-; CHECK: movk x[[ADDR_REG]], #:abs_g2_nc:var16
-; CHECK: movk x[[ADDR_REG]], #:abs_g3:var16
-; CHECK: ldrh w0, [x[[ADDR_REG]]]
-}
-
-define dso_local i32 @global_i32() {
-; CHECK-LABEL: global_i32:
-  %val = load i32, ptr @var32
-  ret i32 %val
-; CHECK: movz x[[ADDR_REG:[0-9]+]], #:abs_g0_nc:var32
-; CHECK: movk x[[ADDR_REG]], #:abs_g1_nc:var32
-; CHECK: movk x[[ADDR_REG]], #:abs_g2_nc:var32
-; CHECK: movk x[[ADDR_REG]], #:abs_g3:var32
-; CHECK: ldr w0, [x[[ADDR_REG]]]
-}
-
-define dso_local i64 @global_i64() {
-; CHECK-LABEL: global_i64:
-  %val = load i64, ptr @var64
-  ret i64 %val
-; CHECK: movz x[[ADDR_REG:[0-9]+]], #:abs_g0_nc:var64
-; CHECK: movk x[[ADDR_REG]], #:abs_g1_nc:var64
-; CHECK: movk x[[ADDR_REG]], #:abs_g2_nc:var64
-; CHECK: movk x[[ADDR_REG]], #:abs_g3:var64
-; CHECK: ldr x0, [x[[ADDR_REG]]]
-}
-
-define dso_local <2 x i64> @constpool() {
-; CHECK-LABEL: constpool:
-  ret <2 x i64> <i64 123456789, i64 987654321100>
-
-; CHECK: movz x[[ADDR_REG:[0-9]+]], #:abs_g0_nc:[[CPADDR:.LCPI[0-9]+_[0-9]+]]
-; CHECK: movk x[[ADDR_REG]], #:abs_g1_nc:[[CPADDR]]
-; CHECK: movk x[[ADDR_REG]], #:abs_g2_nc:[[CPADDR]]
-; CHECK: movk x[[ADDR_REG]], #:abs_g3:[[CPADDR]]
-; CHECK: ldr q0, [x[[ADDR_REG]]]
-}
diff --git a/llvm/test/CodeGen/AArch64/code-model-large.ll b/llvm/test/CodeGen/AArch64/code-model-large.ll
new file mode 100644
index 000000000000000..11133abcb8be35a
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/code-model-large.ll
@@ -0,0 +1,144 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -o - %s | FileCheck %s --check-prefix=STATIC
+; RUN: llc -mtriple=aarch64-linux-gnu -code-model=large -relocation-model=pic -o - %s | FileCheck %s --check-prefix=PIC
+
+@var8 = dso_local global i8 0
+@var16 = dso_local global i16 0
+@var32 = dso_local global i32 0
+@var64 = dso_local global i64 0
+
+define dso_local ptr @global_addr() {
+;
+; STATIC-LABEL: global_addr:
+; STATIC:       // %bb.0:
+; STATIC-NEXT:    movz x0, #:abs_g0_nc:var8
+; STATIC-NEXT:    movk x0, #:abs_g1_nc:var8
+; STATIC-NEXT:    movk x0, #:abs_g2_nc:var8
+; STATIC-NEXT:    movk x0, #:abs_g3:var8
+; STATIC-NEXT:    ret
+;
+; PIC-LABEL: global_addr:
+; PIC:       .Lglobal_addr$local:
+; PIC-NEXT:    .type .Lglobal_addr$local,@function
+; PIC-NEXT:    .cfi_startproc
+; PIC-NEXT:  // %bb.0:
+; PIC-NEXT:    adrp x0, .Lvar8$local
+; PIC-NEXT:    add x0, x0, :lo12:.Lvar8$local
+; PIC-NEXT:    ret
+  ret ptr @var8
+  ; The movz/movk calculation should end up returned directly in x0.
+}
+
+define dso_local i8 @global_i8() {
+;
+; STATIC-LABEL: global_i8:
+; STATIC:       // %bb.0:
+; STATIC-NEXT:    movz x8, #:abs_g0_nc:var8
+; STATIC-NEXT:    movk x8, #:abs_g1_nc:var8
+; STATIC-NEXT:    movk x8, #:abs_g2_nc:var8
+; STATIC-NEXT:    movk x8, #:abs_g3:var8
+; STATIC-NEXT:    ldrb w0, [x8]
+; STATIC-NEXT:    ret
+;
+; PIC-LABEL: global_i8:
+; PIC:       .Lglobal_i8$local:
+; PIC-NEXT:    .type .Lglobal_i8$local,@function
+; PIC-NEXT:    .cfi_startproc
+; PIC-NEXT:  // %bb.0:
+; PIC-NEXT:    adrp x8, .Lvar8$local
+; PIC-NEXT:    ldrb w0, [x8, :lo12:.Lvar8$local]
+; PIC-NEXT:    ret
+  %val = load i8, ptr @va...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 25, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@smithp35
Copy link
Collaborator

Doesn't this effectively make -mcmodel=large equivalent to -mcmodel=small when -fpie is used? I'm a bit nervous about this as it might lead people to believe that -mcmodel=large is working until their program breaks.

Personally I'd prefer an error message like GNU for now as it is marked as unsupported in the ABI https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#implementation-of-code-models

One thought I had about code-models and PIC/PIE was to force all global data accesses through the GOT, this would mean that although the GOT would have to be within 4 GiB of the code, global data accessed via the GOT could be anywhere within the address space. Whether that should be called large-code model or something else I don't know. I'm not sure that making a PC-relative offset from a sequence of MOVW using (R__MOVW_PREL_G0) to the GOT really helps for most programs.

@MaskRay
Copy link
Member Author

MaskRay commented Oct 25, 2023

Doesn't this effectively make -mcmodel=large equivalent to -mcmodel=small when -fpie is used? I'm a bit nervous about this as it might lead people to believe that -mcmodel=large is working until their program breaks.

Yes. The change is primarily about clarification: the large code model support in lib/Target/AArch64 is no-pic only. The isPositionIndependent conditions serve as a document that the PIC large code model needs to do something different.

If there are JIT-style workloads using large code model (if code is generated to already-existing memory mappings, the addresses are fixed and the relocation model is pretty much insignificant), this change will force them to use the correct static relocation model instead of relying on a possibly wrong PIC relocation model.

Personally I'd prefer an error message like GNU for now as it is marked as unsupported in the ABI ARM-software/abi-aa@main/sysvabi64/sysvabi64.rst#implementation-of-code-models

I was uncertain whether Clang Driver should reject it and the description currently says
"The PIC large code model is largely undefined in toolchains, but Clang doesn't reject -fpic -mcmodel=large as GCC does.".

Thanks for the opinion. I will sent a patch to reject -fpic -mcmodel=large for AArch64 on top of #53402

One thought I had about code-models and PIC/PIE was to force all global data accesses through the GOT, this would mean that although the GOT would have to be within 4 GiB of the code, global data accessed via the GOT could be anywhere within the address space. Whether that should be called large-code model or something else I don't know. I'm not sure that making a PC-relative offset from a sequence of MOVW using (R__MOVW_PREL_G0) to the GOT really helps for most programs.

Using GOT is Power's choice: https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models

Personally I think reusing small code model should be fine for an extended period of time as

For data references from code, x86-64 uses R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 relocations, which have a smaller range [-231,231). In contrast, AArch64 employs R_AARCH64_ADR_PREL_PG_HI21 relocations, which has a doubled range of [-232,232). This larger range makes it unlikely for AArch64 to encounter relocation overflow issues before the binary becomes excessively oversized for x86-64.

@MaskRay MaskRay requested a review from smithp35 October 31, 2023 00:38
Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this isn't necessarily needed for Clang now that #70262 has landed, but there may be other front-ends/JITs that could in theory try and use position independence and the large code-model.

Can you update the comment to mention #70262 . Will also need to change "but Clang doesn't reject -fpic -mcmodel=large as GCC does"

Code changes look reasonable to me.

There is no PIC support for -mcmodel=large
(https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst)
and Clang recently rejects -mcmodel= with PIC (llvm#70262).

The current backend code assumes that the large code model is non-PIC.
This patch adds `!getTargetMachine().isPositionIndependent()` conditions
to clarify that the support is non-PIC only. In addition, add some tests
as change detectors in case PIC large code model is supported in the
future.

If other front-ends/JITs use the large code model with PIC, they will
get small code model code sequence, instead of potentially-incorrect
MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence
will cause text relocations using ELF linkers.

(The small code model code sequence is usually sufficient as ADRP+ADD or
ADRP+LDR targets [-2**32,2**32), which has a doubled range of x86-64
R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-2**32,2**32).)
@MaskRay
Copy link
Member Author

MaskRay commented Oct 31, 2023

Rebase (no code change other than formatting).

Reword the description to mention #70262 and clarify that

This patch adds !getTargetMachine().isPositionIndependent() conditions to clarify that the support is non-PIC only. In addition, add some tests as change detectors in case PIC large code model is supported in the future.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. LGTM.

@MaskRay MaskRay merged commit a62b86a into llvm:main Nov 1, 2023
3 checks passed
@MaskRay MaskRay deleted the a64-pic-large-code-model branch November 1, 2023 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants