Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel #78890

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jan 21, 2024

Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change fastLowerCall to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in call-rv-marker.ll is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Created using spr 1.3.4
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 21, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Fangrui Song (MaskRay)

Changes

Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change fastLowerCall to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275


Full diff: https://github.com/llvm/llvm-project/pull/78890.diff

8 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/CallLowering.cpp (+10-3)
  • (modified) llvm/lib/Target/AArch64/AArch64FastISel.cpp (+7)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+5-4)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+6-5)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp (+12-1)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+12-4)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+3)
  • (modified) llvm/test/CodeGen/AArch64/nonlazybind.ll (+38-43)
diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index ccd9b13d730b60..d3484e5229e704 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -144,9 +144,16 @@ bool CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, const CallBase &CB,
   // Try looking through a bitcast from one function type to another.
   // Commonly happens with calls to objc_msgSend().
   const Value *CalleeV = CB.getCalledOperand()->stripPointerCasts();
-  if (const Function *F = dyn_cast<Function>(CalleeV))
-    Info.Callee = MachineOperand::CreateGA(F, 0);
-  else if (isa<GlobalIFunc>(CalleeV) || isa<GlobalAlias>(CalleeV)) {
+  if (const Function *F = dyn_cast<Function>(CalleeV)) {
+    if (F->hasFnAttribute(Attribute::NonLazyBind)) {
+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
+      MIRBuilder.buildGlobalValue(Reg, F);
+      Info.Callee = MachineOperand::CreateReg(Reg, false);
+    } else {
+      Info.Callee = MachineOperand::CreateGA(F, 0);
+    }
+  } else if (isa<GlobalIFunc>(CalleeV) || isa<GlobalAlias>(CalleeV)) {
     // IR IFuncs and Aliases can't be forward declared (only defined), so the
     // callee must be in the same TU and therefore we can direct-call it without
     // worrying about it being out of range.
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index e98f6c4984a752..93d6024f34c09c 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3202,6 +3202,13 @@ bool AArch64FastISel::fastLowerCall(CallLoweringInfo &CLI) {
   if (Callee && !computeCallAddress(Callee, Addr))
     return false;
 
+  // MO_GOT is not handled. -fno-plt compiled intrinsic calls do not have the
+  // nonlazybind attribute. Check "RtLibUseGOT" instead.
+  if ((Subtarget->classifyGlobalFunctionReference(Addr.getGlobalValue(), TM) !=
+       AArch64II::MO_NO_FLAG) ||
+      MF->getFunction().getParent()->getRtLibUseGOT())
+    return false;
+
   // The weak function target may be zero; in that case we must use indirect
   // addressing via a stub on windows as it may be out of range for a
   // PC-relative jump.
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 96ea692d03f563..56de890c78deca 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7969,13 +7969,14 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
       Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, 0);
     }
   } else if (auto *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
-    if (getTargetMachine().getCodeModel() == CodeModel::Large &&
-        Subtarget->isTargetMachO()) {
-      const char *Sym = S->getSymbol();
+    bool UseGot = (getTargetMachine().getCodeModel() == CodeModel::Large &&
+                   Subtarget->isTargetMachO()) ||
+                  MF.getFunction().getParent()->getRtLibUseGOT();
+    const char *Sym = S->getSymbol();
+    if (UseGot) {
       Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, AArch64II::MO_GOT);
       Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
     } else {
-      const char *Sym = S->getSymbol();
       Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
     }
   }
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index cf57d950ae8d7f..c4c6827313b5e1 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -43,10 +43,10 @@ static cl::opt<bool>
 UseAddressTopByteIgnored("aarch64-use-tbi", cl::desc("Assume that top byte of "
                          "an address is ignored"), cl::init(false), cl::Hidden);
 
-static cl::opt<bool>
-    UseNonLazyBind("aarch64-enable-nonlazybind",
-                   cl::desc("Call nonlazybind functions via direct GOT load"),
-                   cl::init(false), cl::Hidden);
+static cl::opt<bool> MachOUseNonLazyBind(
+    "aarch64-macho-enable-nonlazybind",
+    cl::desc("Call nonlazybind functions via direct GOT load for Mach-O"),
+    cl::Hidden);
 
 static cl::opt<bool> UseAA("aarch64-use-aa", cl::init(true),
                            cl::desc("Enable the use of AA during codegen."));
@@ -434,7 +434,8 @@ unsigned AArch64Subtarget::classifyGlobalFunctionReference(
 
   // NonLazyBind goes via GOT unless we know it's available locally.
   auto *F = dyn_cast<Function>(GV);
-  if (UseNonLazyBind && F && F->hasFnAttribute(Attribute::NonLazyBind) &&
+  if ((!isTargetMachO() || MachOUseNonLazyBind) && F &&
+      F->hasFnAttribute(Attribute::NonLazyBind) &&
       !TM.shouldAssumeDSOLocal(*GV->getParent(), GV))
     return AArch64II::MO_GOT;
 
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
index 84057ea8d2214a..773eadbf34de37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
@@ -1273,8 +1273,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder,
            !Subtarget.noBTIAtReturnTwice() &&
            MF.getInfo<AArch64FunctionInfo>()->branchTargetEnforcement())
     Opc = AArch64::BLR_BTI;
-  else
+  else {
+    // For an intrinsic call (e.g. memset), use GOT if "RtLibUseGOT" (-fno-plt)
+    // is set.
+    if (Info.Callee.isSymbol() && F.getParent()->getRtLibUseGOT()) {
+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F.getType(), DL));
+      auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE);
+      DstOp(Reg).addDefToMIB(MRI, MIB);
+      MIB.addExternalSymbol(Info.Callee.getSymbolName(), AArch64II::MO_GOT);
+      Info.Callee = MachineOperand::CreateReg(Reg, false);
+    }
     Opc = getCallOpcode(MF, Info.Callee.isReg(), false);
+  }
 
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
   unsigned CalleeOpNo = 0;
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 8344e79f78e1eb..e60db260e3ef10 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2841,11 +2841,19 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
   }
 
   case TargetOpcode::G_GLOBAL_VALUE: {
-    auto GV = I.getOperand(1).getGlobal();
-    if (GV->isThreadLocal())
-      return selectTLSGlobalValue(I, MRI);
+    const GlobalValue *GV = nullptr;
+    unsigned OpFlags;
+    if (I.getOperand(1).isSymbol()) {
+      OpFlags = I.getOperand(1).getTargetFlags();
+      // Currently only used by "RtLibUseGOT".
+      assert(OpFlags == AArch64II::MO_GOT);
+    } else {
+      GV = I.getOperand(1).getGlobal();
+      if (GV->isThreadLocal())
+        return selectTLSGlobalValue(I, MRI);
+      OpFlags = STI.ClassifyGlobalReference(GV, TM);
+    }
 
-    unsigned OpFlags = STI.ClassifyGlobalReference(GV, TM);
     if (OpFlags & AArch64II::MO_GOT) {
       I.setDesc(TII.get(AArch64::LOADgot));
       I.getOperand(1).setTargetFlags(OpFlags);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index b561cb12c93a1c..83137949d0f244 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1314,6 +1314,9 @@ bool AArch64LegalizerInfo::legalizeSmallCMGlobalValue(
   // By splitting this here, we can optimize accesses in the small code model by
   // folding in the G_ADD_LOW into the load/store offset.
   auto &GlobalOp = MI.getOperand(1);
+  // Don't modify an intrinsic call.
+  if (GlobalOp.isSymbol())
+    return true;
   const auto* GV = GlobalOp.getGlobal();
   if (GV->isThreadLocal())
     return true; // Don't want to modify TLS vars.
diff --git a/llvm/test/CodeGen/AArch64/nonlazybind.ll b/llvm/test/CodeGen/AArch64/nonlazybind.ll
index 669a8ee04b2492..f5bb3a4ecbc9a0 100644
--- a/llvm/test/CodeGen/AArch64/nonlazybind.ll
+++ b/llvm/test/CodeGen/AArch64/nonlazybind.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=aarch64-apple-ios %s -o - -aarch64-enable-nonlazybind | FileCheck %s --check-prefix=MACHO
+; RUN: llc -mtriple=aarch64-apple-ios %s -o - -aarch64-macho-enable-nonlazybind | FileCheck %s --check-prefix=MACHO
 ; RUN: llc -mtriple=aarch64-apple-ios %s -o - | FileCheck %s --check-prefix=MACHO-NORMAL
 ; RUN: llc -mtriple=aarch64 -fast-isel %s -o - | FileCheck %s --check-prefixes=ELF,ELF-FI
 ; RUN: llc -mtriple=aarch64 -global-isel %s -o - | FileCheck %s --check-prefixes=ELF,ELF-GI
@@ -19,13 +19,18 @@ define void @test_laziness(ptr %a) nounwind {
 ; MACHO-NEXT:  Lloh1:
 ; MACHO-NEXT:    ldr x8, [x8, _external@GOTPAGEOFF]
 ; MACHO-NEXT:    blr x8
+; MACHO-NEXT:  Lloh2:
+; MACHO-NEXT:    adrp x8, _memset@GOTPAGE
 ; MACHO-NEXT:    mov x0, x19
 ; MACHO-NEXT:    mov w1, #1 ; =0x1
+; MACHO-NEXT:  Lloh3:
+; MACHO-NEXT:    ldr x8, [x8, _memset@GOTPAGEOFF]
 ; MACHO-NEXT:    mov w2, #1000 ; =0x3e8
-; MACHO-NEXT:    bl _memset
+; MACHO-NEXT:    blr x8
 ; MACHO-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
 ; MACHO-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
 ; MACHO-NEXT:    ret
+; MACHO-NEXT:    .loh AdrpLdrGot Lloh2, Lloh3
 ; MACHO-NEXT:    .loh AdrpLdrGot Lloh0, Lloh1
 ;
 ; MACHO-NORMAL-LABEL: test_laziness:
@@ -34,50 +39,34 @@ define void @test_laziness(ptr %a) nounwind {
 ; MACHO-NORMAL-NEXT:    stp x29, x30, [sp, #16] ; 16-byte Folded Spill
 ; MACHO-NORMAL-NEXT:    mov x19, x0
 ; MACHO-NORMAL-NEXT:    bl _external
+; MACHO-NORMAL-NEXT:  Lloh0:
+; MACHO-NORMAL-NEXT:    adrp x8, _memset@GOTPAGE
 ; MACHO-NORMAL-NEXT:    mov x0, x19
 ; MACHO-NORMAL-NEXT:    mov w1, #1 ; =0x1
+; MACHO-NORMAL-NEXT:  Lloh1:
+; MACHO-NORMAL-NEXT:    ldr x8, [x8, _memset@GOTPAGEOFF]
 ; MACHO-NORMAL-NEXT:    mov w2, #1000 ; =0x3e8
-; MACHO-NORMAL-NEXT:    bl _memset
+; MACHO-NORMAL-NEXT:    blr x8
 ; MACHO-NORMAL-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
 ; MACHO-NORMAL-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
 ; MACHO-NORMAL-NEXT:    ret
+; MACHO-NORMAL-NEXT:    .loh AdrpLdrGot Lloh0, Lloh1
 ;
-; ELF-FI-LABEL: test_laziness:
-; ELF-FI:       // %bb.0:
-; ELF-FI-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-FI-NEXT:    mov x19, x0
-; ELF-FI-NEXT:    bl external
-; ELF-FI-NEXT:    mov w8, #1 // =0x1
-; ELF-FI-NEXT:    mov x0, x19
-; ELF-FI-NEXT:    mov x2, #1000 // =0x3e8
-; ELF-FI-NEXT:    uxtb w1, w8
-; ELF-FI-NEXT:    bl memset
-; ELF-FI-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-FI-NEXT:    ret
-;
-; ELF-GI-LABEL: test_laziness:
-; ELF-GI:       // %bb.0:
-; ELF-GI-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-GI-NEXT:    mov x19, x0
-; ELF-GI-NEXT:    bl external
-; ELF-GI-NEXT:    mov x0, x19
-; ELF-GI-NEXT:    mov w1, #1 // =0x1
-; ELF-GI-NEXT:    mov w2, #1000 // =0x3e8
-; ELF-GI-NEXT:    bl memset
-; ELF-GI-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-GI-NEXT:    ret
-;
-; ELF-SDAG-LABEL: test_laziness:
-; ELF-SDAG:       // %bb.0:
-; ELF-SDAG-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-SDAG-NEXT:    mov x19, x0
-; ELF-SDAG-NEXT:    bl external
-; ELF-SDAG-NEXT:    mov x0, x19
-; ELF-SDAG-NEXT:    mov w1, #1 // =0x1
-; ELF-SDAG-NEXT:    mov w2, #1000 // =0x3e8
-; ELF-SDAG-NEXT:    bl memset
-; ELF-SDAG-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-SDAG-NEXT:    ret
+; ELF-LABEL: test_laziness:
+; ELF:       // %bb.0:
+; ELF-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; ELF-NEXT:    adrp x8, :got:external
+; ELF-NEXT:    mov x19, x0
+; ELF-NEXT:    ldr x8, [x8, :got_lo12:external]
+; ELF-NEXT:    blr x8
+; ELF-NEXT:    adrp x8, :got:memset
+; ELF-NEXT:    mov x0, x19
+; ELF-NEXT:    mov w1, #1 // =0x1
+; ELF-NEXT:    ldr x8, [x8, :got_lo12:memset]
+; ELF-NEXT:    mov w2, #1000 // =0x3e8
+; ELF-NEXT:    blr x8
+; ELF-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; ELF-NEXT:    ret
   call void @external()
   call void @llvm.memset.p0.i64(ptr align 1 %a, i8 1, i64 1000, i1 false)
   ret void
@@ -86,12 +75,12 @@ define void @test_laziness(ptr %a) nounwind {
 define void @test_laziness_tail() nounwind {
 ; MACHO-LABEL: test_laziness_tail:
 ; MACHO:       ; %bb.0:
-; MACHO-NEXT:  Lloh2:
+; MACHO-NEXT:  Lloh4:
 ; MACHO-NEXT:    adrp x0, _external@GOTPAGE
-; MACHO-NEXT:  Lloh3:
+; MACHO-NEXT:  Lloh5:
 ; MACHO-NEXT:    ldr x0, [x0, _external@GOTPAGEOFF]
 ; MACHO-NEXT:    br x0
-; MACHO-NEXT:    .loh AdrpLdrGot Lloh2, Lloh3
+; MACHO-NEXT:    .loh AdrpLdrGot Lloh4, Lloh5
 ;
 ; MACHO-NORMAL-LABEL: test_laziness_tail:
 ; MACHO-NORMAL:       ; %bb.0:
@@ -99,7 +88,9 @@ define void @test_laziness_tail() nounwind {
 ;
 ; ELF-LABEL: test_laziness_tail:
 ; ELF:       // %bb.0:
-; ELF-NEXT:    b external
+; ELF-NEXT:    adrp x0, :got:external
+; ELF-NEXT:    ldr x0, [x0, :got_lo12:external]
+; ELF-NEXT:    br x0
   tail call void @external()
   ret void
 }
@@ -108,3 +99,7 @@ declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
 
 !llvm.module.flags = !{!0}
 !0 = !{i32 7, !"RtLibUseGOT", i32 1}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; ELF-FI: {{.*}}
+; ELF-GI: {{.*}}
+; ELF-SDAG: {{.*}}

@hstk30-hw hstk30-hw self-requested a review January 21, 2024 12:37
@MaskRay
Copy link
Member Author

MaskRay commented Jan 29, 2024

Ping:)

@davemgreen
Copy link
Collaborator

It looks like some of the tests might be failing? Or does it need a rebase?

@MaskRay
Copy link
Member Author

MaskRay commented Jan 30, 2024

It looks like some of the tests might be failing? Or does it need a rebase?

Sorry about it. The last minute FastISel change caused some failures. Addr.getGlobalValue() could be null. I have simplified AArch64FastISel.cpp to just check "RtLibUseGOT" (added by -fno-plt) and fixed the failures.

Comment on lines 1302 to 1303
auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE);
DstOp(Reg).addDefToMIB(MRI, MIB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use buildGlobalValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is to handle library calls.
buildGlobalValue builds a G_GLOBAL_VALUE with MIB.addGlobalAddress(GV);. Here we use an ExternalSymbol, and cannot use buildGlobalValue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing overload then, should try to avoid raw buildInstr calls when possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you propose a new MachineIRBuilder API? This is a special use of buildInstr(TargetOpcode::G_GLOBAL_VALUE) and there is only one. I think avoid adding another API is good for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You lose CSE of these values by not going through the complete buildInstr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildInstr(TargetOpcode::G_GLOBAL_VALUE) cannot be CSEed today, and I think this patch should not change it.

It seems that only the buildInstr overload with DstOps/SrcOps can perform CSE. MachineIRBuilder::buildGlobalValue does not call this overload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using raw buildInstr is a worse API. Please just add the new overload

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I think adding a function similar to buildGlobalValue, either omits const GlobalValue *GV or changes the argument to const char*, will add confusion, as this is a very special use case.

I slightly simplified the code in the just-pushed commit.

; GISEL-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
; GISEL-NEXT: b _objc_release
; GISEL-NEXT: br x1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhahn, @TNorthover do these sound OK to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Feb 13, 2024
Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change `fastLowerCall` to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in `call-rv-marker.ll` is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: llvm#78275

Pull Request: llvm#78890
@MaskRay
Copy link
Member Author

MaskRay commented Feb 28, 2024

Ping:)

Comment on lines 149 to 151
auto Reg =
MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
MIRBuilder.buildGlobalValue(Reg, F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto Reg =
MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
MIRBuilder.buildGlobalValue(Reg, F);
Register Reg = MIRBuilder.buildGlobalValue(getLLTForType(*F->getType(), DL), F).getReg(0);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After clang-format, this becomes

-      auto Reg =
-          MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
-      MIRBuilder.buildGlobalValue(Reg, F);
+      Register Reg =
+          MIRBuilder.buildGlobalValue(getLLTForType(*F->getType(), DL), F)
+              .getReg(0);

which is not shorter... In addition, the new code AArch64CallLowering.cpp uses createGenericVirtualRegister. So sticking with createGenericVirtualRegister here adds consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the getLLTForType to a variable to make the line shorter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw MRI.createGenericVirtualRegister calls should be purged whenever possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in de93d86 !

Comment on lines 1308 to 1314
auto Reg =
MRI.createGenericVirtualRegister(getLLTForType(*F.getType(), DL));
MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE)
.addDef(Reg)
.addExternalSymbol(Info.Callee.getSymbolName(), AArch64II::MO_GOT);
Info.Callee = MachineOperand::CreateReg(Reg, false);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still should fix this API

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from an AArch64 perspective I think this looks OK too. Thanks

@MaskRay MaskRay merged commit 201572e into main Mar 5, 2024
3 of 4 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/aarch64-implement-fno-plt-for-selectiondagglobalisel branch March 5, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants