Skip to content

Commit

Permalink
Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC …
Browse files Browse the repository at this point in the history
…code models"

The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
  • Loading branch information
rnk committed Jun 25, 2018
1 parent b812847 commit 88fee5f
Show file tree
Hide file tree
Showing 14 changed files with 545 additions and 42 deletions.
10 changes: 6 additions & 4 deletions llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -940,11 +940,13 @@ bool X86DAGToDAGISel::matchWrapper(SDValue N, X86ISelAddressMode &AM) {

bool IsRIPRel = N.getOpcode() == X86ISD::WrapperRIP;

// Only do this address mode folding for 64-bit if we're in the small code
// model.
// FIXME: But we can do GOTPCREL addressing in the medium code model.
// We can't use an addressing mode in the 64-bit large code model. In the
// medium code model, we use can use an mode when RIP wrappers are present.
// That signifies access to globals that are known to be "near", such as the
// GOT itself.
CodeModel::Model M = TM.getCodeModel();
if (Subtarget->is64Bit() && M != CodeModel::Small && M != CodeModel::Kernel)
if (Subtarget->is64Bit() &&
(M == CodeModel::Large || (M == CodeModel::Medium && !IsRIPRel)))
return true;

// Base and index reg must be 0 in order to use %rip as base.
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Target/X86/X86InstrCompiler.td
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ let hasSideEffects = 0, isNotDuplicable = 1, Uses = [ESP, SSP],
def MOVPC32r : Ii32<0xE8, Pseudo, (outs GR32:$reg), (ins i32imm:$label),
"", []>;

// 64-bit large code model PIC base construction.
let hasSideEffects = 0, mayLoad = 1, isNotDuplicable = 1, SchedRW = [WriteJump] in
def MOVGOT64r : PseudoI<(outs GR64:$reg),
(ins GR64:$scratch, i64i32imm_pcrel:$got), []>;

// ADJCALLSTACKDOWN/UP implicitly use/def ESP because they may be expanded into
// a stack adjustment and the codegen must know that they may modify the stack
Expand Down
60 changes: 44 additions & 16 deletions llvm/lib/Target/X86/X86InstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11239,7 +11239,9 @@ isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
/// TODO: Eliminate this and move the code to X86MachineFunctionInfo.
///
unsigned X86InstrInfo::getGlobalBaseReg(MachineFunction *MF) const {
assert(!Subtarget.is64Bit() &&
assert((!Subtarget.is64Bit() ||
MF->getTarget().getCodeModel() == CodeModel::Medium ||
MF->getTarget().getCodeModel() == CodeModel::Large) &&
"X86-64 PIC uses RIP relative addressing");

X86MachineFunctionInfo *X86FI = MF->getInfo<X86MachineFunctionInfo>();
Expand All @@ -11250,7 +11252,8 @@ unsigned X86InstrInfo::getGlobalBaseReg(MachineFunction *MF) const {
// Create the register. The code to initialize it is inserted
// later, by the CGBR pass (below).
MachineRegisterInfo &RegInfo = MF->getRegInfo();
GlobalBaseReg = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
GlobalBaseReg = RegInfo.createVirtualRegister(
Subtarget.is64Bit() ? &X86::GR64_NOSPRegClass : &X86::GR32_NOSPRegClass);
X86FI->setGlobalBaseReg(GlobalBaseReg);
return GlobalBaseReg;
}
Expand Down Expand Up @@ -12624,9 +12627,10 @@ namespace {
static_cast<const X86TargetMachine *>(&MF.getTarget());
const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();

// Don't do anything if this is 64-bit as 64-bit PIC
// uses RIP relative addressing.
if (STI.is64Bit())
// Don't do anything in the 64-bit small and kernel code models. They use
// RIP-relative addressing for everything.
if (STI.is64Bit() && (TM->getCodeModel() == CodeModel::Small ||
TM->getCodeModel() == CodeModel::Kernel))
return false;

// Only emit a global base reg in PIC mode.
Expand All @@ -12653,17 +12657,41 @@ namespace {
else
PC = GlobalBaseReg;

// Operand of MovePCtoStack is completely ignored by asm printer. It's
// only used in JIT code emission as displacement to pc.
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::MOVPC32r), PC).addImm(0);

// If we're using vanilla 'GOT' PIC style, we should use relative addressing
// not to pc, but to _GLOBAL_OFFSET_TABLE_ external.
if (STI.isPICStyleGOT()) {
// Generate addl $__GLOBAL_OFFSET_TABLE_ + [.-piclabel], %some_register
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::ADD32ri), GlobalBaseReg)
.addReg(PC).addExternalSymbol("_GLOBAL_OFFSET_TABLE_",
X86II::MO_GOT_ABSOLUTE_ADDRESS);
if (STI.is64Bit()) {
if (TM->getCodeModel() == CodeModel::Medium) {
// In the medium code model, use a RIP-relative LEA to materialize the
// GOT.
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::LEA64r), PC)
.addReg(X86::RIP)
.addImm(0)
.addReg(0)
.addExternalSymbol("_GLOBAL_OFFSET_TABLE_")
.addReg(0);
} else if (TM->getCodeModel() == CodeModel::Large) {
// Loading the GOT in the large code model requires math with labels,
// so we use a pseudo instruction and expand it during MC emission.
unsigned Scratch = RegInfo.createVirtualRegister(&X86::GR64RegClass);
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::MOVGOT64r), PC)
.addReg(Scratch, RegState::Undef | RegState::Define)
.addExternalSymbol("_GLOBAL_OFFSET_TABLE_");
} else {
llvm_unreachable("unexpected code model");
}
} else {
// Operand of MovePCtoStack is completely ignored by asm printer. It's
// only used in JIT code emission as displacement to pc.
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::MOVPC32r), PC).addImm(0);

// If we're using vanilla 'GOT' PIC style, we should use relative
// addressing not to pc, but to _GLOBAL_OFFSET_TABLE_ external.
if (STI.isPICStyleGOT()) {
// Generate addl $__GLOBAL_OFFSET_TABLE_ + [.-piclabel],
// %some_register
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::ADD32ri), GlobalBaseReg)
.addReg(PC)
.addExternalSymbol("_GLOBAL_OFFSET_TABLE_",
X86II::MO_GOT_ABSOLUTE_ADDRESS);
}
}

return true;
Expand Down
35 changes: 35 additions & 0 deletions llvm/lib/Target/X86/X86MCInstLower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1982,6 +1982,41 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
return;
}

case X86::MOVGOT64r: {
// Materializes the GOT for the 64-bit large code model.
MCSymbol *DotSym = OutContext.createTempSymbol();
OutStreamer->EmitLabel(DotSym);

unsigned DstReg = MI->getOperand(0).getReg();
unsigned ScratchReg = MI->getOperand(1).getReg();
MCSymbol *GOTSym = MCInstLowering.GetSymbolFromOperand(MI->getOperand(2));

// .LtmpN: leaq .LtmpN(%rip), %dst
const MCExpr *DotExpr = MCSymbolRefExpr::create(DotSym, OutContext);
EmitAndCountInstruction(MCInstBuilder(X86::LEA64r)
.addReg(DstReg) // dest
.addReg(X86::RIP) // base
.addImm(1) // scale
.addReg(0) // index
.addExpr(DotExpr) // disp
.addReg(0)); // seg

// movq $_GLOBAL_OFFSET_TABLE_ - .LtmpN, %scratch
const MCExpr *GOTSymExpr = MCSymbolRefExpr::create(GOTSym, OutContext);
const MCExpr *GOTDiffExpr =
MCBinaryExpr::createSub(GOTSymExpr, DotExpr, OutContext);
EmitAndCountInstruction(MCInstBuilder(X86::MOV64ri)
.addReg(ScratchReg) // dest
.addExpr(GOTDiffExpr)); // disp

// addq %scratch, %dst
EmitAndCountInstruction(MCInstBuilder(X86::ADD64rr)
.addReg(DstReg) // dest
.addReg(DstReg) // dest
.addReg(ScratchReg)); // src
return;
}

case X86::ADD32ri: {
// Lower the MO_GOT_ABSOLUTE_ADDRESS form of ADD32ri.
if (MI->getOperand(2).getTargetFlags() != X86II::MO_GOT_ABSOLUTE_ADDRESS)
Expand Down
34 changes: 25 additions & 9 deletions llvm/lib/Target/X86/X86Subtarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,31 @@ X86Subtarget::classifyGlobalReference(const GlobalValue *GV) const {

unsigned char
X86Subtarget::classifyLocalReference(const GlobalValue *GV) const {
// 64 bits can use %rip addressing for anything local.
if (is64Bit())
return X86II::MO_NO_FLAG;

// If this is for a position dependent executable, the static linker can
// figure it out.
// If we're not PIC, it's not very interesting.
if (!isPositionIndependent())
return X86II::MO_NO_FLAG;

// For 64-bit, we need to consider the code model.
if (is64Bit()) {
switch (TM.getCodeModel()) {
// 64-bit small code model is simple: All rip-relative.
case CodeModel::Small:
case CodeModel::Kernel:
return X86II::MO_NO_FLAG;

// The large PIC code model uses GOTOFF.
case CodeModel::Large:
return X86II::MO_GOTOFF;

// Medium is a hybrid: RIP-rel for code, GOTOFF for DSO local data.
case CodeModel::Medium:
if (isa<Function>(GV))
return X86II::MO_NO_FLAG; // All code is RIP-relative
return X86II::MO_GOTOFF; // Local symbols use GOTOFF.
}
llvm_unreachable("invalid code model");
}

// The COFF dynamic linker just patches the executable sections.
if (isTargetCOFF())
return X86II::MO_NO_FLAG;
Expand All @@ -97,8 +113,8 @@ X86Subtarget::classifyLocalReference(const GlobalValue *GV) const {

unsigned char X86Subtarget::classifyGlobalReference(const GlobalValue *GV,
const Module &M) const {
// Large model never uses stubs.
if (TM.getCodeModel() == CodeModel::Large)
// The static large model never uses stubs.
if (TM.getCodeModel() == CodeModel::Large && !isPositionIndependent())
return X86II::MO_NO_FLAG;

// Absolute symbols can be referenced directly.
Expand All @@ -120,7 +136,7 @@ unsigned char X86Subtarget::classifyGlobalReference(const GlobalValue *GV,
if (isTargetCOFF())
return X86II::MO_DLLIMPORT;

if (is64Bit())
if (is64Bit() && TM.getCodeModel() != CodeModel::Large)
return X86II::MO_GOTPCREL;

if (isTargetDarwin()) {
Expand Down
8 changes: 7 additions & 1 deletion llvm/lib/Target/X86/X86TargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,15 @@ static std::string computeDataLayout(const Triple &TT) {
}

static Reloc::Model getEffectiveRelocModel(const Triple &TT,
bool JIT,
Optional<Reloc::Model> RM) {
bool is64Bit = TT.getArch() == Triple::x86_64;
if (!RM.hasValue()) {
// JIT codegen should use static relocations by default, since it's
// typically executed in process and not relocatable.
if (JIT)
return Reloc::Static;

// Darwin defaults to PIC in 64 bit mode and dynamic-no-pic in 32 bit mode.
// Win64 requires rip-rel addressing, thus we force it to PIC. Otherwise we
// use static relocation model by default.
Expand Down Expand Up @@ -210,7 +216,7 @@ X86TargetMachine::X86TargetMachine(const Target &T, const Triple &TT,
CodeGenOpt::Level OL, bool JIT)
: LLVMTargetMachine(
T, computeDataLayout(TT), TT, CPU, FS, Options,
getEffectiveRelocModel(TT, RM),
getEffectiveRelocModel(TT, JIT, RM),
getEffectiveCodeModel(CM, JIT, TT.getArch() == Triple::x86_64), OL),
TLOF(createTLOF(getTargetTriple())) {
// Windows stack unwinder gets confused when execution flow "falls through"
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/X86/cleanuppad-large-codemodel.ll
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
; RUN: llc -mtriple=x86_64-pc-windows-msvc -code-model=large -o - < %s | FileCheck %s
; RUN: llc -mtriple=x86_64-pc-windows-msvc -code-model=large -relocation-model=static -o - < %s | FileCheck %s

declare i32 @__CxxFrameHandler3(...)

Expand Down
Loading

0 comments on commit 88fee5f

Please sign in to comment.