Skip to content

Commit

Permalink
hwasan: Move memory access checks into small outlined functions on aa…
Browse files Browse the repository at this point in the history
…rch64.

Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
  • Loading branch information
pcc committed Jan 23, 2019
1 parent 7cf2720 commit 73078ec
Show file tree
Hide file tree
Showing 13 changed files with 435 additions and 202 deletions.
34 changes: 26 additions & 8 deletions compiler-rt/lib/hwasan/hwasan_linux.cc
Expand Up @@ -368,22 +368,27 @@ static AccessInfo GetAccessInfo(siginfo_t *info, ucontext_t *uc) {
return AccessInfo{addr, size, is_store, !is_store, recover};
}

static bool HwasanOnSIGTRAP(int signo, siginfo_t *info, ucontext_t *uc) {
AccessInfo ai = GetAccessInfo(info, uc);
if (!ai.is_store && !ai.is_load)
return false;

static void HandleTagMismatch(AccessInfo ai, uptr pc, uptr frame,
ucontext_t *uc) {
InternalMmapVector<BufferedStackTrace> stack_buffer(1);
BufferedStackTrace *stack = stack_buffer.data();
stack->Reset();
SignalContext sig{info, uc};
GetStackTrace(stack, kStackTraceMax, StackTrace::GetNextInstructionPc(sig.pc),
sig.bp, uc, common_flags()->fast_unwind_on_fatal);
GetStackTrace(stack, kStackTraceMax, pc, frame, uc,
common_flags()->fast_unwind_on_fatal);

++hwasan_report_count;

bool fatal = flags()->halt_on_error || !ai.recover;
ReportTagMismatch(stack, ai.addr, ai.size, ai.is_store, fatal);
}

static bool HwasanOnSIGTRAP(int signo, siginfo_t *info, ucontext_t *uc) {
AccessInfo ai = GetAccessInfo(info, uc);
if (!ai.is_store && !ai.is_load)
return false;

SignalContext sig{info, uc};
HandleTagMismatch(ai, StackTrace::GetNextInstructionPc(sig.pc), sig.bp, uc);

#if defined(__aarch64__)
uc->uc_mcontext.pc += 4;
Expand All @@ -394,6 +399,19 @@ static bool HwasanOnSIGTRAP(int signo, siginfo_t *info, ucontext_t *uc) {
return true;
}

extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __hwasan_tag_mismatch(
uptr addr, uptr access_info) {
AccessInfo ai;
ai.is_store = access_info & 0x10;
ai.recover = false;
ai.addr = addr;
ai.size = 1 << (access_info & 0xf);

HandleTagMismatch(ai, (uptr)__builtin_return_address(0),
(uptr)__builtin_frame_address(0), nullptr);
__builtin_unreachable();
}

static void OnStackUnwind(const SignalContext &sig, const void *,
BufferedStackTrace *stack) {
GetStackTrace(stack, kStackTraceMax, StackTrace::GetNextInstructionPc(sig.pc),
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Expand Up @@ -1048,6 +1048,9 @@ def int_icall_branch_funnel : Intrinsic<[], [llvm_vararg_ty], []>;
def int_load_relative: Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_anyint_ty],
[IntrReadMem, IntrArgMemOnly]>;

def int_hwasan_check_memaccess :
Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty, llvm_i32_ty], [IntrInaccessibleMemOnly]>;

// Xray intrinsics
//===----------------------------------------------------------------------===//
// Custom event logging for x-ray.
Expand Down
112 changes: 112 additions & 0 deletions llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
Expand Up @@ -28,6 +28,7 @@
#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"
#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"
Expand All @@ -43,6 +44,7 @@
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstBuilder.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/Casting.h"
Expand Down Expand Up @@ -95,6 +97,10 @@ class AArch64AsmPrinter : public AsmPrinter {
void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);

std::map<std::pair<unsigned, uint32_t>, MCSymbol *> HwasanMemaccessSymbols;
void LowerHWASAN_CHECK_MEMACCESS(const MachineInstr &MI);
void EmitHwasanMemaccessSymbols(Module &M);

void EmitSled(const MachineInstr &MI, SledKind Kind);

/// tblgen'erated driver function for lowering simple MI->MC
Expand Down Expand Up @@ -229,7 +235,109 @@ void AArch64AsmPrinter::EmitSled(const MachineInstr &MI, SledKind Kind)
recordSled(CurSled, MI, Kind);
}

void AArch64AsmPrinter::LowerHWASAN_CHECK_MEMACCESS(const MachineInstr &MI) {
unsigned Reg = MI.getOperand(0).getReg();
uint32_t AccessInfo = MI.getOperand(1).getImm();
MCSymbol *&Sym = HwasanMemaccessSymbols[{Reg, AccessInfo}];
if (!Sym) {
// FIXME: Make this work on non-ELF.
if (!TM.getTargetTriple().isOSBinFormatELF())
report_fatal_error("llvm.hwasan.check.memaccess only supported on ELF");

std::string SymName = "__hwasan_check_x" + utostr(Reg - AArch64::X0) + "_" +
utostr(AccessInfo);
Sym = OutContext.getOrCreateSymbol(SymName);
}

EmitToStreamer(*OutStreamer,
MCInstBuilder(AArch64::BL)
.addExpr(MCSymbolRefExpr::create(Sym, OutContext)));
}

void AArch64AsmPrinter::EmitHwasanMemaccessSymbols(Module &M) {
if (HwasanMemaccessSymbols.empty())
return;

const Triple &TT = TM.getTargetTriple();
assert(TT.isOSBinFormatELF());
std::unique_ptr<MCSubtargetInfo> STI(
TM.getTarget().createMCSubtargetInfo(TT.str(), "", ""));

MCSymbol *HwasanTagMismatchSym =
OutContext.getOrCreateSymbol("__hwasan_tag_mismatch");

for (auto &P : HwasanMemaccessSymbols) {
unsigned Reg = P.first.first;
uint32_t AccessInfo = P.first.second;
MCSymbol *Sym = P.second;

OutStreamer->SwitchSection(OutContext.getELFSection(
".text.hot", ELF::SHT_PROGBITS,
ELF::SHF_EXECINSTR | ELF::SHF_ALLOC | ELF::SHF_GROUP, 0,
Sym->getName()));

OutStreamer->EmitSymbolAttribute(Sym, MCSA_ELF_TypeFunction);
OutStreamer->EmitSymbolAttribute(Sym, MCSA_Weak);
OutStreamer->EmitSymbolAttribute(Sym, MCSA_Hidden);
OutStreamer->EmitLabel(Sym);

OutStreamer->EmitInstruction(MCInstBuilder(AArch64::UBFMXri)
.addReg(AArch64::X16)
.addReg(Reg)
.addImm(4)
.addImm(55),
*STI);
OutStreamer->EmitInstruction(MCInstBuilder(AArch64::LDRBBroX)
.addReg(AArch64::W16)
.addReg(AArch64::X9)
.addReg(AArch64::X16)
.addImm(0)
.addImm(0),
*STI);
OutStreamer->EmitInstruction(MCInstBuilder(AArch64::UBFMXri)
.addReg(AArch64::X17)
.addReg(Reg)
.addImm(56)
.addImm(63),
*STI);
OutStreamer->EmitInstruction(MCInstBuilder(AArch64::SUBSWrs)
.addReg(AArch64::WZR)
.addReg(AArch64::W16)
.addReg(AArch64::W17)
.addImm(0),
*STI);
MCSymbol *HandleMismatchSym = OutContext.createTempSymbol();
OutStreamer->EmitInstruction(
MCInstBuilder(AArch64::Bcc)
.addImm(AArch64CC::NE)
.addExpr(MCSymbolRefExpr::create(HandleMismatchSym, OutContext)),
*STI);
OutStreamer->EmitInstruction(
MCInstBuilder(AArch64::RET).addReg(AArch64::LR), *STI);

OutStreamer->EmitLabel(HandleMismatchSym);
if (Reg != AArch64::X0)
OutStreamer->EmitInstruction(MCInstBuilder(AArch64::ORRXrs)
.addReg(AArch64::X0)
.addReg(AArch64::XZR)
.addReg(Reg)
.addImm(0),
*STI);
OutStreamer->EmitInstruction(MCInstBuilder(AArch64::MOVZXi)
.addReg(AArch64::X1)
.addImm(AccessInfo)
.addImm(0),
*STI);
OutStreamer->EmitInstruction(
MCInstBuilder(AArch64::B)
.addExpr(MCSymbolRefExpr::create(HwasanTagMismatchSym, OutContext)),
*STI);
}
}

void AArch64AsmPrinter::EmitEndOfAsmFile(Module &M) {
EmitHwasanMemaccessSymbols(M);

const Triple &TT = TM.getTargetTriple();
if (TT.isOSBinFormatMachO()) {
// Funny Darwin hack: This flag tells the linker that no global symbols
Expand Down Expand Up @@ -883,6 +991,10 @@ void AArch64AsmPrinter::EmitInstruction(const MachineInstr *MI) {
LowerPATCHABLE_TAIL_CALL(*MI);
return;

case AArch64::HWASAN_CHECK_MEMACCESS:
LowerHWASAN_CHECK_MEMACCESS(*MI);
return;

case AArch64::SEH_StackAlloc:
TS->EmitARM64WinCFIAllocStack(MI->getOperand(0).getImm());
return;
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.td
Expand Up @@ -763,6 +763,13 @@ def MSRpstateImm4 : MSRpstateImm0_15;
def MOVbaseTLS : Pseudo<(outs GPR64:$dst), (ins),
[(set GPR64:$dst, AArch64threadpointer)]>, Sched<[WriteSys]>;

let Uses = [ X9 ], Defs = [ X16, X17, LR, NZCV ] in {
def HWASAN_CHECK_MEMACCESS : Pseudo<
(outs), (ins GPR64noip:$ptr, i32imm:$accessinfo),
[(int_hwasan_check_memaccess X9, GPR64noip:$ptr, (i32 imm:$accessinfo))]>,
Sched<[]>;
}

// The cycle counter PMC register is PMCCNTR_EL0.
let Predicates = [HasPerfMon] in
def : Pat<(readcyclecounter), (MRS 0xdce8)>;
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp
Expand Up @@ -248,6 +248,9 @@ const RegisterBank &AArch64RegisterBankInfo::getRegBankFromRegClass(
case AArch64::GPR64spRegClassID:
case AArch64::GPR64sponlyRegClassID:
case AArch64::GPR64allRegClassID:
case AArch64::GPR64noipRegClassID:
case AArch64::GPR64common_and_GPR64noipRegClassID:
case AArch64::GPR64noip_and_tcGPR64RegClassID:
case AArch64::tcGPR64RegClassID:
case AArch64::WSeqPairsClassRegClassID:
case AArch64::XSeqPairsClassRegClassID:
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/AArch64/AArch64RegisterInfo.td
Expand Up @@ -205,6 +205,11 @@ def tcGPR64 : RegisterClass<"AArch64", [i64], 64, (sub GPR64common, X19, X20, X2
// BTI-protected function.
def rtcGPR64 : RegisterClass<"AArch64", [i64], 64, (add X16, X17)>;

// Register set that excludes registers that are reserved for procedure calls.
// This is used for pseudo-instructions that are actually implemented using a
// procedure call.
def GPR64noip : RegisterClass<"AArch64", [i64], 64, (sub GPR64, X16, X17, LR)>;

// GPR register classes for post increment amount of vector load/store that
// has alternate printing when Rm=31 and prints a constant immediate value
// equal to the total number of bytes transferred.
Expand Down

0 comments on commit 73078ec

Please sign in to comment.