Skip to content

2.8 Hook: LTO Enabled Hook

DK edited this page Sep 24, 2023 · 3 revisions

LTO.what()

When the target program has interprocedural optimizations enabled for their builds, it often translates to link time optimizations, and for MSVC mainstream, it's /LTCG and /GL, a.k.a. link time code generation and whole program optimizations. This technique feeds the compiler with extra information that allows it make interprocedural optimizations.

How does it affect our hooking methods and common approaches?

For MSVC x64 windows targets, we hook by complying with x64 calling convention. This is under the assumptions that caller handles the shadow stack and volatile register are safe to use with callee.

The x64 MSVC ABI considers rax, rcx, rdx, r8, r9, r10, r11 and xmm0 to xmm5 as volatile.

Example

Consider the following call example:

mov rdx, [rcx] // a2
mov rcx, r8 // use r8 value here as a1
call my_func(rcx, rdx)
mov r8, 0x100 // using r8 as a free register now

After returning from my_func, compiler will consider all volatile registers value changed, thus compiler will not reuse rcx or rdx assuming the value is preserved, e.g. it will not do mov r8, rcx.

However, with LTO enabled targets, it may look like this:

mov rdx, [rcx] // a2
mov rcx, r8 // use r8 value here as a1
call my_func(rcx, rdx)
test r8, r8 // keep using r8

With the profile guided optimization information from linker, compiler knows that my_func did not change r8, or it can be optimized to not change r8, so compiler lets the caller use r8 across the call boundary. This effectively reduces register preserving/stack usage, thus optimizations.

When implementing the hook functions, there's no way of knowing if the hooks will change specific registers, and that information cannot be accounted for dealing with LTO targets.

Auto Patch

Currently, dku::Hook offers write_call variant, write_call_ex. This API is designated for auto preserving regular/sse registers across a hook call boundary and keeps the original LTO code running.

API

Relocate a callsite with target hook function while preserving regular and sse registers across non-volatile call boundaries.

  • src : address of the target callsite
  • dst : hook function
  • regs : regular registers to preserve as non volatile
  • simd : sse registers to preserve as non volatile
inline auto write_call_ex(
    const dku_memory auto a_src,
    F                     a_dst,
    enumeration<Register> a_regs = { Register::NONE },
    enumeration<SIMD>     a_simd = { SIMD::NONE }
) noexcept

Example

using namespace DKUtil::Alias;

// hook function
bool Hook_123456(void* a_gameInstance)
{
    return func(a_gameInstance);
}

// original function
static inline std::add_pointer_t<decltype(Hook_123456)> func;

// callsite
auto addr = 0x7FF712345678;

// preserve rdx, r9
func = dku::Hook::write_call_ex<5>(addr, Hook_123456, { Reg::RDX, Reg::R9 });
// preserve xmm0, xmm2
func = dku::Hook::write_call_ex<5>(addr, Hook_123456, { Reg::NONE }, { Xmm::XMM0, Xmm::XMM2 });
// preserve rdx, r9, and xmm0, xmm2
func = dku::Hook::write_call_ex<5>(addr, Hook_123456, { Reg::RDX, Reg::R9 }, { Xmm::XMM0, Xmm::XMM2 });
// preserve all
func = dku::Hook::write_call_ex<5>(addr, Hook_123456, { Reg::ALL }, { Xmm::ALL });