Skip to content

Commit

Permalink
core: Support heap-based trampolines.
Browse files Browse the repository at this point in the history
1. Generate off-stack nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines is
to be provided in libgcc, and is by-default _not_ compiled in unless
the target specifically requires it, or you manually provide
--enable-off-stack-trampolines when configuring gcc/libgcc.

The gcc flag -foff-stack-trampolines controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-authored-by: Andrew Burgess <andrew.burgess@embecosm.com>

gcc/ChangeLog:

        * builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
        (BUILT_IN_NESTED_PTR_DELETED): Ditto.
        * common.opt (foff-stack-trampolines): Add flag to control
        generation of heap-based trampoline instantiation.
        * tree-nested.c (convert_tramp_reference_op): Don't bother calling
        __builtin_adjust_trampoline for the off-stack case.
        (finalize_nesting_tree_1): Emit calls to
        __builtin_nested_...{created,deleted} if we're generating with
        -foff-stack-trampolines.
        * tree.c (build_common_builtin_nodes): Build
        __builtin_nested_...{created,deleted}.
	* dov/invoke.texi (-foff-stack-trampolines): Document.

libgcc/ChangeLog:

	* configure.ac: Add configure parameter
        --enable-off-stack-trampolines, and do error checking if we've
        trying to enable off-stack trampolines for a platform that doesn't
        provide any such implementation.
	* configure: Regenerate.
	* libgcc-std.ver.in: Ditto.
	* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
        (__builtin_nested_func_ptr_deleted): Ditto.

2. Add x86_64-linux support for off-stack trampolines

Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the x86_64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-off-stack-trampolines) and
gcc (-foff-stack-trampolines) in supporting off-stack trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

Co-authored-by: Andrew Burgess <andrew.burgess@embecosm.com>

libgcc/ChangeLog:

	* config/i386/heap-trampoline.c: New file: Implement off-stack
	trampolines for x86_64.
	* config/i386/t-heap-trampoline: Add rule to build
	config/i386/heap-trampoline.c
	* config.host (x86_64-*-linux*): Handle
	--enable-off-stack-trampolines.
	* configure.ac (--enable-off-stack-trampolines): Permit setting
	for target x86_64-*-linux*.
	* configure: Regenerate.

3. Add aarch64-linux support for off-stack trampolines

Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the aarch64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-off-stack-trampolines) and
gcc (-foff-stack-trampolines) in supporting off-stack trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

Co-authored-by: Andrew Burgess <andrew.burgess@embecosm.com>

libgcc/ChangeLog:

        * config/aarch64/heap-trampoline.c: New file: Implement off-stack
	trampolines for aarch64.
        * config/aarch64/t-heap-trampoline: Add rule to build
        config/aarch64/heap-trampoline.c
        * config.host (aarch64-*-linux*): Handle
        --enable-off-stack-trampolines.
        * configure.ac (--enable-off-stack-trampolines): Permit setting
        for target aarch64-*-linux*.
        * configure: Regenerate.

4. Darwin, aarch64, x86_64: Support heap trampolines.

Implement the __builtin_nested_func_ptr_{created,deleted} functions for
x86_64 and aarch64 Darwin.

For aarch64 --enable-off-stack-trampolines is enabled by default, and
-foff-stack-trampolines is enabled by default if we are on host MacOS
version 11.x or greater.

For x86_64 this is configure-time opt-in (and can be applied from 10.10
onwards)

Co-authored-by: Andrew Burgess <andrew.burgess@embecosm.com>
Co-authored-by: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

        * config/aarch64/heap-trampoline.c (allocate_trampoline_page):
	Request for MAP_JIT in the case of __APPLE__.
	Provide __APPLE__ variant of aarch64_trampoline_insns that uses
	x16 as the chain pointer.
	(__builtin_nested_func_ptr_created): Call
	pthread_jit_write_protect_np() to toggle read/write permission on
	page.
        * config.host (aarch64*-*darwin* | arm64*-*darwin*): Handle
        --enable-off-stack-trampolines.
        * configure.ac (--enable-off-stack-trampolines): Permit setting
	for target aarch64*-*darwin* | arm64*-*darwin*, and set default to
	enabled.
        * configure: Regenerate.
  • Loading branch information
mablinov authored and iains committed Jun 29, 2023
1 parent 7fd7466 commit 9779fb7
Show file tree
Hide file tree
Showing 18 changed files with 650 additions and 23 deletions.
2 changes: 2 additions & 0 deletions gcc/builtins.def
Original file line number Diff line number Diff line change
Expand Up @@ -1073,6 +1073,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__builtin_nested_func_ptr_created")
DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__builtin_nested_func_ptr_deleted")

/* Implementing __builtin_setjmp. */
DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
Expand Down
6 changes: 5 additions & 1 deletion gcc/common.opt
Original file line number Diff line number Diff line change
Expand Up @@ -2234,6 +2234,10 @@ foffload-abi=
Common Joined RejectNegative Enum(offload_abi)
-foffload-abi=[lp64|ilp32] Set the ABI to use in an offload compiler.

foff-stack-trampolines
Common RejectNegative Var(flag_off_stack_trampolines) Init(OFF_STACK_TRAMPOLINES_INIT)
Generate trampolines in executable memory rather than executable stack.

Enum
Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs)

Expand Down Expand Up @@ -2884,7 +2888,7 @@ Common Var(flag_tracer) Optimization
Perform superblock formation via tail duplication.

ftrampolines
Common Var(flag_trampolines) Init(0)
Common Var(flag_trampolines) Init(OFF_STACK_TRAMPOLINES_INIT)
For targets that normally need trampolines for nested functions, always
generate them instead of using descriptors.

Expand Down
11 changes: 11 additions & 0 deletions gcc/config.gcc
Original file line number Diff line number Diff line change
Expand Up @@ -1125,6 +1125,17 @@ case ${target} in
;;
esac

# Figure out if we need to enable -foff-stack-trampolines by default
case ${target} in
*-*-darwin2*)
# Currently, we do this for macOS 11 and above.
tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=1"
;;
*)
tm_defines="$tm_defines OFF_STACK_TRAMPOLINES_INIT=0"
;;
esac

case ${target} in
aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
tm_file="${tm_file} elfos.h newlib-stdint.h"
Expand Down
7 changes: 7 additions & 0 deletions gcc/config/i386/darwin.h
Original file line number Diff line number Diff line change
Expand Up @@ -308,3 +308,10 @@ along with GCC; see the file COPYING3. If not see
#define CLEAR_INSN_CACHE(beg, end) \
extern void sys_icache_invalidate(void *start, size_t len); \
sys_icache_invalidate ((beg), (size_t)((end)-(beg)))

/* Disable custom function descriptors for Darwin when we have off-stack
trampolines. */
#undef X86_CUSTOM_FUNCTION_TEST
#define X86_CUSTOM_FUNCTION_TEST \
(!flag_off_stack_trampolines && !flag_trampolines) ? 1 : 0

2 changes: 1 addition & 1 deletion gcc/config/i386/i386.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25500,7 +25500,7 @@ ix86_libgcc_floating_mode_supported_p
#define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok

#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS X86_CUSTOM_FUNCTION_TEST

#undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
#define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
Expand Down
6 changes: 6 additions & 0 deletions gcc/config/i386/i386.h
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,12 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
/* Minimum allocation boundary for the code of a function. */
#define FUNCTION_BOUNDARY 8

/* We will and with this value to test if a custom function descriptor needs
a static chain. The function boundary must the adjusted so that the bit
this represents is no longer part of the address. 0 Disables the custom
function descriptors. */
#define X86_CUSTOM_FUNCTION_TEST 1

/* C++ stores the virtual bit in the lowest bit of function pointers. */
#define TARGET_PTRMEMFUNC_VBIT_LOCATION ptrmemfunc_vbit_in_pfn

Expand Down
15 changes: 14 additions & 1 deletion gcc/doc/invoke.texi
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,7 @@ Objective-C and Objective-C++ Dialects}.
-fverbose-asm -fpack-struct[=@var{n}]
-fleading-underscore -ftls-model=@var{model}
-fstack-reuse=@var{reuse_level}
-ftrampolines -ftrapv -fwrapv
-ftrampolines -foff-stack-trampolines -ftrapv -fwrapv
-fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
-fstrict-volatile-bitfields -fsync-libcalls}

Expand Down Expand Up @@ -18370,6 +18370,19 @@ instructions. It does not allow exceptions to be thrown from
arbitrary signal handlers such as @code{SIGALRM}. This enables
@option{-fexceptions}.

@opindex foff-stack-trampolines
@item -foff-stack-trampolines
Certain platforms (such as the Apple M1) do not permit an executable
stack. Generate calls to @code{__builtin_nested_func_ptr_created} and
@code{__builtin_nested_func_ptr_deleted} in order to allocate and
deallocate trampoline space on the executable heap. Please note that
these functions are implemented in libgcc, and will not be compiled in
unless you provide @option{--enable-off-stack-trampolines} when
building gcc. @emph{PLEASE NOTE}: The trampolines are @emph{not}
guaranteed to be correctly deallocated if you @code{setjmp},
instantiate nested functions, and then @code{longjmp} back to a state
prior to having allocated those nested functions.

@opindex fdelete-dead-exceptions
@item -fdelete-dead-exceptions
Consider that instructions that may throw exceptions but don't otherwise
Expand Down
121 changes: 104 additions & 17 deletions gcc/tree-nested.cc
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,14 @@ get_trampoline_type (struct nesting_info *info)
if (trampoline_type)
return trampoline_type;

/* When trampolines are created off-stack then the only thing we need in the
local frame is a single pointer. */
if (flag_off_stack_trampolines)
{
trampoline_type = build_pointer_type (void_type_node);
return trampoline_type;
}

align = TRAMPOLINE_ALIGNMENT;
size = TRAMPOLINE_SIZE;

Expand Down Expand Up @@ -2788,17 +2796,27 @@ convert_tramp_reference_op (tree *tp, int *walk_subtrees, void *data)

/* Compute the address of the field holding the trampoline. */
x = get_frame_field (info, target_context, x, &wi->gsi);
x = build_addr (x);
x = gsi_gimplify_val (info, x, &wi->gsi);

/* Do machine-specific ugliness. Normally this will involve
computing extra alignment, but it can really be anything. */
if (descr)
builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
/* APB: We don't need to do the adjustment calls when using off-stack
trampolines, any such adjustment will be done when the off-stack
trampoline is created. */
if (!descr && flag_off_stack_trampolines)
x = gsi_gimplify_val (info, x, &wi->gsi);
else
builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
call = gimple_build_call (builtin, 1, x);
x = init_tmp_var_with_call (info, &wi->gsi, call);
{
x = build_addr (x);

x = gsi_gimplify_val (info, x, &wi->gsi);

/* Do machine-specific ugliness. Normally this will involve
computing extra alignment, but it can really be anything. */
if (descr)
builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
else
builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
call = gimple_build_call (builtin, 1, x);
x = init_tmp_var_with_call (info, &wi->gsi, call);
}

/* Cast back to the proper function type. */
x = build1 (NOP_EXPR, TREE_TYPE (t), x);
Expand Down Expand Up @@ -3377,6 +3395,7 @@ build_init_call_stmt (struct nesting_info *info, tree decl, tree field,
static void
finalize_nesting_tree_1 (struct nesting_info *root)
{
gimple_seq cleanup_list = NULL;
gimple_seq stmt_list = NULL;
gimple *stmt;
tree context = root->context;
Expand Down Expand Up @@ -3508,9 +3527,48 @@ finalize_nesting_tree_1 (struct nesting_info *root)
if (!field)
continue;

x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
stmt = build_init_call_stmt (root, i->context, field, x);
gimple_seq_add_stmt (&stmt_list, stmt);
if (flag_off_stack_trampolines)
{
/* We pass a whole bunch of arguments to the builtin function that
creates the off-stack trampoline, these are
1. The nested function chain value (that must be passed to the
nested function so it can find the function arguments).
2. A pointer to the nested function implementation,
3. The address in the local stack frame where we should write
the address of the trampoline.
When this code was originally written I just kind of threw
everything at the builtin, figuring I'd work out what was
actually needed later, I think, the stack pointer could
certainly be dropped, arguments #2 and #4 are based off the
stack pointer anyway, so #1 doesn't seem to add much value. */
tree arg1, arg2, arg3;

gcc_assert (DECL_STATIC_CHAIN (i->context));
arg1 = build_addr (root->frame_decl);
arg2 = build_addr (i->context);

x = build3 (COMPONENT_REF, TREE_TYPE (field),
root->frame_decl, field, NULL_TREE);
arg3 = build_addr (x);

x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_CREATED);
stmt = gimple_build_call (x, 3, arg1, arg2, arg3);
gimple_seq_add_stmt (&stmt_list, stmt);

/* This call to delete the nested function trampoline is added to
the cleanup list, and called when we exit the current scope. */
x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_DELETED);
stmt = gimple_build_call (x, 0);
gimple_seq_add_stmt (&cleanup_list, stmt);
}
else
{
/* Original code to initialise the on stack trampoline. */
x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
stmt = build_init_call_stmt (root, i->context, field, x);
gimple_seq_add_stmt (&stmt_list, stmt);
}
}
}

Expand All @@ -3535,11 +3593,40 @@ finalize_nesting_tree_1 (struct nesting_info *root)
/* If we created initialization statements, insert them. */
if (stmt_list)
{
gbind *bind;
annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
gimple_bind_set_body (bind, stmt_list);
if (flag_off_stack_trampolines)
{
/* Handle the new, off stack trampolines. */
gbind *bind;
annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
annotate_all_with_location (cleanup_list, DECL_SOURCE_LOCATION (context));
bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));

gimple_seq xxx_list = NULL;

if (cleanup_list != NULL)
{
/* We Maybe shouldn't be creating this try/finally if -fno-exceptions is
in use. If this is the case, then maybe we should, instead, be
inserting the cleanup code onto every path out of this function? Not
yet figured out how we would do this. */
gtry *t = gimple_build_try (stmt_list, cleanup_list, GIMPLE_TRY_FINALLY);
gimple_seq_add_stmt (&xxx_list, t);
}
else
xxx_list = stmt_list;

gimple_bind_set_body (bind, xxx_list);
}
else
{
/* The traditional, on stack trampolines. */
gbind *bind;
annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
gimple_bind_set_body (bind, stmt_list);
}
}

/* If a chain_decl was created, then it needs to be registered with
Expand Down
17 changes: 17 additions & 0 deletions gcc/tree.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9870,6 +9870,23 @@ build_common_builtin_nodes (void)
"__builtin_nonlocal_goto",
ECF_NORETURN | ECF_NOTHROW);

tree ptr_ptr_type_node = build_pointer_type (ptr_type_node);

ftype = build_function_type_list (void_type_node,
ptr_type_node, // void *chain
ptr_type_node, // void *func
ptr_ptr_type_node, // void **dst
NULL_TREE);
local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
BUILT_IN_NESTED_PTR_CREATED,
"__builtin_nested_func_ptr_created", ECF_NOTHROW);

ftype = build_function_type_list (void_type_node,
NULL_TREE);
local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
BUILT_IN_NESTED_PTR_DELETED,
"__builtin_nested_func_ptr_deleted", ECF_NOTHROW);

ftype = build_function_type_list (void_type_node,
ptr_type_node, ptr_type_node, NULL_TREE);
local_define_builtin ("__builtin_setjmp_setup", ftype,
Expand Down
9 changes: 9 additions & 0 deletions libgcc/config.host
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,9 @@ aarch64*-*-linux*)
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
if test x$off_stack_trampolines = xyes; then
tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
fi
;;
aarch64*-*-vxworks7*)
extra_parts="$extra_parts crtfastmath.o"
Expand Down Expand Up @@ -718,6 +721,9 @@ x86_64-*-darwin*)
tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
tm_file="$tm_file i386/darwin-lib.h"
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
if test x$off_stack_trampolines = xyes; then
tmake_file="${tmake_file} i386/t-heap-trampoline"
fi
;;
i[34567]86-*-elfiamcu)
tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdftf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
Expand Down Expand Up @@ -784,6 +790,9 @@ x86_64-*-linux*)
tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules"
tm_file="${tm_file} i386/elf-lib.h"
md_unwind_header=i386/linux-unwind.h
if test x$off_stack_trampolines = xyes; then
tmake_file="${tmake_file} i386/t-heap-trampoline"
fi
;;
x86_64-*-kfreebsd*-gnu)
extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
Expand Down
Loading

0 comments on commit 9779fb7

Please sign in to comment.