From 96ad8a056fa051cb2a2d5b7e5ab11672f8148d4d Mon Sep 17 00:00:00 2001 From: Dmitrii Kuvaiskii Date: Tue, 11 Apr 2023 04:28:56 -0700 Subject: [PATCH] [LibOS] Keep main thread alive if it's not the last one This change is to cover a corner case of a non-main thread performing `execve()`. Linux does the following: the main (leader) thread is terminated, and the non-main thread assumes its identity: https://elixir.bootlin.com/linux/v6.0/source/fs/exec.c#L1078 Before this commit, Gramine simply terminated the main thread. This led to the process being marked as a zombie (even though the non-main thread runs correctly and responds to signals). This in turn confused tools like `docker kill`, which couldn't find the Gramine process anymore. This commit fixes this corner case. Gramine can't do as Linux because there is no way to ask the host OS to "rewire" the identity of one thread to another thread. Thus we introduce a workaround of keeping alive (but in a "parked" state) the main thread -- it sleeps infinitely. In this case, the main thread and its associated resources (like the thread's LibOS stack) are not freed -- this leaks memory, but only once per process, as there is only one main thread per process, even after several execve invocations. Signed-off-by: Dmitrii Kuvaiskii --- libos/src/sys/libos_exit.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/libos/src/sys/libos_exit.c b/libos/src/sys/libos_exit.c index 039724f46a..5c3bb6cc5b 100644 --- a/libos/src/sys/libos_exit.c +++ b/libos/src/sys/libos_exit.c @@ -71,6 +71,39 @@ noreturn void thread_exit(int error_code, int term_signal) { /* Remove current thread from the threads list. */ if (!check_last_thread(/*mark_self_dead=*/true)) { + if (cur_thread->pal_handle == g_pal_public_state->first_thread) { + /* + * Do not exit the main thread (and do not free its resources) if the main thread is not + * the last one in the process. This is added to correctly handle the case of a non-main + * thread performing `execve()`, even after the main thread is considered terminated. + * + * The main thread waits forever so that the host OS doesn't "lose track" of this + * Gramine process. E.g. on Linux, if the main thread (aka leader thread) terminates, + * then the process becomes a zombie, which may confuse some tools like `docker kill`. + * This "waiting forever" leaks memory, but only once per process (as there is only one + * main thread per process, even after several execve invocations). + * + * Linux solves this corner case differently: the leader thread is terminated, and the + * non-main thread assumes its identity (in particular, its PID): + * + * https://elixir.bootlin.com/linux/v6.0/source/fs/exec.c#L1078 + * + * Gramine can't do the same because there is no way to ask the host OS to "rewire" the + * identity of one thread to another thread. Thus this workaround of infinite wait. + * Note that because the main thread was removed from the list of threads (thanks to + * `mark_self_dead=true` above), the still-alive main thread will not prevent the + * Gramine process from terminating later on. Also note that because this thread never + * leaves LibOS/PAL context, it will not receive signals. + * + * TODO: "Rewire" the identity of the non-main thread inside Gramine, similarly to how + * Linux does it. + */ + thread_prepare_wait(); + while (true) + thread_wait(/*timeout_us=*/NULL, /*ignore_pending_signals=*/true); + __builtin_unreachable(); + } + /* ask async worker thread to cleanup this thread */ cur_thread->clear_child_tid_pal = 1; /* any non-zero value suffices */ /* We pass this ownership to `cleanup_thread`. */