Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shutdown sequence more robust under cooperative scheduling #8

Merged
merged 6 commits into from
Aug 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions arch/lkl/kernel/setup.c
Original file line number Diff line number Diff line change
Expand Up @@ -163,14 +163,22 @@ long lkl_sys_halt(void)

is_running = false;

lkl_cpu_wait_shutdown();
/**
* TODO: We may have cloned host threads that did not terminate,
* and prevent us from shutting down the LKL CPU.
*
* As a workaround, we do not shutdown all threads.
prp marked this conversation as resolved.
Show resolved Hide resolved
*/

syscalls_cleanup();
threads_cleanup();
//lkl_cpu_wait_shutdown();

//syscalls_cleanup();

// threads_cleanup();
/* Shutdown the clockevents source. */
tick_suspend_local();
free_mem();
lkl_ops->thread_join(current_thread_info()->tid);
//tick_suspend_local();
//free_mem();
//lkl_ops->thread_join(current_thread_info()->tid);

return 0;
}
Expand Down
70 changes: 53 additions & 17 deletions arch/lkl/kernel/syscalls.c
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,18 @@ static void del_host_task(void *arg)

struct lkl_tls_key *task_key;

/* Record the exit status after shutdown */
_Atomic(int) lkl_exit_status;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit status of what? Presumably the first process that runs? We should probably be tracking that outside of LKL by intercepting the exit thread group syscall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but let's make this a separate PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, please can you then take ownership of factoring out our LKL history into things that are suitable for going upstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but this can only be done once we have the final design for the LKL shutdown sequence. I think that this PR improves stability quite substantially, but it is not yet the final design that would be ready for upstream. There may also be a dependency with the signal handling changes.


/* Record received signal after shutdown */
_Atomic(int) lkl_received_signal;

/* Use this to record an ongoing LKL shutdown */
_Atomic(bool) lkl_shutdown = false;

/* Returns the task_struct associated with the current lthread */
static struct task_struct *idle_host_task;

/* Returns the task_struct associated with the current lthread */
struct task_struct* lkl_get_current_task_struct(void)
{
return lkl_ops->tls_get(task_key);
Expand All @@ -139,6 +146,35 @@ long lkl_syscall(long no, long *params)
no, current->comm, test_tsk_thread_flag(task, TIF_HOST_THREAD),
test_tsk_thread_flag(task, TIF_SIGPENDING));

/**
* If there is an ongoing shutdown, we assume that the caller (the idle
* host task) already holds the CPU lock.
*/
if (lkl_shutdown) {

/**
* Only permit the idle host task to issue system calls now, because
* it is invoking the shutdown sequence.
*/
if (current != idle_host_task) {
LKL_TRACE(
"Ignoring userspace syscall (current->comm=%s allowed=%s)\n",
current->comm, idle_host_task->comm);
return -ENOTSUPP;
}

ret = run_syscall(no, params);
task_work_run();
do_signal(NULL);

if (no == __NR_reboot) {
thread_sched_jb();
}

LKL_TRACE("Shutdown syscall=%li done\n", no);
return ret;
}

ret = lkl_cpu_get();
if (ret < 0) {

Expand Down Expand Up @@ -251,13 +287,7 @@ long lkl_syscall(long no, long *params)
switch_to_host_task(task);
}

/*
* Stop signal handling when LKL is shutting down. We cannot deliver
* signals because we are shutting down the kernel.
*/
if (!lkl_shutdown) {
do_signal(NULL);
}
do_signal(NULL);

if (no == __NR_reboot) {
thread_sched_jb();
Expand All @@ -273,7 +303,9 @@ long lkl_syscall(long no, long *params)
return ret;
}

static struct task_struct *idle_host_task;
extern _Atomic(int) lkl_exit_status;
extern _Atomic(int) lkl_received_signal;
extern _Atomic(bool) lkl_shutdown;

/* called from idle, don't failed, don't block */
void wakeup_idle_host_task(void)
Expand All @@ -295,10 +327,21 @@ static int idle_host_task_loop(void *unused)
for (;;) {
lkl_cpu_put();
lkl_ops->sem_down(ti->sched_sem);
if (idle_host_task == NULL) {

if (lkl_shutdown) {

/**
* Notify the host of the shutdown.
*
* Note that we are not releasing the CPU lock here, which allows
* the termination code to do syscalls directly.
prp marked this conversation as resolved.
Show resolved Hide resolved
*/
lkl_ops->terminate(lkl_exit_status, lkl_received_signal);

lkl_ops->thread_exit();
return 0;
}

schedule_tail(ti->prev_sched);
}
}
Expand Down Expand Up @@ -368,13 +411,6 @@ int host0_init(void)
void syscalls_cleanup(void)
{
LKL_TRACE("enter\n");
if (idle_host_task) {
struct thread_info *ti = task_thread_info(idle_host_task);

idle_host_task = NULL;
lkl_ops->sem_up(ti->sched_sem);
lkl_ops->thread_join(ti->tid);
}

if (lkl_ops->tls_free)
lkl_ops->tls_free(task_key);
Expand Down
18 changes: 12 additions & 6 deletions arch/lkl/kernel/threads.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#include <asm/cpu.h>
#include <asm/sched.h>

extern _Atomic(int) lkl_exit_status;
extern _Atomic(int) lkl_received_signal;
extern _Atomic(bool) lkl_shutdown;

static int init_ti(struct thread_info *ti)
Expand Down Expand Up @@ -89,27 +91,31 @@ static void kill_thread(struct thread_info *ti)
* case, we need to notify the host to initiate an LKL shutdown.
*/
int exit_code = task->exit_code;
int exit_status = exit_code >> 8;
int received_signal = exit_code & 255;
lkl_exit_status = exit_code >> 8;
lkl_received_signal = exit_code & 255;
int exit_signal = task->exit_signal;

LKL_TRACE(
"terminating LKL (exit_state=%i exit_code=%i exit_signal=%i exit_status=%i "
"received_signal=%i ti->dead=%i task->pid=%i "
"task->tgid=%i ti->TIF_SCHED_JB=%i ti->TIF_SIGPENDING=%i)\n",
task->exit_state, exit_code, exit_signal,
exit_status, received_signal, ti->dead,
lkl_exit_status, lkl_received_signal, ti->dead,
task->pid, task->tgid,
test_ti_thread_flag(ti, TIF_SCHED_JB),
test_ti_thread_flag(ti, TIF_SIGPENDING));

/**
* Notify the idle host task to initiate the LKL shutdown
*/
lkl_shutdown = true;

/* Notify the LKL host to shut down */
lkl_ops->terminate(exit_status, received_signal);

ti->dead = true;
lkl_ops->sem_up(ti->sched_sem);
Copy link

@vtikoo vtikoo Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic can now be moved out, as both the conditional paths are doing this? Looks like the only case the conditionals miss are host threads with TIF_NO_TERMINATION set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have this case when host threads are directly deallocated. (I didn't want this to look very different from what it was before.)

lkl_ops->thread_join(ti->tid);
ti->tid = NULL;
}

lkl_ops->sem_free(ti->sched_sem);
ti->sched_sem = NULL;
}
Expand Down