-
Notifications
You must be signed in to change notification settings - Fork 264
RFC: Signal delivery in user program via signal stack frame #347
RFC: Signal delivery in user program via signal stack frame #347
Conversation
Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
DkCpuIdRetrieve is needed for shim to get size of xsave area. Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
add atomic operations for later use. set_bit/clear_bit/test_and_set_bit/test_and_clear_bit Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
this patch cleans up shim_signal.c without logic change. - remove unnecessary goto In this file, goto is abused unnecessarily. - remove unnecessary macro Two macros, is_internal, internal_fault can be inline function. Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
This patch is still Work-In-Progress. The purpose of this patch is to get feedback before going further and trigger discussion. I believe the patch is enough to show the idea and how to implement it. Deliver signal when executing in user program via signal stack frame, not direct calling handler within LibOS. With this patches, (emulated) signal is delivered only when user program is running by setting up signal stack frame. If LibOS or PAL is running with asynchronously signal, the (async) signal is queued and the signal stack frame is created right before system call is return. This also opens up the capability to support sigaltstack. The current implementation of invoking registered signal handler directly within LibOS is problematic. It causes several issues below and hard to solve them. - race condition in LibOS because invoking registered signal handler directly in host signal handler even when cpu is executing within LibOS. If the blocking emulated system call is executing, it's very likely. It incurs much complexity. - nested signal can't be handled well. If async signal(e.g. SIGALARM) is delivered and then synchronous signal(e.g. SIGSEGV) within signal handler is delieved, The current implementation tries to queue SIGSEGV, and loop in SIGSEGV. - sigaltstack can't be implemented well Typically language runtime uses sigaltstack. It's desirable to support those features well. Signed-off-by: Isaku Yamahata <isaku.yamahata@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes below seem sensible, but largely unrelated to signal handling.
I think this proposal addresses some real problems, but I am not completely clear on the design. Perhaps this would be something to discuss at a contributor meeting?
Reviewed 5 of 5 files at r1.
Reviewable status: 2 of 18 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @donporter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, had my review settings off. A few minor comments inline. I get the high-level idea now, but I have a few things you could clarify about the design that would help:
- I believe the model is that the shim defaults to allocating a separate stack for application-visible signal handling, correct? This can be replaced by sigaltstack.
- Upon signal delivery to the application, you jump on/off this stack.
- Incoming signals still go via the PAL, and are queued in the shim structures? Does the PAL register an alternate stack with the host as well?
Is this right?
Reviewed 1 of 5 files at r1, 4 of 4 files at r2, 1 of 1 files at r3, 11 of 11 files at r5.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @yamahata)
LibOS/shim/include/shim_internal.h, line 174 at r5 (raw file):
/* definition for syscall table */ extern unsigned long fpu_xstate_size; void handle_sysret_signal(void);
Can you add some comments explaining what this function is and its expected behavior, as well as explaining handle_signal(), __sigreturn, and their relationship?
LibOS/shim/include/shim_internal.h, line 473 at r5 (raw file):
PAL_CONTEXT context; ucontext_t * uc; } PAL_EVENT;
Can we import this definition from a PAL header (or move it to a public PAL header if needed)?
LibOS/shim/include/shim_signal.h, line 142 at r5 (raw file):
struct shim_signal * signal); void append_signal (struct shim_thread * thread, int sig, siginfo_t * info,
Please add comments explaining the behavior and proper use of append_signal and deliver_signal, while you are working on this.
LibOS/shim/src/shim_init.c, line 198 at r5 (raw file):
memcpy(&new_tcb->context, &old_tcb->context, sizeof(struct shim_context)); new_tcb->tid = old_tcb->tid; new_tcb->flags = new_tcb->flags;
huh? Should this be old_tcb? This seems like a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @yamahata)
LibOS/shim/include/shim_internal.h, line 744 at r5 (raw file):
extern void * __syscallas_signal_allowed_1_end; extern void * __syscallas_signal_allowed_2_begin; extern void * __syscallas_signal_allowed_2_end;
Why do we even need these three regions (in syscalldb()
and __sigreturn()
) where signal is allowed to be delivered? These regions are tiny, with each having only a couple of instructions. Is there some safety/correctness reason you introduced them? Or are they solely for performance? In the latter case, can we remove them to simplify code?
LibOS/shim/include/shim_tls.h, line 71 at r5 (raw file):
void * debug_buf; #define SHIM_FLAG_SIGPENDING 0 unsigned long flags;
Why do we need both thread.has_signal
and flags & SHIM_FLAG_SIGPENDING
? Cannot we use just one of them?
LibOS/shim/src/syscallas.S, line 185 at r5 (raw file):
cmp $0, %rax je .Lsignal_not_delivered jmp .Lret_signal
Bug 1? We don't update RAX to the return value of syscall. RAX = 1 because deliver_signal_on_sysret()
returns only false (0) or true (1). But syscalldb()
must actually return syscall value. I think before this jmp .Lret_signal
, we also need movq %fs:(SHIM_TCB_OFFSET + TCB_SYSCALL_NR), %rax
.
Bug 2? Why do we only deliver one signal? There may be several signals queued, why not check them in a loop similarly to signal_not_delivered case?
LibOS/shim/src/syscallas.S, line 195 at r5 (raw file):
.size syscalldb, .-syscalldb // void __sigreturn(mcontext_t * uc_mcontext)
TODO for myself: understand __sigreturn()
and deliver_signal_on_sysret()
and __setup_sig_frame()
LibOS/shim/src/bookkeep/shim_signal.c, line 185 at r5 (raw file):
event == NULL /* send to self */) { struct shim_signal ** signal_log = NULL; if ((signal = malloc_copy(signal,sizeof(struct shim_signal))) &&
This malloc_copy()
should be replaced with statically allocated object copy, right?
LibOS/shim/src/bookkeep/shim_signal.c, line 734 at r5 (raw file):
__handle_one_signal(tcb, sig, signal, event); free(signal); DkThreadYieldExecution();
Why would we need yielding execution? Shouldn't we stack all pending signals to execute after we return to user code?
LibOS/shim/src/bookkeep/shim_signal.c, line 979 at r5 (raw file):
regs->rdx = (unsigned long)&user_sigframe->uc; return true;
Bug? Before returning, we must add tcb->context.syscall_nr = syscall_ret
so that when we return from syscalldb()
, RAX reflects syscall return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @yamahata)
LibOS/shim/src/bookkeep/shim_signal.c, line 888 at r5 (raw file):
void (*handler) (int, siginfo_t *, void *) = deliver.handler; void (*restorer) (void) = deliver.restorer;
NOTE: The following code constructs the signal stack frame as follows:
+--------------------+
| |
| user_fpstate |
+--------------------+
| |
| user_sigframe |
| .restorer |
| .uc (ucontext) |
| .info (siginfo) |
| |
+--------------------+
| rip |
+--------------------+
| |
| shim_regs regs |
| |
+--------------------+
LibOS/shim/src/bookkeep/shim_signal.c, line 915 at r5 (raw file):
user_uc->uc_stack.ss_size = 0; user_uc->uc_stack.ss_flags = 0;
NOTE: The following code requires understanding of ucontext_t
in glibc. Files: glibc-2.19/sysdeps/unix/sysv/linux/x86/bits/sigcontext.h
and glibc-2.19/sysdeps/unix/sysv/linux/x86/sys/ucontext.h
.
LibOS/shim/src/bookkeep/shim_signal.c, line 976 at r5 (raw file):
*rip = (long)handler; regs->rdi = (long)sig; regs->rsi = (unsigned long)&user_sigframe->info;
Bug? user_sigframe->info
is not initialized. Should be something like:
memcpy(&user_sigframe->info, &signal->info, sizeof(signal->info));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 17 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @donporter and @yamahata)
LibOS/shim/include/shim_internal.h, line 473 at r5 (raw file):
Previously, donporter (Don Porter) wrote…
Can we import this definition from a PAL header (or move it to a public PAL header if needed)?
Why do we even need PAL_EVENT in LibOS? In this PR, PAL_EVENT is used to peek into the saved PAL_CONTEXT and ucontext_t (user context). Cannot we achieve the same by examining on of the already used contexts?
It seems there is a mess with many different contexts. There are two contexts in PAL_EVENT, SHIM context stored in tcb.context
, and two contexts stored inshim_signal.context
and shim_signal.pal_context
. I got lost which is which.
LibOS/shim/src/bookkeep/shim_signal.c, line 556 at r5 (raw file):
struct _libc_fpstate * fpstate = uc->uc_mcontext.fpregs; unsigned int fpstate_size = fpstate_size_get(fpstate);
NOTE: The following code constructs the signal stack frame as follows:
... 128B redzone ...
+--------------------+
| |
| user_fp |
+--------------------+
| |
| user_sigframe |
| .restorer |
| .uc (ucontext) |
| .info (siginfo) |
| |
+--------------------+
Why is this frame different from the one constructed during deliver_signal_on_sysret()
?
LibOS/shim/src/bookkeep/shim_signal.c, line 577 at r5 (raw file):
memcpy(&user_sigframe->uc.uc_sigmask, &uc->uc_sigmask, sizeof(user_sigframe->uc.uc_sigmask));
The following code modifies three contexts: signal->pal_context
, event->uc->uc_mcontext
, and event->context
with the same reg values. This seems redundant -- do we really need to update all three of them? (We definitely need to update event->context
as explained below, but others?)
LibOS/shim/src/bookkeep/shim_signal.c, line 598 at r5 (raw file):
// _DkExceptionReturn overwrite uc.uc_mcontext.gregs // PAL_CONTEXT == greg_t memcpy(&event->context, gregs, sizeof(PAL_CONTEXT));
NOTE: Right before host's sigreturn (actual sigreturn), _DkExceptionReturn()
overwrites event->uc->uc_mcontext.gregs
(which is real reg values upon resuming normal user context) with event->context
(which is PAL_CONTEXT). Note that event->uc
is a pointer to host-provided ucontext
, so by modifying event->uc->...
we modify regs of resumed normal user context.
Thus, since we want to resume not from normal user context, but from our fresh signal stack frame, we need to replace contents of event->context
with our fresh frame contents. Later, _DkExceptionReturn()
will instruct the kernel to restore mcontext to this fresh frame.
LibOS/shim/src/bookkeep/shim_signal.c, line 743 at r5 (raw file):
} void handle_sysret_signal(void)
If we will not use SHIM_FLAG_SIGPENDING
but only thread.has_signal
, there is no need for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished my first review of the code. The proposal is great and allows for nested signals and easy sigaltstack support (which will only require the change to altstack when the signal frame is allocated in syscalldb
and __setup_sig_frame
). There are a couple bugs/redundancies which I marked inline.
I have several major remarks:
- Saving/restoring the FP context looks messy (talking about
fpstate
). I would prefer to move it to a separate file. - There are way too many contexts stored in different objects (PAL_CONTEXT, SHIM_CONTEXT, ucontext). I got lost in them, and I have a feeling that at least some of them are unnecessary.
- I don't understand why we need a separate
SHIM_FLAG_SIGPENDING
flag in addition tothread.has_signal
. Cannot we re-usehas_signal
to check pending signals? syscallas_signal_allowed_XX_begin/end
seems over-complicated. Do we really need tiny bits ofsyscalldb/sigreturn
assembly marked with these regions?- I would prefer to remove
PAL_EVENT
everywhere. First, it's weird that LibOS peeks into PAL objects. Second, we basically need it to check the interrupted PAL context -- cannot we use other, already available options liketcb.context
?
Reviewable status: all files reviewed, 17 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @donporter and @yamahata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 18 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @donporter and @yamahata)
LibOS/shim/src/sys/shim_sigaction.c, line 92 at r5 (raw file):
} int shim_do_sigreturn (int __unused)
Is this even necessary? If there are any pending signals, they are supposed to be delivered upon syscall completion. Since sigreturn()
is a syscall, LibOS will anyway check for pending signals immediately after shim_do_sigreturn()
. So, looks like we don't need this additional point of checking for pending syscalls.
#1218 supersedes this. |
This change is