Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #46

Open
nagisa opened this Issue Mar 29, 2018 · 13 comments

Comments

Projects
None yet
4 participants
@nagisa
Copy link

nagisa commented Mar 29, 2018

So during the all-hands we identified a use-case that would be good to be documented in the unsafe code guidelines.

There are at least three sources of "cancellation" which might end up "removing" (in Taylor’s words) a value without "dropping" it, which in turn results in unsoundness for e.g. rayon, crossbeam, &pin stuff. These sources are:

  1. longjmp/setjmp (used in practice for error handling by rust lua and perhaps many other embedded languages/interpreters);
  2. pthread_cancel: which can run either an asynchronous unwinding exception (might occur at any program point) or raise an unwindingexception at well specified points such as sleep; exact behaviour is specified by pthread_setcancelstate.
  3. pthread_exit, pthread_kill: which will "stop" the thread, potentially executing some arbitrary code and cleaning the thread up (freeing thread’s stack).

There’s no question that these functions may be useful in one scenario or the other, so it would be good if we figured out scenarios in which these functions are sound to use (e.g. if the thread stack contains only Copy types) and encoded this information into our unsafe code guidelines book.

cc @nikomatsakis @cramertj

@nagisa nagisa changed the title Thread cancellation, unwinding and their interaction with drops Thread cancellation, asynchronous unwinding exceptions and their interaction with drops Mar 29, 2018

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Mar 29, 2018

To be clear, pthread_kill sends a signal to a specific thread. If that signal is not handled by a signal handler then the whole process is killed, not just that thread. I think this is outside the scope of this discussion.

Windows has a TerminateThread function which kills a thread without performing any cleanup. Use of this function is strongly discouraged because it can lead to memory leaks and/or deadlocks.

Forking a multi-threaded process also results in something which strongly resembles killing a thread: the child process after forking will only have a single thread, effectively making it look like all other threads have been killed. This has the same downsides as TerminateThread: thread stacks and thread-local storage are leaked, and any held locks are not unlocked.

Since both forking and TerminateThread do not run any cleanup code, there is no unsafety: they are equivalent to simply suspending the threads indefinitely. Memory leaks and deadlocks are not considered to be memory-unsafe.

@nagisa

This comment has been minimized.

Copy link
Author

nagisa commented Mar 29, 2018

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Mar 29, 2018

pthread_exit is more complicated since it frees the thread stack but does not unwind the stack. This could cause issues if, for example, you were to pass a reference to stack-local data to another thread, and then kill the current thread:

let mut var = 0;
crossbeam::scope(|scope| {
   scope.spawn(|| loop { var += 1; } );
   unsafe { pthread_exit(); }
});

Using pthread_exit can probably be used safely as long as you ensure that exist no pointers back into the stack or TLS of the killed thread (or if such pointers exist then you must make sure to never dereference them).

Note that this includes the standard library and any external crates that you might be using. The case for external crates is simple: you are responsible for auditing them (with no help from Rust's type system). However for the standard library this is a bit more complicated: do we want to stable-guarantee that all of the libstd code involved in spawning a thread (thread::spawn) will also be safe with regards to exiting the thread?

According to the documentation of TerminateThread "the system frees
thread's initial stack"

Ah, I seem to have missed that. In that case TerminateThread should be treated roughly the same way as pthread_exit.

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Mar 29, 2018

Asynchronous cancellation using pthread_cancel is a bit tricky since it may involve unwinding from places that may not expect unwinding to occur:

  • If PTHREAD_CANCEL_DEFERRED is used (this is the default type), then libc will attempt to unwind from syscalls such as sleep, which are currently assumed to be nounwind. Have we decided whether this is UB or a guaranteed abort?

  • If a thread is already in the process of unwinding due to a panic, I am not exactly sure what will happen but it's probably not good.

  • Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?

  • Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

@nagisa

This comment has been minimized.

Copy link
Author

nagisa commented Mar 29, 2018

Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

Plan is to have unwinding from FFI be fine if the functions are annotated with #[unwind]. It is still unclear whether the cleanups in the rust code would be run or not, though.

@nagisa

This comment has been minimized.

Copy link
Author

nagisa commented Mar 29, 2018

Also, thanks for writing down all these things. Most of those questions have been floated around during the All-hands, but I ended up not remembering any of them :)

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Mar 29, 2018

Regarding setjmp & longjmp. I think it's fine as long as setjmp is called from C code. setjmp isn't a normal function and there are hacks in C compilers to support it (__attribute__((returns_twice))). There are also special semantics for volatile local variables in a function that calls setjmp. Therefore I propose disallowing calling setjmp from Rust code.

However using longjmp to jump over Rust frames should be fine, as long as the user is aware that destructors will not be called, so they will be responsible for the safety implications at the library level.

@nagisa

This comment has been minimized.

Copy link
Author

nagisa commented Mar 29, 2018

@Amanieu longjmp is implemented by unwinding on Windows. See rust-lang/rust#48251. Elsewhere, it happens to do the same "remove but do not run destructors" thing, which is something that was declared to be unsound (at least in presence of Drop types within "removed" stuff).

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Mar 29, 2018

I don't think trying to make pthread_exit safe or anything else which kills a thread and its stack is reasonable. This makes e.g. Crossbeams' and Rayon's scoped threads unsound.

So, these will have to be unsafe operations. In terms of when they (and also e.g. setjmp/longjmp) are safe sound to use, I'd say they must not "skip over" any stack frame where any type has a Drop implementation. That means no code actually runs during "normal" unwinding. As a consequence, just deallocating these stackframes is behaviorally perfectly equivalent to a panic being raised (at the longjmp place) and then caught again (at the setjmp place) -- right?

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Mar 29, 2018

Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?

Oh no this cannot possibly work. If you unwind e.g. during the execution of x = y, you could leave behind total garbage in x that is half-way the old value and half-way the new one. Rust is only safe to unwind in places where panics could be raised. I think such a restriction is necessary to write any kind of exception-safe unsafe code.

@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Mar 29, 2018

We know that none of these functions are safe. What we are trying to determine here is: under which conditions is using them (not) UB? Additionally, we are only look at this from the language point of view: everyone will agree that most Rust code assumes that panics don't come from arbitrary instructions and that destructors are executed while unwinding.

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Mar 30, 2018

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Ack.

under which conditions is using them (not) UB?

Right, we'll have to decide that. I said my 2 cents above.

@jugglerchris

This comment has been minimized.

Copy link

jugglerchris commented Apr 5, 2018

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Could there could be some kind of annotation you could use to make this audit easier, at least locally? The compiler must know whether there are any Drop types on the stack at a particular point.

fn may_longjmp() {
    let foo = make_foo();

    // Fail to compile if any destructors would run on unwind (here for foo)
    #[no_drop_on_stack]
    ffi_function_which_may_longjmp();
}

You'd still need to annotate up your call stack to the point where setjmp or similar was called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.