Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread cancellation, asynchronous unwinding exceptions and their interaction with drops #211

Open
nagisa opened this issue Mar 29, 2018 · 19 comments

Comments

@nagisa
Copy link

@nagisa nagisa commented Mar 29, 2018

So during the all-hands we identified a use-case that would be good to be documented in the unsafe code guidelines.

There are at least three sources of "cancellation" which might end up "removing" (in Taylor’s words) a value without "dropping" it, which in turn results in unsoundness for e.g. rayon, crossbeam, &pin stuff. These sources are:

  1. longjmp/setjmp (used in practice for error handling by rust lua and perhaps many other embedded languages/interpreters);
  2. pthread_cancel: which can run either an asynchronous unwinding exception (might occur at any program point) or raise an unwindingexception at well specified points such as sleep; exact behaviour is specified by pthread_setcancelstate.
  3. pthread_exit, pthread_kill: which will "stop" the thread, potentially executing some arbitrary code and cleaning the thread up (freeing thread’s stack).

There’s no question that these functions may be useful in one scenario or the other, so it would be good if we figured out scenarios in which these functions are sound to use (e.g. if the thread stack contains only Copy types) and encoded this information into our unsafe code guidelines book.

cc @nikomatsakis @cramertj

@nagisa nagisa changed the title Thread cancellation, unwinding and their interaction with drops Thread cancellation, asynchronous unwinding exceptions and their interaction with drops Mar 29, 2018
@Amanieu

This comment has been minimized.

Copy link

@Amanieu Amanieu commented Mar 29, 2018

To be clear, pthread_kill sends a signal to a specific thread. If that signal is not handled by a signal handler then the whole process is killed, not just that thread. I think this is outside the scope of this discussion.

Windows has a TerminateThread function which kills a thread without performing any cleanup. Use of this function is strongly discouraged because it can lead to memory leaks and/or deadlocks.

Forking a multi-threaded process also results in something which strongly resembles killing a thread: the child process after forking will only have a single thread, effectively making it look like all other threads have been killed. This has the same downsides as TerminateThread: thread stacks and thread-local storage are leaked, and any held locks are not unlocked.

Since both forking and TerminateThread do not run any cleanup code, there is no unsafety: they are equivalent to simply suspending the threads indefinitely. Memory leaks and deadlocks are not considered to be memory-unsafe.

@nagisa

This comment has been minimized.

Copy link
Author

@nagisa nagisa commented Mar 29, 2018

@Amanieu

This comment has been minimized.

Copy link

@Amanieu Amanieu commented Mar 29, 2018

pthread_exit is more complicated since it frees the thread stack but does not unwind the stack. This could cause issues if, for example, you were to pass a reference to stack-local data to another thread, and then kill the current thread:

let mut var = 0;
crossbeam::scope(|scope| {
   scope.spawn(|| loop { var += 1; } );
   unsafe { pthread_exit(); }
});

Using pthread_exit can probably be used safely as long as you ensure that exist no pointers back into the stack or TLS of the killed thread (or if such pointers exist then you must make sure to never dereference them).

Note that this includes the standard library and any external crates that you might be using. The case for external crates is simple: you are responsible for auditing them (with no help from Rust's type system). However for the standard library this is a bit more complicated: do we want to stable-guarantee that all of the libstd code involved in spawning a thread (thread::spawn) will also be safe with regards to exiting the thread?

According to the documentation of TerminateThread "the system frees
thread's initial stack"

Ah, I seem to have missed that. In that case TerminateThread should be treated roughly the same way as pthread_exit.

@Amanieu

This comment has been minimized.

Copy link

@Amanieu Amanieu commented Mar 29, 2018

Asynchronous cancellation using pthread_cancel is a bit tricky since it may involve unwinding from places that may not expect unwinding to occur:

  • If PTHREAD_CANCEL_DEFERRED is used (this is the default type), then libc will attempt to unwind from syscalls such as sleep, which are currently assumed to be nounwind. Have we decided whether this is UB or a guaranteed abort?

  • If a thread is already in the process of unwinding due to a panic, I am not exactly sure what will happen but it's probably not good.

  • Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?

  • Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

@nagisa

This comment has been minimized.

Copy link
Author

@nagisa nagisa commented Mar 29, 2018

Have we decide what the behavior of foreign exceptions unwinding through Rust frames is? Is it allowed or UB? Is drop called or not?

Plan is to have unwinding from FFI be fine if the functions are annotated with #[unwind]. It is still unclear whether the cleanups in the rust code would be run or not, though.

@nagisa

This comment has been minimized.

Copy link
Author

@nagisa nagisa commented Mar 29, 2018

Also, thanks for writing down all these things. Most of those questions have been floated around during the All-hands, but I ended up not remembering any of them :)

@Amanieu

This comment has been minimized.

Copy link

@Amanieu Amanieu commented Mar 29, 2018

Regarding setjmp & longjmp. I think it's fine as long as setjmp is called from C code. setjmp isn't a normal function and there are hacks in C compilers to support it (__attribute__((returns_twice))). There are also special semantics for volatile local variables in a function that calls setjmp. Therefore I propose disallowing calling setjmp from Rust code.

However using longjmp to jump over Rust frames should be fine, as long as the user is aware that destructors will not be called, so they will be responsible for the safety implications at the library level.

@nagisa

This comment has been minimized.

Copy link
Author

@nagisa nagisa commented Mar 29, 2018

@Amanieu longjmp is implemented by unwinding on Windows. See rust-lang/rust#48251. Elsewhere, it happens to do the same "remove but do not run destructors" thing, which is something that was declared to be unsound (at least in presence of Drop types within "removed" stuff).

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Mar 29, 2018

I don't think trying to make pthread_exit safe or anything else which kills a thread and its stack is reasonable. This makes e.g. Crossbeams' and Rayon's scoped threads unsound.

So, these will have to be unsafe operations. In terms of when they (and also e.g. setjmp/longjmp) are safe sound to use, I'd say they must not "skip over" any stack frame where any type has a Drop implementation. That means no code actually runs during "normal" unwinding. As a consequence, just deallocating these stackframes is behaviorally perfectly equivalent to a panic being raised (at the longjmp place) and then caught again (at the setjmp place) -- right?

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Mar 29, 2018

Asynchronous unwinding (from an arbitrary instruction in a Rust function) is probably fine from the language point of view. DWARF & SEH64 have unwinding tables that are precise for each instruction, while SEH32 and SJLJ unwinding push/pop frames atomically with regards to signal handlers. However actual Rust code may not be designed with arbitrary unwinding in mind. The same question as pthread_exit applies here: which parts of libstd are we will to guarantee are "async-unwind-safe"?

Oh no this cannot possibly work. If you unwind e.g. during the execution of x = y, you could leave behind total garbage in x that is half-way the old value and half-way the new one. Rust is only safe to unwind in places where panics could be raised. I think such a restriction is necessary to write any kind of exception-safe unsafe code.

@Amanieu

This comment has been minimized.

Copy link

@Amanieu Amanieu commented Mar 29, 2018

We know that none of these functions are safe. What we are trying to determine here is: under which conditions is using them (not) UB? Additionally, we are only look at this from the language point of view: everyone will agree that most Rust code assumes that panics don't come from arbitrary instructions and that destructors are executed while unwinding.

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Mar 30, 2018

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Ack.

under which conditions is using them (not) UB?

Right, we'll have to decide that. I said my 2 cents above.

@jugglerchris

This comment has been minimized.

Copy link

@jugglerchris jugglerchris commented Apr 5, 2018

If a user wants to do "fancy" stuff with unwinding (or the lack thereof) then he is solely responsible for auditing any code that he may be unwinding through to ensure that safety is maintained.

Could there could be some kind of annotation you could use to make this audit easier, at least locally? The compiler must know whether there are any Drop types on the stack at a particular point.

fn may_longjmp() {
    let foo = make_foo();

    // Fail to compile if any destructors would run on unwind (here for foo)
    #[no_drop_on_stack]
    ffi_function_which_may_longjmp();
}

You'd still need to annotate up your call stack to the point where setjmp or similar was called.

@RalfJung RalfJung transferred this issue from rust-lang/rust-memory-model Oct 16, 2019
@comex

This comment has been minimized.

Copy link

@comex comex commented Oct 16, 2019

What happens with -C panic=abort?

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Oct 16, 2019

I don't think any of the examples in the OP are affected by that?

@gnzlbg

This comment has been minimized.

Copy link
Collaborator

@gnzlbg gnzlbg commented Oct 17, 2019

@comex

What happens with -C panic=abort?

I haven't tried, but IIUC on Itanium-C++ ABI thread cancelation uses Unwind_ForceUnwind which means that the stack frames would be unwound even if -C panic=abort (whether that does consistently run destructors or not on C++, I don't know, and on Rust, I don't know either).

On windows, where a longjmp is implemented using exceptions, it does unwind frames even when -C panic=abort. @kyren tested this for rlua, which works with -C panic=abort, but rlua does not have types with destructors in the frames that get unwound. @kyren added types with destructors to it, and if the type is in the same frame that calls an extern "C" function that unwinds, currently, its destructor is not run, but if it is in a different frame, then its destructor is run. This might have to do with how we currently implement things in Rust, and does not necessary match what C++ does. So maybe this is a current implementation "bug" (*), our current tests for this are here and they do not test that the destructor is actually run - the destructors there do nothing, e.g., as opposed to trying to count how many times they are invoked.

Notice that when -C panic=abort we stamp nounwind everywhere, but LLVM says that it is ok for nounwind functions to unwind in certain specific ways.

(*) Note that this isn't a language bug, because we maybe don't provide any guarantees for what happens here at the language level - it'd be only a bug at the implementation level, were we want this to work in certain consistent ways, even if longjmp or thread cancellation over frames with destructors is UB. However, if we say that -C panic=abort code never unwinds, then this might be a language bug, because I don't see how we could enforce this guarantee properly.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Oct 17, 2019

Also see rust-lang/reference#693 (comment) for an approximate model of stack frame and their dropping by @gnzlbg.

@comex

This comment has been minimized.

Copy link

@comex comex commented Oct 17, 2019

I checked:

On Darwin, pthread_cancel doesn't use unwinding at all; it uses a separate linked list to track pthread_cleanup handlers. As such, it will never run Rust destructors.

On Linux/glibc, pthread_cancel does indeed use _Unwind_ForcedUnwind.

On both OSes, pthread_exit uses the same code path as pthread_cancel, so the behavior is equivalent (unwinding on Linux, not on Darwin).

Here are some test results with both panic=unwind and panic=abort on Linux and macOS – various types of exotic crashes:

https://gist.github.com/comex/401f5c02f066fdea8e06c75a28267247

@gnzlbg

This comment has been minimized.

Copy link
Collaborator

@gnzlbg gnzlbg commented Oct 17, 2019

@comex thanks!

On Darwin, pthread_cancel doesn't use unwinding at all; it uses a separate linked list to track pthread_cleanup handlers. As such, it will never run Rust destructors.

This is interesting, because it doesn't work exactly like a longjmp since it runs some pthread_cleanup handlers :D

I tried to visualize all the info and came up with this table:

name kind runs destructors? stoppable? pausable? aborts on aborts-on-panic-shims
Rust panic native yes catch catch+resume yes
C++ throw foreign yes catch could catch+resume could yes
pthread_cancel (linux) foreign yes no catch+resume could no
pthread_cancel (osx) foreign no ? ? no
longjmp (linux) foreign no no (setjmp only) no no
longjmp (windows-SEH) foreign yes catch could catch+resume could no

Instead of specifying the behavior of each of these on each platform, it might be better to specify the behavior of Rust depending on the columns.

  • If the kind is native, the behavior is well-defined and does ...
  • If the kind is foreign, the behavior might be undefined, depending on other factors
  • If the unwind does not run destructors, the behavior is undefined if unwinding deallocates drop types with safety invariants that require running destructors like Pin
  • If the unwind is not stoppable or pausable, the behavior is undefined if the unwind reaches a thread boundary
  • If the unwind does not abort on abort-on-panic shims and runs destructors and there are types with destructors in the frames, the behavior is undefined (e.g. calling a nounwind function that ends up unwinding such that destructors run might lead to only some destructors being run, depending on optimizations)
  • Some guarantees for catch_unwind, for example:
    • if the unwind is foreign, the behavior is UB
    • or, if the unwind is foreign, catch unwind never catches the unwind

So if you have a platform and want to call some function that panics, you'd just fill in a row in the table, and answer the questions to see when the behavior is defined, and to what.

Something like that might end up reducing the amount of UB to few special cases, e.g.:

  • deallocating a Pin without calling its destructor
  • unwinding through a thread boundary with an unwind that cannot be stopped or paused (e.g. a longjmp that's not caught by its setjmp on the current thread)
  • unwind is not caught by abort-on-panic-shims but runs destructors and there are types with drop glue on frames (e.g. longjmp on windows-seh if the frames have types with destructors)

leaving a lot of useful cases as "well-defined".

@RalfJung RalfJung added T-unwind and removed T-drop labels Dec 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.