New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS destructors on the main thread are a little sketchy #28129

Open
alexcrichton opened this Issue Aug 31, 2015 · 6 comments

Comments

Projects
None yet
4 participants
@alexcrichton
Member

alexcrichton commented Aug 31, 2015

There are some platforms where TLS destructors are run when the main thread exits, there are some platforms where this does not happen, and there are some platforms where things just go crazy. For example, testing this program:

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("wut");
    }
}

thread_local!(static FOO: Foo = Foo);

fn main() {
    FOO.with(|_| {});
}
  • Linux ELF TLS - appears to work
  • Linux pthread destructors - appear to not work (#19776)
  • OSX - appears to call destructors, but the program above specifically causes some form of memory corrupting, triggering an assert in malloc
  • Windows GNU/MSVC - appears to not work. We're listening for DLL_PROCESS_DETACH but for some reason we're not getting that notification.
@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Sep 17, 2015

Contributor

The memory corruption in MacOSX seems to be related to the TLS usage during Drop of a TLS value.
If the print! invocation is replaced with stdout().write, no crash occurs.
However, if stdout is used within main, the process dies with the error

thread '<main>' panicked at 'cannot access stdout during shutdown', ../src/libcore/option.rs:333

both if drop calls print! and if it calls write.

Contributor

ranma42 commented Sep 17, 2015

The memory corruption in MacOSX seems to be related to the TLS usage during Drop of a TLS value.
If the print! invocation is replaced with stdout().write, no crash occurs.
However, if stdout is used within main, the process dies with the error

thread '<main>' panicked at 'cannot access stdout during shutdown', ../src/libcore/option.rs:333

both if drop calls print! and if it calls write.

bors added a commit that referenced this issue Sep 23, 2015

Auto merge of #28585 - ranma42:simpler-panic, r=alexcrichton
This is part of some cleanup I did while investigating #28129.
This also ensures that `on_panic` is run even if the user has registered too many callbacks.

bors added a commit that referenced this issue Sep 26, 2015

Auto merge of #28631 - ranma42:robust-panic, r=alexcrichton
This is mainly to avoid infinite recursion and make debugging more convenient in the anomalous case in which `on_panic` panics.
I encountered such issues while changing libstd to debug/fix part of #28129.

While writing this I was wondering about which functions belong to `panicking` and which to `unwind`.
I placed them in this way mostly because of convenience, but I would strongly appreciate guidance.
@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Sep 29, 2015

Contributor

pthread destructors seem to have the same behaviour on MacOS X as on Linux (i.e. they are not run from when the main thread terminates). Googling around seems to suggest that this is a known (expected?) fact. Specifically, TLS destructors are only run when pthread_exit is invoked, which for non-main threads happens implicitly.
The only authoritative source I was able to find is http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html

Contributor

ranma42 commented Sep 29, 2015

pthread destructors seem to have the same behaviour on MacOS X as on Linux (i.e. they are not run from when the main thread terminates). Googling around seems to suggest that this is a known (expected?) fact. Specifically, TLS destructors are only run when pthread_exit is invoked, which for non-main threads happens implicitly.
The only authoritative source I was able to find is http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 29, 2015

Member

Yeah I think solving this in the case of pthreads will either require us documenting "dtors may not run" or perhaps adding our own atexit handler or something like that to handle this (e.g. some custom code on our end). Not entirely sure what that would look like though!

Member

alexcrichton commented Sep 29, 2015

Yeah I think solving this in the case of pthreads will either require us documenting "dtors may not run" or perhaps adding our own atexit handler or something like that to handle this (e.g. some custom code on our end). Not entirely sure what that would look like though!

@ranma42

This comment has been minimized.

Show comment
Hide comment
@ranma42

ranma42 Sep 29, 2015

Contributor

In order to make the atexit handler practical we would need to keep explicitly track the active destructors (at least on the main thread), i.e. basically keep the list of DTORS as in the pthread fallback implementation. This might not be as bad as it looks; in fact it might allow us to reuse some code (currently the Linux fallback dtor and the normal Windows dtor are basically doing the same thing in a slightly different way).

Another option would be to start the main function in a new thread and immediately joining it in lang_start. This would be about as effective as hooking the destructors in the Rust atexit, because for Rust application either way would work just fine and for Rust libraries used by non-Rust applications TLS would not be destroyed in either case.

The C atexit function would suffer from the same limitations as the Rust atexit (need to track dtors), but it might help in non-Rust applications, assuming that go through the normal termination (exit, not _Exit nor quick_exit nor an abort of any kind) as expected by the C runtime.

Documenting "dtors may not run" looks like a reasonable compromise to me, especially if we can guarantee that this will at most happen for the main thread (and, ideally, only on non-Rust apps). I believe that this guarantee is especially important if threads are spawned and joined (instead of managed as a thread pool) to ensure that leaks are bounded.

Contributor

ranma42 commented Sep 29, 2015

In order to make the atexit handler practical we would need to keep explicitly track the active destructors (at least on the main thread), i.e. basically keep the list of DTORS as in the pthread fallback implementation. This might not be as bad as it looks; in fact it might allow us to reuse some code (currently the Linux fallback dtor and the normal Windows dtor are basically doing the same thing in a slightly different way).

Another option would be to start the main function in a new thread and immediately joining it in lang_start. This would be about as effective as hooking the destructors in the Rust atexit, because for Rust application either way would work just fine and for Rust libraries used by non-Rust applications TLS would not be destroyed in either case.

The C atexit function would suffer from the same limitations as the Rust atexit (need to track dtors), but it might help in non-Rust applications, assuming that go through the normal termination (exit, not _Exit nor quick_exit nor an abort of any kind) as expected by the C runtime.

Documenting "dtors may not run" looks like a reasonable compromise to me, especially if we can guarantee that this will at most happen for the main thread (and, ideally, only on non-Rust apps). I believe that this guarantee is especially important if threads are spawned and joined (instead of managed as a thread pool) to ensure that leaks are bounded.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 29, 2015

Member

we would need to keep explicitly track the active destructors

Yeah this probably isn't so bad as we already do it in some cases, so it'd just be a matter of shuffling things around.

Another option would be to start the main function in a new thread and immediately joining it in lang_start

Unfortunately I think this won't work because there's a number of GUI frameworks (or something like that) which only work on the main thread (I think on Windows in particular), so running off-the-main-thread by default may be a bit of a heavy hammer to fix this!

Documenting "dtors may not run" looks like a reasonable compromise to me

I agree it's probably not that bad, but if we could get OSX and Windows working reliably, it's more pressure for us to get pthreads working reliably :). I feel like Windows is pretty easy to fix, it seems like some small error is just being missed there. OSX also feels the same way to me in terms of weird things happening. The pthreads case also isn't super critical on Linux because it's only used on older linuxes.

Overall I think we may still have enough rope left to climb out and close this issue, so I wouldn't be quite willing just yet to close it out by documenting things may not run.

Member

alexcrichton commented Sep 29, 2015

we would need to keep explicitly track the active destructors

Yeah this probably isn't so bad as we already do it in some cases, so it'd just be a matter of shuffling things around.

Another option would be to start the main function in a new thread and immediately joining it in lang_start

Unfortunately I think this won't work because there's a number of GUI frameworks (or something like that) which only work on the main thread (I think on Windows in particular), so running off-the-main-thread by default may be a bit of a heavy hammer to fix this!

Documenting "dtors may not run" looks like a reasonable compromise to me

I agree it's probably not that bad, but if we could get OSX and Windows working reliably, it's more pressure for us to get pthreads working reliably :). I feel like Windows is pretty easy to fix, it seems like some small error is just being missed there. OSX also feels the same way to me in terms of weird things happening. The pthreads case also isn't super critical on Linux because it's only used on older linuxes.

Overall I think we may still have enough rope left to climb out and close this issue, so I wouldn't be quite willing just yet to close it out by documenting things may not run.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Mar 1, 2017

Member

there's a number of GUI frameworks (or something like that) which only work on the main thread (I think on Windows in particular)

Windows itself does not actually care which thread you do the UI on, as long as you're consistent. Once you create a window on a given thread, that thread now has a message queue which you're obligated to pump forever.

Member

retep998 commented Mar 1, 2017

there's a number of GUI frameworks (or something like that) which only work on the main thread (I think on Windows in particular)

Windows itself does not actually care which thread you do the UI on, as long as you're consistent. Once you create a window on a given thread, that thread now has a message queue which you're obligated to pump forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment