TLS destructors on the main thread are a little sketchy #28129

Open
alexcrichton opened this Issue Aug 31, 2015 · 5 comments

Projects

None yet

2 participants

@alexcrichton
Member

There are some platforms where TLS destructors are run when the main thread exits, there are some platforms where this does not happen, and there are some platforms where things just go crazy. For example, testing this program:

struct Foo;

impl Drop for Foo {
    fn drop(&mut self) {
        println!("wut");
    }
}

thread_local!(static FOO: Foo = Foo);

fn main() {
    FOO.with(|_| {});
}
  • Linux ELF TLS - appears to work
  • Linux pthread destructors - appear to not work (#19776)
  • OSX - appears to call destructors, but the program above specifically causes some form of memory corrupting, triggering an assert in malloc
  • Windows GNU/MSVC - appears to not work. We're listening for DLL_PROCESS_DETACH but for some reason we're not getting that notification.
@alexcrichton alexcrichton added the A-io label Aug 31, 2015
@ranma42
Contributor
ranma42 commented Sep 17, 2015

The memory corruption in MacOSX seems to be related to the TLS usage during Drop of a TLS value.
If the print! invocation is replaced with stdout().write, no crash occurs.
However, if stdout is used within main, the process dies with the error

thread '<main>' panicked at 'cannot access stdout during shutdown', ../src/libcore/option.rs:333

both if drop calls print! and if it calls write.

@bors bors added a commit that referenced this issue Sep 23, 2015
@bors bors Auto merge of #28585 - ranma42:simpler-panic, r=alexcrichton
This is part of some cleanup I did while investigating #28129.
This also ensures that `on_panic` is run even if the user has registered too many callbacks.
07ca1ab
@bors bors added a commit that referenced this issue Sep 26, 2015
@bors bors Auto merge of #28631 - ranma42:robust-panic, r=alexcrichton
This is mainly to avoid infinite recursion and make debugging more convenient in the anomalous case in which `on_panic` panics.
I encountered such issues while changing libstd to debug/fix part of #28129.

While writing this I was wondering about which functions belong to `panicking` and which to `unwind`.
I placed them in this way mostly because of convenience, but I would strongly appreciate guidance.
6645ca1
@ranma42
Contributor
ranma42 commented Sep 29, 2015

pthread destructors seem to have the same behaviour on MacOS X as on Linux (i.e. they are not run from when the main thread terminates). Googling around seems to suggest that this is a known (expected?) fact. Specifically, TLS destructors are only run when pthread_exit is invoked, which for non-main threads happens implicitly.
The only authoritative source I was able to find is http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_exit.html

@alexcrichton
Member

Yeah I think solving this in the case of pthreads will either require us documenting "dtors may not run" or perhaps adding our own atexit handler or something like that to handle this (e.g. some custom code on our end). Not entirely sure what that would look like though!

@ranma42
Contributor
ranma42 commented Sep 29, 2015

In order to make the atexit handler practical we would need to keep explicitly track the active destructors (at least on the main thread), i.e. basically keep the list of DTORS as in the pthread fallback implementation. This might not be as bad as it looks; in fact it might allow us to reuse some code (currently the Linux fallback dtor and the normal Windows dtor are basically doing the same thing in a slightly different way).

Another option would be to start the main function in a new thread and immediately joining it in lang_start. This would be about as effective as hooking the destructors in the Rust atexit, because for Rust application either way would work just fine and for Rust libraries used by non-Rust applications TLS would not be destroyed in either case.

The C atexit function would suffer from the same limitations as the Rust atexit (need to track dtors), but it might help in non-Rust applications, assuming that go through the normal termination (exit, not _Exit nor quick_exit nor an abort of any kind) as expected by the C runtime.

Documenting "dtors may not run" looks like a reasonable compromise to me, especially if we can guarantee that this will at most happen for the main thread (and, ideally, only on non-Rust apps). I believe that this guarantee is especially important if threads are spawned and joined (instead of managed as a thread pool) to ensure that leaks are bounded.

@alexcrichton
Member

we would need to keep explicitly track the active destructors

Yeah this probably isn't so bad as we already do it in some cases, so it'd just be a matter of shuffling things around.

Another option would be to start the main function in a new thread and immediately joining it in lang_start

Unfortunately I think this won't work because there's a number of GUI frameworks (or something like that) which only work on the main thread (I think on Windows in particular), so running off-the-main-thread by default may be a bit of a heavy hammer to fix this!

Documenting "dtors may not run" looks like a reasonable compromise to me

I agree it's probably not that bad, but if we could get OSX and Windows working reliably, it's more pressure for us to get pthreads working reliably :). I feel like Windows is pretty easy to fix, it seems like some small error is just being missed there. OSX also feels the same way to me in terms of weird things happening. The pthreads case also isn't super critical on Linux because it's only used on older linuxes.

Overall I think we may still have enough rope left to climb out and close this issue, so I wouldn't be quite willing just yet to close it out by documenting things may not run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment