Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped thread implicit join doesn't wait for thread locals to be dropped #116237

Open
pvillela opened this issue Sep 28, 2023 · 25 comments
Open

Scoped thread implicit join doesn't wait for thread locals to be dropped #116237

pvillela opened this issue Sep 28, 2023 · 25 comments
Assignees
Labels
A-thread Area: std::thread A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@pvillela
Copy link

pvillela commented Sep 28, 2023

See also #116179:

When a scoped thread is implicitly joined, the destructors of thread-local variables are not guaranteed to have completed when the scope is exited. When a scoped thread is explicitly joined, however, the destructors of thread-local variables do complete before the scope is exited.

[..]


Scoped thread implicit join fails 'happens before' guarantee

As documented in the Rust Atomics and Locks book by Mara Bos, "joining a thread creates a happens-before relationship between the joined thread and what happens after the join() call". This is not true for implicit joins on scoped threads.

I tried this code:

use std::{hint::black_box, mem::replace, thread};

static mut CONTROL: Option<String> = None;

fn main() {
    // Explicit join of scoped thread.
    {
        for _ in 0..5000 {
            thread::scope(|s| {
                let h = s.spawn(|| {
                    FOO.with(|v| {
                        black_box(v);
                    });
                });
                h.join().unwrap();
            });

            // SAFETY: this happens after the thread join, which provides a `happens before` guarantee
            let v = unsafe { &CONTROL };
            black_box(format!("{:?}", v));
            print!(".");
        }
        println!("Completed EXPLICIT join loop.");
    }

    println!();

    // Implicit join of scoped thread.
    {
        for _ in 0..5000 {
            thread::scope(|s| {
                let _h = s.spawn(|| {
                    FOO.with(|v| {
                        black_box(v);
                    });
                });
            });

            // SAFETY: this happens after the implicit thread join, which should provide a `happens before` guarantee
            let v = unsafe { &CONTROL };
            black_box(format!("{:?}", v));
            print!(".");
        }
        println!("Completed IMPLICIT join loop.");
    }
}

struct Foo(());

impl Drop for Foo {
    fn drop(&mut self) {
        // SAFETY: this happens before the thread join, which provides a `happens before` guarantee
        let _: Option<String> = unsafe { replace(&mut CONTROL, Some("abcd".to_owned())) };
    }
}

thread_local! {
    static FOO: Foo = Foo(());
}

I expected to see this happen: Both the explicit join and implicit join portions of the code should terminate normally.

Instead, this happened: The explicit join portion terminates normally as expected, but the implicit join portion panics (the number of iterations on the loop before it panics varies) .

This issue is related to issue #116179.

Meta

The same behaviour is observed on the nightly version nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.74.0-nightly (0288f2e 2023-09-25).

rustc --version --verbose:

rustc 1.72.1 (d5c2e9c34 2023-09-13)
binary: rustc
commit-hash: d5c2e9c342b358556da91d61ed4133f6f50fc0c3
commit-date: 2023-09-13
host: x86_64-unknown-linux-gnu
release: 1.72.1
LLVM version: 16.0.5
Backtrace

thread 'main' panicked at 'byte index 5 is out of bounds of `hP�^`', library/core/src/fmt/mod.rs:2324:30
stack backtrace:
   0: rust_begin_unwind
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/panicking.rs:67:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/mod.rs:87:9
   4: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::Range<usize>>::index
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/traits.rs:235:21
   5: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/str/traits.rs:61:15
   6: <str as core::fmt::Debug>::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2324:30
   7: <alloc::string::String as core::fmt::Debug>::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/string.rs:2271:9
   8: <&T as core::fmt::Debug>::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2268:62
   9: core::fmt::builders::DebugTuple::field::{{closure}}
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/builders.rs:322:17
  10: core::result::Result<T,E>::and_then
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/result.rs:1319:22
  11: core::fmt::builders::DebugTuple::field
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/builders.rs:309:35
  12: core::fmt::Formatter::debug_tuple_field1_finish
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2035:9
  13: <core::option::Option<T> as core::fmt::Debug>::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/option.rs:559:37
  14: <&T as core::fmt::Debug>::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:2268:62
  15: core::fmt::rt::Argument::fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/rt.rs:138:9
  16: core::fmt::write
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:1094:21
  17: core::fmt::Write::write_fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/fmt/mod.rs:192:9
  18: alloc::fmt::format::format_inner
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:610:16
  19: alloc::fmt::format::{{closure}}
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:614:34
  20: core::option::Option<T>::map_or_else
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/option.rs:1180:21
  21: alloc::fmt::format
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/fmt.rs:614:5
  22: thread_local_destruction_in_scoped_thread2::main
             at ./general/src/bin/thread_local_destruction_in_scoped_thread2.rs:41:23
  23: core::ops::function::FnOnce::call_once
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@pvillela pvillela added the C-bug Category: This is a bug. label Sep 28, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 28, 2023
@WaffleLapkin WaffleLapkin added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Sep 28, 2023
@workingjubilee workingjubilee added the A-thread Area: std::thread label Sep 29, 2023
@fmease fmease added A-thread-locals Area: Thread local storage (TLS) and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 29, 2023
@workingjubilee workingjubilee added I-libs-nominated The issue / PR has been nominated for discussion during a libs team meeting. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 5, 2023
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 5, 2023
@workingjubilee
Copy link
Contributor

workingjubilee commented Oct 5, 2023

You know, technically, it only says it has the "happens before" on the API docs of ScopedJoinHandle::join. I'm guessing there being a difference is not intended behavior, however. cc @m-ou-se

@workingjubilee workingjubilee added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Oct 5, 2023
@workingjubilee
Copy link
Contributor

This error is spotty, sometimes it happens, sometimes it doesn't. It happens since at least 1.63, when this function was stabilized.

@m-ou-se m-ou-se self-assigned this Oct 5, 2023
@workingjubilee
Copy link
Contributor

nope, seems this has been happening since day -128, checked with 7737e0b 2022-04-04 using

slightly modified code for version changes
#![feature(bench_black_box)]
#![feature(scoped_threads)]
use std::{hint::black_box, mem::replace, thread};

static mut CONTROL: Option<String> = None;

fn main() {
    // Explicit join of scoped thread.
    {
        for _ in 0..5000 {
            thread::scope(|s| {
                let h = s.spawn(|_| {
                    FOO.with(|v| {
                        black_box(v);
                    });
                });
                h.join().unwrap();
            });

            // SAFETY: this happens after the thread join, which provides a `happens before` guarantee
            let v = unsafe { &CONTROL };
            black_box(format!("{:?}", v));
            print!(".");
        }
        println!("Completed EXPLICIT join loop.");
    }

    println!();

    // Implicit join of scoped thread.
    {
        for _ in 0..5000 {
            thread::scope(|s| {
                let _h = s.spawn(|_| {
                    FOO.with(|v| {
                        black_box(v);
                    });
                });
            });

            // SAFETY: this happens after the implicit thread join, which should provide a `happens before` guarantee
            let v = unsafe { &CONTROL };
            black_box(format!("{:?}", v));
            print!(".");
        }
        println!("Completed IMPLICIT join loop.");
    }
}

struct Foo(());

impl Drop for Foo {
    fn drop(&mut self) {
        // SAFETY: this happens before the thread join, which provides a `happens before` guarantee
        let _: Option<String> = unsafe { replace(&mut CONTROL, Some("abcd".to_owned())) };
    }
}

thread_local! {
    static FOO: Foo = Foo(());
}

@workingjubilee workingjubilee removed the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Oct 5, 2023
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 8, 2023
@joboet
Copy link
Contributor

joboet commented Oct 14, 2023

Explicit joins go through the platforms join operation, which waits until the spawned thread has actually, fully terminated. Therefore, operations performed in TLS destructors are seen by the joining thread.

Implicit joins on the other hand are implemented via an atomic counter that tracks the number of extant threads. The spawning thread parks at the end of the scope until the counter has been set to zero. Scoped threads decrement the counter when exiting their main function and unpark the spawning thread if necessary. The bug described here occurs because the TLS destructors are run after the main function completes, therefore there is no relationship formed between their operations and the counter load at the end of the scope.

As I see it, there are two ways to resolve this:

  1. Explicitly state that there are no ordering guarantees for TLS destructors. The bug described here can only occur in unsafe code, as safe code can neither store referenced data in TLS nor write to shared data without proper synchronization.
  2. Store all created threads in a linked list and use the platform join operation to join the remaining threads, providing the synchronization required for the code in this issue to be sound.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 18, 2023

Thanks for the bug report!

It's a good question whether this behaviour is acceptable or not. In many ways, TLS destructors behave a bit weirdly.

As mentioned above, the implicit joining behaviour does properly wait for the thread's main function to finish, so this is never a problem in safe code.

So, I don't think I'd categorize this as a high priority bug, or perhaps not even as a bug at all. But I do agree it would be good to fix this, if we can find a good way to do that.

Keeping track of all the join/thread handles in some sort of (linked) list would be problematic, as that'd result in an ever growing list (~a memory leak) when using a long-lived scope. (E.g. a webserver that all runs in a single thread::scope, which spawns a thread for every request. It'd never exit the scope, so the list would just grow forever.)

I'll investigate to see if we can find a better solution. Perhaps the counter needs to be decremented by a TLS destructor itself, or we can trigger TLS destruction ourselves, or perhaps we can think of another solution.

@m-ou-se m-ou-se changed the title Scoped thread implicit join fails 'happens before' guarantee Scoped thread implicit join doesn't wait for thread locals to be dropped Oct 18, 2023
@Amanieu
Copy link
Member

Amanieu commented Oct 18, 2023

I'll investigate to see if we can find a better solution. Perhaps the counter needs to be decremented by a TLS destructor itself, or we can trigger TLS destruction ourselves, or perhaps we can think of another solution.

This might work. TLS destructors are executed in reverse order of registration, so if we register one right at the start of the scoped thread, it should execute last after any other TLS destructors in the thread.

@joboet
Copy link
Contributor

joboet commented Oct 18, 2023

That's not guaranteed and very unreliable. If a destructor registers another destructor, that one will run after all others. On platforms that use keyed TLS, there is no predefined order at all.

@workingjubilee
Copy link
Contributor

workingjubilee commented Oct 19, 2023

It could be acceptable to define the implicit join as inherently slightly racy if we're confident that it has no unsoundness per se in doing so (i.e. it is very obvious this is bad if combined with the aforementioned unsafe code), as we can shift the blame to the unsafe code.

However, that leaves the question:

            // SAFETY: this happens after the implicit thread join
            // which should provide a `happens before` guarantee
            let v = unsafe { &CONTROL };
            black_box(format!("{:?}", v));
            print!(".");
        }
        println!("Completed IMPLICIT join loop.");
    }
}

impl Drop for Foo {
    fn drop(&mut self) {
        // SAFETY: this happens before the thread join
        // which should provide a `happens before` guarantee
        let _: Option<String> = unsafe { replace(&mut CONTROL, Some("abcd".to_owned())) };
    }
}

Would either of these unsafe blocks, themselves, be okay, if the other one didn't exist? It seems the answer is that the unsafe read of the static mut is never okay, at least.

Also, in the previous example, it should be noted that executing with optimizations appears to make the first mem::drop and the implicit scoped thread's TLS Drop::drop significantly more likely to race. The thread local's Drop sometimes doesn't even get to its print before the mem::drop is over with.

@Amanieu
Copy link
Member

Amanieu commented Nov 15, 2023

We discussed this in the library team meeting and the decision was the keep the existing behavior. However we are open to any suggestions on how the documentation can be improved to be clearer. Feel free to open a doc PR.

@Amanieu Amanieu removed the I-libs-nominated The issue / PR has been nominated for discussion during a libs team meeting. label Nov 15, 2023
@workingjubilee
Copy link
Contributor

That may be more easily done if we know what the motivation was for deciding to preserve the existing behavior, contra the alternatives? And which unsafe block is the one that "actually" contains the UB?

@m-ou-se
Copy link
Member

m-ou-se commented Mar 14, 2024

That meeting happened on Zulip, here.

What we discussed is that the way to solve this involves keeping track of all the join handles in the main thread, which can pile up (behave like a memory leak) for a long living scope, so we see no other resolution than just accepting the current behaviour of scope().

@m-ou-se
Copy link
Member

m-ou-se commented Mar 14, 2024

Note that the thread_local documentation says:

Note that a “best effort” is made to ensure that destructors for types stored in thread local storage are run

Although it's easy to argue that this "best effort" is only about whether they are run or not, not whether they are run at the expected time.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 14, 2024

I believe we don't guarantee anywhere that TLS destructors will run before .join(). (I wonder how much would break if that behaviour changed.)

But if we do want to guarantee that, we could document that scope() does not implicitly call .join(), but instead that it simply waits for the main function of all spawned threads to have returned.

@pvillela
Copy link
Author

pvillela commented Mar 14, 2024 via email

@talchas2
Copy link

talchas2 commented May 3, 2024

Keeping track of all the join/thread handles in some sort of (linked) list would be problematic, as that'd result in an ever growing list (~a memory leak) when using a long-lived scope

Note that this is already a memory leak for exactly the same sort of reason inside of the OS's thread library. Any joinable thread that has not been joined will require some memory to handle that state and pass through any return value supported by the API.

It would increase the size of the memory leak of course, likely by around 2x at most if the OS has optimized the joinable-but-exited thread state down to minimal amounts.

@RalfJung
Copy link
Member

Ouch, yeah that's quite the surprising footgun. Personally I'd rather have a bit of memory accumulation in thread::scope than risk UB due to missing synchronization. As @talchas2 said, this doesn't make any O(1) memory profile into O(n), at least not when taking into account pthread/kernel resource usage -- it's "just" a different constant in an O(n).

@bstrie
Copy link
Contributor

bstrie commented Jun 19, 2024

To clarify, is there a consensus on whether this is a soundness bug, or just a documentation bug?

@RalfJung RalfJung added the I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. label Jun 19, 2024
@RalfJung
Copy link
Member

RalfJung commented Jun 19, 2024

Nominating that question for libs-api discussion.

There was a poll among the team and it looks very much like the conclusion is "just a documentation bug", but it'd be good to get an answer that explicitly accounts for @talchas2' comment above.

@m-ou-se
Copy link
Member

m-ou-se commented Jul 2, 2024

Note that this is already a memory leak for exactly the same sort of reason inside of the OS's thread library. Any joinable thread that has not been joined will require some memory to handle that state and pass through any return value supported by the API.

We explicitly detach those threads when the join handle is dropped, so those threads are no longer joinable, so are not leaking memory.

@talchas2
Copy link

talchas2 commented Jul 2, 2024

Yeah, that's true, I missed that.

@m-ou-se
Copy link
Member

m-ou-se commented Jul 9, 2024

It would be VERY helpful to have a guarantee that TLS destructors run before a call to join() returns.

That is already the behavior today. Explicitly calling .join() calls pthread_join or similar, which will wait until after TLS destructors have run. (Although we don't explicitly guarantee that. Perhaps we should!)

This issue is about what happens to threads that you have not explicitly .join()ed when exiting the thread::scope in which they were spawned. That currently only waits for the main function of those threads to return. They are not 'joined', because they have already been 'detached' (when the join handle was dropped).

@m-ou-se m-ou-se added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. labels Jul 9, 2024
@m-ou-se
Copy link
Member

m-ou-se commented Jul 9, 2024

Proposing to close this issue with resolution "not a bug, current behavior at scope exit needs to be properly documented".

@rfcbot close

@rfcbot
Copy link

rfcbot commented Jul 9, 2024

Team member @m-ou-se has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Jul 9, 2024
@pvillela
Copy link
Author

@m-ou-se reply:on #116237 (comment):

It would be VERY helpful to have a guarantee that TLS destructors run before a call to join() returns.

That is already the behavior today. Explicitly calling .join() calls pthread_join or similar, which will wait until after TLS destructors have run. (Although we don't explicitly guarantee that. Perhaps we should!)

I have created a separate issue #127571 about this.

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Jul 16, 2024
@rfcbot
Copy link

rfcbot commented Jul 16, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread Area: std::thread A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests