Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

Closed
jonhoo opened this issue Apr 10, 2020 · 5 comments · Fixed by #85
Closed

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

jonhoo opened this issue Apr 10, 2020 · 5 comments · Fixed by #85
Assignees
Labels
bug Something isn't working

Comments

@jonhoo
Copy link
Owner

jonhoo commented Apr 10, 2020

This is this issue which I've factored out into its own issue. Basically, the map::tree_bins::concurrent_tree_bin test occasionally segfaults for me without a backtrace. I can reproduce on current nightly on Linux by running this command for a while:

$ while cargo test --lib map::tree_bins::concurrent_tree_bin -- --test-threads=1 --nocapture; do :; done

Using gdb, I managed to capture a stack trace:

Thread 21 "flurry-27205002" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5632700 (LWP 349291)]
std::thread::Thread::unpark () at src/libstd/thread/mod.rs:1191
1191    src/libstd/thread/mod.rs: No such file or directory.
(gdb) bt
#0  std::thread::Thread::unpark () at src/libstd/thread/mod.rs:1191
#1  0x00005555555eadb2 in flurry::node::TreeBin<K,V>::find (bin=..., hash=0, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/node.rs:472
#2  0x00005555555e3594 in flurry::raw::Table<K,V>::find (self=0x5555557839d0, bin=0x555555784f70, hash=0, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/raw/mod.rs:174
#3  0x00005555555bc56c in flurry::map::HashMap<K,V,S>::get_node (self=0x555555780b40, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/map.rs:1314
#4  0x00005555555bcd1e in flurry::map::HashMap<K,V,S>::get (self=0x555555780b40, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/map.rs:1387
#5  0x000055555559540e in flurry::map::tree_bins::concurrent_tree_bin::{{closure}} () at src/map.rs:3406
#6  0x00005555555abe91 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:130
#7  0x000055555558f4a1 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/thread/mod.rs:475
#8  0x00005555555d6db1 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=()) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:318
#9  0x00005555555a353a in std::panicking::try::do_call (data=0x7ffff5631998 "0\vxUUU\000") at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:331
#10 0x00005555555a370d in __rust_try ()
#11 0x00005555555a3393 in std::panicking::try (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:274
#12 0x00005555555d6e31 in std::panic::catch_unwind (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394
#13 0x000055555558ec19 in std::thread::Builder::spawn_unchecked::{{closure}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/thread/mod.rs:474
#14 0x00005555555875ae in core::ops::function::FnOnce::call_once{{vtable-shim}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
#15 0x00005555556cf61f in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/94d346360da50f159e0dc777dc9bc3c5b6b51a00/src/liballoc/boxed.rs:1008
#16 0x00005555556e25b3 in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/94d346360da50f159e0dc777dc9bc3c5b6b51a00/src/liballoc/boxed.rs:1008
#17 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#18 0x00007ffff7f7746f in start_thread () from /usr/lib/libpthread.so.0
#19 0x00007ffff7e8d3d3 in clone () from /usr/lib/libc.so.6
@domenicquirl
Copy link
Collaborator

Also unable to reproduce on Windows stable as of now. The location makes me think of
the "race" discussed in #72 (review) as a likely candidate. We talked about how the tokens are required because they might have to be available before the waiting threads parks. However, your concern there may still be valid wrt. the reading thread seeing the stored WAITER in lock_state, the stored waiter as non-null, but in the meantime the writing thread re-checks lock_state and not only swaps out the waiter, but cleans it up with into_owned. Might need to be defer_destroy, to handle the above case.

@domenicquirl
Copy link
Collaborator

I need to leave now and will have to come back to this (and maybe setup a Linux nightly for testing). If you have time, maybe try this out in the meantime.

@jonhoo
Copy link
Owner Author

jonhoo commented Apr 10, 2020

Hmm, I wonder why the Java code does not have to deal with that...

@domenicquirl
Copy link
Collaborator

If this ends up being the cause, it would be because the reading thread holds a reference to the Thread handle in question, so it cannot get GC'd (they don't [have to] use atomic pointers)

@jonhoo
Copy link
Owner Author

jonhoo commented Apr 10, 2020

Ah, that's a good point. Let me try making that a deferred destroy.

jonhoo added a commit that referenced this issue Apr 10, 2020
It is possible for someone to try to wake a waiting thread while that
thread is spinning. In that case, the waiting thread may decide to free
its waker just before the waking thread tries to wake the waiting one,
resulting in a use-after-free.

This fixes that by deferring the freeing of the waker until the next
epoch.

Seems to fix #84.
jonhoo added a commit that referenced this issue Apr 10, 2020
It is possible for someone to try to wake a waiting thread while that
thread is spinning. In that case, the waiting thread may decide to free
its waker just before the waking thread tries to wake the waiting one,
resulting in a use-after-free.

This fixes that by deferring the freeing of the waker until the next
epoch.

Seems to fix #84.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants