Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

jonhoo · 2020-04-10T15:24:48Z

This is this issue which I've factored out into its own issue. Basically, the map::tree_bins::concurrent_tree_bin test occasionally segfaults for me without a backtrace. I can reproduce on current nightly on Linux by running this command for a while:

$ while cargo test --lib map::tree_bins::concurrent_tree_bin -- --test-threads=1 --nocapture; do :; done

Using gdb, I managed to capture a stack trace:

Thread 21 "flurry-27205002" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5632700 (LWP 349291)]
std::thread::Thread::unpark () at src/libstd/thread/mod.rs:1191
1191    src/libstd/thread/mod.rs: No such file or directory.
(gdb) bt
#0  std::thread::Thread::unpark () at src/libstd/thread/mod.rs:1191
#1  0x00005555555eadb2 in flurry::node::TreeBin<K,V>::find (bin=..., hash=0, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/node.rs:472
#2  0x00005555555e3594 in flurry::raw::Table<K,V>::find (self=0x5555557839d0, bin=0x555555784f70, hash=0, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/raw/mod.rs:174
#3  0x00005555555bc56c in flurry::map::HashMap<K,V,S>::get_node (self=0x555555780b40, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/map.rs:1314
#4  0x00005555555bcd1e in flurry::map::HashMap<K,V,S>::get (self=0x555555780b40, key=0x7ffff56317f8, guard=0x7ffff56317b8) at src/map.rs:1387
#5  0x000055555559540e in flurry::map::tree_bins::concurrent_tree_bin::{{closure}} () at src/map.rs:3406
#6  0x00005555555abe91 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:130
#7  0x000055555558f4a1 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/thread/mod.rs:475
#8  0x00005555555d6db1 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=()) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:318
#9  0x00005555555a353a in std::panicking::try::do_call (data=0x7ffff5631998 "0\vxUUU\000") at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:331
#10 0x00005555555a370d in __rust_try ()
#11 0x00005555555a3393 in std::panicking::try (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:274
#12 0x00005555555d6e31 in std::panic::catch_unwind (f=...) at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394
#13 0x000055555558ec19 in std::thread::Builder::spawn_unchecked::{{closure}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/thread/mod.rs:474
#14 0x00005555555875ae in core::ops::function::FnOnce::call_once{{vtable-shim}} () at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
#15 0x00005555556cf61f in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/94d346360da50f159e0dc777dc9bc3c5b6b51a00/src/liballoc/boxed.rs:1008
#16 0x00005555556e25b3 in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/94d346360da50f159e0dc777dc9bc3c5b6b51a00/src/liballoc/boxed.rs:1008
#17 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#18 0x00007ffff7f7746f in start_thread () from /usr/lib/libpthread.so.0
#19 0x00007ffff7e8d3d3 in clone () from /usr/lib/libc.so.6

The text was updated successfully, but these errors were encountered:

domenicquirl · 2020-04-10T15:50:43Z

Also unable to reproduce on Windows stable as of now. The location makes me think of
the "race" discussed in #72 (review) as a likely candidate. We talked about how the tokens are required because they might have to be available before the waiting threads parks. However, your concern there may still be valid wrt. the reading thread seeing the stored WAITER in lock_state, the stored waiter as non-null, but in the meantime the writing thread re-checks lock_state and not only swaps out the waiter, but cleans it up with into_owned. Might need to be defer_destroy, to handle the above case.

domenicquirl · 2020-04-10T15:51:26Z

I need to leave now and will have to come back to this (and maybe setup a Linux nightly for testing). If you have time, maybe try this out in the meantime.

jonhoo · 2020-04-10T15:52:53Z

Hmm, I wonder why the Java code does not have to deal with that...

domenicquirl · 2020-04-10T15:54:59Z

If this ends up being the cause, it would be because the reading thread holds a reference to the Thread handle in question, so it cannot get GC'd (they don't [have to] use atomic pointers)

jonhoo · 2020-04-10T16:00:45Z

Ah, that's a good point. Let me try making that a deferred destroy.

It is possible for someone to try to wake a waiting thread while that thread is spinning. In that case, the waiting thread may decide to free its waker just before the waking thread tries to wake the waiting one, resulting in a use-after-free. This fixes that by deferring the freeing of the waker until the next epoch. Seems to fix #84.

jonhoo added the bug Something isn't working label Apr 10, 2020

jonhoo assigned domenicquirl Apr 10, 2020

jonhoo mentioned this issue Apr 10, 2020

Racy test failure: treeifying a Moved entry #83

Closed

jonhoo mentioned this issue Apr 10, 2020

Do not free waker until safe #85

Merged

jonhoo closed this as completed in #85 Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

jonhoo commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

jonhoo commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

jonhoo commented Apr 10, 2020

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

Racy test failure: segfault in map::tree_bins::concurrent_tree_bin #84

Comments

jonhoo commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

jonhoo commented Apr 10, 2020

domenicquirl commented Apr 10, 2020

jonhoo commented Apr 10, 2020