Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPanic in Receiver::recv() #39364
Comments
This comment has been minimized.
This comment has been minimized.
|
@jamespharaoh do you have a standalone example for example that could be |
This comment has been minimized.
This comment has been minimized.
|
I've attempted to recreate this in a simple piece of code but haven't had any success, will keep trying, but perhaps someone can give me some advice... I have a feeling this bug might be in some other synchronization that's going on, since I'm only creating one Sender and one Receiver, then the sender is passed around behind an So it seems to me that somehow if part of the functionality to do that is not correctly synchronizing the state as this is cloned, locked, unlocked, and sent between threads, then it could certainly produce incorrect behaviour in the I am also using various C libraries etc but it seems unlikely that a problem with those would so reliably cause an error in the same place. Does this also seem reasonable? |
This comment has been minimized.
This comment has been minimized.
|
Yeah if there's other unsafe code that may be something to take a look at as well, but otherwise we haven't seen a segfault in channels in a very long time so nothing jumps to mind unfortunately :( |
This comment has been minimized.
This comment has been minimized.
|
The other unsafe code is just compression libraries linked in, well trusted and heavily used, and accessed in a very threadsafe way. And despite lots of concurrency they work flawlessly - the only error I am seeing is always this same receiver issue. My next experiment to reproduce it will add in another element from the app where i see the problem, which is your |
alexcrichton
referenced this issue
Mar 2, 2017
Open
panic "entered unreachable code" triggered when iterating over mpsc Receiver #40156
This comment has been minimized.
This comment has been minimized.
|
Note that #40156 is similar to this so that makes me pretty confident it's not the fault of your local unsafe code. @jamespharaoh did you get anywhere with more investigation? |
This comment has been minimized.
This comment has been minimized.
|
I haven't had a chance but will try soon. |
This comment has been minimized.
This comment has been minimized.
|
I have a similar problem. at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: 0x56287ebb9a54 - std::sys_common::backtrace::_print::hd8a1b72dcf3955ef
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: 0x56287ebc149c - std::panicking::default_hook::{{closure}}::h5ff605bba7612658
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: 0x56287ebc1064 - std::panicking::default_hook::h9bc4f6dfee57d6bd
at /checkout/src/libstd/panicking.rs:371
4: 0x56287ebc18eb - std::panicking::rust_panic_with_hook::hdc01585dc2bf7122
at /checkout/src/libstd/panicking.rs:549
5: 0x56287ebc17c4 - std::panicking::begin_panic::hf84f4975d9f9b642
at /checkout/src/libstd/panicking.rs:511
6: 0x56287ebc16f9 - std::panicking::begin_panic_fmt::hcc3f360b2ba80419
at /checkout/src/libstd/panicking.rs:495
7: 0x56287e3cbbc6 - <std::sync::mpsc::shared::Packet<T>>::decrement::he4fa9520181c5c85
at /checkout/src/libstd/macros.rs:51
8: 0x56287e3c806f - <std::sync::mpsc::shared::Packet<T>>::recv::h3c95f5bc336537aa
at /checkout/src/libstd/sync/mpsc/shared.rs:232
9: 0x56287e3ad379 - <std::sync::mpsc::Receiver<T>>::recv_max_until::h950909094e0767d9
at /checkout/src/libstd/sync/mpsc/mod.rs:966
10: 0x56287e3acc85 - <std::sync::mpsc::Receiver<T>>::recv_timeout::hf72a64a0530efaa1
at /checkout/src/libstd/sync/mpsc/mod.rs:940
11: 0x56287e466841 - <mould_extension::ConnectExtensionWorker as mould::worker::Worker<T>>::realize::hf8b7190433e70336
at /home/denis/vxrevenue/cloud/sub/mould-extension/src/lib.rs:129
12: 0x56287e41cd77 - mould::server::process_session::{{closure}}::h0572e63ea7bd3be9
at /home/denis/vxrevenue/cloud/sub/mould/src/server.rs:83
13: 0x56287e41b69a - mould::server::process_session::h54610d99cf99088f
at /home/denis/vxrevenue/cloud/sub/mould/src/server.rs:44
14: 0x56287e41f76b - mould::server::wsmould::start::{{closure}}::hb36d2fb80ee5ded6
at /home/denis/vxrevenue/cloud/sub/mould/src/server.rs:272
15: 0x56287e469345 - <std::panic::AssertUnwindSafe<F> as core::ops::FnOnce<()>>::call_once::h8ef904bc75108aeb
at /checkout/src/libstd/panic.rs:296
16: 0x56287e3a2e7a - std::panicking::try::do_call::h0979d3031b45f486
at /checkout/src/libstd/panicking.rs:454
17: 0x56287ebc897a - __rust_maybe_catch_panic
at /checkout/src/libpanic_unwind/lib.rs:98
18: 0x56287e3a26de - std::panicking::try::h42b9334978084c46
at /checkout/src/libstd/panicking.rs:433
19: 0x56287e39d4a3 - std::panic::catch_unwind::h5ea213ef0eb7edd1
at /checkout/src/libstd/panic.rs:361
20: 0x56287e3a1a86 - std::thread::Builder::spawn::{{closure}}::h1288ffa1c4d83635
at /checkout/src/libstd/thread/mod.rs:360
21: 0x56287e3fca66 - <F as alloc::boxed::FnBox<A>>::call_box::h1b125a486a246990
at /checkout/src/liballoc/boxed.rs:640
22: 0x56287ebc0714 - std::sys::imp::thread::Thread::new::thread_start::h75b208405df6dcf1
at /checkout/src/liballoc/boxed.rs:650
at /checkout/src/libstd/sys_common/thread.rs:21
at /checkout/src/libstd/sys/unix/thread.rs:84
23: 0x7f7d554096c9 - start_thread
24: 0x7f7d54f2cf7e - clone
25: 0x0 - <unknown> |
This comment has been minimized.
This comment has been minimized.
|
Maybe it's important: thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
(left: `140664964694432`, right: `0`)', /checkout/src/libstd/sync/mpsc/shared.rs:503https://doc.rust-lang.org/src/std/sync/mpsc/shared.rs.html#503 impl<T> Drop for Packet<T> {
fn drop(&mut self) {
// Note that this load is not only an assert for correctness about
// disconnection, but also a proper fence before the read of
// `to_wake`, so this assert cannot be removed with also removing
// the `to_wake` assert.
assert_eq!(self.cnt.load(Ordering::SeqCst), DISCONNECTED);
/*503*/ assert_eq!(self.to_wake.load(Ordering::SeqCst), 0);
assert_eq!(self.channels.load(Ordering::SeqCst), 0);
}
} |
This comment has been minimized.
This comment has been minimized.
|
Not only one thread panics. Other thread panics here: fn decrement(&self, token: SignalToken) -> StartResult {
unsafe {
/*253*/ assert_eq!(self.to_wake.load(Ordering::SeqCst), 0);
let ptr = token.cast_to_usize();
self.to_wake.store(ptr, Ordering::SeqCst); |
This comment has been minimized.
This comment has been minimized.
|
Hi @alexcrichton, I've made an example which often fails and sometimes panic! on receive:
To approve it panics I include trace I've taken: thread '<unnamed>' panicked at 'assertion failed: `(left == right)` (left: `140671335870496`, right: `0`)', /checkout/src/libstd/sync/mpsc/shared.rs:253
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread '<unnamed>' panicked at 'sending request: "SendError(..)"', /checkout/src/libcore/result.rs:859
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', /checkout/src/libcore/result.rs:859Sometimes it even loses messages and can be blocked. I hope it helps... |
This comment has been minimized.
This comment has been minimized.
|
I've improved logging to make sure that the messages sends correctly. I saw a regularity: it brokes when sender instance moved (it transfers owning of the only one instance of DEBUG:rfail: - - - - - - - - - - - - - - - - - - - -
DEBUG:rfail: >>>>> 716 >>>>>
DEBUG:rfail: 716 - found : 0
DEBUG:rfail: 716 - request -> 0
DEBUG:rfail: 716 - response <- 0
DEBUG:rfail: 716 - found : 1
DEBUG:rfail: 716 - request -> 1
DEBUG:rfail: 716 - response <- 1
DEBUG:rfail: 716 - found : 2
DEBUG:rfail: 716 - request -> 2
DEBUG:rfail: 716 - response <- 2
DEBUG:rfail: <<<<< 716 <<<<<
DEBUG:rfail: - - - - - - - - - - - - - - - - - - - -
DEBUG:rfail: >>>>> 717 >>>>>
DEBUG:rfail: 717 - found : 0
thread '<unnamed>' panicked at 'assertion failed: `(left == right)` (left: `139883544888960`, right: `0`)', /checkout/src/libstd/sync/mpsc/shared.rs:253
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread '<unnamed>' panicked at 'receive result: RecvError', /checkout/src/libcore/result.rs:859
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', /checkout/src/libcore/result.rs:859Any ideas how can I detect, catch and debug it? |
This comment has been minimized.
This comment has been minimized.
|
@DenisKolodin awesome thanks so much for the recent investigation into this! The failure here I believe is specifically happening with Locally I was unable to reproduce with It'd be great if you could help out investigating libstd in this regard, but I'll try to get around to taking a look too soon. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I've checked it again with your suggestions. You are absolutely right: this bug appeared with |
arielb1
added
T-libs
I-wrong
labels
Apr 6, 2017
This comment has been minimized.
This comment has been minimized.
|
FWIW, I've reduced @DenisKolodin example to simpler one:
In debug build it usually results in one of two assertions in 1000 iterations on my macbook. |
stepancheg
added a commit
to stepancheg/rust
that referenced
this issue
Jun 24, 2017
stepancheg
added a commit
to stepancheg/rust
that referenced
this issue
Jun 24, 2017
alexcrichton
referenced this issue
Jun 25, 2017
Closed
Assertion failure in mpsc/shared.rs on stable #42852
Mark-Simulacrum
added
C-bug
and removed
I-wrong
labels
Jul 27, 2017
This comment has been minimized.
This comment has been minimized.
daniel-e
commented
Aug 7, 2017
|
I ran into the same problem. Here's a even easier example how to trigger the bug:
|
lotabout
added a commit
to lotabout/skim
that referenced
this issue
Sep 21, 2017
rce
added a commit
to thomaxius-and-co/lemon_bot_discord
that referenced
this issue
Oct 14, 2017
This comment has been minimized.
This comment has been minimized.
|
@imnotbad ran into the same issue with a GTK project. I reduced the problem and investigated it. My reproduction case can be found in this playpen. The problem is the following: rust/src/libstd/sync/mpsc/mod.rs Lines 871 to 876 in 7360d6d This results in the wake_token to be removed from the rust/src/libstd/sync/mpsc/shared.rs Lines 251 to 253 in 7360d6d My idea for a solution to the problem would be to remove the inheritage of the wake_token from the sender side. Instead after the sender has performed the upgrade, it should wake up the receiver, which cleans the signal_token, upgrades itself and registers a new signal_token. rust/src/libstd/sync/mpsc/shared.rs Lines 100 to 128 in 7360d6d |
matthauck
added a commit
to tanium/octobot
that referenced
this issue
Jun 19, 2018
matthauck
referenced this issue
Jun 19, 2018
Merged
Avoid uses of Receiver::recv_timeout to avoid panic in tests #185
This comment has been minimized.
This comment has been minimized.
|
I see, servo still has 3 uses of |
This comment has been minimized.
This comment has been minimized.
|
@stepancheg Out of interest: Why is it hard to incorporate the selection mechanic / to fix the race condition with it in place? From my point of view it shouldn't be much more than an additional field containing a SignalToken on each selected channel which is used instead of the single blocking receiver with information about which channel has awoken it. I can imagine the stealing logic being much more complicated. Another general question: From what I've heard and seen about / in crossbeam-channel, it seems to do everything std::mpsc is doing but more and parts are even more efficient. For example crossbeam-channel was the only bounded spsc implementation I found which is lockfree if data is available. Would it be possible to replace the current mpsc implementation in std with crossbeam-channel? |
This comment has been minimized.
This comment has been minimized.
I think I did a patch which fixes race condition and preserves selection mechanic. It's doable. But proving (and understanding) why it is correct is harder than without select.
Crossbeam may have its own bugs, may have slightly different semantics, and have parts not needed in stdlib API. That said I think it's possible. But generally, I don't think that Moreover, selecting from multiple sources may be less performant than using a single channel, so In my opinion, |
bors
added a commit
that referenced
this issue
Aug 14, 2018
ordovicia
referenced this issue
Sep 16, 2018
Closed
mpsc::Receiver::recv_timeout may panic in weird edgecase #54267
This comment has been minimized.
This comment has been minimized.
kpcyrd
commented
Sep 16, 2018
|
It seems #54267 is the same issue @DenisKolodin describes. I'm hitting this bug using the channel from What's interesting: it seems I'm able to work around this bug by making sure the first call to |
This comment has been minimized.
This comment has been minimized.
|
As you seem to have investigated the issue quite a lot, you might be interested in my technical summary here (#39364 (comment)) if you haven't seen that already. My guess why you can work around the issue by sending a message is that |
This comment has been minimized.
This comment has been minimized.
kpcyrd
commented
Sep 21, 2018
|
@oberien thanks for the hint. I still got panics occasionally by sending 1 dummy message, I'm now sending 2 dummy messages and hopefully work around the bug that way. I still hope somebody with the right skills is able to fix it, my code looks a bit awkward right now. |
This comment has been minimized.
This comment has been minimized.
kpcyrd
commented
Sep 22, 2018
|
Sending two message was still causing issues and I've refactored my code base to use crossbeam-channels instead: https://docs.rs/crossbeam-channel/0.2.6/crossbeam_channel/#how-to-try-sendingreceiving-a-message-with-a-timeout So far I haven't had any issues, I would recommend this as a solution to everybody else who runs into this bug. |
akshayknarayan
added a commit
to ccp-project/portus
that referenced
this issue
Sep 26, 2018
akshayknarayan
referenced this issue
Sep 26, 2018
Closed
channel IPC sometimes segfaults when running integration tests #58
akshayknarayan
added a commit
to ccp-project/portus
that referenced
this issue
Sep 26, 2018
majecty
added a commit
to CodeChain-io/codechain-agent-client
that referenced
this issue
Oct 31, 2018
This comment has been minimized.
This comment has been minimized.
stearnsc
commented
Nov 14, 2018
|
I'm not sure if this is the same issue or if I should open a new one, but I'm getting an extremely intermittent panic: My basic structure is the tx of a channel is cloned for each incoming request in an async hyper service, then moved along as part of a context for handling the request. A dedicated thread loops reading from the rx ( Googling around I found viperscape/oyashio#3 which appears to the be the same symptoms, but harder for me to know for sure it's the same issue, given it's in a library that's wrapping mpsc. For what it's worth, the reproduction in that issue (https://github.com/rohitjoshi/oyashio_test/) panics consistently. I'm happy to include the full stacktrace if that's helpful. |
This comment has been minimized.
This comment has been minimized.
please do :) |
This comment has been minimized.
This comment has been minimized.
stearnsc
commented
Nov 15, 2018
•
|
@Centril here ya go :)
|
jamespharaoh commentedJan 28, 2017
•
edited
I'm having a problem using a channel to send notifications to a background thread... As I understand it, this shouldn't be possible? Receiver::recv() is not supposed to panic, is it? I'm not doing anything unsafe...
Mentioned this on reddit and burntsushi has confirmed that this looks like a bug. I will try and produce some simpler code to reproduce this but don't have time at the moment. (edit - please nag me if i don't produce an example, I have a vague idea what will do it)
I've tried this with stable (1.14) and nightly and get the same result.
A copy of the code which generates the error is available here:
https://github.com/jamespharaoh/rust-output/tree/channel_recv_panic