Skip to content

mpsc::Receiver::recv_timeout may panic in weird edgecase #54267

@kpcyrd

Description

@kpcyrd

Description

I'm currently stuck in a very obscure edgecase: the first recv_timeout panics the program with an assertion fail if no message is received until the timeout. If a message is sent before this timeout the program won't crash, even if the 2nd call to recv_timeout does hit the timeout:

This does work

  1. recv_timeout(100ms) // message is sent within that time
  2. message is processed
  3. recv_timeout(100ms) // no message within timeout
  4. program doesn't crash and tries again

This doesn't work

  1. recv_timeout(100ms) // no message within timeout
  2. assertion fail in stdlib panics the program

Backtrace

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `112055772029824`,
 right: `0`', libstd/sync/mpsc/shared.rs:253:13
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:475
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:390
   6: std::panicking::begin_panic_fmt
             at libstd/panicking.rs:345
   7: <std::sync::mpsc::shared::Packet<T>>::decrement
             at /checkout/src/libstd/macros.rs:78
   8: <std::sync::mpsc::shared::Packet<T>>::recv
             at /checkout/src/libstd/sync/mpsc/shared.rs:232
   9: <std::sync::mpsc::Receiver<T>>::recv_deadline
             at /checkout/src/libstd/sync/mpsc/mod.rs:1387
  10: <std::sync::mpsc::Receiver<T>>::recv_timeout
             at /checkout/src/libstd/sync/mpsc/mod.rs:1300
[my code starts here]

Code from stdlib

This is the code in question. The first assert fails for unknown reasons (not sure how to_wake is used):

    // Essentially the exact same thing as the stream decrement function.
    // Returns true if blocking should proceed.
    fn decrement(&self, token: SignalToken) -> StartResult {
        unsafe {
            assert_eq!(self.to_wake.load(Ordering::SeqCst), 0);
            let ptr = token.cast_to_usize();
            self.to_wake.store(ptr, Ordering::SeqCst);


            let steals = ptr::replace(self.steals.get(), 0);


            match self.cnt.fetch_sub(1 + steals, Ordering::SeqCst) {
                DISCONNECTED => { self.cnt.store(DISCONNECTED, Ordering::SeqCst); }
                // If we factor in our steals and notice that the channel has no
                // data, we successfully sleep
                n => {
                    assert!(n >= 0);
                    if n - steals <= 0 { return Installed }
                }
            }


            self.to_wake.store(0, Ordering::SeqCst);
            drop(SignalToken::cast_from_usize(ptr));
            Abort
        }
    }

Failed attempt to reproduce

The issue is 100% reliable in my codebase (the number changes, but the panic is always the same, even after full rebuilds), I've tried to build a test case that is structed the same way my program is structured but failed to reproduce the issue. The full code base isn't public yet.

use std::thread;
use std::sync::mpsc;
use std::time::Duration;

enum Event {
    Tick,
    Done,
}

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        thread::sleep(Duration::from_secs(3));
        tx.send(Event::Tick).unwrap();
        thread::sleep(Duration::from_secs(3));
        tx.send(Event::Done).unwrap();
    });

    loop {
        match rx.recv_timeout(Duration::from_secs(100)) {
            Ok(Event::Tick) => println!("tick"),
            Ok(Event::Done) => break,
            Err(mpsc::RecvTimeoutError::Timeout) => (),
            Err(mpsc::RecvTimeoutError::Disconnected) => break,
        }
    }
}

Random thoughts

I'm suspecting there might be a dependency at fault that has unsound unsafe code, but I'm running out of ideas how to debug this (had some issues with valgrind and still working on getting it to work). Some pointers would be appreciated.

System info

Archlinux with stable rustc from rustup:

rustc 1.29.0 (aa3ca1994 2018-09-11)
cargo 1.29.0 (524a578d7 2018-08-05)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions