New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reaching unreachable code or deadlocks when unwinding in MPSC #58042

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@jethrogb
Copy link
Contributor

jethrogb commented Feb 1, 2019

Currently, when the stack is unwound while an MPSC channel receiver is blocking on a receive, this will result in the execution of unreachable!() or a deadlock. Various platforms may trigger unwinding while a thread is blocked, due to an error condition.

This PR changes the MPSC drop logic so that unwinding works properly. It also adds tests for all MPSC variants. It can be tricky to trigger unwinding while a thread is blocked. Here I've used pthread_cancel on Linux.

Fixes fortanix/rust-sgx#86

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Feb 1, 2019

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@jethrogb

This comment has been minimized.

Copy link
Contributor Author

jethrogb commented Feb 1, 2019

I haven't yet fixed the stream and shared implementations, I wanted to get some feedback on this PR first.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Feb 1, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:01643448:start=1549008525142296323,finish=1549008597642748465,duration=72500452142
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
[01:10:27] FF..............................................
[01:10:27] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:502:22
[01:10:27] failures:
[01:10:27] 
[01:10:27] ---- [run-fail] run-fail/mpsc-recv-unwind/stream.rs stdout ----
[01:10:27] 
[01:10:27] error: Error: expected failure status (Signal(6)) but received status ExitStatus(ExitStatus(256)).
[01:10:27] status: exit code: 1
[01:10:27] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-fail/mpsc-recv-unwind/stream/a"
[01:10:27] ------------------------------------------
[01:10:27] Deadlock detected
[01:10:27] 
[01:10:27] ------------------------------------------
[01:10:27] ------------------------------------------
[01:10:27] stderr:
[01:10:27] ------------------------------------------
[01:10:27] 
[01:10:27] ------------------------------------------
[01:10:27] 
[01:10:27] thread '[run-fail] run-fail/mpsc-recv-unwind/stream.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3311:9
[01:10:27] 
[01:10:27] ---- [run-fail] run-fail/mpsc-recv-unwind/shared.rs stdout ----
[01:10:27] 
[01:10:27] error: Error: expected failure status (Signal(6)) but received status ExitStatus(ExitStatus(256)).
[01:10:27] status: exit code: 1
[01:10:27] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-fail/mpsc-recv-unwind/shared/a"
[01:10:27] ------------------------------------------
[01:10:27] Deadlock detected
[01:10:27] 
[01:10:27] ------------------------------------------
[01:10:27] ------------------------------------------
[01:10:27] stderr:
[01:10:27] ------------------------------------------
[01:10:27] 
[01:10:27] ------------------------------------------
[01:10:27] 
[01:10:27] thread '[run-fail] run-fail/mpsc-recv-unwind/shared.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3311:9
[01:10:27] 
[01:10:27] failures:
[01:10:27] failures:
[01:10:27]     [run-fail] run-fail/mpsc-recv-unwind/shared.rs
[01:10:27]     [run-fail] run-fail/mpsc-recv-unwind/stream.rs
[01:10:27] test result: FAILED. 145 passed; 2 failed; 1 ignored; 0 measured; 0 filtered out
[01:10:27] 
[01:10:27] 
[01:10:27] 
[01:10:27] 
[01:10:27] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/run-fail" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-fail" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "run-fail" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:10:27] 
[01:10:27] 
[01:10:27] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:10:27] Build completed unsuccessfully in 0:11:04
[01:10:27] Build completed unsuccessfully in 0:11:04
[01:10:27] make: *** [check] Error 1
[01:10:27] Makefile:48: recipe for target 'check' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0103c0bb
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Fri Feb  1 09:20:35 UTC 2019
---
travis_time:end:1ba81ae2:start=1549012837125012095,finish=1549012837130425587,duration=5413492
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:1b5084f8
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|oBRT, Aborted.
#0  0x00007fce80c35428 in ?? ()
[Current thread is 1 (LWP 24071)]
#0  0x00007fce80c35428 in ?? ()
#1  0x00007fce80c3702a in ?? ()
#2  0x0000000000000020 in ?? ()
#3  0x0000000000000000 in ?? ()

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 1, 2019

Sorry I don't really understand what's motivating this PR, this seems like something that surely would have been caught in the half a decade these types have been in use. Is this something like the platform is causing a panic at a location that wasn't expected? If so, where?

@jethrogb

This comment has been minimized.

Copy link
Contributor Author

jethrogb commented Feb 4, 2019

Sorry I don't really understand what's motivating this PR

The 4 test cases that I've added are motivating this. Run them today and you'll run into unreachable! or deadlocks. Ideally, the tests would be run-pass tests, but I couldn't figure out how to write them that way, so instead I'm checking for a specific failure condition that indicates everything as expected.

this seems like something that surely would have been caught in the half a decade these types have been in use.

I guess not. Like I said, it can be tricky to trigger unwinding while a thread is blocked.

Is this something like the platform is causing a panic at a location that wasn't expected? If so, where?

Yes, inside a call to mpsc::Receiver::recv

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 9, 2019

☔️ The latest upstream changes (presumably #58316) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 11, 2019

Thanks for the explanations, but I don't think this is something that we'll want to support in the standard library. Thread cancellation seems like it's an additional features to the channels in the standard library, and I think it'd be fine to document that they're not compatible but otherwise feature development of channels is encouraged to happen externally in crates like crossbeam-channel for this use case.

@jethrogb

This comment has been minimized.

Copy link
Contributor Author

jethrogb commented Feb 12, 2019

I don't think you've understood the heart of the issue yet. The problem is not specific to thread cancellation. The issue occurs when when the stack is unwound while an MPSC channel receiver is blocking on a receiver. Thread cancellation is just an easy way to showcase this on Linux, but there are other ways to trigger unwinding at that point.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 12, 2019

Can you show an example test case that deadlocks without using thread cancellation?

@jethrogb

This comment has been minimized.

Copy link
Contributor Author

jethrogb commented Feb 13, 2019

According to the Linux man page for pthread_cond_timedwait, it may return with EINTR. I haven't been able to figure out how to trigger that, but here I've simulated that by just implementing pthread_cond_timedwait. This program never exits.

#![feature(rustc_private)] // for libc, you can also use crates.io libc instead.

extern crate libc;

#[no_mangle]
pub unsafe extern "C" fn pthread_cond_timedwait(
    _cond: *mut libc::pthread_cond_t,
    _mutex: *mut libc::pthread_mutex_t,
    _abstime: *const libc::timespec
) -> libc::c_int {
    *libc::__errno_location() = libc::EINTR;
    return 1;
}

fn main() {
    let (s, r) = std::sync::mpsc::channel();
    s.send(()).unwrap();
    r.recv().unwrap();
    s.send(()).unwrap();
    r.recv().unwrap();
    r.recv_timeout(std::time::Duration::from_millis(1)).unwrap()
}
@jethrogb

This comment has been minimized.

Copy link
Contributor Author

jethrogb commented Feb 13, 2019

If unwinding were supported in Wasm, I'm pretty this would hang right now after printing the panic message:

#[cfg(test)]
extern crate wasm_bindgen_test;
#[cfg(test)]
use wasm_bindgen_test::*;

#[cfg(test)]
#[wasm_bindgen_test]
pub fn recv() {
    let (tx, rx) = std::sync::mpsc::channel::<()>();
    let _ = tx.clone();
    let _ = rx.recv();
}
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 14, 2019

EINTR should be handled by the condvar implementation and it's a bug if it isn't, and wasm is a special case that I don't think really counts for this because it doesn't have threads anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment