Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"died due to signal 11" in collectionstests on arm-android #55861

Open
kennytm opened this Issue Nov 11, 2018 · 13 comments

Comments

Projects
None yet
5 participants
@kennytm
Copy link
Member

kennytm commented Nov 11, 2018

Symptom: The arm-android test failed with the following messages:

[01:44:38] died due to signal 11
[01:44:38] error: test failed, to rerun pass '--test collectionstests'
[01:44:38] 
[01:44:38] 
[01:44:38] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "arm-linux-androideabi" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "alloc" "--"
[01:44:38] expected success, got: exit code: 3
[01:44:38] 
[01:44:38] 
[01:44:38] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target arm-linux-androideabi
[01:44:38] Build completed unsuccessfully in 1:32:50

Previous instances:

This might be caused by a mis-optimization like #49775.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Nov 12, 2018

Added #55052 (comment) to the list

@kennytm

This comment has been minimized.

Copy link
Member Author

kennytm commented Feb 5, 2019

I suspect #56869 (comment) on i686-musl is the same as this issue. Perhaps we should run the collectiontest in miri.

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x08166877 in unbin (i=8, c=0xa048008) at src/malloc/malloc.c:195
#1  malloc (n=<optimized out>) at src/malloc/malloc.c:322
#2  0x0814ddd9 in alloc () at src/libstd/sys/unix/alloc.rs:11
#3  __rdl_alloc () at src/libstd/alloc.rs:233
#4  0x08127c2d in alloc () at /rustc/b6fdcffc3d9d0b28d3a1fc34c49221ff13617b43/src/liballoc/alloc.rs:72
#5  exchange_malloc () at /rustc/b6fdcffc3d9d0b28d3a1fc34c49221ff13617b43/src/liballoc/alloc.rs:182
#6  new<core::cell::UnsafeCell<core::option::Option<core::result::Result<(), alloc::boxed::Box<Any>>>>> () at /rustc/b6fdcffc3d9d0b28d3a1fc34c49221ff13617b43/src/liballoc/sync.rs:288
#7  spawn_unchecked<closure,()> () at /rustc/b6fdcffc3d9d0b28d3a1fc34c49221ff13617b43/src/libstd/thread/mod.rs:458
#8  spawn<closure,()> () at /rustc/b6fdcffc3d9d0b28d3a1fc34c49221ff13617b43/src/libstd/thread/mod.rs:382
#9  test::run_test::run_test_inner::hb0857806220adfa0 () at src/libtest/lib.rs:1450
#10 0x0812645e in test::run_test::h5d2da69c3af16b8e () at src/libtest/lib.rs:1471
#11 0x08120a84 in run_tests<closure> () at src/libtest/lib.rs:1161
#12 test::run_tests_console::hd61c1544d577a32b () at src/libtest/lib.rs:957
#13 0x08119522 in test::test_main::h2355f0b379f819ba () at src/libtest/lib.rs:290
#14 0x08119c07 in test::test_main_static::h90a75711843ac1f7 () at src/libtest/lib.rs:324
#15 0x08106785 in collectionstests::main::h89799a166ae8ba23 ()

The stacktrace doesn't seem meaningful.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Feb 13, 2019

@Mark-Simulacrum

This comment has been minimized.

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Feb 14, 2019

I've managed to reproduce this locally, using the same Android emulator used by CI. I'm working on creating a self-contained script to make it easy to test this locally.

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Feb 14, 2019

I believe I've obtained a backtrace from the emulator:

#0  0x0000662c in ?? ()
#1  0xb6d2c61c in __rust_maybe_catch_panic ()
   from /data/tmp/work/libstd-a3b039e1022c4e23.so
#2  0xb6dc76d0 in test::run_test::run_test_inner::h59a42c89d1b0ae7a ()
   from /data/tmp/work/libtest-0c5e281140db6463.so
#3  0xb6dc7368 in test::run_test::hd576cc177e253177 ()
   from /data/tmp/work/libtest-0c5e281140db6463.so
#4  0xb6dc49fc in test::run_tests_console::h8730cd4311bcbbcb ()
   from /data/tmp/work/libtest-0c5e281140db6463.so
#5  0xb6dc1284 in test::test_main::h343b975aae7c6b90 ()
   from /data/tmp/work/libtest-0c5e281140db6463.so
#6  0xb6dc148c in test::test_main_static::hfd242c8df31b861a ()
   from /data/tmp/work/libtest-0c5e281140db6463.so
#7  0xb6e6779c in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hf8fbc2e6d4c74fb5 ()
#8  0xb6d22244 in std::panicking::try::do_call::h49216886d8653049 ()
   from /data/tmp/work/libstd-a3b039e1022c4e23.so
#9  0xb6d2c61c in __rust_maybe_catch_panic ()
   from /data/tmp/work/libstd-a3b039e1022c4e23.so
#10 0xb6d206d4 in std::panic::catch_unwind::h9c8f0a4ad65b2a9d ()
   from /data/tmp/work/libstd-a3b039e1022c4e23.so
#11 0xb6cf8a10 in std::rt::lang_start_internal::hc01076141fd50efb ()
   from /data/tmp/work/libstd-a3b039e1022c4e23.so
#12 0xb6e589e4 in main ()

Interestingly enough, it appears that the process actually hangs in the emulator, rather than dying. This allowed me to install gdb and connect the process after the test had failed.

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Feb 14, 2019

I strongly suspect that this is related to the panic_safe test in src/liballoc/tests/slice.rs. However, I'm not certain, as I'm having difficulty narrowing things down to the offending test.

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Feb 14, 2019

Some additional results:

I added the following script as a test (it contains some logic from panic_safe):

use std::cell::Cell;
use std::thread;
use std::panic;

thread_local!(static SILENCE_PANIC: Cell<bool> = Cell::new(false));

#[test]
fn test_panic_hook() {
    let prev = panic::take_hook();
    panic::set_hook(Box::new(move |info| {
        if !SILENCE_PANIC.with(|s| s.get()) {
            prev(info);
        }
    }));

    for i in 0..1000 {
        let _ = thread::spawn(move || {
            SILENCE_PANIC.with(|s| s.set(true));
            panic!("Panicked from thread: {}", i);
        }).join();
    }
    println!("All done!");
}

I then manually uploaded it to the emulator, and invoked it in a loop with the following Bash script (run from /data/tmp/work):

export LD_LIBRARY_PATH='.'
while ./collectionstests-b33d04f51899f1c2 catch_panic; do :; done

After running this for about 15-20 minutes, I stopped the loop. For reasons I don't yet understand, several of the spawned processes has segfaulted, wthout stopping the loop.

I then uploaded a static GDB binary from here to the emulator (this was the only way I could manage to get GDB to work).

I managed to obtain the following backtrace from my test program:

#0  0xb6c855a4 in __futex_syscall3 () from /system/lib/libc.so
#1  0xb6c7768c in __pthread_cond_timedwait_relative ()
   from /system/lib/libc.so
#2  0xb6c776ec in __pthread_cond_timedwait () from /system/lib/libc.so
#3  0xb6c7b726 in pthread_join () from /system/lib/libc.so
#4  0xb6d03810 in std::sys::unix::thread::Thread::join::h07b837055b9f571f ()
   from libstd-a3b039e1022c4e23.so
#5  0xb6e57ccc in _$LT$std..thread..JoinHandle$LT$T$GT$$GT$::join::hd81a7f2f5464521c ()
#6  0xb6e5e944 in collectionstests::catch_panic::test_panic_hook::h9f08377776546cb3 ()
#7  0xb6dc7ca8 in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::h3970079b2a2e8325 () from libtest-0c5e281140db6463.so
#8  0xb6d3361c in __rust_maybe_catch_panic () from libstd-a3b039e1022c4e23.so
#9  0xb6dce6d0 in test::run_test::run_test_inner::h59a42c89d1b0ae7a ()
   from libtest-0c5e281140db6463.so
#10 0xb6dce368 in test::run_test::hd576cc177e253177 ()
   from libtest-0c5e281140db6463.so
#11 0xb6dcb9fc in test::run_tests_console::h8730cd4311bcbbcb ()
   from libtest-0c5e281140db6463.so
#12 0xb6dc8284 in test::test_main::h343b975aae7c6b90 ()
   from libtest-0c5e281140db6463.so
#13 0xb6dc848c in test::test_main_static::hfd242c8df31b861a ()
   from libtest-0c5e281140db6463.so
#14 0xb6e57424 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hf8fbc2e6d4c74fb5 ()
#15 0xb6d29244 in std::panicking::try::do_call::h49216886d8653049 ()
   from libstd-a3b039e1022c4e23.so
#16 0xb6d3361c in __rust_maybe_catch_panic () from libstd-a3b039e1022c4e23.so
#17 0xb6d276d4 in std::panic::catch_unwind::h9c8f0a4ad65b2a9d ()
   from libstd-a3b039e1022c4e23.so
#18 0xb6cffa10 in std::rt::lang_start_internal::hc01076141fd50efb ()
   from libstd-a3b039e1022c4e23.so
#19 0xb6e75078 in main ()

From what I can see, the panic hook appears to be executing, and then somehow jumping back into the body of the loop. EDIT: I believe that this is actually showing the normal invocation of my test function.

Note that this was the only thread running. I'm not sure if this is related to the original issue, but it suggets that something weird is going on with the panicking threads.

My current hypothesis is that there's some sort of concurrency and/or codegen issue with the the std::panic::catch_unwind logic (maybe the try compiler intrinsic?).

@Aaron1011

This comment has been minimized.

Copy link
Contributor

Aaron1011 commented Feb 14, 2019

Further evidence for panicking being involved:

Another run crashed with this log (I wasn't able to obtain a backtrace):

test vec::drain_filter_complex ... ok
test str::slice_index::simple_big ... ok
test str::test_unsafe_slice ... ok
test string::test_unsized_to_string ... ok
died due to signal 11

The test immediately following string::test_unsized_to_string is string::test_str_truncate_split_codepoint. This test is marked with #[should_panic], so it seems that simply pancking while a test is running may trigger the issue.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Feb 26, 2019

Thanks @Aaron1011!

We discussed this in the infra team meeting today and decided that we would disable the should_panic tests on Android. This'll basically be a #[cfg(not(target_os = "android"))] gate on each of those tests. Note that android is a tier-2 platform so we (the Rust team) ourselves don't plan to invest time in fixing it, but if someone wants to attempt to narrow this down further we'll gladly review patches.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Feb 26, 2019

I'm temporarily assigning this to myself -- I plan to add that cfg soon. Of course, if someone wants to take that on and save me some time, I'd be happy :)

@Mark-Simulacrum Mark-Simulacrum self-assigned this Feb 26, 2019

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 3, 2019

I'm going to use the functionality added in #58689 instead of a cfg-gate across all of std/test/alloc -- that seems quite a bit cleaner.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 3, 2019

Posted #58900 and unassigning myself. That PR won't fix all of the problems here so it doesn't fix this bug (it doesn't properly inform rustdoc to skip should_panic tests) so it should not close this issue.

@Mark-Simulacrum Mark-Simulacrum removed their assignment Mar 3, 2019

Centril added a commit to Centril/rust that referenced this issue Mar 9, 2019

Rollup merge of rust-lang#58900 - Mark-Simulacrum:ignore-should-panic…
…-android, r=kennytm

Ignore should_panic tests on android

These tests currently segfault sometimes. Android is a tier-2 target so
this is fine to disable instead of fixing.

Unfortunately, this isn't quite enough as rustdoc doesn't currently
correctly interpret the --exclude-should-panic flag (i.e., ignores it).
That proved to be harder to fix than I had time for so we're going to
leave it and hope that at least some of the failures are fixed.

Hopefully alleviates rust-lang#55861; I don't have the time to investigate fixing rustdoc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.