Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc 1.64.0 crashes on riscv64gc-unknown-linux-gnu #102155

Closed
lunasophia opened this issue Sep 22, 2022 · 29 comments
Closed

rustc 1.64.0 crashes on riscv64gc-unknown-linux-gnu #102155

lunasophia opened this issue Sep 22, 2022 · 29 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-riscv Target: RISC-V architecture P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lunasophia
Copy link

lunasophia commented Sep 22, 2022

I upgraded my stable-riscv64gc-unknown-linux-gnu Rust toolchain on my StarFive VisionFive board running the official Ubuntu release this afternoon. I then tried to use the compiler, but I receive a reproducible compiler error. I can confirm that version 1.63.0 works without issue.

Code

I tried this code:

fn main() {
}

I expected to see this happen: the code to build

Instead, this happened: I receive a reproducible error (see backtrace below).

Version it worked on

It most recently worked on: Rust 1.63

Version with regression

rustc --version --verbose:

rustc 1.64.0 (a55dd71d5 2022-09-19)
binary: rustc
commit-hash: a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52
commit-date: 2022-09-19
host: riscv64gc-unknown-linux-gnu
release: 1.64.0
LLVM version: 14.0.6

Backtrace

Backtrace

$ RUST_BACKTRACE=1 cargo run
   Compiling foo v0.1.0 (/home/luna/foo)
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc3d810)[0x3f8d079810]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3f93b0f800]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0x3cea1aa)[0x3f901261aa]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(_RNvCsZJaBwVYvUP_16rustc_query_impl15query_callbacks+0x916c)[0x3f9032cbaa]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0x94700a)[0x3f8cd8300a]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xd57d14)[0x3f8d193d14]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xcce018)[0x3f8d10a018]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xcccb48)[0x3f8d108b48]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(_RNvMs0_NtCs2PGdSkTarcu_15rustc_interface7queriesNtB5_7Queries11global_ctxt+0x300)[0x3f8d182794]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xbecada)[0x3f8d028ada]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc387a6)[0x3f8d0747a6]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xbed8d4)[0x3f8d0298d4]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc0a996)[0x3f8d046996]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc04666)[0x3f8d040666]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc06da4)[0x3f8d042da4]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so(+0xc0b6da)[0x3f8d0476da]
/home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/libstd-f41102d4d21d2c30.so(rust_metadata_std_ac92b06979af539e+0xa0542)[0x3f8c38e542]
/lib/riscv64-linux-gnu/libc.so.6(+0x675a6)[0x3f8c21e5a6]
error: could not compile `foo`

Caused by:
  process didn't exit successfully: `rustc --crate-name foo --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=1b3b0ae194b7a453 -C extra-filename=-1b3b0ae194b7a453 --out-dir /home/luna/foo/target/debug/deps -C incremental=/home/luna/foo/target/debug/incremental -L dependency=/home/luna/foo/target/debug/deps` (signal: 11, SIGSEGV: invalid memory reference)
$

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

@lunasophia lunasophia added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Sep 22, 2022
@rustbot rustbot added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. I-prioritize Issue: Indicates that prioritization has been requested for this issue. and removed regression-untriaged Untriaged performance or correctness regression. labels Sep 22, 2022
@inquisitivecrystal inquisitivecrystal added the I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. label Sep 23, 2022
@apiraino
Copy link
Contributor

WG-prioritization assigning P-critical for the moment (Zulip discussion).

Would be great if someone could identify where this regression started (also using cargo bisect).

@lunasophia just curious, did you have a chance to try nightlies or betas between the two stable 1.63 and 1.64? The answer might be there somewhere. Thanks.

@rustbot label -I-prioritize +P-critical E-needs-bisection

@rustbot rustbot added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 23, 2022
@lunasophia
Copy link
Author

@lunasophia just curious, did you have a chance to try nightlies or betas between the two stable 1.63 and 1.64? The answer might be there somewhere. Thanks.

I have not, but I'll look at writing a script to automate it over the weekend.

@lunasophia
Copy link
Author

While testing this (i.e. I specified the date range for my search incorrectly and pulled in a post-1.64.0 nightly), I noticed the nightly I just installed does work without issue:

cargo 1.66.0-nightly (73ba3f35e 2022-09-18)
release: 1.66.0-nightly
commit-hash: 73ba3f35e0205844418260722c11602113179c4a
commit-date: 2022-09-18
host: riscv64gc-unknown-linux-gnu
libgit2: 1.5.0 (sys:0.15.0 vendored)
libcurl: 7.83.1-DEV (sys:0.4.55+curl-7.83.1 vendored ssl:OpenSSL/1.1.1q)
os: Ubuntu 22.04 (jammy) [64-bit]

If further bisecting is desired, I'd be happy to do so, though I'm a little unclear on how nightly dates match up to version numbers. I installed the 2022-08-11 nightly (expecting it to be 1.64 or 1.63), and it's 1.65 (and still broken), so a list of dates to try would help me narrow down my search.

@lunasophia
Copy link
Author

I joined the Zulip chat and ran cargo-bisect-rustc as instructed. After some trial and error with the date range I found the following:

searched nightlies: from nightly-2022-05-30 to nightly-2022-08-01
regressed nightly: nightly-2022-07-18
searched commit range: d5e7f47...263edd4
regressed commit: 263edd4

bisected with cargo-bisect-rustc v0.6.4

Host triple: riscv64gc-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc 2022-05-30 --end 2022-08-01 --test-dir .

@Rageking8
Copy link
Contributor

Maybe cc @5225225 ?

@5225225
Copy link
Contributor

5225225 commented Sep 24, 2022

Wait what?

That code shouldn't take effect unless you're running with -Zstrict-init-checks. And even then, it should be a panic, not an abort.

Very strange! Maybe try running gdb or valgrind on the binary and then seeing where the segfault is?

@lunasophia
Copy link
Author

lunasophia commented Sep 24, 2022

I'm not sure how helpful the gdb output will be. A lot of the locals have been optimized out and there's not a lot of symbol table information.

gdb run on cargo
(gdb) bt full
#0  syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=0, arg4=0, arg5=0, arg6=-1, arg7=274609475632) at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
        ret = <optimized out>
#1  0x0000002aab379664 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
No locals.
#2  0x0000002aab37c50e in std::sys::unix::locks::futex_condvar::Condvar::wait_optional_timeout () at library/std/src/sys/unix/locks/futex_condvar.rs:51
No locals.
#3  std::sys::unix::locks::futex_condvar::Condvar::wait () at library/std/src/sys/unix/locks/futex_condvar.rs:35
No locals.
#4  0x0000002aab3504d4 in <jobserver::HelperState>::for_each_request::<jobserver::imp::spawn_helper::{closure#1}::{closure#0}> ()
No symbol table info available.
#5  0x0000002aab350a60 in std::sys_common::backtrace::__rust_begin_short_backtrace::<jobserver::imp::spawn_helper::{closure#1}, ()> ()
No symbol table info available.
#6  0x0000002aab350cb2 in _RINvNvNtCseOBki07ryB6_3std9panicking3try7do_callINtNtNtCsidPuqEqzKzv_4core5panic11unwind_safe16AssertUnwindSafeNCNCINvMNtB6_6threadNtB1T_7Builder16spawn_unchecked_NCNvNtCsGjmX1GWYch_9jobserver3imp12spawn_helpers_0uEs_00EuEB2H_.llvm.3138756864971081497 ()
No symbol table info available.
#7  0x0000002aab350d4e in __rust_try.llvm.3138756864971081497 ()
No symbol table info available.
#8  0x0000002aab351954 in <<std::thread::Builder>::spawn_unchecked_<jobserver::imp::spawn_helper::{closure#1}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} ()
No symbol table info available.
#9  0x0000002aab37bdc0 in alloc::boxed::{impl#44}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1935
No locals.
#10 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
    at library/alloc/src/boxed.rs:1935
No locals.
#11 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108
No locals.
#12 0x0000003ff7e7c5a6 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        start = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {{__pc = 274742101330, __regs = {274741663600, 274741665600, 274877883878, 274877883879, 274743680272, 0, 183261183386, 6,
                    274743680272, 274739568640, 274741665600, 274741665600}, __sp = 274741663280, __fpregs = {0 <repeats 12 times>}}}, mask_was_saved = 0}}, priv = {pad = {0x0,
              0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        robust = <optimized out>
#13 0x0000003ff7ec8a02 in __thread_start () at ../sysdeps/unix/sysv/linux/riscv/clone.S:85
No locals.
(gdb)
gdb run on rustc
Reading symbols from /home/luna/.cargo/bin/rustc...
(gdb) set args --crate-name foo --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=1b3b0ae194b7a453 -C extra-filename=-1b3b0ae194b7a453 --out-dir /home/luna/foo/target/debug/deps -C incremental=/home/luna/foo/target/debug/incremental -L dependency=/home/luna/foo/target/debug/deps
(gdb) run
Starting program: /home/luna/.cargo/bin/rustc --crate-name foo --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=1b3b0ae194b7a453 -C extra-filename=-1b3b0ae194b7a453 --out-dir /home/luna/foo/target/debug/deps -C incremental=/home/luna/foo/target/debug/incremental -L dependency=/home/luna/foo/target/debug/deps
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
process 28658 is executing new program: /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/rustc
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x3ff05f1ca0 (LWP 28660)]
[New Thread 0x3fefdedca0 (LWP 28661)]
[Thread 0x3fefdedca0 (LWP 28661) exited]
{"artifact":"/home/luna/foo/target/debug/deps/foo-1b3b0ae194b7a453.d","emit":"dep-info"}

Thread 2 "rustc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ff05f1ca0 (LWP 28660)]
0x0000003ff45f21aa in <rustc_middle::arena::Arena>::alloc_from_iter::<rustc_middle::dep_graph::dep_node::DepKindStruct, rustc_arena::IsNotCopy, [rustc_middle::dep_graph::dep_node::DepKindStruct; 282]> () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
(gdb) bt full
#0  0x0000003ff45f21aa in <rustc_middle::arena::Arena>::alloc_from_iter::<rustc_middle::dep_graph::dep_node::DepKindStruct, rustc_arena::IsNotCopy, [rustc_middle::dep_graph::dep_node::DepKindStruct; 282]> () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#1  0x0000003ff47f8baa in rustc_query_impl::query_callbacks () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#2  0x0000003ff124f00a in <core::cell::once::OnceCell<_>>::get_or_try_init::outlined_call::<<core::cell::once::OnceCell<rustc_middle::ty::context::GlobalCtxt>>::get_or_init<rustc_interface::passes::create_global_ctxt::{closure#1}::{closure#0}>::{closure#0}, rustc_middle::ty::context::GlobalCtxt, !> ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#3  0x0000003ff165fd14 in <core::cell::once::OnceCell<rustc_middle::ty::context::GlobalCtxt>>::get_or_init::<rustc_interface::passes::create_global_ctxt::{closure#1}::{closure#0}>
    () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#4  0x0000003ff15d6018 in <rustc_session::session::Session>::time::<&rustc_middle::ty::context::GlobalCtxt, rustc_interface::passes::create_global_ctxt::{closure#1}> ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#5  0x0000003ff15d4b48 in rustc_interface::passes::create_global_ctxt ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#6  0x0000003ff164e794 in <rustc_interface::queries::Queries>::global_ctxt ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#7  0x0000003ff14f4ada in <rustc_interface::interface::Compiler>::enter::<rustc_driver::run_compiler::{closure#1}::{closure#2}, core::result::Result<core::option::Option<rustc_interface::queries::Linker>, rustc_errors::ErrorGuaranteed>> () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#8  0x0000003ff15407a6 in rustc_span::with_source_map::<core::result::Result<(), rustc_errors::ErrorGuaranteed>, rustc_interface::interface::create_compiler_and_run<core::result::Result<(), rustc_errors::ErrorGuaranteed>, rustc_driver::run_compiler::{closure#1}>::{closure#1}> ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#9  0x0000003ff14f58d4 in <scoped_tls::ScopedKey<rustc_span::SessionGlobals>>::set::<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_errors::ErrorGuaranteed>, rustc_driver::run_compiler::{closure#1}>::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>> ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#10 0x0000003ff1512996 in std::sys_common::backtrace::__rust_begin_short_backtrace::<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_errors::ErrorGuaranteed>, rustc_driver::run_compiler::{closure#1}>::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>> () from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#11 0x0000003ff150c666 in _RINvNvNtCseOBki07ryB6_3std9panicking3try7do_callINtNtNtCsidPuqEqzKzv_4core5panic11unwind_safe16AssertUnwindSafeNCNCINvMNtB6_6threadNtB1T_7Builder16spawn_unchecked_NCINvNtCs2PGdSkTarcu_15rustc_interface4util31run_in_thread_pool_with_globalsNCINvNtB2I_9interface12run_compilerINtNtBR_6result6ResultuNtCs4NYEZz9yNmi_12rustc_errors15ErrorGuaranteedENCNvCs2vGVMgUuDv2_12rustc_driver12run_compilers_0E0B4o_E0B4o_Es_00EB4o_EB5B_.llvm.13548527062024321570 ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#12 0x0000003ff150eda4 in __rust_try.llvm.13548527062024321570 ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#13 0x0000003ff15136da in <<std::thread::Builder>::spawn_unchecked_<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_errors::ErrorGuaranteed>, rustc_driver::run_compiler::{closure#1}>::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} ()
   from /home/luna/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
No symbol table info available.
#14 0x0000003ff085a542 in alloc::boxed::{impl#44}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1935
No locals.
--Type <RET> for more, q to quit, c to continue without paging--c
#15 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1935
No locals.
#16 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108
No locals.
#17 0x0000003ff06ea5a6 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        start = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {{__pc = 274616722770, __regs = {274615702736, 274615704736, 274877885078, 274877885079, 274743680272, 0, 274618230044, 6, 274743680272, 274607316992, 274615704736, 274615704736}, __sp = 274615702416, __fpregs = {0 <repeats 12 times>}}}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        robust = <optimized out>
#18 0x0000003ff0736a02 in __thread_start () at ../sysdeps/unix/sysv/linux/riscv/clone.S:85
No locals.
(gdb)

If it matters, syscall 98 is indeed futex on this machine.

@r-value
Copy link

r-value commented Sep 29, 2022

Here on Arch Linux RISC-V, we can't even build the rustc executables. The stage1 compiler panicked when building stage1 std artifacts.

config & logs: https://gist.github.com/r-value/61aa4658ec1a1c9bb803e768cb003d19

@r-value
Copy link

r-value commented Sep 29, 2022

Here on Arch Linux RISC-V, we can't even build the rustc executables. The stage1 compiler panicked when building stage1 std artifacts.

config & logs: https://gist.github.com/r-value/61aa4658ec1a1c9bb803e768cb003d19

To be clear, we're building natively on RISC-V machines. We also tried native builds under qemu-user but it just crashes on SIGSEGV with some binary backtraces.

@r-value
Copy link

r-value commented Oct 9, 2022

EDIT: This is a misconfigured bisection, ignore it plz.

searched nightlies: from nightly-2022-06-24 to nightly-2022-08-05
regressed nightly: nightly-2022-07-26
searched commit range: 7fe022f...6dbae3a
regressed commit: 7f93d4a

bisected with cargo-bisect-rustc v0.6.4

Host triple: riscv64gc-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=1.63.0 --end=1.64.0

@saethlin
Copy link
Member

There are now two bisections in this issue, and two compiler crashes being reported. Is the second bisection for the ICE when building the compiler on RISC-V?

@r-value
Copy link

r-value commented Oct 10, 2022

There are now two bisections in this issue, and two compiler crashes being reported. Is the second bisection for the ICE when building the compiler on RISC-V?

@saethlin Sorry about the misleading information.

I found some misconfiguration in my bisection and did it again just now, finding that the actual regressed commit should be commit 263edd43c from #99033 when building the compiler on RISC-V natively and actually the same as the one from @lunasophia .

@r-value
Copy link

r-value commented Oct 11, 2022

FYI, I have successfully built a functional rustc on a RISC-V machine natively with the regressed commit reverted. So there must be something wrong with commit 263edd43.

@saethlin
Copy link
Member

Just to confirm, reverting the commit in question fixes both the compiler panic and the bad codegen?

@r-value
Copy link

r-value commented Oct 11, 2022

I didn't try the cross-compiled compiler (I'm assuming the compiler binary used by @lunasophia is cross-compiled without bootstrap because it clearly can't compile itself). But since the native build bootstrapped successfully and works fine, I suppose it's reasonable to assume that reverting the commit fixes both problem.

@saethlin
Copy link
Member

saethlin commented Oct 11, 2022

I'll admit that this is a part of the Rust project I'm not very familiar with, but I'm at a loss for how to explain how that panic is possible. DroplessArena has zero .borrow() calls and a single .borrow_mut() call in a non-recursive function.

So I hate to jump to this, but my best guess is that this is a miscompilation. This is a bit of a long shot, but we do have pretty rich debug assertions. Does setting

[rust]
debug = true

cause whatever you're doing to cause bootstrapping to do anything at all different than the above panic backtrace?

@mrivnak
Copy link

mrivnak commented Oct 11, 2022

I'm running Gentoo on my visionfive and see this issue with the dev-lang/rust-bin as well as from rustup. I'll try with the Gentoo dev-lang/rust package to see if that's different. I believe rust-bin binaries are from Gentoo and not Rustup so they shouldn't be the same if it's a miscompilation issue. Again, will check but the visionfive is quite slow so it'll take me a while.

@r-value
Copy link

r-value commented Oct 11, 2022

I'll admit that this is a part of the Rust project I'm not very familiar with, but I'm at a loss for how to explain how that panic is possible. DroplessArena has zero .borrow() calls and a single .borrow_mut() call in a non-recursive function.

So I hate to jump to this, but my best guess is that this is a miscompilation. This is a bit of a long shot, but we do have pretty rich debug assertions. Does setting

[rust]
debug = true

cause whatever you're doing to cause bootstrapping to do anything at all different than the above panic backtrace?

Thanks for your advise! I'm setting this and rebuilding w/o the commit reverted. This may take some time to finish on the HiFive Unmatched :)

@r-value
Copy link

r-value commented Oct 11, 2022

Well, things become even more interesting - the 1.64.0 release passed the bootstrap with debug = true, enabling the debug assertions.

I suppose there is something wrong with the optimizer. The assertions might have interfered with some optimization procedure and prevented rustc from being miscompiled..

@saethlin
Copy link
Member

saethlin commented Oct 11, 2022

I agree with your speculation, but that's about all the help I have to offer. The commit that regresses this really shouldn't change codegen, but if it does I expect the change should be visible in MIR. So the only thing I can think of to do next is compile a bunch of Rust code to MIR with and without that commit and try to find an example that produces different MIR. That might hint at a small example that can be used to reproduce the miscompilation.

I'm going to do my best to update the labels. There is a compiler team meeting in about 2 days I think, they should have something to say about this. I hope.

@rustbot label -E-needs-bisection +A-LLVM +O-riscv

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. O-riscv Target: RISC-V architecture and removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc labels Oct 11, 2022
@saethlin
Copy link
Member

I attempted my above suggestion on nextest and the standard library. Exactly the same MIR with and without the commit. Which is so incredibly consistent I almost wonder if I did it wrong.

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Oct 20, 2022
This partially reverts commit 53f2e77.

Issue: rust-lang/rust#102155
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
@davidlt
Copy link

davidlt commented Nov 26, 2022

FYI I am seeing the same issues in Fedora 37. Native compile on SiFive HiFive Unmatched.

@saethlin
Copy link
Member

@davidlt Per this comment: #102155 (comment) can you confirm that nightly works? Or, ideally, beta? The current stable is now 1.65 so it's possible that the referenced nightly that fixes the issue is now beta. If that is the case, stable should work again for you in a few weeks.

Though we still have no idea what's going on here so it's possible this was "fixed" by some unrelated change which causes us to no longer tickle a miscompilation in LLVM, which means this bug could resurface at any time.

@apiraino
Copy link
Contributor

What's the actual status for this issue? Thanks

@rustbot label -P-critical +P-medium +T-compiler

@rustbot rustbot added P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed P-critical Critical priority labels Mar 29, 2023
@davidlt
Copy link

davidlt commented Mar 29, 2023

I haven't seen this problem in Fedora/RISCV land in 1.66, or 1.67. Soon to be updated to 1.68.

@r-value
Copy link

r-value commented Apr 5, 2023

The problem also disappeared in Arch Linux RISC-V since version 1.65.0.

@apiraino
Copy link
Contributor

apiraino commented Apr 6, 2023

thanks for the comments! I'll tentatively close this issue, please feel free to reopen is it's not the case (cc: @lunasophia )

@apiraino apiraino closed this as completed Apr 6, 2023
@devyn
Copy link

devyn commented Aug 4, 2023

Sorry to bump an issue that's already been closed, but I am seeing exactly this error still happen, also on a VisionFive (V1) board. I debugged with GDB and the segfault is happening in futex_wait, just as before.

I don't think it's entirely deterministic, and sometimes if I repeatedly run the same rustc command from cargo, it consistently happens, but then when I run cargo again from the beginning I get rustc failing on a different crate instead.

For example, I failed to build ripgrep twice, but then on the third time it actually succeeded. When there is a segfault it's always in futex_wait. When I look at registers, a0 = zero (null), which would probably cause a segfault on a futex_wait call for sure, though keep in mind that that is after the syscall even though it didn't return, so I'm not totally sure that that's the state of a0 before the ecall.

For reference I've tried on both 1.65.0 and 1.71.0, and if anything, 1.71.0 seemed to be worse.

I'd love to bisect this issue again to see if I can narrow it down, but I gather it would require me setting up a script to copy the cross compiled rustc over to the board and test it a few times to ensure it is/isn't an issue.

I also don't believe it's a hardware issue because I haven't experienced any weird segfaults in anything else, and I've also been able to use software that contains Rust without anything like this happening - for example, Firefox runs great. I would really love to believe that it's just an issue with insufficient power causing weird things to happen, but it's so consistently happening on that specific futex_wait syscall, even though it happens inconsistently.


I've been doing a bit more debugging to try to figure out what's going on. This appears to be something that can only happen if there's contention on the atomic word being waited on - sensibly, otherwise we wouldn't wait - so trying to get GDB to catch the value being passed to syscall fails completely, because the breakpoint interrupts any kind of contention and we don't see that happen anymore.

I've looked at the disassembly of Condvar::wait and futex_wait, but we're basically talking about the address of the self parameter being passed directly to the syscall as the first argument, and I can't see anywhere that the saved register that it lives in might get clobbered by something else. It basically goes between s2 and a0 twice, without ever being loaded from the stack or anything getting called in between that might inadvertently mess it up.

I wonder if strace might let me see what's happening?


One example of a segfault causing futex() call from strace:

futex(0x3f9d057df0, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 10641, NULL, FUTEX_BITSET_MATCH_ANY) = ?

I have noticed that the pointer starting with 0x3f9 might actually be invalid, because on a successful run I saw:

futex(0x3fb70e10fc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x3faf339df0, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 10889, NULL, FUTEX_BITSET_MATCH_ANY) = 0

but on a failure I saw:

futex(0x3fa61480fc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x3f9e3a0df0, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 10938, NULL, FUTEX_BITSET_MATCH_ANY) = ?
+++ killed by SIGSEGV (core dumped) +++

It could be nothing, but it could be something.


I examined the core dump and found that the address passed to futex() was valid, and furthermore, that the memory address in question did in fact contain the value 10938. So that rules that out.

Part of me now wants to blame the kernel, as it doesn't feel like rustc is doing anything it shouldn't.

@r-value
Copy link

r-value commented Aug 8, 2023

@devyn I don't think it's related to the issue. This issue describes a serious ICE that segfaults almost unconditionally and our logs are not indicating any signs of futex issue. So far this issue never happened again in new releases. You should probably file a new issue fully describing your scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-riscv Target: RISC-V architecture P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests