Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV from rustc while building crate legion #77869

Open
alex5nader opened this issue Oct 12, 2020 · 37 comments
Open

SIGSEGV from rustc while building crate legion #77869

alex5nader opened this issue Oct 12, 2020 · 37 comments
Assignees
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alex5nader
Copy link

Code

I am not sure what part of legion is causing this. I have not encountered this issue for any other crates.

Meta

rustc --version --verbose:

rustc 1.47.0 (18bf6b4f0 2020-10-07)
binary: rustc
commit-hash: 18bf6b4f01a6feaf7259ba7cdae58031af1b7b39
commit-date: 2020-10-07
host: x86_64-unknown-linux-gnu
release: 1.47.0
LLVM version: 11.0

Error output

   Compiling legion v0.3.1 (/data/Projects/legion)
error: could not compile `legion`.

Caused by:
  process didn't exit successfully: `rustc --crate-name legion --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="codegen"' --cfg 'feature="crossbeam-channel"' --cfg 'feature="crossbeam-events"' --cfg 'feature="default"' --cfg 'feature="erased-serde"' --cfg 'feature="legion_codegen"' --cfg 'feature="parallel"' --cfg 'feature="rayon"' --cfg 'feature="serde"' --cfg 'feature="serialize"' -C metadata=14f1150a42ae3e4b -C extra-filename=-14f1150a42ae3e4b --out-dir /data/Projects/legion/target/debug/deps -C incremental=/data/Projects/legion/target/debug/incremental -L dependency=/data/Projects/legion/target/debug/deps --extern bit_set=/data/Projects/legion/target/debug/deps/libbit_set-0f027bbe9088639b.rmeta --extern crossbeam_channel=/data/Projects/legion/target/debug/deps/libcrossbeam_channel-e02935c1a92635b3.rmeta --extern derivative=/data/Projects/legion/target/debug/deps/libderivative-027c3cec12a884ca.so --extern downcast_rs=/data/Projects/legion/target/debug/deps/libdowncast_rs-818a53b23fc7be82.rmeta --extern erased_serde=/data/Projects/legion/target/debug/deps/liberased_serde-c8566e1a0c06d2b3.rmeta --extern itertools=/data/Projects/legion/target/debug/deps/libitertools-4b46418de185c381.rmeta --extern legion_codegen=/data/Projects/legion/target/debug/deps/liblegion_codegen-7fefbee3b51a1a22.so --extern parking_lot=/data/Projects/legion/target/debug/deps/libparking_lot-1282ab6a8685ce14.rmeta --extern paste=/data/Projects/legion/target/debug/deps/libpaste-69df8912f33518e2.so --extern rayon=/data/Projects/legion/target/debug/deps/librayon-1e861157ad884d7a.rmeta --extern serde=/data/Projects/legion/target/debug/deps/libserde-2a6ef3a1ac05b029.rmeta --extern smallvec=/data/Projects/legion/target/debug/deps/libsmallvec-7e54452c7c62a719.rmeta --extern thiserror=/data/Projects/legion/target/debug/deps/libthiserror-e30ede5540027b3b.rmeta --extern uuid=/data/Projects/legion/target/debug/deps/libuuid-1b1ed382cc39f9ea.rmeta` (signal: 11, SIGSEGV: invalid memory reference)
Backtrace

#0  free (ptr=0x48c2df416aec43d6) at ../jemalloc/src/jemalloc.c:2393
#1  0x00007ffff3511bcc in <smallvec::SmallVec<A> as core::ops::drop::Drop>::drop ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#2  0x00007ffff35bf011 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_local ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#3  0x00007ffff35be822 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_block ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#4  0x00007ffff35c07d0 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_fn ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#5  0x00007ffff3549f9c in rustc_ast::visit::walk_assoc_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#6  0x00007ffff35cc54e in rustc_resolve::late::LateResolutionVisitor::with_generic_param_rib ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#7  0x00007ffff35c2f39 in rustc_resolve::late::LateResolutionVisitor::resolve_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#8  0x00007ffff355696e in rustc_ast::visit::walk_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#9  0x00007ffff35c2473 in rustc_resolve::late::LateResolutionVisitor::resolve_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#10 0x00007ffff355696e in rustc_ast::visit::walk_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#11 0x00007ffff35c2473 in rustc_resolve::late::LateResolutionVisitor::resolve_item ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#12 0x00007ffff3547d42 in rustc_ast::visit::walk_crate ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#13 0x00007ffff3588ac7 in rustc_resolve::Resolver::resolve_crate ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#14 0x00007ffff08f3c97 in rustc_interface::passes::configure_and_expand_inner ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#15 0x00007ffff08d06c9 in rustc_interface::passes::configure_and_expand::{{closure}} ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#16 0x00007ffff08aaecf in rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#17 0x00007ffff08f2965 in rustc_interface::passes::configure_and_expand ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#18 0x00007ffff0913f73 in rustc_interface::queries::Queries::expansion ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#19 0x00007ffff05bd887 in rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#20 0x00007ffff0551f27 in rustc_span::with_source_map ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#21 0x00007ffff05bf513 in rustc_interface::interface::create_compiler_and_run ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#22 0x00007ffff059d9fa in scoped_tls::ScopedKey<T>::set ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#23 0x00007ffff05b2957 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#24 0x00007ffff053bdae in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
   from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#25 0x00007fffef949f5a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
    at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#26 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
    at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#27 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#28 0x00007fffef84f606 in ?? () from /usr/lib/libpthread.so.0
#29 0x00007fffef775753 in clone () from /usr/lib/haswell/libc.so.6

@alex5nader alex5nader added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 12, 2020
@alex5nader
Copy link
Author

Building legion on Rust 1.46.0 does work.

rustc --version --verbose:

rustc 1.46.0 (04488afe3 2020-08-24)
binary: rustc
commit-hash: 04488afe34512aa4c33566eb16d8c912a3ae04f9
commit-date: 2020-08-24
host: x86_64-unknown-linux-gnu
release: 1.46.0
LLVM version: 10.0

@camelid camelid added I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ labels Oct 12, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 12, 2020
@jyn514
Copy link
Member

jyn514 commented Oct 12, 2020

Possible duplicate of #77849

@camelid camelid added the O-linux Operating system: Linux label Oct 12, 2020
@ehuss
Copy link
Contributor

ehuss commented Oct 14, 2020

I'm able to reproduce this, although it is finicky. I'm able to reproduce on stable, and as far back as 1.43. I've been having a hard time bisecting to a specific change, since it is a little inconsistent (it can take a few hundred incremental builds before it fails). The failures seem to start around 126ad2b (#68708), although it might be earlier.

I can only repro on my main linux system, but I can't seem to repro on a VM.

It seems to always fail with a call to free on an invalid pointer inside LateResolutionVisitor. It doesn't matter if it is built with jemalloc or not.

I might keep poking at it for a bit, but I think I'm unlikely to make any breakthroughs.

@camelid camelid removed the O-linux Operating system: Linux label Oct 14, 2020
@apiraino
Copy link
Contributor

just out of curiosity, are there conditions that could accellerate the "reproducibility"? Like, if it's a memory exhaustion and allocations fail, could that theoretically happen sooner on a system (hand-wavy speaking) with resources artificially kept busy?

@Aaron1011
Copy link
Member

@ehuss: What commit of legion did you build?

@ehuss
Copy link
Contributor

ehuss commented Oct 15, 2020

@apiraino I don't think it has anything to do with resource exhaustion. So far I have 0 clues. I tried running on valgrind overnight, but it wouldn't fail.

@Aaron1011 I'm on 0733aa39b253b3404544afc3485d332429009799 (v0.3.1).

@alex5nader Can you include which model of CPU you are using?

@alex5nader
Copy link
Author

@ehuss I'm using a Ryzen 5 1600.

@OvermindDL1
Copy link

I've been getting exactly this same bug for many rust versions both stable and nightly (currently on 1.47) over the past ~6 months or so that I've been trying legion from legion 2.4 to 3.0 to its git version, using a Ryzen7. Even a freshly created cargo new ... project with just legion added as a dependency and nothing else changed causes this every single time. Been compiling a multitude of many other projects with excessive dependencies without issues, it's only just legion.

Here's a GDB backtrace of the SIGSEGV (which happens on thread 2):

#0  free (ptr=0x48c2df416aec23d6) at ../jemalloc/src/jemalloc.c:2393
#1  0x00007ffff3513bcc in <smallvec::SmallVec<A> as core::ops::drop::Drop>::drop () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#2  0x00007ffff35c1011 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_local () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#3  0x00007ffff35c0822 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_block () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#4  0x00007ffff35c27d0 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_fn () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#5  0x00007ffff354bf9c in rustc_ast::visit::walk_assoc_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#6  0x00007ffff35ce54e in rustc_resolve::late::LateResolutionVisitor::with_generic_param_rib () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#7  0x00007ffff35c4f39 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#8  0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#9  0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#10 0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#11 0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#12 0x00007ffff3549d42 in rustc_ast::visit::walk_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#13 0x00007ffff358aac7 in rustc_resolve::Resolver::resolve_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#14 0x00007ffff08f5c97 in rustc_interface::passes::configure_and_expand_inner () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#15 0x00007ffff08d26c9 in rustc_interface::passes::configure_and_expand::{{closure}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#16 0x00007ffff08acecf in rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#17 0x00007ffff08f4965 in rustc_interface::passes::configure_and_expand () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#18 0x00007ffff0915f73 in rustc_interface::queries::Queries::expansion () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#19 0x00007ffff05bf887 in rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#20 0x00007ffff0553f27 in rustc_span::with_source_map () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#21 0x00007ffff05c1513 in rustc_interface::interface::create_compiler_and_run () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#22 0x00007ffff059f9fa in scoped_tls::ScopedKey<T>::set () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#23 0x00007ffff05b4957 in std::sys_common::backtrace::__rust_begin_short_backtrace () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#24 0x00007ffff053ddae in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#25 0x00007fffef94bf5a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#26 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#27 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#28 0x00007fffef83e669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#29 0x00007fffef7642b3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I can very reliably reproduce this. No other known issues with the system, everything else compiles without issue, everything runs without issue, memtest and other stress tests run without issue.

@jyn514 jyn514 added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Oct 16, 2020
@OvermindDL1
Copy link

OvermindDL1 commented Oct 16, 2020

I've uploaded my no-op project that constantly reproduces on my system to: https://github.com/OvermindDL1/legion_testing

Just cargo build and when it gets to legion after all its other dependencies, then it crashes. I'm guessing you might need a Ryzen CPU (on perhaps linux, using Ubuntu 20.04 here) based on all other reports I've been seeing of this so far?

@OvermindDL1
Copy link

Memory allocation appears to be fairly minimal at the point of crash, 365megs of VIRT and 348megs of RES, with 99004 of SHM, does not appear to be resource exhaustion of anything that I can see.

@OvermindDL1
Copy link

I cloned https://github.com/TomGillen/legion.git and building it via cargo build also produces the same error. So you can just clone the source project itself and build it to test.

@OvermindDL1
Copy link

After testing of a few things, I found if I removed the legion_codegen library from inside the Cargo.toml it then compiles.

In the small test project, leaving out the default features (which should leave out the legion_codegen crate) does not allow it to compile.

Note, it's legion failing to compile, legion_codegen compiles fine, I'm trying to see what legion_codegen does now...

@OvermindDL1
Copy link

OvermindDL1 commented Oct 16, 2020

So legion itself doesn't so anything with legion_codegen other than just re-export it, that's it. Seems it's the procmacro to generate the system attribute. Why would just re-exporting it cause compiling legion to crash though...

EDIT1: Commenting out the entirety of legion_codegen's source code still causes a compilation failure.

EDIT2: Commenting out all of its dependencies still causes a compilation failure...

EDIT3: Also commenting out proc-macro = true still fails to compile.

EDIT4: Commenting out legion_codegen from legion's Cargo.toml is failing to compile, even after a cargo clean, when it compiled properly before... There seems to be some indeterminacy here...

EDIT5: Removed all optional dependencies and its still failing to compile, even after a clean.

EDIT6: Slowly commenting out large swaths of legion and replacing them with no-ops and got it down to something in the internals module so far...

EDIT7: So far I've got it down to src/internals/cons.rs!

EDIT8: And got it down to this macro call impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z);, I'm now peeling it apart...

EDIT9: Got it down to this line in the macro: let cons!($($items),*) = self;
Peeling apart the cons macro now...

EDIT10: Okay so the macro's seem fine, however the argument count to impl_flatten is causing it, if it is reduced to impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R); then it works, but increasing it by 1 to impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S); and it SIGSEGV's...

EDIT11: Interestingly, if I try to remove some of the entirely empty modules that I completely commented out then it compiles again...

EDIT12: Got it down to just an empty src/internals/entity.rs and the src/internals/cons.rs (mostly commented out except that macro and the trait it implements) and it still SIGSEGV's, trying to reduce further...

@OvermindDL1
Copy link

So far the only code let uncommented is in src/internals/cons.rs and it is:

macro_rules! cons {
    () => (
        ()
    );
    ($head:tt) => (
        ($head, ())
    );
    ($head:tt, $($tail:tt),*) => (
        ($head, cons!($($tail),*))
    );
}

fn blah() {
    let cons!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z) = todo!();
}

And in src/internals.mod.rs:

pub mod cons;

Apparently it's getting more random when it happens the more code I remove, it still happens about 50% of the time though. And in src/lib.rs:

mod internals;

Going to try pulling this into its own project now to see if I can replicate it more standalone...

@OvermindDL1
Copy link

I have reduced the code significantly, error is now:

$ cargo build
   Compiling legion_testing v0.1.0 (/home/overminddl1/rust/legion_testing)
error: could not compile `legion_testing`.

Caused by:
  process didn't exit successfully: `rustc --crate-name legion_testing --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C metadata=668da26770ceeea9 -C extra-filename=-668da26770ceeea9 --out-dir /home/overminddl1/rust/legion_testing/target/debug/deps -C incremental=/home/overminddl1/rust/legion_testing/target/debug/incremental -L dependency=/home/overminddl1/rust/legion_testing/target/debug/deps` (signal: 11, SIGSEGV: invalid memory reference)

I have updated the https://github.com/OvermindDL1/legion_testing project to remove legion and just have the code that tests it. I'm trying to reduce it further but I may be hitting the limit. If I manage to reduce it further then I'll update that repo and post here.

@OvermindDL1
Copy link

I've reduced it a little more, I've noticed that the more arguments I remove from the cons! call then the significantly lower chance it has of happening, leaving it with the full alphabet makes it about 75% of the time it will crash. Again this is on a Ryzen7 with Ubuntu 18.10 with these versions:

$ rustc --version --verbose
rustc 1.47.0 (18bf6b4f0 2020-10-07)
binary: rustc
commit-hash: 18bf6b4f01a6feaf7259ba7cdae58031af1b7b39
commit-date: 2020-10-07
host: x86_64-unknown-linux-gnu
release: 1.47.0
LLVM version: 11.0
$ cargo --version --verbose
cargo 1.47.0 (f3c7e066a 2020-08-28)
release: 1.47.0
commit-hash: f3c7e066ad66e05439cf8eab165a2de580b41aaf
commit-date: 2020-08-28

Is anyone else above that was having an issue compiling legion try out this minimal repo and cargo clean; cargo build a few times to confirm? Perhaps try to reduce it further?

The current reproducing code is (in src/lib.rs):

macro_rules! cons {
    ($head:tt) => (
        ($head, ())
    );
    ($head:tt, $($tail:tt),*) => (
        ($head, cons!($($tail),*))
    );
}

fn blah() {
    let cons!(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, z) = todo!();
}

@OvermindDL1
Copy link

Oh, and as a note, it still happens if you replace tt in the macro with ident as well.

@jyn514 jyn514 removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Oct 16, 2020
@jyn514
Copy link
Member

jyn514 commented Oct 16, 2020

@OvermindDL1 if you expand out the macro does it still crash? Or does it require using the macro?

@Aaron1011
Copy link
Member

@OvermindDL1: I can't reproduce the crash at all with your repository.

Does this happen if you run rustc directly on the file? Could you record a trace with rr?

@OvermindDL1
Copy link

The reason I stopped was because I ran out of time, I'm currently driving for a while, I'm unsure if I'll be able to get to it, if I get Time to look at it tonight I'll try to, otherwise I may be delayed by up to Monday, so if anyone else can get to it that is able to replicate it then that would probably be better.

@OvermindDL1
Copy link

@Aaron1011 I'm curious of your CPU and OS

@Aaron1011
Copy link
Member

@OvermindDL1: I'm running Arch Linux with an Intel Core i9-8950HK

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2020

@Aaron1011 I'm able to reproduce with the reduced macro rules example. It can take a fair number of runs for it to fail (for me, anywhere from 1 to 500 runs). I can't seem to get rr to work very well (I have an AMD cpu). If I try to reverse-next from the failure, it says Expected syscall_bp_vm to be clear but it's 2518439's address space with a breakpoint at 0x7f11b6c3e353 while we're at 0x70000008. I haven't used it before, so I'm not too familiar with it.

Just using gdb with the core dump, it's pretty much the same error as before. Inside resolve_pattern_top it is calling drop_in_place, as best I can see it is freeing a pointer into the middle of some object code (rustc_ast::ast::Pat::walk+348).

@OvermindDL1
Copy link

OvermindDL1 commented Oct 17, 2020

@Aaron1011 So not a Ryzen, so far it seems everyone this happens to has a Ryzen, interesting...

For note, remotely from my phone over ssh I'm trying to do what I can, even rustc --edition=2018 src/lib.rs --crate-type=lib segfaults as well, so does rustc --edition=2018 src/lib.rs --crate-type=lib -Zunstable-options --pretty=expanded > src/lib_expanded.rs, but I got the macro expanded version after a few tries:

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std;
macro_rules! cons {
    ($ head : ident) => (($ head, ())) ;
    ($ head : ident, $ ($ tail : ident), *) =>
    (($ head, cons ! ($ ($ tail), *))) ;
}

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
        { ::std::rt::begin_panic("not yet implemented") };
}

And compiling it via rustc --edition=2018 src/lib_expanded.rs --crate-type=lib also segfaults, so it's not a macro issue, still about a 50% crash rate (the other 50% is just reporting the file error as normal). Reduced further to:

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
        { ::std::rt::begin_panic("not yet implemented") };
}

Again, reducing the depth of the tuples lowers the chance that it happens significantly, very rarely if removed 2, more common crash if adding more. Can replace { ::std::rt::begin_panic("not yet implemented") } with just () as well to become:

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, (aa, ())))))))))))))))))))))))))) = ();
}

And still happens about 50% of the time for me.

Hard to do much from my phone, but will try more later as I can.

@OvermindDL1
Copy link

This crash feels very similar to https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but I'm very unsure.

Unrelated, but is there a way to get rustc with a newer jemalloc or built without jemalloc just as a test?

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2020

@OvermindDL1 I've been testing without jemalloc, and get the same results, so I don't think it is an issue. If you build rustc from source (x.py build library/std), the default is without jemalloc.

@OvermindDL1
Copy link

@ehuss Very cool, thanks for checking without jemalloc. What's your CPU and OS? You said AMD, but is it a ryzen? I have multiple machines here to test with, most of them AMD, only one is a ryzen and that one is the only one that has an issue, unfortunately it's also my fastest cpu by a significant margin so it's the system I usually use as a build host.

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2020

I have a Ryzen Threadripper 2950X, on Ubuntu 20.04. I'm in the same boat, this is the only machine where it reproduces, but it is also by far the fastest one, so I'm still not sure if it is AMD-specific.

@OvermindDL1
Copy link

It always happens on a different thread than the main thread, so I'm actually quite curious if it's some kind of race condition with many core CPUs. Is there a way to specify the number of threads that rustc is allowed to use? I would love to test with a single thread, two threads, on up until I can reproduce it.

I guess I can just load it with a forced cpu core affinity, I'll try to do that the next opportunity I get but it might not be for a little while, so if someone else is able to do before me that would probably be better.

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2020

For the most part, rustc is single threaded, it just runs everything on a dedicated thread for various reasons. It only uses multiple threads for code generation (in llvm), and this crash is happening far earlier than that.

@ishitatsuyuki
Copy link
Contributor

rustc always spawns a thread for the purpose of controlling stack size. If that's hindering your debugging, then you can patch rustc like in #48575.

@mati865
Copy link
Contributor

mati865 commented Oct 17, 2020

@ehuss for rr on Zen you have to use one of workarounds from here: https://github.com/mozilla/rr/wiki/Zen

@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2020

Yea, I implemented the workaround, and the script printed Zen workaround in place. It seems to print that error whenever it steps over certain syscalls like IO. I'm also a bit confused, I ran rustc in a loop until it crashed. It very clearly dumped a core file, and that core file that has a stack that I expect, but when I run rr replay, and manually step through the problem area (resolve_pattern_top), it steps through as-if everything is OK. It's like it is replaying one of the previous successful runs. It's quite confusing.

@apiraino
Copy link
Contributor

@ehuss @OvermindDL1 impressive work done here to try to reproduce. Can we now set some facts about it? I'm trying to square the issue for the compiler team.

Is the latest snippet in this comment a good reproducible example at least in some range of conditions? Second fact, can we rule out a CPU vendor specific issue? What else can we say about this to help reproducing it reliably?

thanks!

@ehuss
Copy link
Contributor

ehuss commented Oct 22, 2020

The simplification listed above fails on some versions, but not all. It seems to be really sensitive and will pass where the original legion still fails. For example, nightly-2020-10-03 fails where nightly-2020-10-04 passes. However, legion still fails for me on nightly-2020-10-04.

I did fair bit of investigation, but did not find anything terribly useful. It is very sensitive to the exact code layout and optimization settings of rustc. For example, compiling rustc_resolve with -O2 causes the problem to go away. Adding #[inline(never)] to resolve_pattern_top also makes the problem go away.

I cannot rule out that it is AMD-specific because I don't have easy access to a fast Intel system. I was unable to repro in a virtual machine on an Intel machine. I was also unable to repro on macOS (Intel) or Windows (AMD).

If someone can reproduce on an Intel Linux system, that would help rule out anything CPU-specific. If they can get it to fail, then running rr could be really helpful, since I can't seem to get it to work correctly on my AMD system.

The script I use to run is:

#!/bin/bash

# Run with RUSTUP_TOOLCHAIN=<toolchain name> to test different toolchains.

ulimit -c unlimited

set -e

rustc -V

for i in {1..1000}
do
    echo $i
    rustc --crate-type rlib foo.rs --emit=metadata
    # Change to this if testing a cargo project:
    # touch src/lib.rs
    # cargo check
done
tput bel

@jyn514 jyn514 added the A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) label Oct 28, 2020
@jyn514
Copy link
Member

jyn514 commented Oct 28, 2020

Assigning P-medium as discussed as part of the Prioritization Working Group procedure and removing I-prioritize. Also assigning I-nominate so we can try to get eyes on the root cause of the issue.

@jyn514 jyn514 added I-nominated P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 28, 2020
@pnkfelix pnkfelix self-assigned this Nov 12, 2020
@spastorino
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests