Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly regression on x86_64_unknown_linux_gnu #43126

Closed
Aceeri opened this Issue Jul 8, 2017 · 25 comments

Comments

Projects
None yet
9 participants
@Aceeri
Copy link

Aceeri commented Jul 8, 2017

Travis is failing for me on the newer nightlies, but not previous ones and not on stable/beta. I'm willing to guess it has something to do with procedural macros, since other PRs build correctly on this nightly.

Travis builds for comparison:
Success: rustc 1.19.0-nightly (10d7cb44c 2017-06-18)
Failure: rustc 1.19.0-nightly (04145943a 2017-06-19)

Related PR: slide-rs/specs#201

Exits with code 101 with no error messages, just a failure. Can't reproduce on Windows, so I am kind of limited in trying to debug this.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jul 8, 2017

Seems to work for me (at least locally) on rustc 1.20.0-nightly (9b85e1cfa 2017-07-07) on Linux.

@TimNN

This comment has been minimized.

Copy link
Contributor

TimNN commented Jul 9, 2017

Ok, I think there's three things going on:

  1. rustup is annoying and turns a segmentation fault produced by rustc into an exit code 1
  2. The failing rustc invocation uses serde_derive, that is it runs code previously produced by the compiler.
  3. IIRC the 2017-06-20 nightly falls into the range of "duplicate drop" nightlies and thus produces bad code.
@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jul 19, 2017

It'd be good to verify if this is fixed by rerunning on a more recent nightly. Marking as a regression for now though...

@Aceeri

This comment has been minimized.

Copy link
Author

Aceeri commented Jul 19, 2017

@Mark-Simulacrum I just re-ran the travis build and now it seems both nightly and beta have this issue now:
https://travis-ci.org/slide-rs/specs/builds/255438162?utm_source=github_status&utm_medium=notification

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jul 19, 2017

I can't seem to reproduce locally on macOS either...

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 23, 2017

@Aceeri would it be possible to make a standalone reproduction of this? Right now I'm not even sure how to get access to the commit on my local machine...

@Aceeri

This comment has been minimized.

Copy link
Author

Aceeri commented Jul 23, 2017

@alexcrichton I'll see if I can try a couple of things to reproduce it if I can find some time.

@Marwes

This comment has been minimized.

Copy link
Contributor

Marwes commented Aug 4, 2017

Might have the same bug in combine https://travis-ci.org/Marwes/combine/jobs/261073809. Compiling on windows with nightly works fine as well.

@Marwes

This comment has been minimized.

Copy link
Contributor

Marwes commented Aug 5, 2017

Reproduction: gist playground

Seems like for some reason x86_64_unknown_linux_gnu segfaults(?) when trying to compile the trait implementation of an extremely nested type. Each tuple implementation in the gist expands a type as Or<A, Or<B, Or<C, ...>>> so it compiles if I implement it only for tuples of lesser arity or if I break up the implementation to have a be wider instead of so deep.

@Aceeri

This comment has been minimized.

Copy link
Author

Aceeri commented Aug 5, 2017

That might actually be the problem, I have a macro in mine that recurses ~128 times (each adding to a type with <T as Split>::Next as Split>::Next ...

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 9, 2017

@Marwes I believe your example here may be the same as #43613, where the segfault is just manifesting as OOM. I ran it locally and rustc ran way with half my memory before I killed it.

In that case I wonder if we can tag #43613 as a regression and close this?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Aug 17, 2017

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Aug 17, 2017

triage: P-high -- seems like a valid regression we need to investigate.

@TimNN

This comment has been minimized.

Copy link
Contributor

TimNN commented Aug 17, 2017

Edit: This is for the reproduction, not the original issue. #43546 was merged after this was opened issue.

According to rust-bisect, this regressed from a successful compilation to an OOM (when given 500MB ram) with #43546.

@TimNN

This comment has been minimized.

Copy link
Contributor

TimNN commented Aug 17, 2017

I attempted to bisect the commit from #43126 (comment) (again with 500MB of ram, I couldn't repro on the latest nightly with full memory). The results were varied (sometimes a segmentation fault and sometimes an illegal instructions, so I'm not sure how helpful or correct the result will be).

rust-bisect (consistently) points to #40847, which however falls outside the failed / success range in the OP (the failing commit from the OP was no longer available).


Event though I couldn't repro on the latest nightly with 1GB ram, I ran rust-bisect again with 1GB ram and got #39409, which does fall into the range in the OP.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 22, 2017

@TimNN thanks for work here! Just to see if I understand, is this an OOM? An infinite loop?

Do we think that there's a clear cause for this?

@TimNN

This comment has been minimized.

Copy link
Contributor

TimNN commented Aug 22, 2017

@alexcrichton: I did the bisection from my last comment again, this time with an 8 GB ram limit (which is much larger than the 4GB the OP's failing builds had on travis (if they ran on the container infrastructure)).

So I currently believe that some or all of #43546, #40847 and #39409 (and/or maybe others) caused an increase in ram requirements (which may have decreased again in the latest nightly).

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 22, 2017

@TimNN are you on Linux? We could try to test this out with massif and valgrind. If you've got the exact compilers and nightlies I can also try to run locally.

In general though you can just do:

valgrind --tool=massif --trace-children=yes rustc ...
ms_print massif.* | less

And then there should be a "pretty graph" at the top of the output

@alexcrichton alexcrichton modified the milestone: 1.20 Aug 23, 2017

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Aug 24, 2017

@pnkfelix will investigate whether #39409 is indeed at fault, as it it seems the most likely culprit from those 3.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 28, 2017

1.20 is now stable

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Aug 31, 2017

Update: @pnkfelix mentioned to me that he was going to attempt to land a -Z flag that would let you disable EndRegion emission, so that we can test the impact more easily. I don't think this has happened yet.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Aug 31, 2017

#44219 is the PR.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Sep 1, 2017

@alexcrichton BTW I had a really hard time getting useful data out of massif (it was claiming the peak memory usage was something like 12kb), until last night @eddyb told me that valgrind/massif do not behave well with jemalloc.

So after I hacked the compiler to use the system allocator (via the diff below), I got much more believable results from valgrind/massif.

  • (This leads me to wonder if we should provide an easier way for the user to opt into using the system allocator when running rustc... I don't know if making it a dynamic choice when running rustc will totally defeat the gains of providing jemalloc as an option in the first place, but I figure I should at least mention the idea...)
diff --git a/src/rustc/rustc.rs b/src/rustc/rustc.rs
index bfd01146d2..1c3d8b1da2 100644
--- a/src/rustc/rustc.rs
+++ b/src/rustc/rustc.rs
@@ -9,7 +9,9 @@
 // except according to those terms.

 #![feature(rustc_private)]
+#![feature(alloc_system)]

+extern crate alloc_system;
 extern crate rustc_driver;

 fn main() { rustc_driver::main() }
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 7, 2017

Can we confirm if this is fixed (for now) by #44249?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 14, 2017

Going to assume this was resolved by #44249. Closing. Please let us know if you are still seeing problems, @Aceeri!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.