Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly regression on x86_64_unknown_linux_gnu #43126

Closed
Aceeri opened this issue Jul 8, 2017 · 25 comments
Closed

Nightly regression on x86_64_unknown_linux_gnu #43126

Aceeri opened this issue Jul 8, 2017 · 25 comments
Assignees
Labels
C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@Aceeri
Copy link

Aceeri commented Jul 8, 2017

Travis is failing for me on the newer nightlies, but not previous ones and not on stable/beta. I'm willing to guess it has something to do with procedural macros, since other PRs build correctly on this nightly.

Travis builds for comparison:
Success: rustc 1.19.0-nightly (10d7cb44c 2017-06-18)
Failure: rustc 1.19.0-nightly (04145943a 2017-06-19)

Related PR: amethyst/specs#201

Exits with code 101 with no error messages, just a failure. Can't reproduce on Windows, so I am kind of limited in trying to debug this.

@sfackler
Copy link
Member

sfackler commented Jul 8, 2017

Seems to work for me (at least locally) on rustc 1.20.0-nightly (9b85e1cfa 2017-07-07) on Linux.

@TimNN
Copy link
Contributor

TimNN commented Jul 9, 2017

Ok, I think there's three things going on:

  1. rustup is annoying and turns a segmentation fault produced by rustc into an exit code 1
  2. The failing rustc invocation uses serde_derive, that is it runs code previously produced by the compiler.
  3. IIRC the 2017-06-20 nightly falls into the range of "duplicate drop" nightlies and thus produces bad code.

@Mark-Simulacrum
Copy link
Member

It'd be good to verify if this is fixed by rerunning on a more recent nightly. Marking as a regression for now though...

@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 19, 2017
@Aceeri
Copy link
Author

Aceeri commented Jul 19, 2017

@Mark-Simulacrum I just re-ran the travis build and now it seems both nightly and beta have this issue now:
https://travis-ci.org/slide-rs/specs/builds/255438162?utm_source=github_status&utm_medium=notification

@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 19, 2017
@Mark-Simulacrum
Copy link
Member

I can't seem to reproduce locally on macOS either...

@alexcrichton
Copy link
Member

@Aceeri would it be possible to make a standalone reproduction of this? Right now I'm not even sure how to get access to the commit on my local machine...

@Aceeri
Copy link
Author

Aceeri commented Jul 23, 2017

@alexcrichton I'll see if I can try a couple of things to reproduce it if I can find some time.

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 28, 2017
@Marwes
Copy link
Contributor

Marwes commented Aug 4, 2017

Might have the same bug in combine https://travis-ci.org/Marwes/combine/jobs/261073809. Compiling on windows with nightly works fine as well.

@Marwes
Copy link
Contributor

Marwes commented Aug 5, 2017

Reproduction: gist playground

Seems like for some reason x86_64_unknown_linux_gnu segfaults(?) when trying to compile the trait implementation of an extremely nested type. Each tuple implementation in the gist expands a type as Or<A, Or<B, Or<C, ...>>> so it compiles if I implement it only for tuples of lesser arity or if I break up the implementation to have a be wider instead of so deep.

@Aceeri
Copy link
Author

Aceeri commented Aug 5, 2017

That might actually be the problem, I have a macro in mine that recurses ~128 times (each adding to a type with <T as Split>::Next as Split>::Next ...

@alexcrichton
Copy link
Member

@Marwes I believe your example here may be the same as #43613, where the segfault is just manifesting as OOM. I ran it locally and rustc ran way with half my memory before I killed it.

In that case I wonder if we can tag #43613 as a regression and close this?

@nikomatsakis
Copy link
Contributor

@TimNN @Mark-Simulacrum -- any chance either of you want to try and bisect the more narrow example that @Marwes gave here?

@nikomatsakis
Copy link
Contributor

triage: P-high -- seems like a valid regression we need to investigate.

@rust-highfive rust-highfive added the P-high High priority label Aug 17, 2017
@TimNN
Copy link
Contributor

TimNN commented Aug 17, 2017

Edit: This is for the reproduction, not the original issue. #43546 was merged after this was opened issue.

According to rust-bisect, this regressed from a successful compilation to an OOM (when given 500MB ram) with #43546.

@TimNN
Copy link
Contributor

TimNN commented Aug 17, 2017

I attempted to bisect the commit from #43126 (comment) (again with 500MB of ram, I couldn't repro on the latest nightly with full memory). The results were varied (sometimes a segmentation fault and sometimes an illegal instructions, so I'm not sure how helpful or correct the result will be).

rust-bisect (consistently) points to #40847, which however falls outside the failed / success range in the OP (the failing commit from the OP was no longer available).


Event though I couldn't repro on the latest nightly with 1GB ram, I ran rust-bisect again with 1GB ram and got #39409, which does fall into the range in the OP.

@alexcrichton
Copy link
Member

@TimNN thanks for work here! Just to see if I understand, is this an OOM? An infinite loop?

Do we think that there's a clear cause for this?

@TimNN
Copy link
Contributor

TimNN commented Aug 22, 2017

@alexcrichton: I did the bisection from my last comment again, this time with an 8 GB ram limit (which is much larger than the 4GB the OP's failing builds had on travis (if they ran on the container infrastructure)).

So I currently believe that some or all of #43546, #40847 and #39409 (and/or maybe others) caused an increase in ram requirements (which may have decreased again in the latest nightly).

@alexcrichton
Copy link
Member

@TimNN are you on Linux? We could try to test this out with massif and valgrind. If you've got the exact compilers and nightlies I can also try to run locally.

In general though you can just do:

valgrind --tool=massif --trace-children=yes rustc ...
ms_print massif.* | less

And then there should be a "pretty graph" at the top of the output

@alexcrichton alexcrichton modified the milestone: 1.20 Aug 23, 2017
@nikomatsakis
Copy link
Contributor

@pnkfelix will investigate whether #39409 is indeed at fault, as it it seems the most likely culprit from those 3.

@alexcrichton
Copy link
Member

1.20 is now stable

@alexcrichton alexcrichton added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Aug 28, 2017
@alexcrichton alexcrichton removed the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Aug 28, 2017
@nikomatsakis
Copy link
Contributor

Update: @pnkfelix mentioned to me that he was going to attempt to land a -Z flag that would let you disable EndRegion emission, so that we can test the impact more easily. I don't think this has happened yet.

@nikomatsakis
Copy link
Contributor

#44219 is the PR.

@pnkfelix
Copy link
Member

pnkfelix commented Sep 1, 2017

@alexcrichton BTW I had a really hard time getting useful data out of massif (it was claiming the peak memory usage was something like 12kb), until last night @eddyb told me that valgrind/massif do not behave well with jemalloc.

So after I hacked the compiler to use the system allocator (via the diff below), I got much more believable results from valgrind/massif.

  • (This leads me to wonder if we should provide an easier way for the user to opt into using the system allocator when running rustc... I don't know if making it a dynamic choice when running rustc will totally defeat the gains of providing jemalloc as an option in the first place, but I figure I should at least mention the idea...)
diff --git a/src/rustc/rustc.rs b/src/rustc/rustc.rs
index bfd01146d2..1c3d8b1da2 100644
--- a/src/rustc/rustc.rs
+++ b/src/rustc/rustc.rs
@@ -9,7 +9,9 @@
 // except according to those terms.

 #![feature(rustc_private)]
+#![feature(alloc_system)]

+extern crate alloc_system;
 extern crate rustc_driver;

 fn main() { rustc_driver::main() }

@nikomatsakis
Copy link
Contributor

Can we confirm if this is fixed (for now) by #44249?

@nikomatsakis
Copy link
Contributor

Going to assume this was resolved by #44249. Closing. Please let us know if you are still seeing problems, @Aceeri!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants