Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent 1000% regression in llvm phase of bootstrap #34831

Closed
nrc opened this Issue Jul 15, 2016 · 29 comments

Comments

Projects
None yet
@nrc
Copy link
Member

nrc commented Jul 15, 2016

See perf

Given that there has been no shouting, this could be due to something which only affects the perf server. Should still investigate.

cc @rust-lang/compiler

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 17, 2016

Given #34891 (comment) #33890 may be the cause,

bors added a commit that referenced this issue Jul 18, 2016

Auto merge of #34899 - michaelwoerister:always_internalize_symbols, r…
…=eddyb

Run base::internalize_symbols() even for single-codegen-unit crates.

The initial linkage-assignment (especially for closures) is a conservative one that makes some symbols more visible than they need to be. While this is not a correctness problem, it does force the LLVM inliner to be more conservative too, which results in poor performance. Once translation is based solely on MIR, it will be easier to also make the initial linkage assignment a better fitting one. Until then `internalize_symbols()` does a good job of preventing most performance regressions.

This should solve the regressions reported in #34891 and maybe also those in #34831.

As a side-effect, this will also solve most of the problematic cases described in #34793. Not reliably so, however. For that, we still need a solution like the one implement in #34830.

cc @rust-lang/compiler
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 21, 2016

triage: P-high

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 23, 2016

#34917 (might be a fix) landed 19 hours ago only.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 25, 2016

Copying over regression/I-slow tags from #34998, which I"m closing in favor of this.

@nikomatsakis as a P-high bug, do you know if someone would be suitable to assign to this?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 25, 2016

We still haven’t any measurements with nightly containing the PR I linked above. 2016-07-13 seems to be the last measurement done. @nrc is there anyway a manual run could be invoked?

@pmarcelll

This comment has been minimized.

Copy link
Contributor

pmarcelll commented Jul 25, 2016

@nagisa I wanted to benchmark my small BF interpreter (it slowed down a bit after the 2016-06-24 nightly) but unfortunately there's no released nightly containing that PR, the bots are failing due to #35002. @alexcrichton is there a way to temporarily increase the timeout length on the bots? It would be a short-term solution and we would have a new nightly (hopefully).

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 25, 2016

@pmarcelll yeah I'm trying to get a nightly out, not sure what's all been going on :(

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 25, 2016

It would be a huge mistake to release another version of Rust without fixing this.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Jul 25, 2016

I could not see any regression with -Z orbit.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Jul 25, 2016

If the compile time regression is indeed related to the collector-driven trans changes, then my guess would be that the increased amount of IR pushes LLVM over the amount of memory fitting into the RAM of the perf.rust-lang machine or something similar. I don't think I've run into such a pronounced increase of bootstrapping time when building locally.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 25, 2016

Has anyone tried to reproduce the regr? That is, comparing:

  • just before
  • just after
  • trunk
@scottcarr

This comment has been minimized.

Copy link
Contributor

scottcarr commented Jul 25, 2016

I don't reproduce the regression. I tried building hyper.0.5.0 which according to perf.rust-lang.org has a 350% slow down.

Without -Z orbit I get:

nightly-2016-06-23: 31.41s
nightly-2016-07-07: 31.61s
master (as of 2016-07-25): 29.40s

With -Z orbit I get:

nightly-2016-06-23: 33.40s
nightly-2016-07-07: 32.52s
master (as of 2016-07-25): 28.89s

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 27, 2016

@nrc despite us having a 07-25 nightly build, the last perf run still seems to be at 07-13. Could one be triggered manually or something?

@nrc

This comment has been minimized.

Copy link
Member Author

nrc commented Jul 27, 2016

@nagisa frontend server needs a kick, then that should be updated. I've been fiddling with the backend (trying to get -Zorbit results) and so that is offline atm, should be fixed soon though.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 28, 2016

Just FYI I think @edunham kicked the front end server.

On Jul 27, 2016 19:02, "Nick Cameron" notifications@github.com wrote:

@nagisa https://github.com/nagisa frontend server needs a kick, then
that should be updated. I've been fiddling with the backend (trying to get
-Zorbit results) and so that is offline atm, should be fixed soon though.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34831 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJeZvgvVVzlTdYsyaNqDeWzgVskPpKTks5qZ-N5gaJpZM4JNEPW
.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 28, 2016

So there is a new measurement now (week of July 23) and the 1000% drop does not seem to be improved. I was wondering about @michaelwoerister's hypothesis that memory usage might be to blame, so I loaded up this graph charting the memory usage. Indeed there is a huge spike on July 1 -- note that the perf drop was measured on July 2nd. Seems like a potential link!

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 28, 2016

huge spike

Note that the graph axis are not numbered from 0, which is why the spike looks so big. The increase is a bit less than 5%, which is to be expected since we started book-keeping around a big amount of stuff.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 28, 2016

Note that the graph axis are not numbered from 0, which is why the spike looks so big. The increase is a bit less than 5%, which is to be expected since we started book-keeping around a big amount of stuff.

Ha! Thanks. Oldest trick in the book, and I fell for it. ;)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 28, 2016

OK, so, I just tried bootstrapping (make rustc-stage1 && time make all) both before/after the regresson locally and I did not find much of a difference. Before 33m, after 35m.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 28, 2016

(Note that I was doing other things etc, but I figured a 1000% regression would be visible. :)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 29, 2016

@nrc I wonder if it would make sense to re-run the benchmark results on the perf machine? I guess we've seen them repeat numerous times, but it seems like a phenomena that is just not being witnessed from other setups?

@nrc

This comment has been minimized.

Copy link
Member Author

nrc commented Jul 31, 2016

I've run the bootstrap on the perf machine a few times and it does really seem to take that long. I currently have an old commit running to see if something changed in the environment. Next step is to try a current commit but without the scripts in case something in the scripts is making it take super long.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 31, 2016

@nrc How many hours is "that long"?

@nrc

This comment has been minimized.

Copy link
Member Author

nrc commented Aug 1, 2016

@eddyb 48?

@Aatch

This comment has been minimized.

Copy link
Contributor

Aatch commented Aug 1, 2016

@nrc sounds like it might be specific to the perf machine then. If full builds were regularly taking even a tenth of that to run, we'd know about it elsewhere.

@nrc

This comment has been minimized.

Copy link
Member Author

nrc commented Aug 1, 2016

@Aatch to be clear, that is a full run of the perf measurements, which is multiple bootstraps + 6 stage builds + 10 of each benchmark

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Aug 1, 2016

On IRC @nrc pointed out that building just stage2 jumped from 13m to 2h30m, so ~12x.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 4, 2016

It was pointed out on IRC that this was happening maybe because LLVM was being compiled in debug mode. With that in mind I now remember running into this problem on #34559 where the Android builder was randomly switching LLVM to debug mode, and the fix was to update CMake from 2.8 to something else.

As of a few days ago the in-tree LLVM requires CMake 3.4 to compile, so future contributors shouldn't run into this, but @nrc do you know what the cmake version is on the perf bot?

@nrc

This comment has been minimized.

Copy link
Member Author

nrc commented Aug 5, 2016

\o/ This was fixed with a complete nuke of the perf server and a cmake upgrade.

@nrc nrc closed this Aug 5, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.