Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release nightly compilers with ability to internally parallelize #59667

Closed
alexcrichton opened this issue Apr 3, 2019 · 4 comments
Closed

Release nightly compilers with ability to internally parallelize #59667

alexcrichton opened this issue Apr 3, 2019 · 4 comments
Labels
A-parallel-queries Area: Parallel query execution T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Apr 3, 2019

This is intended to be a tracking issue to releasing nightly compilers with the ability to internally parallelize themselves but they are still defaulted to single threaded mode. This is part of the larger parallel compiler tracking issue, and is intended to be an incremental step towards fully closing that out.

A recent attempt to build binaires of the parallel compiler led to the thought of whether we could just enable a parallel compiler by default. Note that there are two axes we can change here over time:

  • Whether or nor the compiler can be parallelized at all, aka whether it's built with the --cfg parallel_compiler flag.
  • Whether or not the compiler by default is parallelized, aka the default value of -Z threads

The proposal in this issue is to default to -Z threads=1 (or the moral equivalent) but build nightly compilers with --cfg parallel_compiler (or the equivalent thereof). The intention is to get us closer to shipping a parallel compiler while buying us time to continue to fix any issues that arise. This would allow, for example, for users to very easily test out parallel compilation locally by using RUSTFLAGS=-Zthreads=16.

The main blocker for doing this is performance. Requested in a recent thread we realized it's imported to not watch the comparison of instruction counts but rather instead watch the wall time numbers. The instruction count numbers regress 2-3% which looks deceptively good, but the wall-time numbers regress 10-20% (ish) which is much more serious.

Some further investigation shows that most of the slowdown is likely coming from the use of mutexes (as opposed to other avenues like removing parallel code, the overhead of using rayon, or using Arc instead of Rc).

The next steps here would be to investigate whether we can recover the performance lost from using mutexes (probably if we can remove the mutexes one way or another).

This issue will likely receive many updates over time!

@alexcrichton alexcrichton added A-parallel-queries Area: Parallel query execution T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels Apr 3, 2019
@alexcrichton
Copy link
Member Author

@Zoxc do you know off the top of your head what some hot mutexes might be? Some local profiling of a compiler from #59644 and the commit just before is not very illuminating, while everything does get a bit slower it's hard to see where it's getting slower.

It does look like get_query (presumably this lock?) is pretty hot, but that also seems somewhat fundamental

@HadrienG2
Copy link

HadrienG2 commented Apr 3, 2019

You may want to try this lock contention profiling tool and see if it works for you: http://0pointer.de/blog/projects/mutrace.html .

@Aaron1011
Copy link
Member

Aaron1011 commented Jul 5, 2019

As a temporary workaround, we could try doing something similar to the fragile crate. At runtime, we would inspect -Z threads:

  1. If -Z threads > 1, we use a normal Mutex.
  2. If -Z threads = 1, we use a 'fake' mutex - a type which implements Send/Sync, but panics if used on any thread other than the one which created it. Since only one thread should ever be accessing these Mutexes, the panic should never actually occur.

Hopefully, the overhead of these runtime checks would be much less than the overhead of a full Mutex type. This would hopefully allow a parallelizable compiler to be shipped, while at the same time we continue to work in improving single-thread performance (with actual Mutexes).

@Alexendoo
Copy link
Member

Implemented in #117435

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parallel-queries Area: Parallel query execution T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests

4 participants