-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release nightly compilers with ability to internally parallelize #59667
Comments
@Zoxc do you know off the top of your head what some hot mutexes might be? Some local profiling of a compiler from #59644 and the commit just before is not very illuminating, while everything does get a bit slower it's hard to see where it's getting slower. It does look like |
You may want to try this lock contention profiling tool and see if it works for you: http://0pointer.de/blog/projects/mutrace.html . |
As a temporary workaround, we could try doing something similar to the fragile crate. At runtime, we would inspect
Hopefully, the overhead of these runtime checks would be much less than the overhead of a full |
Implemented in #117435 |
This is intended to be a tracking issue to releasing nightly compilers with the ability to internally parallelize themselves but they are still defaulted to single threaded mode. This is part of the larger parallel compiler tracking issue, and is intended to be an incremental step towards fully closing that out.
A recent attempt to build binaires of the parallel compiler led to the thought of whether we could just enable a parallel compiler by default. Note that there are two axes we can change here over time:
--cfg parallel_compiler
flag.-Z threads
The proposal in this issue is to default to
-Z threads=1
(or the moral equivalent) but build nightly compilers with--cfg parallel_compiler
(or the equivalent thereof). The intention is to get us closer to shipping a parallel compiler while buying us time to continue to fix any issues that arise. This would allow, for example, for users to very easily test out parallel compilation locally by usingRUSTFLAGS=-Zthreads=16
.The main blocker for doing this is performance. Requested in a recent thread we realized it's imported to not watch the comparison of instruction counts but rather instead watch the wall time numbers. The instruction count numbers regress 2-3% which looks deceptively good, but the wall-time numbers regress 10-20% (ish) which is much more serious.
Some further investigation shows that most of the slowdown is likely coming from the use of mutexes (as opposed to other avenues like removing parallel code, the overhead of using rayon, or using
Arc
instead ofRc
).The next steps here would be to investigate whether we can recover the performance lost from using mutexes (probably if we can remove the mutexes one way or another).
This issue will likely receive many updates over time!
The text was updated successfully, but these errors were encountered: