Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upProfile Guided Optimization (PGO) and LLVM Superoptimizer #1220
Comments
6D65
changed the title
Profile Guided Optimization (PGO) and LLVM SuperOptimizer
Profile Guided Optimization (PGO) and LLVM Superoptimizer
Jul 21, 2015
This comment has been minimized.
This comment has been minimized.
|
Some slides about PGO in LLVM (from 2013): http://llvm.org/devmtg/2013-11/slides/Carruth-PGO.pdf |
This comment has been minimized.
This comment has been minimized.
|
A blog post describing a POC: https://unhandledexpression.com/2016/04/14/using-llvm-pgo-in-rust/. |
This comment has been minimized.
This comment has been minimized.
6D65
commented
Apr 16, 2016
|
@Hywan thank you. This looks like a good first step. 15% performance for the best case scenario, sounds great especially since it's free. I'm not sure what to do with this issue. It's possible to do outside the rustc, using the llvm toolchain. I wonder if this would work as a cargo extension. Running the benchmark tests, and optimize the code around those code paths. Though, not sure how the benchmarks reflect real life usage hot paths. |
This comment has been minimized.
This comment has been minimized.
|
ping @Geal, he is the author of the blog post and he could provide relevant answers. |
This comment has been minimized.
This comment has been minimized.
Geal
commented
Apr 17, 2016
|
There are three ways this can happen practically:
There's also the issue that I mentioned in my post: that test only applies on one crate with no dependencies except libstd. What happens when you have multiple libraries as dependencies? Do you generate the profiling data for all of them? It would be nice to optimize the end program and the dependencies based on usage data of the end program. As a first step, having it in rustc as a |
This comment has been minimized.
This comment has been minimized.
keean
commented
May 5, 2016
|
I do a lot of PGO in C++ mainly using GCC. I tend to have a specific function to run the code kernels that need heavy optimisation many times with sample data. I would suggest something like the #[test] pragma used in testing. So I have some functions marked with #[profile]. Then when I compile it should clear the profile data, build once with instrumentation, run the marked 'profile' functions to generate the profile data, and then build a second time without instrumentation using the generated profile data. I would suggest this is all handled by cargo, perhaps with an optimisation level, so the level 4 optimisation does this whole profile build process when you run "cargo build". |
kernelmachine
referenced this issue
May 15, 2016
Closed
Will there (or can there) also be profile-guided optimisation? #10
nrc
added
T-dev-tools
T-compiler
labels
Aug 25, 2016
This comment has been minimized.
This comment has been minimized.
valarauca
commented
Oct 15, 2016
•
|
Instead of using an extension to Cargo.toml wouldn't it be a more eloquent solution to use something like |
This comment has been minimized.
This comment has been minimized.
Permutatrix
commented
Nov 13, 2016
•
|
@valarauca I think it's safe to say that a run through the benchmarks doesn't generally represent a typical use of the code in the real world. For instance, if you have a small function that's called in a tight loop within a larger function, in many cases it's reasonable to benchmark the former and not the latter. But if you give the optimizer a profile based on those benchmarks, it won't see how often the larger function calls the smaller one, so it might not inline that call—making the loop slower than it was without the profile! Benchmarks also tend to follow the exact same code path over and over, which seems rather nonsensical for profiling purposes, since you don't learn anything new after the first iteration. I agree with you on the first part, though. I like @keean's suggestion a lot. I've never actually used PGO before, but it seems like maybe a |
This comment has been minimized.
This comment has been minimized.
scottlamb
commented
Nov 14, 2016
|
fwiw, I'm looking forward to this feature, and a fwiw, my setup at work is the second option @Geal described: use pre-generated profiling data. In particular, I use AutoFDO. A fancy pipeline gathers stats from my binary as it serves production traffic and saves a profile to version control. Subsequent builds use the latest available profile. For my servers, this is about a 15% performance improvement. (The paper says, more generally, that AutoFDO yields "improvements commonly in the 10-15% range and sometimes over 30%".) So pre-generated profiles are absolutely valuable to support; autofdo is great when you have a way to instrument your real binary under real, consistent load. But I think it'd be hard to get that real, consistent load for mobile/desktop apps. And autofdo requires Intel-specific hardware counters (on bare metal—they don't seem to work in VMware Fusion). It'd be a pain for the personal Rust project I'm working on now; I'd rather just write a |
This comment has been minimized.
This comment has been minimized.
valarauca
commented
Nov 14, 2016
Fair statement.
I understand that There would still be some branch hinting optimizations out of the case you outlined. So there is still some, largely trivial gains albeit much like sparse Lastly if done correct |
This comment has been minimized.
This comment has been minimized.
Vurich
commented
Jul 10, 2017
•
|
This would be great for parity, since we have one major path that we care very highly about (the import speed) and that we profile for. We could just run it over and over again for the sake of PGO. There's probably a bunch of cold paths that aren't marked as such in the bitcode. I think as far as design goes, I would have a |
the8472
referenced this issue
Dec 28, 2017
Open
LLVM's pass ordering chokes on zero-cost abstractions. #44041
This comment has been minimized.
This comment has been minimized.
emilio
commented
Feb 12, 2018
|
It'd be also great for Firefox. We had to do a lot of manual tweaking to querySelector to be comparable to the C++ version, that could presumably get some cleanup if we get PGO. |
This comment has been minimized.
This comment has been minimized.
emilio
commented
Feb 19, 2018
|
I hacked on this this weekend, and I think I got something to work, will post a WIP PR for feedback soon :) |
This comment has been minimized.
This comment has been minimized.
emilio
commented
Feb 19, 2018
|
(Also note that I didn't do the cargo integration on top of it, I only did the bits so that profile usage and generation can go through rustc instead of |
This comment has been minimized.
This comment has been minimized.
lu-zero
commented
Oct 15, 2018
|
@emilio did you forget to update this issue? |
This comment has been minimized.
This comment has been minimized.
robsmith11
commented
Dec 30, 2018
|
What is the status of PGO with rustc/cargo? I tried building with |
This comment has been minimized.
This comment has been minimized.
remexre
commented
Dec 30, 2018
•
|
It works for me; I'm using these commands:
https://git.remexre.xyz/remexre/csci5607-final/src/branch/master/Justfile#L41-L48
|
This comment has been minimized.
This comment has been minimized.
robsmith11
commented
Jan 2, 2019
|
Thanks. It seems the problem I was having was specific to my use of external symbols. I've opened a new issue for it: rust-lang/rust#57258 |
6D65 commentedJul 21, 2015
Hi,
I'm wondering if it's possible todo PGO with rustc. I was searching and haven't really found something concrete aside from a few messages in mailing list.
I guess it should be possible(if added to the compiler), as it's supported by LLVM(also it looks like Google might be interested in improving this, http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082744.html).
Also, i've heard the work on Superoptimer for LLVM, and i'm curious about what result will it give when run against code generated by rustc.
All of these seem low hanging performance fruit, though enabling them might mean a lot of work.
Thanks