cmd/compile: feedback-guided optimization #28262
This is an umbrella issue for supporting feedback-guided optimization. This has been discussed offhand in a few other contexts (inlining decisions: #17566, code layout decisions: #20356, language changes: #24204, stack growth: #18138).
It is not clear what the design for FGO/PGO support might look like. In particular, what do the feedback/profile files look like?
@aclements observed during conversation at GopherCon 2018 that in order to preserve reproducible builds and efficiency, the feedback file needs to be hashed by cmd/go along with all other inputs, which suggests that it ought to be committed to version control.
The text was updated successfully, but these errors were encountered:
Oh, hey, I have a concrete example of a potentially-relevant case for this, which is a recurring thing that bites people with microbenchmarks, but may also affect real code:
In rare cases, trivial questions of code alignment can produce very large performance differences. I had a test case, at one point, where I had two functions which were getting roughly a 2x performance difference in benchmarking. (As in, one took twice as long as the other.) But the more I experimented, the more confused I became, because it turned out the functions were identical, and I could change which one was faster by reordering them in source.
The microbenchmark case probably doesn't matter, but there are real-world cases where a couple of likely-concurrent hot paths end up with some kind of problem which is entirely a function of the exact location of their code -- I'm guessing either "aligned or not aligned with cache line" or "in same cache-associativity group as other hot paths", but I don't have the expertise to dive into it far enough to be sure.
My guess is that there's some actual specialized code out there where inserting a handful of nops before some specific functions could produce a +/-20% performance change in the program as a whole. Figuring out which ones seems likely to be arbitrarily hard; you'd need some kind of framework for figuring out how to benchmark the functions, and how to benchmark different sets of functions in parallel, and so on.
This is the sort of thing which is a huge amount of work, and which doesn't provide any noticeable benefit until it does, and then it can be inordinately significant. (The big question is whether you end up perceiving it as "sometimes we get lucky and our code runs faster" or "sometimes we get unlucky and our code runs slower").