Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: feedback-guided optimization #28262

Open
josharian opened this issue Oct 17, 2018 · 15 comments
Open

cmd/compile: feedback-guided optimization #28262

josharian opened this issue Oct 17, 2018 · 15 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Oct 17, 2018

This is an umbrella issue for supporting feedback-guided optimization. This has been discussed offhand in a few other contexts (inlining decisions: #17566, code layout decisions: #20356, language changes: #24204, stack growth: #18138).

It is not clear what the design for FGO/PGO support might look like. In particular, what do the feedback/profile files look like?

@aclements observed during conversation at GopherCon 2018 that in order to preserve reproducible builds and efficiency, the feedback file needs to be hashed by cmd/go along with all other inputs, which suggests that it ought to be committed to version control.

cc @randall77 @ianlancetaylor @aclements @crawshaw @bradfitz @CAFxX (and the list could go on, but I'm sure everyone will find it)

@josharian josharian added this to the Unplanned milestone Oct 17, 2018
@seebs
Copy link
Contributor

@seebs seebs commented Aug 29, 2019

Oh, hey, I have a concrete example of a potentially-relevant case for this, which is a recurring thing that bites people with microbenchmarks, but may also affect real code:

In rare cases, trivial questions of code alignment can produce very large performance differences. I had a test case, at one point, where I had two functions which were getting roughly a 2x performance difference in benchmarking. (As in, one took twice as long as the other.) But the more I experimented, the more confused I became, because it turned out the functions were identical, and I could change which one was faster by reordering them in source.

The microbenchmark case probably doesn't matter, but there are real-world cases where a couple of likely-concurrent hot paths end up with some kind of problem which is entirely a function of the exact location of their code -- I'm guessing either "aligned or not aligned with cache line" or "in same cache-associativity group as other hot paths", but I don't have the expertise to dive into it far enough to be sure.

My guess is that there's some actual specialized code out there where inserting a handful of nops before some specific functions could produce a +/-20% performance change in the program as a whole. Figuring out which ones seems likely to be arbitrarily hard; you'd need some kind of framework for figuring out how to benchmark the functions, and how to benchmark different sets of functions in parallel, and so on.

This is the sort of thing which is a huge amount of work, and which doesn't provide any noticeable benefit until it does, and then it can be inordinately significant. (The big question is whether you end up perceiving it as "sometimes we get lucky and our code runs faster" or "sometimes we get unlucky and our code runs slower").

Loading

@dangit815
Copy link

@dangit815 dangit815 commented Oct 18, 2021

More than two years have passed since the last discussion, is there any progress or plan on FGO recently?

Loading

@aclements
Copy link
Member

@aclements aclements commented Oct 18, 2021

I hadn't remembered we even had an open issue for this!

@cherrymui and @prattmic are actively doing prototype work right now to help us understand the potential wins from PGO/FGO/FDO and how to best integrate it into the build process. It's early stages, but we're tentatively hoping to have initial PGO optimizations available in 1.19.

Loading

@josharian
Copy link
Contributor Author

@josharian josharian commented Oct 18, 2021

Here's a related idea I've been kicking around.

Instead of doing running program -> profile -> compiler, add an extra intermediate step: running program -> profile -> optimization file -> compiler. The optimization file consists of optimization information and directives to the compiler.

There are a few advantages to this:

  • the compiler can stay simpler
  • being able to inspect and edit the optimizations file directly (or via a tool) will make it easier to understand what is happening, bisect, test, and minimize issues
  • it opens the door for other mechanisms/tooling to add information and optimizations, such as: offline static analysis, offline human analysis, bespoke profiling techniques, or combining profiles taken across a fleet of machines

Loading

@josharian
Copy link
Contributor Author

@josharian josharian commented Oct 18, 2021

@cherrymui and @prattmic are actively doing prototype work right now to help us understand the potential wins from PGO/FGO/FDO and how to best integrate it into the build process

Though I am delighted to hear this, I feel compelled to say yet again: I wish we had a more predictable, repeatable mechanism to learn things like this than someone happening to ping a relevant issue.

Loading

@thepudds
Copy link

@thepudds thepudds commented Oct 18, 2021

Though I am delighted to hear this, I feel compelled to say yet again: I wish we had a more predictable, repeatable mechanism to learn things like this than someone happening to ping a relevant issue.

Is this the type of thing that could get covered in the “Compiler and Runtime Meeting Notes“ tracking issue #43930 ?

That did seem useful, although I know it takes some work.

Loading

@prattmic
Copy link
Member

@prattmic prattmic commented Oct 18, 2021

Instead of doing running program -> profile -> compiler, add an extra intermediate step: running program -> profile -> optimization file -> compiler. The optimization file consists of optimization information and directives to the compiler.

I have received similar feedback in discussing this with one user: that they would like a way to manually specify optimizations that a PGO profile might indicate if capturing actual profiles is not possible/representative or via, as you say "offline human analysis". They specifically mentioned //go: directives, but I think a separate edit-able file would work as well.

Loading

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Nov 5, 2021

Is this the type of thing that could get covered in the “Compiler and Runtime Meeting Notes“ tracking issue #43930 ?

That did seem useful, although I know it takes some work.

CC @josharian

Yeah. I haven't been maintaining that of late. My apologies. (It's kinda fallen on the pile of "stuff I should get to".) I'll try to do better. :)

Having said that it's unlikely that PGO would have made that tracking issue. PGO started in experimental form, and we were only trying to see if it was worth pursuing. Our experimentation and the results from some external groups lead us to believe PGO might make the 2022 roadmap, but we're still trying to figure out what form it will take.

Loading

@cristaloleg
Copy link

@cristaloleg cristaloleg commented Nov 8, 2021

Regarding PGO in the latest .NET 6 release https://devblogs.microsoft.com/dotnet/announcing-net-6/#dynamic-pgo and https://devblogs.microsoft.com/dotnet/announcing-net-6/#dynamic-pgo (UPD: there is a 2nd Dynamic PGO paragraph closer to the end of the article)

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 22, 2021

To echo #28262 (comment), #49688 came as a surprise to seemingly everyone except the compiler team. If #43930 isn't the right medium to keep everyone else in the loop, perhaps consider trying something else. My gut feeling is that those public weekly notes are a good starting point, and I'd like to see them resume - especially if they included a very brief summary of the main projects currently in progress.

To hopefully give a bit of perspective: the lack of transparency into what is being actively considered or experimented on discourages external contributions. Using this thread as an example, it seems like Josh has been thinking about this idea for a while, and I'd bet he would contribute towards the design, testing, and perhaps even coding/reviews. That kind of external involvement simply isn't going to happen if there isn't a public way to be kept in the loop, though.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 22, 2021

@mvdan I'm sorry it seems that there is a lack of transparency by the Go team. The compiler team at Google hasn't done any work on PGO that is not public. The code in #49688 was developed entirely independently at Uber. The compiler team only became aware of it relatively recently when Uber reached out privately. The public #49688 is a step toward transparency and keeping everybody in the loop.

It would have been better to make a public announcement about it first. Still, we would now be in the same position anyhow. The communication on this is a lesson to bear in mind the next time there is a substantial privately developed contribution. Sorry for the confusion and angst.

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 23, 2021

Perhaps I'm not seeing the full picture - in #28262 (comment) it was described that some prototype work was already taking place, and in #49688 (comment) it seems like the design and coordination with Uber to upstream their work is currently underway.

These updates are certainly welcome, and I'm really excited to see some progress in this space. However, I'd like to clarify that I don't think we'd have arrived at the same position anyway. To once again take Josh as an example, he's only learning about developments around this proposal via comments that feel like after-thoughts, leaving him little room to participate or even be kept in the loop with some detail. Which is unfortunate, given he's the author of the proposal and some of its design ideas, and has been one of the most reliable compiler contributors over the years :)

Maybe the way the compiler team functions is indeed to work on this issue privately, and then come out publicly with a proposal document and possibly even a prototype implementation. I'm trying to argue, hopefully in a constructive way, that we should aim to be more transparent and welcoming to external contributors. A good first step would be to post regular and early updates somewhere like #43930, making it easy for others to be kept up to date in a timely manner. Ideally some of the team's communication channels would be public too, much like golang-tools has been doing for years, but at least weekly written updates are an easier incremental step.

Loading

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Nov 23, 2021

@mvdan I don't think you're missing the picture – there are a few details that could help fill you in, but most of it is all current events.

A bit of background:
The Go Compiler and Runtime team is split between two locations, Cambridge MA (with a small contingent in NYC), and Sunnyvale. The majority of the generics work has been handled in Sunnyvale, and while Go focused on 1.18, Cambridge decided to minimize instability in the compiler and runtime. So, we decided to do some experimentation. While lots of times work is internally motivated by things Google would like to see from Go (like the register ABI in 1.16/1.17/1.18), Google's happy if we just focus on generics right now. CAM decided to spend a small amount of resources to characterize what PGO would give us, but we didn't know if we would go further than experimentation. (and we still don't)

The present day:
About when Cherry started to get numbers from PGO inlining, Uber reached out about a PGO implementation they have. They have been thinking about it for longer than we have, and implemented more than we have with our recent experiments. Uber has expressed that they want to upstream their work, but we don't know it's the right thing for the Go ecosystem as a whole, so we need to take a look at it. I will be honest and say that there are parts of it weren't not sure are a good fit for cloud Go users and embedded Go users, but at first blush lots of decisions they've made sound good. We still need to see what they've got before we make any decisions. I doubt their work will be upstreamed without modification.

That is where we are. Uber wants us to consider their inlining implementation, and we need to take a look at it.

The feedback that the compiler and runtime team could do a better job is correct, but in this instance, I feel like we communicated when we knew things. The PGO work we undertook in 2021H2 was truly an experiment. We do experimentation all the time, and when we can share results (ie, it isn't based on internal Google data), we do.

I hope this explanation helps, and I'm happy to chat at any time.

Loading

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 28, 2021

Thanks, Jeremy - I appreciate the extra updates and context on PGO in this thread, as well as all the work that is happening here.

The Go Compiler and Runtime team

Just so we're clear, this implies only Google, correct? I think that's what I'm trying to get at. People like Josh certainly deserve being considered part of the compiler team, but are repeatedly kept out of relevant threads and video calls simply because they are not employed by Google :)

I admit this is getting off-topic, though, so this will be my last comment on the topic here. I've already outlined how I think you could improve transparency and encourage more significant contributions from non-Googlers, and I'd rather not repeat myself as I don't have much else to add.

I hope it's clear this is not an anti-Google sentiment in any way - but rather, how other companies (like Josh's Tailscale or this thread's Uber) would likely participate more regularly and actively in those teams if they were given an honest chance.

Loading

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Nov 29, 2021

Just so we're clear, this implies only Google, correct? I think that's what I'm trying to get at. People like Josh certainly deserve being considered part of the compiler team, but are repeatedly kept out of relevant threads and video calls simply because they are not employed by Google :)

Sorry, yes. That's how we split things up within google. These are the people within Google who think about the C&RT, don't mean to sound like we're alone in this fight. :)

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet