Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: optionally dump PGO profile at the program exit #62444

Open
zamazan4ik opened this issue Sep 4, 2023 · 7 comments
Open

proposal: runtime: optionally dump PGO profile at the program exit #62444

zamazan4ik opened this issue Sep 4, 2023 · 7 comments
Labels
Milestone

Comments

@zamazan4ik
Copy link

According to the official Go documentation about PGO, the only described way to collect a PGO profile is via https://pkg.go.dev/net/http/pprof and collect the profile from a dedicated HTTP handler.

This way has a major drawback - it's not convenient to collect profiles from non-service-like workloads. E.g. if I have a one-shot system - it is almost not possible to collect a good profile from the application.

As far as I understand, in such cases could be applied tricks from https://pkg.go.dev/runtime/pprof documentation with
manual CPU dumping. This way also has some drawbacks:

  • Right now it's not clear - what kind of profile should be collected for the PGO build. CpuProfile, Memprofile? I guess CpuProfile should be used
  • A user needs to manually integrate profile collection at exit - it could be an error-prone activity

I suggest an additional way to collect PGO profiles - add a specific "Instrumentation" build for the Go application. This would be very similar to Instrumentation PGO in other languages/compilers like Clang, GCC, Rustc. In the Go application, this specific kind of build will simply insert a code by dumping a PGO profile to a disk at the program exit (like it's done in Clang). This makes it much easier to collect PGO profiles for programs like https://github.com/dominikh/go-tools .

Also, an additional way with auto-dumping makes using PGO in Go easier for people who have PGO experience from other languages like C++ and Rust.

@gopherbot gopherbot added this to the Proposal milestone Sep 4, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Sep 4, 2023
@ianlancetaylor ianlancetaylor changed the title proposal: affected/package: optionally dump PGO profile at the program exit proposal: runtime: optionally dump PGO profile at the program exit Sep 4, 2023
@ianlancetaylor
Copy link
Member

CC @golang/runtime

@cherrymui
Copy link
Member

Thanks for the proposal.

Right now it's not clear - what kind of profile should be collected for the PGO build. CpuProfile, Memprofile? I guess CpuProfile should be used

Yes, for now, CPUProfile should be used.

In the proposal #55022, specifically the "future work" part of the design https://go.googlesource.com/proposal/+/master/design/55022-pgo.md?pli=1#profile-collection

If in the future we support more than CPU profiles, "As PGO profiles may be beyond just CPU profiles, we could also have a “collect a PGO profile” API, which enables a (possibly configurable) set of profiles to collect specifically for PGO." But we're not there for the now.

For collecting CPU profiles for a non-service-like program (e.g. CLI), usually this is done by pprof.StartCPUProfile(f) and defer pprof.StopCPUProfile() in the main function. This doesn't cover exit paths that are not from exiting main:

  • log.Fatal, panic, crash. These are abnormal cases of the program. It probably make sense not to collect profiles for those, as we don't want to optimize the program for those cases.
  • os.Exit especially os.Exit(0). This is probably a more valid case. We currently don't have a good way for that. Maybe a narrower proposal could be specifically adding support to this case.

Are you proposing to just automatically stop the profiling (and close the file), or also automatically start the profiling? For starting the profile, one would need to choose the file name and path. There doesn't seem to have an obvious default, so runtime would not be able to choose one automatically.

E.g. if I have a one-shot system - it is almost not possible to collect a good profile from the application.

This is an inherently difficult question. To collect a good profile, one needs to run the program with a set of representative workloads. For example, for the compiler, we collect profiles for all packages in the standard library and toolchain then merge it into a single file https://cs.opensource.google/go/go/+/master:src/cmd/compile/profile.sh The Go toolchain would not be able to decide what workloads are representative and helpful for profile collection in an automatic way.

I suggest an additional way to collect PGO profiles - add a specific "Instrumentation" build for the Go application.

Instrumentation-based profile is discussed in the proposal #55022, specifically the design doc https://go.googlesource.com/proposal/+/master/design/55022-pgo.md?pli=1#autofdo-vs_instrumentation_based-fdo
We chose not to add instrumentation-based profiles for now. If you want to add this support, it would be a separate proposal with more complete design and discussion. Thanks.

@zamazan4ik
Copy link
Author

Yes, for now, CPUProfile should be used.

In the proposal #55022, specifically the "future work" part of the design https://go.googlesource.com/proposal/+/master/design/55022-pgo.md?pli=1#profile-collection

If in the future we support more than CPU profiles, "As PGO profiles may be beyond just CPU profiles, we could also have a “collect a PGO profile” API, which enables a (possibly configurable) set of profiles to collect specifically for PGO." But we're not there for the now.

Thanks for the link! It also answers my question regarding the configurable profile rate :) Could we add the information regarding "If you decided to create PGO profiles manually with pprof, you need to use CPU profiles" somewhere to the official PGO documentation? Also, having a note about the temporal inability to configure rate limit is also neat information to know since tweaking the profile rate is a default technique to increase/decrease PGO overhead in the production (completely in the same way as it's done for Linux perf and its sample rate for AutoFDO case).

Are you proposing to just automatically stop the profiling (and close the file), or also automatically start the profiling?

I am proposing automatically start and stop profiling (like it's done in Clang, GCC and Rustc). Start at the beginning of the program, stop at the end.

For starting the profile, one would need to choose the file name and path. There doesn't seem to have an obvious default, so runtime would not be able to choose one automatically.

We can take a look at how it's implemented in other compilers like Clang - link. Clang uses default.profraw as a default filename for profiles. This filename is easily configurable via command-line options (-fprofile-instr-generate=code-%m.profraw) or in runtime via env var (LLVM_PROFILE_FILE="code-%m.profraw"). GCC and Rustc behave in a similar way. From my experience with applying PGO to multiple projects - at least on these projects I didn't have any name crossing with existing files in the project. But anyway - in Go we can choose other defaults.

This is an inherently difficult question. To collect a good profile, one needs to run the program with a set of representative workloads. For example, for the compiler, we collect profiles for all packages in the standard library and toolchain then merge it into a single file https://cs.opensource.google/go/go/+/master:src/cmd/compile/profile.sh The Go toolchain would not be able to decide what workloads are representative and helpful for profile collection in an automatic way.

Choosing the proper workload is another topic for discussion, yep - it's up to a project to find a good near real-life workload. Regarding one-shot systems, it's not a problem since Go tooling already has the ability to merge multiple profiles into one. So for the one-shot systems, we just run the program N times, collect N profiles, merge them into one, and use it during the PGO optimization. I did this multiple times, e.g. during optimizing clang-format, clang-tidy, and other similar projects.

Instrumentation-based profile is discussed in the proposal #55022, specifically the design doc https://go.googlesource.com/proposal/+/master/design/55022-pgo.md?pli=1#autofdo-vs_instrumentation_based-fdo
We chose not to add instrumentation-based profiles for now. If you want to add this support, it would be a separate proposal with more complete design and discussion. Thanks.

Once again - thanks for the link! I see I need to clarify my current proposal - I don't mean to implement the Instrumentation-based PGO. I believe the current AutoFDO approach is fine. I just want to have the ability to automatically start and stop profiling for programs without HTTP interfaces. So my ideal PGO workload for non-service systems looks like this:

  • Compile a Go program with auto-enabled PGO at the start (maybe like go build -pgo=autostart or smth like that)
  • Run my program N times on different workloads, get N PGO profiles
  • Merge N PGO profiles into one
  • Compile the program once again with the collected profiles

Right now to implement this scenario I need to manually tweak the program with pprof stuff like importing required packages, and writing a code for start and stop profiling. But I don't want to do it since the compiler can wonderfully implement it "automatically" via an additional build option. I hope my scenario and desire are clear enough ;)

@prattmic
Copy link
Member

prattmic commented Sep 5, 2023

Hi, thanks for filing this issue. It seems to me that there are two different parts here, feedback on the documentation, and a specific technical proposal. I will respond to both below:

According to the official Go documentation about PGO, the only described way to collect a PGO profile is via https://pkg.go.dev/net/http/pprof and collect the profile from a dedicated HTTP handler.

The main documentation for PGO profile collection is https://go.dev/doc/pgo#collecting-profiles. The first two sentences of this section are:

The Go compiler expects a CPU pprof profile as the input to PGO. Profiles generated by the Go runtime (such as from runtime/pprof and net/http/pprof) can be used directly as the compiler input.

I think that should be pretty clear that net/http/pprof is not the only way to collect profiles? I suspect your comment came out of the later note section about collecting profiles from a production system, which does focus only on net/http/pprof. Perhaps it would be worth adding a mention there about collection for batch systems. As you mention, this varies from application to application, but probably consists of running the system with a representative input N times, with profiling enabled for the entire run.

Right now it's not clear - what kind of profile should be collected for the PGO build. CpuProfile, Memprofile? I guess CpuProfile should be used

Per above, the documentation does specify "CPU pprof profile" in the first sentence of the collection section. Perhaps we should make this more clear by specifically citing pprof.StartCPUProfile.

Regarding the automatic profile collection proposal, if I understand correctly, what you are proposing is essentially a build flag that inserts the example at https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program to the start of the program (except for the memprofile part, and perhaps name selection is automatic)?

Personally, I'm a bit skeptical of the usefulness as a dedicated build option. Profiling in Go is meant to be easy to integrate into applications. e.g., the CPU profiling portion of https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program is only 9 lines of code. Given this ease, I'm not convinced we need an entire build mode to do this for folks. I think it may be better to invest energy in improving documentation to make sure it is clear how to do this.

I can see some minor advantages. e.g., applications that add CPU profiling support often end up with slightly different flags/environment variables/etc to enable profiling, and more consistency would be nice. But in that vain, rather than a PGO-specific option, I am tempted to consider a more universal approach. e.g., every Go binary will automatically enable profiling if GOCPUPROFILE=filename is set.

@zamazan4ik
Copy link
Author

Per above, the documentation does specify "CPU pprof profile" in the first sentence of the collection section. Perhaps we should make this more clear by specifically citing pprof.StartCPUProfile

Yes, it would be great. However, I recommend you insert an example (probably a bit reworked) from the article into the official documentation. In this case, users who want to dump the CPU profiles manually will know how to do it from the official documentation (and all PGO-related information will be written in one place).

Regarding the automatic profile collection proposal, if I understand correctly, what you are proposing is essentially a build flag that inserts the example at https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program to the start of the program (except for the memprofile part, and perhaps name selection is automatic)?

Right. And inserts autodumping the profile information into the disk at the end of the program. Name selection could be automatic or controlled via environment variables.

Personally, I'm a bit skeptical of the usefulness as a dedicated build option. Profiling in Go is meant to be easy to integrate into applications. e.g., the CPU profiling portion of https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program is only 9 lines of code. Given this ease, I'm not convinced we need an entire build mode to do this for folks. I think it may be better to invest energy in improving documentation to make sure it is clear how to do this.

Well, placing these 9 lines properly is not an easy task, if you are trying to optimize many different Go projects :) But even so, let's imagine I want to prepare a PGO profile for a program. I place these lines into the code, collect profiles, and recompile the project. And I need to remove these lines (or disable them somehow), because:

  • I do not plan to release my binaries with enabled instrumentation
  • I do not need unused dependencies in my production builds
    So every time I want to collect (collect initially and regular updates for them later) I need to add and remove this profiling-specific code. Compare this to a compiler option, where I need just recompile the same sources, collect profiles, and perform PGO optimization without changing my sources.

Also, there is an interesting case for PGO - when PGO is enabled not by direct developers of the program but by the maintainers (here I mean Linux distros maintainers). For them is much easier to enable an additional compiler option compared to tweaking sources, collecting profiles, and then reverting the PGO-oriented changes. This problem cannot be easily mitigated by committing profiles directly into the repo in the upstream since not all maintainers "trust" the profiles provided by an upstream (due to different concerns, including security concerns).

I can see some minor advantages. e.g., applications that add CPU profiling support often end up with slightly different flags/environment variables/etc to enable profiling, and more consistency would be nice. But in that vain, rather than a PGO-specific option, I am tempted to consider a more universal approach. e.g., every Go binary will automatically enable profiling if GOCPUPROFILE=filename is set.

I do not like the idea that every binary by default includes profiling-related code inside (it can lead to a slight binary size increase, etc.) but probably I am too strict in this area since my main domain is C++/Rust. If I am able to simply enable PGO profile creation with GOCPUPROFILE=filename and I will get the profile in this filename automatically after the program is closed - my use-case will be fully covered.

@myaaaaaaaaa
Copy link

myaaaaaaaaa commented Sep 6, 2023

Would go test suit your use case?

// main_pgo_test.go
package main

import "testing"

func BenchmarkMain(b *testing.B) {
	for i := 0; i < b.N; i++ {
		main()
	}
}
go test -bench BenchmarkMain -cpuprofile default.pgo

I'll say that I would like to see this supported as an official go test mode. This would enable it to automatically run benchmarks before and after PGO optimizations are applied, and produce a report on the performance deltas.

@zamazan4ik
Copy link
Author

Would go test suit your use case?

Not sure. All I want to do is run a Go binary on multiple sample workload scenarios. Your way means that I need to implement the benchmark manually. If we want to change Go program sources, adding pprof support would be a better idea, I guess.

If a program already has some benchmarks and we want to collect PGO profiles from running the benchmarks (that's not always the best idea but anyway) - yes, it will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants