Skip to content

cmd/compile: design and implementation of Profile-Guided Optimization (PGO) #55025

Closed
@jinlin-bayarea

Description

@jinlin-bayarea

This proposal provides the detailed design and implementation of Profile-Guided Optimization (PGO) in the Go compiler. It augments the design and implementation aspects of PGO with the high-level issue related to PGO.

Background: Inefficiencies in Go programs can be isolated via profiling tools such as pprof and linux profiler perf. Such tools can pinpoint source code regions where most of the execution time is spent. Unlike other optimizing compilers such as LLVM, the Go compiler does not yet perform Profile-Guided Optimization(PGO). PGO uses information about the code’s runtime behavior to guide compiler optimizations such as inlining, code layout etc. PGO can improve application performance in the range 15-30% [LLVM, AutoFDO]. In this proposal, we extend the Go compiler with PGO.

In this proposal, we incorporate the profiles into the frontend of the compiler to build a call graph with node & edge weights (called WeightedCallGraph). The Inliner subsequently uses the WeightedCallGraph to perform profile-guided inlining which aggressively inlines hot functions. We introduce a profile-guided code specialization pass that is tightly integrated with the Inliner and eliminates indirect method call overheads in hot code paths. Furthermore, we annotate IR instructions with their associated profile weights and propagate these to the SSA-level in order to facilitate profile-guided basic-block layout optimization to benefit from better instruction-cache and TLB performance. Finally, we extend Go's linker to also consume the profiles directly and perform function reordering optimization across package boundaries -- which also helps instruction-cache and TLB performance.

The format of the profile file consumed by our PGO is identical to the protobuf format produced by the pprof tool. This format is rich enough to carry additional hardware performance counter information such as cache misses, LBR, etc. Existing perf_data_converter tool from Google can convert a perf.data file produced by the Linux perf into a profile.proto file in protobuf format.

The first version of the code that performs profile-guided inlining is available here. In summary, we introduce the following flags to the go compiler in our first released version:

**-profileuse <filename>**: filename corresponds to protobuf CPU profile. This flag will build the WeightedCallGraph and use it to perform profile-guided inlining.
**-inlinehotthreshold <string_float>** and **-inlinehotbudget <int>**: These two flags are optional as they have been provided with default values. In advanced settings, these flags can be tuned for controlling code size and performance.

Other PGO optimizations such as code specialization, basic block reordering, and function reordering across packages will be open-sourced in subsequent Go compiler releases.

Detailed design document cam be found here (https://go-review.googlesource.com/c/proposal/+/430398/1/design/55025-pgo-design.md)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions