Skip to content

cmd/compile: add basic block counters for PGO #65466

Open
@alexanius

Description

@alexanius

Proposal Details

This is a proposal for implementing PGO basic block counters in the Go compiler. The issue #62463 describes profile-based optimizations useful for the Go compiler. Most of them (basic block ordering, loop unroll, PGO register allocation and others) need counters inside the basic blocks. Currently, the Go compiler has the weighted call graph, which cannot be used for such optimizations directly.

Here I propose to add the basic block counters to make possible implementation of profile guided optimizations.

General approach

The general approach is based on adding counter values to the AST and SSA IR nodes, getting these values from the pprof file and correcting them during the compilation.

Step 1. Load the counter values to the AST nodes. The counters from the samples can easily be loaded to the corresponding AST nodes. As we use sampling profile, not all the nodes will have the values.

Step 2. Propagate the values to the remaining nodes. Here we traverse the AST nodes and propagate existing values to the nodes with no values. This is needed for further steps

Step 3. Correct values after devirtualization and inline. The callee function nodes contains the summary value of all the calls, but after inline, we should re-evaluate these values according to the inline point counter.

Step 4. Assign counters to the basic blocks during the SSA generation.

Step 5. Correct the counters of the basic blocks if any optimization changes the control flow.

Step 6. Implement the optimizations that rely on basic block counters.

Notes on implementation

  1. Alternative approach. The suggested approach assumes storing and correcting the counters during the whole compilation pipeline. This will add additional field to the IR nodes and can complicate the optimization implementation (at least additional steps to the inline). As an alternative, we could try to load counters to the particular SSA basic blocks, basing on the position information of the operations. This approach has the following disadvantages: we still need counter correction, based on top-down and bottom-up control flow graph traversing, and additional correction based on inline tree information. If there exists an optimization, that changes the control flow, we still need correction. Also, the dynamic escapes on cold paths optimization needs the counters on the AST nodes. So, loading the counters to the AST nodes is not more complicated (probably even easier) and gives more opportunities.

  2. One of the non-trivial parts is Step 2 - propagating nodes on the AST. Probably, this algorithm will be implemented as a down-top and top-down walk through the tree. The particular algorithm will be designed during implementation.

  3. To make the profile more precise, we need line discriminators. Currently, the debug information in the Go binary contains only per-line information. This will play a role in the cases of a few conditions in "if" construction, for example, but even without this information, the profile will be useful. The approach for loading this information is described in issue cmd/compile: add intra-line discrimination to PGO profiles #59612.

Implementation plan

I made a prototype that loads counters into the AST IR nodes and going to pass them to the SSA basic blocks. After that, I will implement the Steps 2 and 3. Then I'm going to add discriminators and implement the rest. After that I'm going to implement some of the optimizations like local basic block ordering.

I would like to get feedback from the community and understand if the community finds this useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions