Skip to content

cmd/compile: greedy basic block layout #66420

@y1yang0

Description

@y1yang0

Proposal Details

Fine-grained performance metrics (such as L1-iTLB, IPC, branch-load, branch-load-miss, etc.) and optimization experiments(#63670) have repeatedly shown that block layout can noticably impact code execution efficiency.

In contrast to the current case-by-case tuning approach, I propose adopting the traditional and time-tested Pettis-Hansen (PH) basic block layout algorithm. Its core concept is to place hot block as closely together as possible, allowing basic blocks to fall through whenever feasible, thereby reducing jump instructions. This principle is considered the golden rule in the field of block layout, and state-of-the-art algorithms like extTSP and almost all variants are based on this idea, incorporating advanced heuristic techniques.

The PH algorithm relies on a weighted chain graph, where the weights represent the frequency of edges. In the absence of PGO information, we can only resort to branch prediction results from likelyadjust pass. In the future, we can incorporate PGO data as weights to make the algorithm even more effective.

Experiment Results

image

The x-axis represents testcase id, the y-axis indicates the performance change in percentage points, and negative values denote performance improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions