Skip to content

cmd/compile, runtime: reduce function prologue overheadย #31193

Open
@aclements

Description

@aclements

As of Go 1.12, the stack bound check in every function prologue looks like

MOVQ FS:0xfffffff8, CX
CMPQ 0x10(CX), SP
JBE 0x4018e4

(or some variation thereof). This involves a chain of three dependent instructions, the first two of which are memory loads and the last of which is a conditional branch. I don't have hard data for this, but I suspect this is really bad for the CPU pipeline. The two loads are absolutely going to be in cache, but the CPU still has to wait for the first load before issuing the second, and has to wait for the second before resolving the branch. The branch is highly predictable and can probably be speculated over, but since almost every single function has such a branch, it's probably somewhat likely the branch predictor cache will fail us here.

Function prologue overhead was also reported to be high in The benefits and costs of writing a POSIX kernel in a high-level language by Cutler et al.

One way we could address this is by putting the stack bound in a dedicated register (leveraging our new ability to change the internal ABI, #27539). This would make the prologue a single register/register compare and a branch. The branch would still probably have poor prediction cache locality, but the register/register comparison would happen so quickly that we would lose very little to a speculation failure. We're already moving toward implementing goroutine preemption using signals, which would make it easy to poison this stack bound register when requesting a preemption.

/cc @dr2chase @randall77 @josharian

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions