Skip to content

runtime: optimize write barrier #22460

@aclements

Description

@aclements

This issue is to track work on optimizing the write barrier.

Our current write barrier is fairly inefficient simply because not much effort has been put into optimizing it. For most applications, the write barrier has little overhead simply because it's rarely enabled. However, for some applications (e.g., the compiler) it consumes a few percent of the CPU. Furthermore, to fix #14951, we have to reduce the CPU consumed by GC, which necessarily means that GC and the write barrier will be enabled more of the time, increasing the impact of write barrier overhead.

I propose switching to a "buffered" write barrier, in which a fast path simply enqueues the necessary pointers to a per-P buffer. This fast path can be done without a normal Go call, avoiding the cost of spilling registers around the write barrier. When the buffer fills up, the write barrier will enter the slow path, which will spill all registers and enter the runtime to flush the buffer. We can disallow stack splits and safe-points during flushing so we don't need type information for the spilled registers.

I've already implemented this for the single pointer barrier on amd64. It's ~4X faster than the current barrier, speeds up GC-heavy applications by ~2%, and reduces binary size by ~1.5%. I haven't yet implemented it for other architectures, or used these techniques to improve the bulk write barriers.

/cc @RLH @josharian

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions