-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
This issue is to track work on optimizing the write barrier.
Our current write barrier is fairly inefficient simply because not much effort has been put into optimizing it. For most applications, the write barrier has little overhead simply because it's rarely enabled. However, for some applications (e.g., the compiler) it consumes a few percent of the CPU. Furthermore, to fix #14951, we have to reduce the CPU consumed by GC, which necessarily means that GC and the write barrier will be enabled more of the time, increasing the impact of write barrier overhead.
I propose switching to a "buffered" write barrier, in which a fast path simply enqueues the necessary pointers to a per-P buffer. This fast path can be done without a normal Go call, avoiding the cost of spilling registers around the write barrier. When the buffer fills up, the write barrier will enter the slow path, which will spill all registers and enter the runtime to flush the buffer. We can disallow stack splits and safe-points during flushing so we don't need type information for the spilled registers.
I've already implemented this for the single pointer barrier on amd64. It's ~4X faster than the current barrier, speeds up GC-heavy applications by ~2%, and reduces binary size by ~1.5%. I haven't yet implemented it for other architectures, or used these techniques to improve the bulk write barriers.
/cc @RLH @josharian