Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: replace heap bitmap with allocation headers where possible #66737

Closed
mknyszek opened this issue Apr 9, 2024 · 1 comment
Closed
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Apr 9, 2024

GC: Allocation headers

This issue is a retroactive design description of a runtime change that landed in the Go 1.22 release cycle.

The goal of the runtime change is to improve the CPU efficiency of the GC by improving locality of GC metadata, as well as reduce the GC's memory overheads by deduplicating metadata by type. The basic idea is to give objects containing pointers an 8-byte allocation header which points to the type for that object. This way, for popular types, the GC is hitting the same few words over and over in cache instead of constantly hitting new cache lines just to scan memory. The chance for locality is just much higher overall. Arrays can also have their type information represent the element type, meaning they can just tread the same bitmap instead of walking over a large unrolled bitmap.

Additionally, allocating many objects should be a bit cheaper, since there's no more complicated bitmap unrolling required. In practice this didn't make a huge difference since headers only apply to larger objects anyway, and most of the cost is still unrolling metadata for smaller objects; see the detailed design below.

Detailed design

For small objects, if s is the mspan for the span starting at start, then s.heapBits() returns a slice containing the bitmap for the whole span. That is, s.heapBits()[0] holds the goarch.PtrSize*8 bits for the first goarch.PtrSize*8 words from start through start+63*ptrSize in the span. On a related note, small objects are always small enough that their bitmap fits in goarch.PtrSize*8 bits, so writing out bitmap data takes two bitmap writes at most (because object boundaries don't generally lie on s.heapBits()[i] boundaries).

For larger objects, if t is the type for the object starting at start, within some span whose mspan is s, then the bitmap at t.GCData is "tiled" from start through start+s.elemsize. Specifically, the first bit of t.GCData corresponds to the word at start, the second to the word after start, and so on up to t.PtrBytes. At t.PtrBytes, we skip to start+t.Size_ and begin again from there. This process is repeated until we hit start+s.elemsize. This tiling algorithm supports array data, since the type always refers to the element type of the array. Single objects are considered the same as single-element arrays. The tiling algorithm may scan data past the end of the compiler-recognized
object, but any unused data within the allocation slot (i.e. within s.elemsize) is zeroed, so the GC just observes nil pointers. Note that this "tiled" bitmap isn't stored anywhere; it is generated on-the-fly.

For objects without their own span, the type metadata is stored in the first word before the object at the beginning of the allocation slot. This means objects might shift across size classes, but that's OK. Many objects are not tightly optimizes to size class sizes and have some internal fragmentation to spare, making this usually free. Any increase here is also balanced against the fact that we no longer have a permanent 1:64 heap bitmap that is proportional to the peak heap size. However, for objects with their own span, the type metadata is stored in the mspan to avoid pushing power-of-two objects slightly, causing them to use a whole additional page. Power-of-two objects are common in general, but less common for smaller objects containing pointers. Note that pointerless objects do not have headers, so sizes for those objects are preserved exactly, and we have no GC bitmap overheads for those objects at all anymore.

Part of the trick of this implementation is to make the "tiling" algorithm fast. This mostly came down to micro-optimizing and simplifying the code as much as possible. We can also leverage work done on the bitmap-based iterator implementation that Keith Randall worked out for the current (previous) GC metadata layout to make this fast.

Risks

One risk of this change is that the most common alignment of size classes in the memory allocator is now 8 bytes instead of 16. There were some concerns at the time of release around possibly breaking assembly code that used instructions requiring the higher alignment. As a result, the work was all done behind a GOEXPERIMENT. It turns out to have not been enough of a problem in practice, and no issues about this have been filed in the months after the Go 1.22 release. (Possibly because (for example) SSE instructions requiring 16-byte alignment usually operate on pointerless data, but it's not totally clear at this point).

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 9, 2024
@mknyszek mknyszek closed this as completed Apr 9, 2024
@mknyszek mknyszek added this to the Go1.22 milestone Apr 9, 2024
@mknyszek mknyszek self-assigned this Apr 9, 2024
@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 9, 2024
@mknyszek
Copy link
Contributor Author

mknyszek commented Apr 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

2 participants