-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
Looking at the Golang source code for how map is implemented [1], there seems to be a possible opportunity to enhance cacheline efficiency:
The current algorithm appears to work as follows for looking up a key in a map:
- Hash the key [2] and find the associated bucket [3].
- Search along bucket
.tophasharray [4] for a key candidate <-- cacheline fetch 1 - Binary compare key in key array [5] <-- cacheline fetch 2
- Grab associated element address in element array [6] <-- cacheline fetch 3 (when caller uses element address)
So there are 3 arrays in a bucket; .tophash array, key array, and element (value) array. Each array has the same hard-coded bucketCnt length of 8 elements [7]. A key array item size is typically 16 bytes. This means a typical total key array size is 8 items * 16 byte size = 128 bytes long, or 2x (typically sized) 64 byte cachelines. The total key array size means the associated element array item is usually located in a different cacheline.
This can also be seen (added padding and comments to make it easier to read) from the way the key and element addresses are calculated in the source code:
k := add(unsafe.Pointer(b), dataOffset+i *uintptr(t.keysize) ) // <-- [8]
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) // <-- [9]
// ^^^^^^^^^^^^^^^^^^^^^ element array
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ key array
// ^^^^^^^^^^ skip .tophash array
Possible optimization:
If the key and element arrays would be interleaved (instead of separate arrays) then this would not guarantee that the associated key[n] and element[n] are always in the same cacheline, but often they would be :-)
As an example, the interleaved version of the above k and e assignments would look something like this:
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(0) )
e := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(t.keysize))
In theory, performance would more often be better due to less CPU cacheline fetches, and otherwise the same performance as before?
[1] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[2] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L515
[3] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L517
[4] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[5] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L542
[6] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543
[7] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L66
[8] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L538
[9] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543