Looking at the Golang source code for how map is implemented [1], there seems to be a possible opportunity to enhance cacheline efficiency:
The current algorithm appears to work as follows for looking up a key in a map:
- Hash the key [2] and find the associated bucket [3].
- Search along bucket
.tophash array [4] for a key candidate <-- cacheline fetch 1
- Binary compare key in key array [5] <-- cacheline fetch 2
- Grab associated element address in element array [6] <-- cacheline fetch 3 (when caller uses element address)
So there are 3 arrays in a bucket; .tophash array, key array, and element (value) array. Each array has the same hard-coded bucketCnt length of 8 elements [7]. A key array item size is typically 16 bytes. This means a typical total key array size is 8 items * 16 byte size = 128 bytes long, or 2x (typically sized) 64 byte cachelines. The total key array size means the associated element array item is usually located in a different cacheline.
This can also be seen (added padding and comments to make it easier to read) from the way the key and element addresses are calculated in the source code:
k := add(unsafe.Pointer(b), dataOffset+i *uintptr(t.keysize) ) // <-- [8]
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) // <-- [9]
// ^^^^^^^^^^^^^^^^^^^^^ element array
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ key array
// ^^^^^^^^^^ skip .tophash array
Possible optimization:
If the key and element arrays would be interleaved (instead of separate arrays) then this would not guarantee that the associated key[n] and element[n] are always in the same cacheline, but often they would be :-)
As an example, the interleaved version of the above k and e assignments would look something like this:
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(0) )
e := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(t.keysize))
In theory, performance would more often be better due to less CPU cacheline fetches, and otherwise the same performance as before?
[1] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[2] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L515
[3] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L517
[4] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[5] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L542
[6] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543
[7] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L66
[8] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L538
[9] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543
Looking at the Golang source code for how
mapis implemented [1], there seems to be a possible opportunity to enhance cacheline efficiency:The current algorithm appears to work as follows for looking up a key in a map:
.tophasharray [4] for a key candidate <-- cacheline fetch 1So there are 3 arrays in a bucket;
.tophasharray, key array, and element (value) array. Each array has the same hard-codedbucketCntlength of 8 elements [7]. A key array item size is typically 16 bytes. This means a typical total key array size is 8 items * 16 byte size = 128 bytes long, or 2x (typically sized) 64 byte cachelines. The total key array size means the associated element array item is usually located in a different cacheline.This can also be seen (added padding and comments to make it easier to read) from the way the key and element addresses are calculated in the source code:
Possible optimization:
If the key and element arrays would be interleaved (instead of separate arrays) then this would not guarantee that the associated
key[n]andelement[n]are always in the same cacheline, but often they would be :-)As an example, the interleaved version of the above
kandeassignments would look something like this:In theory, performance would more often be better due to less CPU cacheline fetches, and otherwise the same performance as before?
[1] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[2] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L515
[3] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L517
[4] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[5] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L542
[6] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543
[7] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L66
[8] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L538
[9] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543