Skip to content

runtime: enhance map cacheline efficiency #48687

@simonhf

Description

@simonhf

Looking at the Golang source code for how map is implemented [1], there seems to be a possible opportunity to enhance cacheline efficiency:

The current algorithm appears to work as follows for looking up a key in a map:

  • Hash the key [2] and find the associated bucket [3].
  • Search along bucket .tophash array [4] for a key candidate <-- cacheline fetch 1
  • Binary compare key in key array [5] <-- cacheline fetch 2
  • Grab associated element address in element array [6] <-- cacheline fetch 3 (when caller uses element address)

So there are 3 arrays in a bucket; .tophash array, key array, and element (value) array. Each array has the same hard-coded bucketCnt length of 8 elements [7]. A key array item size is typically 16 bytes. This means a typical total key array size is 8 items * 16 byte size = 128 bytes long, or 2x (typically sized) 64 byte cachelines. The total key array size means the associated element array item is usually located in a different cacheline.

This can also be seen (added padding and comments to make it easier to read) from the way the key and element addresses are calculated in the source code:

k  :=  add(unsafe.Pointer(b), dataOffset+i        *uintptr(t.keysize)                      ) // <-- [8]
e  :=  add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) // <-- [9]
//                                                                    ^^^^^^^^^^^^^^^^^^^^^ element array
//                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ key array
//                            ^^^^^^^^^^ skip .tophash array

Possible optimization:

If the key and element arrays would be interleaved (instead of separate arrays) then this would not guarantee that the associated key[n] and element[n] are always in the same cacheline, but often they would be :-)

As an example, the interleaved version of the above k and e assignments would look something like this:

k  :=  add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(0)        )
e  :=  add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize+t.elemsize)+uintptr(t.keysize))

In theory, performance would more often be better due to less CPU cacheline fetches, and otherwise the same performance as before?

[1] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[2] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L515
[3] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L517
[4] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L532
[5] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L542
[6] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543
[7] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L66
[8] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L538
[9] https://github.com/golang/go/blob/go1.17.1/src/runtime/map.go#L543

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performance

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions