/ go Public
runtime: enhance map cacheline efficiency via prefetch #52902
Issues related to the Go compiler and/or runtime.
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Was watching this  video on Golang map design. Says the 6.5 load factor is used and that this can cause
overflowbuckets, and I wondered how many overflow buckets there are in a typical map?
So I wrote this one-liner to simulate the number of overflow buckets for maps with increasing sizes of fixed, non-overflow buckets. For each map size, it adds "keys" until the 6.5 load factor is reached (usually resulting in the map being 81% full, at which point a real map would expand and double in size). At the point of reaching the 6.5 load factor size then the number of 1st level overflow buckets is shown, as well as 2nd & 3rd level:
At the point of reaching the 6.5 load factor then there are generally about 12% 1st level overflow buckets in the simulation. Which presumably means about 1 in 8 bucket accesses might have to read an overflow bucket to find the key?
Worse, as the number of buckets increases, then the number of 2nd level overflow buckets also increases, but is relatively minor compared to the number of 1st level overflow buckets. So if the programmer is interested in absolute worst case key lookup times, then there will be a very small percentage of keys where the absolute key look up time will be longer than other keys due to having to scan not only the 1st level overflow bucket, but a 2nd level overflow bucket too?
The key lookup code  is still very much like in the presentation  and has these two loops:
I was thinking that this would be the perfect opportunity to use a CPU prefetch instruction -- if it exists in Golang... does it? -- and the pseudo code would look something like:
Why would it be the perfect opportunity? Because generally
__builtin_prefetch()(from GCC in this case) is difficult to use in code because depending upon how fast the RAM is, after executing this instruction there will be a sizeable and difficult to predict async delay before the desired cacheline is cached and ready to be used. Most code does not know which next memory address it's going to need to access this far in the future, and thus it's notoriously difficult to use the prefetch instruction in real code...
In this case -- for the regular code in use today; normally and without prefetching -- this async delay becomes a sync delay at the first time the cacheline containing address
bis accessed, e.g.
b.tophash[i]. However, this code is always going to loop over 8 bucket items before accessing the the next overflow
So if the time to fetch the overflow
bcacheline is x, and the time to execute the prefetch instruction is y, and the time to loop over the 8 bucket items is z, then using the prefetch instruction we stand to save x + y - z time (assuming that x > z) :-)
I would love to try this out, but does Golang already implement a prefetch instruction? Have been searched and not finding. If not, how to go about implementing it? Presumably, if it existed this could speed up up to approx. 1 in 8 bucket reads using the current load factor for a fuller map?
The text was updated successfully, but these errors were encountered: