runtime: enhance map cacheline efficiency via prefetch #52902
Was watching this  video on Golang map design. Says the 6.5 load factor is used and that this can cause
So I wrote this one-liner to simulate the number of overflow buckets for maps with increasing sizes of fixed, non-overflow buckets. For each map size, it adds "keys" until the 6.5 load factor is reached (usually resulting in the map being 81% full, at which point a real map would expand and double in size). At the point of reaching the 6.5 load factor size then the number of 1st level overflow buckets is shown, as well as 2nd & 3rd level:
At the point of reaching the 6.5 load factor then there are generally about 12% 1st level overflow buckets in the simulation. Which presumably means about 1 in 8 bucket accesses might have to read an overflow bucket to find the key?
Worse, as the number of buckets increases, then the number of 2nd level overflow buckets also increases, but is relatively minor compared to the number of 1st level overflow buckets. So if the programmer is interested in absolute worst case key lookup times, then there will be a very small percentage of keys where the absolute key look up time will be longer than other keys due to having to scan not only the 1st level overflow bucket, but a 2nd level overflow bucket too?
The key lookup code  is still very much like in the presentation  and has these two loops:
I was thinking that this would be the perfect opportunity to use a CPU prefetch instruction -- if it exists in Golang... does it? -- and the pseudo code would look something like:
Why would it be the perfect opportunity? Because generally
In this case -- for the regular code in use today; normally and without prefetching -- this async delay becomes a sync delay at the first time the cacheline containing address
So if the time to fetch the overflow
I would love to try this out, but does Golang already implement a prefetch instruction? Have been searched and not finding. If not, how to go about implementing it? Presumably, if it existed this could speed up up to approx. 1 in 8 bucket reads using the current load factor for a fuller map?
The text was updated successfully, but these errors were encountered:
We do have a prefetch you can try, it is
I'm not entirely convinced this would be worth it.
That said, when the prefetch is useful it can save lots of cycles. Definitely worth an experiment.
@randall77 good points and thanks for the tip about
Presumably it wouldn't have to run at max load factor, and the overflow buckets just increase in number long before the max load factor is reached? Isn't it also normal case for any map that grows bigger? I.e. a map will reach the max load factor perhaps many times during its time growing bigger?
This gave me the thought that maybe there could be an extra conditional so that this business logic only kicks in when the map reaches a certain load factor? Presumably the current load factor is relatively easy to compute? This way, the overhead for the general access to a not heavily loaded map would be a single simpler
Maybe the experiment could be as follows:
Step 1: Time adding n more keys to a map, and then time reading all map keys using key lookup.
Run this experiment with and without the prefetch mechanism, and compare the overall times, and the times for each step 1 operation.
Presumably this experiment would let us compare performance for different load levels? What do you think?
Yes, a map might be at the max load factor many times during its life. But it will also be at lower load factors many times during its life. The chance that you have to look at an overflow bucket is dependent on the current load factor. That load factor could be as high as 6.5, but it just as easily could be as low as 3.25. I like to think that the "average" or "random" map has a load factor of 4.875, which implies a much lower "average" overflow bucket probability. Of course that's kind of imprecise, but I think my point is clear.
Sure, that's one possible way to do it. Everything has a cost, though. You'd need the cost of the "test if we're in the high load factor regime" to be cheaper than just issuing the prefetch.
I was thinking an initial experiment could be simpler. Allocate and populate a 1M entry map. Time a loop that looks up all the keys in a randomish order. Add prefetching and see if it gets faster. Vary the map size a bit to get to both the min and max load factor.