-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: enhance map cacheline efficiency via prefetch #52902
Comments
CC @randall77 |
We do have a prefetch you can try, it is I'm not entirely convinced this would be worth it.
That said, when the prefetch is useful it can save lots of cycles. Definitely worth an experiment. |
@randall77 good points and thanks for the tip about
Presumably it wouldn't have to run at max load factor, and the overflow buckets just increase in number long before the max load factor is reached? Isn't it also normal case for any map that grows bigger? I.e. a map will reach the max load factor perhaps many times during its time growing bigger?
This gave me the thought that maybe there could be an extra conditional so that this business logic only kicks in when the map reaches a certain load factor? Presumably the current load factor is relatively easy to compute? This way, the overhead for the general access to a not heavily loaded map would be a single simpler
Maybe the experiment could be as follows: Step 1: Time adding n more keys to a map, and then time reading all map keys using key lookup. Run this experiment with and without the prefetch mechanism, and compare the overall times, and the times for each step 1 operation. Presumably this experiment would let us compare performance for different load levels? What do you think? |
Yes, a map might be at the max load factor many times during its life. But it will also be at lower load factors many times during its life. The chance that you have to look at an overflow bucket is dependent on the current load factor. That load factor could be as high as 6.5, but it just as easily could be as low as 3.25. I like to think that the "average" or "random" map has a load factor of 4.875, which implies a much lower "average" overflow bucket probability. Of course that's kind of imprecise, but I think my point is clear.
Sure, that's one possible way to do it. Everything has a cost, though. You'd need the cost of the "test if we're in the high load factor regime" to be cheaper than just issuing the prefetch.
I was thinking an initial experiment could be simpler. Allocate and populate a 1M entry map. Time a loop that looks up all the keys in a randomish order. Add prefetching and see if it gets faster. Vary the map size a bit to get to both the min and max load factor. |
Was watching this [1] video on Golang map design. Says the 6.5 load factor is used and that this can cause
overflow
buckets, and I wondered how many overflow buckets there are in a typical map?So I wrote this one-liner to simulate the number of overflow buckets for maps with increasing sizes of fixed, non-overflow buckets. For each map size, it adds "keys" until the 6.5 load factor is reached (usually resulting in the map being 81% full, at which point a real map would expand and double in size). At the point of reaching the 6.5 load factor size then the number of 1st level overflow buckets is shown, as well as 2nd & 3rd level:
At the point of reaching the 6.5 load factor then there are generally about 12% 1st level overflow buckets in the simulation. Which presumably means about 1 in 8 bucket accesses might have to read an overflow bucket to find the key?
Worse, as the number of buckets increases, then the number of 2nd level overflow buckets also increases, but is relatively minor compared to the number of 1st level overflow buckets. So if the programmer is interested in absolute worst case key lookup times, then there will be a very small percentage of keys where the absolute key look up time will be longer than other keys due to having to scan not only the 1st level overflow bucket, but a 2nd level overflow bucket too?
The key lookup code [2] is still very much like in the presentation [1] and has these two loops:
I was thinking that this would be the perfect opportunity to use a CPU prefetch instruction -- if it exists in Golang... does it? -- and the pseudo code would look something like:
Why would it be the perfect opportunity? Because generally
__builtin_prefetch()
(from GCC in this case) is difficult to use in code because depending upon how fast the RAM is, after executing this instruction there will be a sizeable and difficult to predict async delay before the desired cacheline is cached and ready to be used. Most code does not know which next memory address it's going to need to access this far in the future, and thus it's notoriously difficult to use the prefetch instruction in real code...In this case -- for the regular code in use today; normally and without prefetching -- this async delay becomes a sync delay at the first time the cacheline containing address
b
is accessed, e.g.b.tophash[i]
. However, this code is always going to loop over 8 bucket items before accessing the the next overflowb
.So if the time to fetch the overflow
b
cacheline is x, and the time to execute the prefetch instruction is y, and the time to loop over the 8 bucket items is z, then using the prefetch instruction we stand to save x + y - z time (assuming that x > z) :-)I would love to try this out, but does Golang already implement a prefetch instruction? Have been searched and not finding. If not, how to go about implementing it? Presumably, if it existed this could speed up up to approx. 1 in 8 bucket reads using the current load factor for a fuller map?
Thoughts?
[1] https://www.youtube.com/watch?v=Tl7mi9QmLns
[2] https://github.com/golang/go/blob/master/src/runtime/map.go#L432
The text was updated successfully, but these errors were encountered: