-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: performance regression for small Swiss Table map access for non-specialized keys #70849
Comments
If we instead compare master from a week or so ago (8c3e391) vs. just making the map lookup behavior change (but without changing benchmark behavior), we get:
Note that both columns from this run are generally slower overall than the first benchmarks presented above. This is likely for a couple of reasons, including these benchmarks here still have a somewhat expensive mod operation in the benchmark code itself. The most interesting change here from the top comment might be Also, I initially poked at the implementation in master a bit with |
Finally, still using the old benchmarks (with predictable keys), if we compare master as of a week or so ago with and without the Swiss Table enabled, we can see that on these benchmarks the default Swiss Table in master does seem slower compared to the old runtime map (geomean of +9.31% worse). In other words, decent chance that master has a performance regression in these lookup benchmarks compared to Go 1.23, though I did not check Go 1.23 directly.
(Those are still the slower version of the benchmarks, which includes the somewhat expensive mod operation in the benchmark code, so the denominator on the percent change here is larger than in our first set of benchmarks above). I'll add a caveat that I was juggling different machines and different versions, and apologies in advance if I crossed some wires here. |
Change https://go.dev/cl/634396 mentions this issue: |
Thanks for the thorough investigation! I can indeed reproduce this using the existing benchmarks as you did in #70849 (comment).
For reference, the same comparison against the old map implementation:
It's interesting since this linear scan was an improvement when I wrote https://go.dev/cl/611189, but a lot has changed since then so I'm not too surprised. What CPU are you testing on? It's also interesting that you seem to see more extreme results than me. I'm on Intel(R) Xeon(R) W-2135 (Skylake). I'll also try out your new benchmarks, I just haven't gotten around to it yet. |
Edit: Initially, I thought this was an "optimization opportunity" and wrote it up as such, but later realized it is seemingly a regression from Go 1.23 (comment). Changed title accordingly.
I was reading through the (excellent!) new Swiss Table implementation from #54766 and noticed a potential optimization opportunity for lookups with small maps (<= 8 elems) for the normal case of non-specialized keys.
I suspect the current linear scan for these maps might be better off if we instead build a match bitmap and jump to the candidate match. I mailed https://go.dev/cl/634396 with this change.
For predictable keys, it might be a small win or close to a wash, but for unpredictable keys, it might be a larger win.
To better illustrate this (as well as to help with analyzing a couple other experiments I'm trying), I also updated the benchmarks (#70700) to shuffle the keys for the newer benchmarks, and also added a new "Hot" benchmark that repeatedly looks up a single key (Hot=1 below) or a small number of random keys (Hot=3 below).
The geomean here is -28.69%. These results are for amd64 and use the SIMD optimizations. I did not test on arm or without the SIMD optimizations.
In all cases, I suspect the misses are predictable, but in some cases the predictable miss is with a predictable key vs. a miss on an unpredictable key in other cases. For a single group like these, that probably doesn't matter.
For the hits and misses, these are the ones that I expect to have predictable keys:
(Those three are also listed above in the larger list, but pulling them out here for commentary).
The first has predictable keys because a single key is looked up repeatedly per run (with 6 elements in the small map). The second two have predictable keys because there is only a single element. In other words, the keys are shuffled in all ~20 of these benchmarks, but in the three here, the shuffling is effectively a no-op.
CC @prattmic
The text was updated successfully, but these errors were encountered: