-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: consider caching previous map hash value #5147
Comments
I've hacked this up as an exercise: https://golang.org/cl/8160043 This isn't being proposed for Go 1.1. I just wanted to see what it'd involve, and learn some new code. |
This issue was updated by revision ecdcec1. R=khr, r CC=golang-dev https://golang.org/cl/8165044 |
It helps when it triggers, and hurts when it doesn't, as expected. Now the challenge would be making it cost less when it doesn't trigger: benchmark old ns/op new ns/op delta BenchmarkNewEmptyMap 144 142 -1.39% BenchmarkHashStringSpeed 42 46 +10.21% BenchmarkHashInt32Speed 34 31 -8.45% BenchmarkHashInt64Speed 31 32 +2.52% BenchmarkHashStringArraySpeed 112 126 +12.50% BenchmarkMegMap 298981 28 -99.99% << best case BenchmarkMegOneMap 24 24 +0.40% BenchmarkMegEmptyMap 4 4 +0.44% BenchmarkSmallStrMap 23 23 +0.00% BenchmarkIntMap 17 17 +0.56% BenchmarkRepeatedLookupStrMapKey32 45 35 -20.22% << new test BenchmarkRepeatedLookupStrMapKey1M 306886 154843 -49.54% << new test |
Strings have lengths. you could compare those before comparing the strings. I bet in real life (words in text, etc.) the random serial correlation would be low enough to bring the 10% down to 1-2% (unless you're already doing that. ;-) Same for String Array -- do all the lengths first and then the string bodies. I wonder what the real diversity is. Maybe you should make a version that does the compares, tabulates matches/non-matches, and prints that at exit. Run the benchmarks that way and do some utilities (build the docs, etc.) and see what the norm is. That's a good way to understand what really happens in practice. |
Out of curiosity, I ran the go1 benchmark before and after this CL: ante:go1 $ ~/go/misc/benchcmp before after benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 6370288817 6376970261 +0.10% BenchmarkFannkuch11 3774653509 3782454073 +0.21% BenchmarkFmtFprintfEmpty 94 95 +1.49% BenchmarkFmtFprintfString 290 288 -0.69% BenchmarkFmtFprintfInt 189 190 +0.53% BenchmarkFmtFprintfIntInt 319 317 -0.63% BenchmarkFmtFprintfPrefixedInt 356 358 +0.56% BenchmarkFmtFprintfFloat 496 503 +1.41% BenchmarkFmtManyArgs 1317 1311 -0.46% BenchmarkGobDecode 12415020 12463592 +0.39% BenchmarkGobEncode 14668974 14508952 -1.09% BenchmarkGzip 569880241 570438324 +0.10% BenchmarkGunzip 128886401 138989651 +7.84% BenchmarkHTTPClientServer 72004 76433 +6.15% BenchmarkJSONEncode 48857161 49610984 +1.54% BenchmarkJSONDecode 111535431 112135503 +0.54% BenchmarkMandelbrot200 4760656 4494016 -5.60% << only good result BenchmarkGoParse 6673246 6722963 +0.75% BenchmarkRegexpMatchEasy0_32 122 123 +0.82% BenchmarkRegexpMatchEasy0_1K 384 385 +0.26% BenchmarkRegexpMatchEasy1_32 106 106 +0.00% BenchmarkRegexpMatchEasy1_1K 1103 1106 +0.27% BenchmarkRegexpMatchMedium_32 216 215 -0.46% BenchmarkRegexpMatchMedium_1K 90747 92652 +2.10% BenchmarkRegexpMatchHard_32 4405 4739 +7.58% BenchmarkRegexpMatchHard_1K 139907 144608 +3.36% BenchmarkRevcomp 1026696025 1021416881 -0.51% BenchmarkTemplate 141715311 142208244 +0.35% BenchmarkTimeParse 487 482 -1.03% BenchmarkTimeFormat 548 552 +0.73% The Mandelbrot result suggests that having the compiler give a hint to the runtime hash insert/lookup functions about when to update & consult the cache would be worthwhile. |
For go1 bench tests overall: 204 [fast2_hit] 2674 [gen_hit] 202447 [gen_miss] 259524 [fast1_miss] 281748 [fast1_hit] 472442 [fast2_miss] Where fast1 and fast2 are the kind int32/int64/string lookup fast paths, and "gen" is any map assignment, deletion, or a non-specialized (not int/string) lookup. Broken up per-test, the baseline is: 2 [fast2_hit] 48 [gen_hit] 63 [fast2_miss] 331 [gen_miss] 12823 [fast1_hit] 102468 [fast1_miss] ... for just the test infrastructure / binary start-up somehow. Then the only interesting ones using maps are: (these include the baseline numbers) BenchmarkHTTPClientServer 2 [fast2_hit] 48 [gen_hit] 64 [fast2_miss] 12823 [fast1_hit] 102468 [fast1_miss] 151846 [gen_miss] BenchmarkJSONEncode 2 [fast2_hit] 48 [gen_hit] 63 [fast2_miss] 331 [gen_miss] 102469 [fast1_miss] 281748 [fast1_hit] <<<<< nice BenchmarkGoParse 2 [fast2_hit] 2674 [gen_hit] 12823 [fast1_hit] 50932 [gen_miss] 259523 [fast1_miss] 471228 [fast2_miss] For net/textproto ReadMIMEHeader's benchmark, which has: m[key] = append(m[key], value). ... from the comment above. Does: 25 [fast2_miss] 30050 [fast1_miss] // m[key] lookup on 9th and 10th keys 30071 [gen_hit] // mkey] assignment on 9th and 10th keys 60101 [BenchmarkReadMIMEHeader-loop] 541151 [gen_miss] // m[key] assignment on 1st-8th keys (miss because no hash was done for short-strings when len(m) <= 8) It does better if the header is bigger (more unique keys) or has larger longer (over 32) keys. For net/http's BenchmarkServerFakeConnNoKeepAlive: 22 [gen_hit] 27 [fast2_miss] 841760 [gen_miss] JSONEncode looks to benefit the most from this. |
@bradfitz: if you do the analysis part with go/ssa, would you please share the code? I'd be interested to see how it might fit in with a go/ssa-based compiler (llgo) as an optimisation. If you can statically identify the map lookup/modification that operate on the same key value, then you should be able to do the cache/reuse statically with minimal impact on other sites. |
The doublehash program in this CL uses go/ssa to find occurrences: https://golang.org/cl/182420043/ % doublehash encoding/json /home/adonovan/go/src/encoding/json/encode.go:1073:14: this map[reflect.Type]int lookup /home/adonovan/go/src/encoding/json/encode.go:1073:14: dominates this update with same map and key /home/adonovan/go/src/encoding/json/encode.go:1074:17: this map[reflect.Type]int lookup /home/adonovan/go/src/encoding/json/encode.go:1073:14: dominates this update with same map and key It only finds update-after-lookup patterns, but it could be easily adapted to lookup-after-lookup too, although they're rare in code that's already performance sensitive. Since unlike gc, go/ssa doesn't have a precise interprocedural escape analysis, I've added an -unsound flag to make optimistic aliasing assumptions about the map expression, e.g. that in this program, if !o.m[k] { ... o.m[k] = true ... } the two references o.m must alias to the same variable. doublehash finds 77 definite occurrences in the standard library, and 176 with the -unsound flag. (Escape analysis would likely yield even more since the key expression may also be subject to aliasing, e.g. m[o.k].) |
Instead of caching the map value per key, what if the frontend turned map operations into separate SSA operations for hashing the key vs. using the hash to do the various map operations? Then there would be SSA operations for map lookup and map assignment which take the hash in addition to the key. The downside is that the subsequent map operations would still need to do The alternative (having map operations return a pointer into the map value) is a little more invasive in that it has to be aware of maps being resized during sets. You might need a separate SSA value for each potential layout of the hash, with all map mutation operations returning a new SSA value for the potentially-new map layout (e.g. after growth). @randall77, do you have plans in general of moving some of the frontend's runtime call lowering later into SSA, so more of the SSA goodness can eliminate things? |
No plans at the moment, although that is a possibility. |
Does copying a map fall in to the same category of optimisation with respect to key hashing? The following is, I believe, the encouraged way of writing a map copy (assuming we want a "simple", exact copy): var m map[t1]t2
theCopy := make(map[t1]t2, len(m))
for k, v := range m {
theCopy[k] = v
} This results in a call to Feels like this particular scenario can be optimised heavily? That said, whilst this copy example is another instance of avoiding unnecessary hashing, it perhaps falls into a different category of (SSA) optimisation... |
CL https://golang.org/cl/30815 mentions this issue. |
To compile: m[k] = v instead of: mapassign(maptype, m, &k, &v), do do: *mapassign(maptype, m, &k) = v mapassign returns a pointer to the value slot in the map. It is just like mapaccess except that it will allocate a new slot if k is not already present in the map. This makes map accesses faster but potentially larger (codewise). It is faster because the write into the map is done when the compiler knows the concrete type, so it can be done with a few store instructions instead of calling typedmemmove. We also potentially avoid stack temporaries to hold v. The code can be larger when the map has pointers in its value type, since there is a write barrier call in addition to the mapassign call. That makes the code at the callsite a bit bigger (go binary is 0.3% bigger). This CL is in preparation for doing operations like m[k] += v with only a single runtime call. That will roughly double the speed of such operations. Update #17133 Update #5147 Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5 Reviewed-on: https://go-review.googlesource.com/30815 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Change https://golang.org/cl/100838 mentions this issue: |
Currently rvalue m[k] is transformed during walk into: tmp1 := *mapaccess(m, k) tmp2 := append(tmp1, ...) *mapassign(m, k) = tmp2 However, this is suboptimal, as we could instead produce just: tmp := mapassign(m, k) *tmp := append(*tmp, ...) Optimization is possible only if during Order it may tell that m[k] is exactly the same at left and right part of assignment. It doesn't work: 1) m[f(k)] = append(m[f(k)], ...) 2) sink, m[k] = sink, append(m[k]...) 3) m[k] = append(..., m[k],...) Benchmark: name old time/op new time/op delta MapAppendAssign/Int32/256-8 33.5ns ± 3% 22.4ns ±10% -33.24% (p=0.000 n=16+18) MapAppendAssign/Int32/65536-8 68.2ns ± 6% 48.5ns ±29% -28.90% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 34.3ns ± 4% 23.3ns ± 5% -32.23% (p=0.000 n=17+18) MapAppendAssign/Int64/65536-8 65.9ns ± 7% 61.2ns ±19% -7.06% (p=0.002 n=18+20) MapAppendAssign/Str/256-8 116ns ±12% 79ns ±16% -31.70% (p=0.000 n=20+19) MapAppendAssign/Str/65536-8 134ns ±15% 111ns ±45% -16.95% (p=0.000 n=19+20) name old alloc/op new alloc/op delta MapAppendAssign/Int32/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=19+18) MapAppendAssign/Int32/65536-8 27.0B ± 0% 20.7B ±30% -23.33% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=20+17) MapAppendAssign/Int64/65536-8 27.0B ± 0% 27.0B ± 0% ~ (all equal) MapAppendAssign/Str/256-8 94.0B ± 0% 78.0B ± 0% -17.02% (p=0.000 n=20+16) MapAppendAssign/Str/65536-8 54.0B ± 0% 54.0B ± 0% ~ (all equal) Fixes #24364 Updates #5147 Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409 Reviewed-on: https://go-review.googlesource.com/100838 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
This would make declaring flags about 10% faster, which matters as |
The text was updated successfully, but these errors were encountered: