Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: consider caching previous map hash value #5147

bradfitz opened this issue Mar 28, 2013 · 25 comments

runtime: consider caching previous map hash value #5147

bradfitz opened this issue Mar 28, 2013 · 25 comments


Copy link

bradfitz commented Mar 28, 2013

Michael Jones notes that the following pattern is common in Go code:

  if v, ok := map[key]; ok { // or !ok
      map[key] = newValue

If hashing that key type is expensive (say: large string/struct) and the map isn't empty
or very small, the above code does it twice.

If we kept a per-P cache of the last key & last computed hash value for that P, we
could eliminate some hash calls.

Probably worth measuring. (Probably not for Go 1.1)
Copy link
Contributor Author

Comment 2:

Example from net/textproto.ReadMIMEHeader:
                m[key] = append(m[key], value)

Copy link
Contributor Author

Comment 3:

I've hacked this up as an exercise:
This isn't being proposed for Go 1.1.  I just wanted to see what it'd involve, and learn
some new code.

Copy link

Comment 4:

Pleased to see this. It's what I would expect. Poor performance is a "death
by a thousand cuts" issue, so every place where something is done twice is
a place where one of those cuts may be healed.

Copy link
Contributor Author

Comment 5:

This issue was updated by revision ecdcec1.

R=khr, r

Copy link
Contributor Author

Comment 6:

It helps when it triggers, and hurts when it doesn't, as expected.
Now the challenge would be making it cost less when it doesn't trigger:
benchmark                             old ns/op    new ns/op    delta
BenchmarkNewEmptyMap                        144          142   -1.39%
BenchmarkHashStringSpeed                     42           46  +10.21%
BenchmarkHashInt32Speed                      34           31   -8.45%
BenchmarkHashInt64Speed                      31           32   +2.52%
BenchmarkHashStringArraySpeed               112          126  +12.50%
BenchmarkMegMap                          298981           28  -99.99%  << best case
BenchmarkMegOneMap                           24           24   +0.40%
BenchmarkMegEmptyMap                          4            4   +0.44%
BenchmarkSmallStrMap                         23           23   +0.00%
BenchmarkIntMap                              17           17   +0.56%
BenchmarkRepeatedLookupStrMapKey32           45           35  -20.22%  << new test
BenchmarkRepeatedLookupStrMapKey1M       306886       154843  -49.54%  << new test

Copy link

Comment 7:

Strings have lengths. you could compare those before comparing the strings.
I bet in real life (words in text, etc.) the random
serial correlation would be low enough to bring the 10% down to 1-2%
(unless you're already doing that. ;-)
Same for String Array -- do all the lengths first and then the string
I wonder what the real diversity is. Maybe you should make a version that
does the compares, tabulates matches/non-matches, and prints that at exit.
Run the benchmarks that way and do some utilities (build the docs, etc.)
and see what the norm is. That's a good way to understand what really
happens in practice.

Copy link
Contributor Author

Comment 8:

We already do that.
That's not what this bug is about.

Copy link
Contributor Author

Comment 9:

Out of curiosity, I ran the go1 benchmark before and after this CL:
ante:go1 $ ~/go/misc/benchcmp before after 
benchmark                         old ns/op    new ns/op    delta
BenchmarkBinaryTree17            6370288817   6376970261   +0.10%
BenchmarkFannkuch11              3774653509   3782454073   +0.21%
BenchmarkFmtFprintfEmpty                 94           95   +1.49%
BenchmarkFmtFprintfString               290          288   -0.69%
BenchmarkFmtFprintfInt                  189          190   +0.53%
BenchmarkFmtFprintfIntInt               319          317   -0.63%
BenchmarkFmtFprintfPrefixedInt          356          358   +0.56%
BenchmarkFmtFprintfFloat                496          503   +1.41%
BenchmarkFmtManyArgs                   1317         1311   -0.46%
BenchmarkGobDecode                 12415020     12463592   +0.39%
BenchmarkGobEncode                 14668974     14508952   -1.09%
BenchmarkGzip                     569880241    570438324   +0.10%
BenchmarkGunzip                   128886401    138989651   +7.84%
BenchmarkHTTPClientServer             72004        76433   +6.15%
BenchmarkJSONEncode                48857161     49610984   +1.54%
BenchmarkJSONDecode               111535431    112135503   +0.54%
BenchmarkMandelbrot200              4760656      4494016   -5.60% << only good
BenchmarkGoParse                    6673246      6722963   +0.75%
BenchmarkRegexpMatchEasy0_32            122          123   +0.82%
BenchmarkRegexpMatchEasy0_1K            384          385   +0.26%
BenchmarkRegexpMatchEasy1_32            106          106   +0.00%
BenchmarkRegexpMatchEasy1_1K           1103         1106   +0.27%
BenchmarkRegexpMatchMedium_32           216          215   -0.46%
BenchmarkRegexpMatchMedium_1K         90747        92652   +2.10%
BenchmarkRegexpMatchHard_32            4405         4739   +7.58%
BenchmarkRegexpMatchHard_1K          139907       144608   +3.36%
BenchmarkRevcomp                 1026696025   1021416881   -0.51%
BenchmarkTemplate                 141715311    142208244   +0.35%
BenchmarkTimeParse                      487          482   -1.03%
BenchmarkTimeFormat                     548          552   +0.73%
The Mandelbrot result suggests that having the compiler give a hint to the runtime hash
insert/lookup functions about when to update & consult the cache would be worthwhile.

Copy link
Contributor Author

Comment 10:

Sigh, the mandelbrot test doesn't even use maps. (I didn't even think about it a second
Yay benchmarks.
Maybe with a higher benchtime it'd be more reliable.  (This isn't even on a laptop,
where I could write it off as CPU frequency scaling)

Copy link

Comment 11:

If nothing else, it's not a glowing endorsement. Hmm...

Copy link

Comment 12:

Maybe you could instrument what fraction of the lookups would result in a hit?  It would
be interesting to know for each of the benchmarks, and that kind of measurement doesn't
suffer from timing variance issues.

Copy link
Contributor Author

Comment 13:

For go1 bench tests overall:
 204 [fast2_hit]
2674 [gen_hit]
202447 [gen_miss]
259524 [fast1_miss]
281748 [fast1_hit]
472442 [fast2_miss]
Where fast1 and fast2 are the kind int32/int64/string lookup fast paths, and "gen" is
any map assignment, deletion, or a non-specialized (not int/string) lookup.  
Broken up per-test, the baseline is:
   2 [fast2_hit]
  48 [gen_hit]
  63 [fast2_miss]
 331 [gen_miss]
12823 [fast1_hit]
102468 [fast1_miss]
... for just the test infrastructure / binary start-up somehow.  
Then the only interesting ones using maps are:  (these include the baseline numbers)
   2 [fast2_hit]
  48 [gen_hit]
  64 [fast2_miss]
12823 [fast1_hit]
102468 [fast1_miss]
151846 [gen_miss]
   2 [fast2_hit]
  48 [gen_hit]
  63 [fast2_miss]
 331 [gen_miss]
102469 [fast1_miss]
281748 [fast1_hit]   <<<<< nice
   2 [fast2_hit]
2674 [gen_hit]
12823 [fast1_hit]
50932 [gen_miss]
259523 [fast1_miss]
471228 [fast2_miss]
For net/textproto ReadMIMEHeader's benchmark, which has:
     m[key] = append(m[key], value).
... from the comment above.  Does:
  25 [fast2_miss]
30050 [fast1_miss]   // m[key] lookup on 9th and 10th keys
30071 [gen_hit]       // mkey] assignment on 9th and 10th keys
60101 [BenchmarkReadMIMEHeader-loop]
541151 [gen_miss]   // m[key] assignment on 1st-8th keys (miss because no hash was done
for short-strings when len(m)  <= 8)
It does better if the header is bigger (more unique keys) or has larger longer (over 32)
For net/http's BenchmarkServerFakeConnNoKeepAlive:
  22 [gen_hit]
  27 [fast2_miss]
841760 [gen_miss]
JSONEncode looks to benefit the most from this.

Copy link
Contributor Author

bradfitz commented Apr 3, 2013

Comment 14:

Labels changed: added performance.

Copy link
Contributor Author

bradfitz commented Apr 3, 2013

Comment 15:

Labels changed: removed optimization.

Copy link

rsc commented Dec 4, 2013

Comment 16:

Labels changed: added repo-main.

Copy link

rsc commented Mar 3, 2014

Comment 17:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

Copy link
Contributor Author

bradfitz commented Dec 2, 2014

Comment 18:

I realize that nowadays (as opposed to when this bug was filed) we have the go/ssa
package, which should ease doing an analysis over all public Go code to see how often
this pattern occurs.

Copy link

axw commented Dec 3, 2014

Comment 19:

@bradfitz: if you do the analysis part with go/ssa, would you please share the code?
I'd be interested to see how it might fit in with a go/ssa-based compiler (llgo) as an
optimisation. If you can statically identify the map lookup/modification that operate on
the same key value, then you should be able to do the cache/reuse statically with
minimal impact on other sites.

Copy link

adonovan commented Dec 3, 2014

Comment 20:

The doublehash program in this CL uses go/ssa to find occurrences:
% doublehash encoding/json
/home/adonovan/go/src/encoding/json/encode.go:1073:14:  this map[reflect.Type]int lookup
/home/adonovan/go/src/encoding/json/encode.go:1073:14:          dominates this update
with same map and key
/home/adonovan/go/src/encoding/json/encode.go:1074:17:  this map[reflect.Type]int lookup
/home/adonovan/go/src/encoding/json/encode.go:1073:14:          dominates this update
with same map and key
It only finds update-after-lookup patterns, but it could be easily adapted to
lookup-after-lookup too, although they're rare in code that's already performance
Since unlike gc, go/ssa doesn't have a precise interprocedural escape analysis, I've
added an -unsound flag to make optimistic aliasing assumptions about the map expression,
e.g. that in this program,
  if !o.m[k] {
    o.m[k] = true
the two references o.m must alias to the same variable.
doublehash finds 77 definite occurrences in the standard library, and 176 with the
-unsound flag.  (Escape analysis would likely yield even more since the key expression
may also be subject to aliasing, e.g. m[o.k].)

Copy link
Contributor Author

Instead of caching the map value per key, what if the frontend turned map operations into separate SSA operations for hashing the key vs. using the hash to do the various map operations?

Then there would be SSA operations for map lookup and map assignment which take the hash in addition to the key.

The downside is that the subsequent map operations would still need to do == on all entry values hashed to the same bucket, but at least the hash operation itself would only be done once.

The alternative (having map operations return a pointer into the map value) is a little more invasive in that it has to be aware of maps being resized during sets. You might need a separate SSA value for each potential layout of the hash, with all map mutation operations returning a new SSA value for the potentially-new map layout (e.g. after growth).

@randall77, do you have plans in general of moving some of the frontend's runtime call lowering later into SSA, so more of the SSA goodness can eliminate things?

Copy link

No plans at the moment, although that is a possibility.

Copy link

myitcv commented Sep 18, 2016

Does copying a map fall in to the same category of optimisation with respect to key hashing?

The following is, I believe, the encouraged way of writing a map copy (assuming we want a "simple", exact copy):

var m map[t1]t2

theCopy := make(map[t1]t2, len(m))

for k, v := range m {
  theCopy[k] = v

This results in a call to runtime.mapassign1 within the loop which, as I understand it, results in a call to hash the key value.

Feels like this particular scenario can be optimised heavily?

That said, whilst this copy example is another instance of avoiding unnecessary hashing, it perhaps falls into a different category of (SSA) optimisation...

Copy link

CL mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 12, 2016
To compile:
  m[k] = v
instead of:
  mapassign(maptype, m, &k, &v), do
  *mapassign(maptype, m, &k) = v

mapassign returns a pointer to the value slot in the map.  It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.

This makes map accesses faster but potentially larger (codewise).

It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove.  We also potentially
avoid stack temporaries to hold v.

The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%

This CL is in preparation for doing operations like m[k] += v with
only a single runtime call.  That will roughly double the speed of
such operations.

Update #17133
Update #5147

Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Run-TryBot: Keith Randall <>
Reviewed-by: Matthew Dempsky <>
TryBot-Result: Gobot Gobot <>
Copy link

Change mentions this issue: cmd/compile: avoid mapaccess at m[k]=append(m[k]..

gopherbot pushed a commit that referenced this issue Mar 20, 2018
Currently rvalue m[k] is transformed during walk into:

        tmp1 := *mapaccess(m, k)
        tmp2 := append(tmp1, ...)
        *mapassign(m, k) = tmp2

However, this is suboptimal, as we could instead produce just:
        tmp := mapassign(m, k)
        *tmp := append(*tmp, ...)

Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)

name                           old time/op    new time/op    delta
MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)

name                           old alloc/op   new alloc/op   delta
MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)

Fixes #24364
Updates #5147

Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-by: Matthew Dempsky <>
Run-TryBot: Matthew Dempsky <>
TryBot-Result: Gobot Gobot <>
Copy link

mvdan commented Mar 25, 2023

This would make declaring flags about 10% faster, which matters as cmd/go spends as much as 15% of its init time in For numbers, see I had to abandon that CL as it was technically a breaking change, but all it tried to do was avoid accessing the same map with the same string key twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

9 participants