New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: consider caching previous map hash value #5147

Open
bradfitz opened this Issue Mar 28, 2013 · 24 comments

Comments

Projects
None yet
8 participants
@bradfitz
Member

bradfitz commented Mar 28, 2013

Michael Jones notes that the following pattern is common in Go code:

  if v, ok := map[key]; ok { // or !ok
      ....
      map[key] = newValue
  }

If hashing that key type is expensive (say: large string/struct) and the map isn't empty
or very small, the above code does it twice.

If we kept a per-P cache of the last key & last computed hash value for that P, we
could eliminate some hash calls.

Probably worth measuring. (Probably not for Go 1.1)
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 2:

Example from net/textproto.ReadMIMEHeader:
                m[key] = append(m[key], value)
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 3:

I've hacked this up as an exercise: https://golang.org/cl/8160043
This isn't being proposed for Go 1.1.  I just wanted to see what it'd involve, and learn
some new code.
@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 29, 2013

Comment 4:

Pleased to see this. It's what I would expect. Poor performance is a "death
by a thousand cuts" issue, so every place where something is done twice is
a place where one of those cuts may be healed.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 5:

This issue was updated by revision ecdcec1.

R=khr, r
CC=golang-dev
https://golang.org/cl/8165044
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 6:

It helps when it triggers, and hurts when it doesn't, as expected.
Now the challenge would be making it cost less when it doesn't trigger:
benchmark                             old ns/op    new ns/op    delta
BenchmarkNewEmptyMap                        144          142   -1.39%
BenchmarkHashStringSpeed                     42           46  +10.21%
BenchmarkHashInt32Speed                      34           31   -8.45%
BenchmarkHashInt64Speed                      31           32   +2.52%
BenchmarkHashStringArraySpeed               112          126  +12.50%
BenchmarkMegMap                          298981           28  -99.99%  << best case
BenchmarkMegOneMap                           24           24   +0.40%
BenchmarkMegEmptyMap                          4            4   +0.44%
BenchmarkSmallStrMap                         23           23   +0.00%
BenchmarkIntMap                              17           17   +0.56%
BenchmarkRepeatedLookupStrMapKey32           45           35  -20.22%  << new test
BenchmarkRepeatedLookupStrMapKey1M       306886       154843  -49.54%  << new test
@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 29, 2013

Comment 7:

Strings have lengths. you could compare those before comparing the strings.
I bet in real life (words in text, etc.) the random
serial correlation would be low enough to bring the 10% down to 1-2%
(unless you're already doing that. ;-)
Same for String Array -- do all the lengths first and then the string
bodies.
I wonder what the real diversity is. Maybe you should make a version that
does the compares, tabulates matches/non-matches, and prints that at exit.
Run the benchmarks that way and do some utilities (build the docs, etc.)
and see what the norm is. That's a good way to understand what really
happens in practice.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 29, 2013

Comment 8:

We already do that.
That's not what this bug is about.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 31, 2013

Comment 9:

Out of curiosity, I ran the go1 benchmark before and after this CL:
ante:go1 $ ~/go/misc/benchcmp before after 
benchmark                         old ns/op    new ns/op    delta
BenchmarkBinaryTree17            6370288817   6376970261   +0.10%
BenchmarkFannkuch11              3774653509   3782454073   +0.21%
BenchmarkFmtFprintfEmpty                 94           95   +1.49%
BenchmarkFmtFprintfString               290          288   -0.69%
BenchmarkFmtFprintfInt                  189          190   +0.53%
BenchmarkFmtFprintfIntInt               319          317   -0.63%
BenchmarkFmtFprintfPrefixedInt          356          358   +0.56%
BenchmarkFmtFprintfFloat                496          503   +1.41%
BenchmarkFmtManyArgs                   1317         1311   -0.46%
BenchmarkGobDecode                 12415020     12463592   +0.39%
BenchmarkGobEncode                 14668974     14508952   -1.09%
BenchmarkGzip                     569880241    570438324   +0.10%
BenchmarkGunzip                   128886401    138989651   +7.84%
BenchmarkHTTPClientServer             72004        76433   +6.15%
BenchmarkJSONEncode                48857161     49610984   +1.54%
BenchmarkJSONDecode               111535431    112135503   +0.54%
BenchmarkMandelbrot200              4760656      4494016   -5.60% << only good
result
BenchmarkGoParse                    6673246      6722963   +0.75%
BenchmarkRegexpMatchEasy0_32            122          123   +0.82%
BenchmarkRegexpMatchEasy0_1K            384          385   +0.26%
BenchmarkRegexpMatchEasy1_32            106          106   +0.00%
BenchmarkRegexpMatchEasy1_1K           1103         1106   +0.27%
BenchmarkRegexpMatchMedium_32           216          215   -0.46%
BenchmarkRegexpMatchMedium_1K         90747        92652   +2.10%
BenchmarkRegexpMatchHard_32            4405         4739   +7.58%
BenchmarkRegexpMatchHard_1K          139907       144608   +3.36%
BenchmarkRevcomp                 1026696025   1021416881   -0.51%
BenchmarkTemplate                 141715311    142208244   +0.35%
BenchmarkTimeParse                      487          482   -1.03%
BenchmarkTimeFormat                     548          552   +0.73%
The Mandelbrot result suggests that having the compiler give a hint to the runtime hash
insert/lookup functions about when to update & consult the cache would be worthwhile.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 31, 2013

Comment 10:

Sigh, the mandelbrot test doesn't even use maps. (I didn't even think about it a second
ago)
Yay benchmarks.
Maybe with a higher benchtime it'd be more reliable.  (This isn't even on a laptop,
where I could write it off as CPU frequency scaling)
@MichaelTJones

This comment has been minimized.

Contributor

MichaelTJones commented Mar 31, 2013

Comment 11:

If nothing else, it's not a glowing endorsement. Hmm...
@randall77

This comment has been minimized.

Contributor

randall77 commented Mar 31, 2013

Comment 12:

Maybe you could instrument what fraction of the lookups would result in a hit?  It would
be interesting to know for each of the benchmarks, and that kind of measurement doesn't
suffer from timing variance issues.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 31, 2013

Comment 13:

For go1 bench tests overall:
 204 [fast2_hit]
2674 [gen_hit]
202447 [gen_miss]
259524 [fast1_miss]
281748 [fast1_hit]
472442 [fast2_miss]
Where fast1 and fast2 are the kind int32/int64/string lookup fast paths, and "gen" is
any map assignment, deletion, or a non-specialized (not int/string) lookup.  
Broken up per-test, the baseline is:
   2 [fast2_hit]
  48 [gen_hit]
  63 [fast2_miss]
 331 [gen_miss]
12823 [fast1_hit]
102468 [fast1_miss]
... for just the test infrastructure / binary start-up somehow.  
Then the only interesting ones using maps are:  (these include the baseline numbers)
BenchmarkHTTPClientServer
   2 [fast2_hit]
  48 [gen_hit]
  64 [fast2_miss]
12823 [fast1_hit]
102468 [fast1_miss]
151846 [gen_miss]
BenchmarkJSONEncode
   2 [fast2_hit]
  48 [gen_hit]
  63 [fast2_miss]
 331 [gen_miss]
102469 [fast1_miss]
281748 [fast1_hit]   <<<<< nice
BenchmarkGoParse
   2 [fast2_hit]
2674 [gen_hit]
12823 [fast1_hit]
50932 [gen_miss]
259523 [fast1_miss]
471228 [fast2_miss]
For net/textproto ReadMIMEHeader's benchmark, which has:
     m[key] = append(m[key], value).
... from the comment above.  Does:
  25 [fast2_miss]
30050 [fast1_miss]   // m[key] lookup on 9th and 10th keys
30071 [gen_hit]       // mkey] assignment on 9th and 10th keys
60101 [BenchmarkReadMIMEHeader-loop]
541151 [gen_miss]   // m[key] assignment on 1st-8th keys (miss because no hash was done
for short-strings when len(m)  <= 8)
It does better if the header is bigger (more unique keys) or has larger longer (over 32)
keys.
For net/http's BenchmarkServerFakeConnNoKeepAlive:
  22 [gen_hit]
  27 [fast2_miss]
841760 [gen_miss]
JSONEncode looks to benefit the most from this.
@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 3, 2013

Comment 14:

Labels changed: added performance.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 3, 2013

Comment 15:

Labels changed: removed optimization.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 4, 2013

Comment 16:

Labels changed: added repo-main.

@rsc

This comment has been minimized.

Contributor

rsc commented Mar 3, 2014

Comment 17:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 2, 2014

Comment 18:

I realize that nowadays (as opposed to when this bug was filed) we have the go/ssa
package, which should ease doing an analysis over all public Go code to see how often
this pattern occurs.
@axw

This comment has been minimized.

Contributor

axw commented Dec 3, 2014

Comment 19:

@bradfitz: if you do the analysis part with go/ssa, would you please share the code?
I'd be interested to see how it might fit in with a go/ssa-based compiler (llgo) as an
optimisation. If you can statically identify the map lookup/modification that operate on
the same key value, then you should be able to do the cache/reuse statically with
minimal impact on other sites.
@adonovan

This comment has been minimized.

adonovan commented Dec 3, 2014

Comment 20:

The doublehash program in this CL uses go/ssa to find occurrences:
  https://golang.org/cl/182420043/
% doublehash encoding/json
/home/adonovan/go/src/encoding/json/encode.go:1073:14:  this map[reflect.Type]int lookup
/home/adonovan/go/src/encoding/json/encode.go:1073:14:          dominates this update
with same map and key
/home/adonovan/go/src/encoding/json/encode.go:1074:17:  this map[reflect.Type]int lookup
/home/adonovan/go/src/encoding/json/encode.go:1073:14:          dominates this update
with same map and key
It only finds update-after-lookup patterns, but it could be easily adapted to
lookup-after-lookup too, although they're rare in code that's already performance
sensitive.
Since unlike gc, go/ssa doesn't have a precise interprocedural escape analysis, I've
added an -unsound flag to make optimistic aliasing assumptions about the map expression,
e.g. that in this program,
  if !o.m[k] {
    ...
    o.m[k] = true
    ...
  }
the two references o.m must alias to the same variable.
doublehash finds 77 definite occurrences in the standard library, and 176 with the
-unsound flag.  (Escape analysis would likely yield even more since the key expression
may also be subject to aliasing, e.g. m[o.k].)
@bradfitz

This comment has been minimized.

Member

bradfitz commented Sep 16, 2016

Instead of caching the map value per key, what if the frontend turned map operations into separate SSA operations for hashing the key vs. using the hash to do the various map operations?

Then there would be SSA operations for map lookup and map assignment which take the hash in addition to the key.

The downside is that the subsequent map operations would still need to do == on all entry values hashed to the same bucket, but at least the hash operation itself would only be done once.

The alternative (having map operations return a pointer into the map value) is a little more invasive in that it has to be aware of maps being resized during sets. You might need a separate SSA value for each potential layout of the hash, with all map mutation operations returning a new SSA value for the potentially-new map layout (e.g. after growth).

@randall77, do you have plans in general of moving some of the frontend's runtime call lowering later into SSA, so more of the SSA goodness can eliminate things?

@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 16, 2016

No plans at the moment, although that is a possibility.

@myitcv

This comment has been minimized.

Member

myitcv commented Sep 18, 2016

Does copying a map fall in to the same category of optimisation with respect to key hashing?

The following is, I believe, the encouraged way of writing a map copy (assuming we want a "simple", exact copy):

var m map[t1]t2

theCopy := make(map[t1]t2, len(m))

for k, v := range m {
  theCopy[k] = v
}

This results in a call to runtime.mapassign1 within the loop which, as I understand it, results in a call to hash the key value.

Feels like this particular scenario can be optimised heavily?

That said, whilst this copy example is another instance of avoiding unnecessary hashing, it perhaps falls into a different category of (SSA) optimisation...

@gopherbot

This comment has been minimized.

gopherbot commented Oct 12, 2016

CL https://golang.org/cl/30815 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 12, 2016

cmd/compile,runtime: redo how map assignments work
To compile:
  m[k] = v
instead of:
  mapassign(maptype, m, &k, &v), do
do:
  *mapassign(maptype, m, &k) = v

mapassign returns a pointer to the value slot in the map.  It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.

This makes map accesses faster but potentially larger (codewise).

It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove.  We also potentially
avoid stack temporaries to hold v.

The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%
bigger).

This CL is in preparation for doing operations like m[k] += v with
only a single runtime call.  That will roughly double the speed of
such operations.

Update #17133
Update #5147

Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Reviewed-on: https://go-review.googlesource.com/30815
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot

This comment has been minimized.

gopherbot commented Mar 15, 2018

Change https://golang.org/cl/100838 mentions this issue: cmd/compile: avoid mapaccess at m[k]=append(m[k]..

gopherbot pushed a commit that referenced this issue Mar 20, 2018

cmd/compile: avoid mapaccess at m[k]=append(m[k]..
Currently rvalue m[k] is transformed during walk into:

        tmp1 := *mapaccess(m, k)
        tmp2 := append(tmp1, ...)
        *mapassign(m, k) = tmp2

However, this is suboptimal, as we could instead produce just:
        tmp := mapassign(m, k)
        *tmp := append(*tmp, ...)

Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)

Benchmark:
name                           old time/op    new time/op    delta
MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)

name                           old alloc/op   new alloc/op   delta
MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)

Fixes #24364
Updates #5147

Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-on: https://go-review.googlesource.com/100838
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment