New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: recognize map-clearing range idiom #20138

Closed
josharian opened this Issue Apr 26, 2017 · 13 comments

Comments

Projects
None yet
8 participants
@josharian
Contributor

josharian commented Apr 26, 2017

Sometimes, maps get emptied for re-use. Idiom:

for k := range m {
  delete(m, k)
}

This is pretty inefficient, though; it involves juggling iterators and making one delete call per map element. This could be recognized by cmd/compile and converted into a (newly added) runtime mapclear call, which could just walk and clear the buckets directly.

We could also decide to have it free all extra overflow buckets that have been allocated for it, which would ameliorate some of the problem in #20135. (The main bucket array and pre-allocated overflow buckets would remain, but that's desirable; emptying a map and re-filling is an intentional way to reuse those.) This would make the implementation of mapclear very efficient, mostly just a typedmemclr of hmap.buckets.

cc @randall77 @martisch

@josharian josharian added this to the Go1.10 milestone Apr 26, 2017

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Apr 26, 2017

Member

Another way to do that is:

m = map[K]V{}

How common is this?

I think more common is:

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

... but that probably wouldn't match.

Member

bradfitz commented Apr 26, 2017

Another way to do that is:

m = map[K]V{}

How common is this?

I think more common is:

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

... but that probably wouldn't match.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Apr 26, 2017

Contributor

m = map[K]V{}

This is different. In the range code the map is emptied, in this, it is replaced with a new, empty map. Consider

x := m
// range clear or assign new empty map
x[0] = "a"
println(m[0])

These will yield different results based on the range clear vs assignment.

And in practice, they have very different performance characteristics, which is why the range loop gets used instead of just abandoning the old map.

How common is this?

Flipping through the cmd subdir, I see:

  • cmd/gofmt/rewrite.go lines 70-20
  • cmd/compile/internal/ssa/regalloc.go lines 1766-1768
  • cmd/go/internal/load/pkg.go lines 280-282
  • cmd/go/internal/load/pkg.go lines 1670-1672

I think more common is:

If the range delete idiom were fast, I suspect that this would get rewritten into two loops in performance-critical code, one to do the work and one to empty the map.

Contributor

josharian commented Apr 26, 2017

m = map[K]V{}

This is different. In the range code the map is emptied, in this, it is replaced with a new, empty map. Consider

x := m
// range clear or assign new empty map
x[0] = "a"
println(m[0])

These will yield different results based on the range clear vs assignment.

And in practice, they have very different performance characteristics, which is why the range loop gets used instead of just abandoning the old map.

How common is this?

Flipping through the cmd subdir, I see:

  • cmd/gofmt/rewrite.go lines 70-20
  • cmd/compile/internal/ssa/regalloc.go lines 1766-1768
  • cmd/go/internal/load/pkg.go lines 280-282
  • cmd/go/internal/load/pkg.go lines 1670-1672

I think more common is:

If the range delete idiom were fast, I suspect that this would get rewritten into two loops in performance-critical code, one to do the work and one to empty the map.

@martisch

This comment has been minimized.

Show comment
Hide comment
@martisch

martisch Apr 26, 2017

Member

I had a use case for this too and would like to work on it as discussed with josharian.
Might even still make it in for 1.9.

The route of clearing all existing buckets and removing extra overflow buckets sounds good.

just supporting:

for k := range m {
  delete(m, k)
}

for now is consistent with slice clearing and provides an easy idiom to fast clearing of maps and reusing the memory.

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

can just be written as

for k, v := range m {
  doSomeWork(k, v)
}

for k, v := range m {
  delete(m, k)
}

if performance is critical. For anything additional in the loop it is currently to complex to detemine in the frontend that nothing else changes the map.

i think the side effects of the code above are less surprising than if the result (map backing memory)
of the below code is dependent on the size of m.

m = map[K]V{}

And it seems inconsistent with:

s := make([]int, 0, 10)
s = []int{}

not resulting in cap(s) == 10 currently.

Member

martisch commented Apr 26, 2017

I had a use case for this too and would like to work on it as discussed with josharian.
Might even still make it in for 1.9.

The route of clearing all existing buckets and removing extra overflow buckets sounds good.

just supporting:

for k := range m {
  delete(m, k)
}

for now is consistent with slice clearing and provides an easy idiom to fast clearing of maps and reusing the memory.

for k, v := range m { // m == work to do
  doSomeWork(k, v)
  delete(m, k)
}

can just be written as

for k, v := range m {
  doSomeWork(k, v)
}

for k, v := range m {
  delete(m, k)
}

if performance is critical. For anything additional in the loop it is currently to complex to detemine in the frontend that nothing else changes the map.

i think the side effects of the code above are less surprising than if the result (map backing memory)
of the below code is dependent on the size of m.

m = map[K]V{}

And it seems inconsistent with:

s := make([]int, 0, 10)
s = []int{}

not resulting in cap(s) == 10 currently.

@martisch martisch self-assigned this Apr 26, 2017

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 May 17, 2017

Contributor

Two tricky cases:

func g(m map[int]int) int {
	var k int
	for k = range m {
		delete(m, k)
	}
	return k
}

We need to avoid the optimization here because k is used after the loop.
(Or have the mapclear call return a key? But that doesn't work for len(m)==0.)
This is easier for ranging over slices because we just assign k=len(s)-1 and if not used it gets deadcode eliminated.

func f(m map[float64]int) {
	for k := range m {
		delete(m, k)
	}
}

If there are NaNs as keys, then their entries should survive the loop.

Contributor

randall77 commented May 17, 2017

Two tricky cases:

func g(m map[int]int) int {
	var k int
	for k = range m {
		delete(m, k)
	}
	return k
}

We need to avoid the optimization here because k is used after the loop.
(Or have the mapclear call return a key? But that doesn't work for len(m)==0.)
This is easier for ranging over slices because we just assign k=len(s)-1 and if not used it gets deadcode eliminated.

func f(m map[float64]int) {
	for k := range m {
		delete(m, k)
	}
}

If there are NaNs as keys, then their entries should survive the loop.

@odeke-em

This comment has been minimized.

Show comment
Hide comment
@odeke-em

odeke-em Jul 30, 2017

Member

To add to @randall77's mentioned tricky cases in #20138 (comment), it might seem a little obvious, but there is another one in which doWork is not a pure function in regards to m, or actually does something with m

doWork := func(k, v) {
   if m[k+1] % 2 == 0 {
      return
   }
   // do the work or even modifies m
}

for k, v := range m {
   doWork(k, v)
   delete(m, k)
}

in this case we just can't convert it to

for k, v := range m {
  doWork(k, v)
}
for k := range m {
  delete(m, k)
}
// or
runtime_mapclear(m)
Member

odeke-em commented Jul 30, 2017

To add to @randall77's mentioned tricky cases in #20138 (comment), it might seem a little obvious, but there is another one in which doWork is not a pure function in regards to m, or actually does something with m

doWork := func(k, v) {
   if m[k+1] % 2 == 0 {
      return
   }
   // do the work or even modifies m
}

for k, v := range m {
   doWork(k, v)
   delete(m, k)
}

in this case we just can't convert it to

for k, v := range m {
  doWork(k, v)
}
for k := range m {
  delete(m, k)
}
// or
runtime_mapclear(m)
@martisch

This comment has been minimized.

Show comment
Hide comment
@martisch

martisch Jul 30, 2017

Member

@odeke-em indeed. As a start it suffices to recognize the simple case:

for k, v := range m {
  delete(m, k)
}

as this is consistent with the detection capability of the slice/array clearing idiom.

Later we can think about making all the clearing idioms more sophisticated if warranted.

Member

martisch commented Jul 30, 2017

@odeke-em indeed. As a start it suffices to recognize the simple case:

for k, v := range m {
  delete(m, k)
}

as this is consistent with the detection capability of the slice/array clearing idiom.

Later we can think about making all the clearing idioms more sophisticated if warranted.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jul 30, 2017

Contributor
Contributor

davecheney commented Jul 30, 2017

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Jul 30, 2017

Member

@davecheney, that would require it can prove that no references to the old m map value exist in a live state anywhere. I wouldn't hold my breath waiting for such things. (e.g. #2205)

Whereas just recognizing the for-delete loop doesn't require any new analysis.

Member

bradfitz commented Jul 30, 2017

@davecheney, that would require it can prove that no references to the old m map value exist in a live state anywhere. I wouldn't hold my breath waiting for such things. (e.g. #2205)

Whereas just recognizing the for-delete loop doesn't require any new analysis.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jul 31, 2017

Contributor
Contributor

davecheney commented Jul 31, 2017

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Jul 31, 2017

Contributor
m := map[int]int{1:2,3:4}
c <- m
m = map[int]int{}

You can't clear m at the third line. It would modify the original map that we sent over the channel.

Contributor

randall77 commented Jul 31, 2017

m := map[int]int{1:2,3:4}
c <- m
m = map[int]int{}

You can't clear m at the third line. It would modify the original map that we sent over the channel.

@martisch martisch modified the milestones: Go1.10, Go1.11 Nov 10, 2017

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Dec 16, 2017

Member

Could something similar be done for map copies? That is, recognising:

m2 := make(map[K]V, len(m1)) // or no capacity
for k, v := range m1 {
    m2[k] = v
}

A quick search yields a dozen matches in the tree (ignore the one with Header):

$ gogrep '~ $m2 = $_; $*_; for $k, $v := range $(m1 is(map)) { $m2[$k] = $v }' std cmd
archive/tar/common.go:680:4: h.Xattrs = make(map[string]string); for k, v := range sys.Xattrs { h.Xattrs[k] = v; }
archive/tar/common.go:692:4: h.PAXRecords = make(map[string]string); for k, v := range sys.PAXRecords { h.PAXRecords[k] = v; }
cmd/fix/typecheck.go:192:7: cfg1.Type = make(map[string]*Type); for k, v := range cfg.Type { cfg1.Type[k] = v; }
cmd/vendor/github.com/google/pprof/driver/driver.go:276:2: isrcs := MappingSources{}; for m, s := range srcs { isrcs[m] = s; }
encoding/gob/type.go:272:2: builtinIdToType = make(map[typeId]gobType); for k, v := range idToType { builtinIdToType[k] = v; }
encoding/gob/type.go:754:2: newm := make(map[reflect.Type]*typeInfo); m, _ := typeInfoMap.Load().(map[reflect.Type]*typeInfo); for k, v := range m { newm[k] = v; }
encoding/json/encode.go:1285:2: newM := make(map[reflect.Type][]field, len(m)+1); for k, v := range m { newM[k] = v; }
net/http/server.go:3127:3: dst := w.Header(); for k, vv := range tw.h { dst[k] = vv; }
net/http/transport.go:512:2: newMap := make(map[string]RoundTripper); for k, v := range oldMap { newMap[k] = v; }
net/internal/socktest/switch.go:47:2: tab := make(Sockets, len(sw.sotab)); for i, s := range sw.sotab { tab[i] = s; }
runtime/pprof/label.go:40:2: childLabels := make(labelMap); parentLabels := labelValue(ctx); for k, v := range parentLabels { childLabels[k] = v; }
sync/atomic/value_test.go:189:3: m2 := make(Map); for k, v := range m1 { m2[k] = v; }
sync/map_reference_test.go:146:2: dirty := make(map[interface{}]interface{}, len(clean)+1); for k, v := range clean { dirty[k] = v; }

For the sake of completeness, here's the command one can use to find instances of the map-clearing range idiom:

$ gogrep 'for $k := range $m { delete($m, $k) }' std cmd
cmd/compile/internal/gc/ssa.go:373:2: for n := range s.fwdVars { delete(s.fwdVars, n); }
cmd/compile/internal/ssa/regalloc.go:1799:2: for k := range e.contents { delete(e.contents, k); }
cmd/go/internal/load/pkg.go:306:2: for name := range packageCache { delete(packageCache, name); }
cmd/go/internal/load/pkg.go:1275:2: for name := range cmdCache { delete(cmdCache, name); }
cmd/gofmt/rewrite.go:70:3: for k := range m { delete(m, k); }
Member

mvdan commented Dec 16, 2017

Could something similar be done for map copies? That is, recognising:

m2 := make(map[K]V, len(m1)) // or no capacity
for k, v := range m1 {
    m2[k] = v
}

A quick search yields a dozen matches in the tree (ignore the one with Header):

$ gogrep '~ $m2 = $_; $*_; for $k, $v := range $(m1 is(map)) { $m2[$k] = $v }' std cmd
archive/tar/common.go:680:4: h.Xattrs = make(map[string]string); for k, v := range sys.Xattrs { h.Xattrs[k] = v; }
archive/tar/common.go:692:4: h.PAXRecords = make(map[string]string); for k, v := range sys.PAXRecords { h.PAXRecords[k] = v; }
cmd/fix/typecheck.go:192:7: cfg1.Type = make(map[string]*Type); for k, v := range cfg.Type { cfg1.Type[k] = v; }
cmd/vendor/github.com/google/pprof/driver/driver.go:276:2: isrcs := MappingSources{}; for m, s := range srcs { isrcs[m] = s; }
encoding/gob/type.go:272:2: builtinIdToType = make(map[typeId]gobType); for k, v := range idToType { builtinIdToType[k] = v; }
encoding/gob/type.go:754:2: newm := make(map[reflect.Type]*typeInfo); m, _ := typeInfoMap.Load().(map[reflect.Type]*typeInfo); for k, v := range m { newm[k] = v; }
encoding/json/encode.go:1285:2: newM := make(map[reflect.Type][]field, len(m)+1); for k, v := range m { newM[k] = v; }
net/http/server.go:3127:3: dst := w.Header(); for k, vv := range tw.h { dst[k] = vv; }
net/http/transport.go:512:2: newMap := make(map[string]RoundTripper); for k, v := range oldMap { newMap[k] = v; }
net/internal/socktest/switch.go:47:2: tab := make(Sockets, len(sw.sotab)); for i, s := range sw.sotab { tab[i] = s; }
runtime/pprof/label.go:40:2: childLabels := make(labelMap); parentLabels := labelValue(ctx); for k, v := range parentLabels { childLabels[k] = v; }
sync/atomic/value_test.go:189:3: m2 := make(Map); for k, v := range m1 { m2[k] = v; }
sync/map_reference_test.go:146:2: dirty := make(map[interface{}]interface{}, len(clean)+1); for k, v := range clean { dirty[k] = v; }

For the sake of completeness, here's the command one can use to find instances of the map-clearing range idiom:

$ gogrep 'for $k := range $m { delete($m, $k) }' std cmd
cmd/compile/internal/gc/ssa.go:373:2: for n := range s.fwdVars { delete(s.fwdVars, n); }
cmd/compile/internal/ssa/regalloc.go:1799:2: for k := range e.contents { delete(e.contents, k); }
cmd/go/internal/load/pkg.go:306:2: for name := range packageCache { delete(packageCache, name); }
cmd/go/internal/load/pkg.go:1275:2: for name := range cmdCache { delete(cmdCache, name); }
cmd/gofmt/rewrite.go:70:3: for k := range m { delete(m, k); }
@martisch

This comment has been minimized.

Show comment
Hide comment
@martisch

martisch Dec 17, 2017

Member

Still coding up the solution to this one for 1.11.

Map copies could be done but it would need to be considered how this will interact when the copied map is changed during the copy so the new copied map does not end up corrupted. Also needs to honor the "If a map entry that has not yet been reached is removed during iteration, the corresponding iteration value will not be produced." semantics.

Best to open a new issue to discuss the details.

Instead of doing all this pattern matching which also is made harder by usually being applied after the order pass we might want to consider having a generic clear and shallow copy function working on maps (or all types) for go 2. Those functions could also handle e.g. the NaN key case differently (clear the entry) to allow for a simpler and faster implementation.

Member

martisch commented Dec 17, 2017

Still coding up the solution to this one for 1.11.

Map copies could be done but it would need to be considered how this will interact when the copied map is changed during the copy so the new copied map does not end up corrupted. Also needs to honor the "If a map entry that has not yet been reached is removed during iteration, the corresponding iteration value will not be produced." semantics.

Best to open a new issue to discuss the details.

Instead of doing all this pattern matching which also is made harder by usually being applied after the order pass we might want to consider having a generic clear and shallow copy function working on maps (or all types) for go 2. Those functions could also handle e.g. the NaN key case differently (clear the entry) to allow for a simpler and faster implementation.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Apr 27, 2018

Change https://golang.org/cl/110055 mentions this issue: cmd/compile: optimize map-clearing range idiom

Change https://golang.org/cl/110055 mentions this issue: cmd/compile: optimize map-clearing range idiom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment