New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

Open
pindamonhangaba opened this Issue Dec 9, 2015 · 11 comments

Comments

Projects
None yet
10 participants
@pindamonhangaba

pindamonhangaba commented Dec 9, 2015

On Ubuntu 14.04, Go 1.4.1 64 bits code compiles fine, with 1.5.2 64 bits I get the following error:

    # command-line-arguments
    fatal error: runtime: out of memory

    runtime stack:
    runtime.throw(0x843f10, 0x16)
            /usr/local/go/src/runtime/panic.go:527 +0x90
    runtime.sysMap(0xc8fc0c0000, 0xf6d0000, 0x44a600, 0xdab818)
            /usr/local/go/src/runtime/mem_linux.go:203 +0x9b
    runtime.mHeap_SysAlloc(0xc95400, 0xf6d0000, 0x0)
            /usr/local/go/src/runtime/malloc.go:426 +0x160
    runtime.mHeap_Grow(0xc95400, 0x7b68, 0x0)
            /usr/local/go/src/runtime/mheap.go:628 +0x63
    runtime.mHeap_AllocSpanLocked(0xc95400, 0x7b63, 0x0)
            /usr/local/go/src/runtime/mheap.go:532 +0x5f1
    runtime.mHeap_Alloc_m(0xc95400, 0x7b63, 0x100000000, 0xc81982ccff)
            /usr/local/go/src/runtime/mheap.go:425 +0x1ac
    runtime.mHeap_Alloc.func1()
            /usr/local/go/src/runtime/mheap.go:484 +0x41
    runtime.systemstack(0xc82003dee0)
            /usr/local/go/src/runtime/asm_amd64.s:278 +0xab
    runtime.mHeap_Alloc(0xc95400, 0x7b63, 0x100000000, 0xc820022720)
            /usr/local/go/src/runtime/mheap.go:485 +0x63
    runtime.largeAlloc(0xf6c6000, 0x3, 0x0)
            /usr/local/go/src/runtime/malloc.go:748 +0xb3
    runtime.mallocgc.func3()
            /usr/local/go/src/runtime/malloc.go:637 +0x33
    runtime.systemstack(0xc820021500)
            /usr/local/go/src/runtime/asm_amd64.s:262 +0x79
    runtime.mstart()
            /usr/local/go/src/runtime/proc1.go:668

    goroutine 1 [running]:
    runtime.systemstack_switch()
            /usr/local/go/src/runtime/asm_amd64.s:216 fp=0xc8b55892c8 sp=0xc8b55892c0
    runtime.mallocgc(0xf6c6000, 0x0, 0x3, 0xc8efa66000)
            /usr/local/go/src/runtime/malloc.go:638 +0x9c4 fp=0xc8b5589398 sp=0xc8b55892c8
    runtime.rawmem(0xf6c6000, 0xc8b55893d8)
            /usr/local/go/src/runtime/malloc.go:791 +0x32 fp=0xc8b55893c0 sp=0xc8b5589398
    runtime.growslice(0x785c20, 0xc8efa66000, 0xc56a000, 0xc56a000, 0xc56a001, 0x0, 0x0, 0x0)
            /usr/local/go/src/runtime/slice.go:92 +0x233 fp=0xc8b5589430 sp=0xc8b55893c0
    cmd/internal/obj.Symgrow(0xc8200aa000, 0xc8309ec1e0, 0xc56a001)
            /usr/local/go/src/cmd/internal/obj/data.go:52 +0x23b fp=0xc8b55894f0 sp=0xc8b5589430
    cmd/internal/obj.Setuintxx(0xc8200aa000, 0xc8309ec1e0, 0xc56a000, 0x0, 0x1, 0xc56a000)
            /usr/local/go/src/cmd/internal/obj/data.go:126 +0x76 fp=0xc8b5589548 sp=0xc8b55894f0
    cmd/compile/internal/gc.duintxx(0xc8309a5100, 0xc56a000, 0x0, 0x1, 0xc56a000)
            /usr/local/go/src/cmd/compile/internal/gc/obj.go:178 +0x97 fp=0xc8b5589630 sp=0xc8b5589548
    cmd/compile/internal/gc.duint8(0xc8309a5100, 0xc56a000, 0x12b00, 0xc56a000)
            /usr/local/go/src/cmd/compile/internal/gc/obj.go:182 +0x3e fp=0xc8b5589660 sp=0xc8b5589630
    cmd/compile/internal/gc.onebitwritesymbol(0xc8bcc82000, 0x63df, 0x7600, 0xc8309a5100)
            /usr/local/go/src/cmd/compile/internal/gc/plive.go:1747 +0x19b fp=0xc8b55896e8 sp=0xc8b5589660
    cmd/compile/internal/gc.liveness(0xc8201fcc60, 0xc8214de000, 0xc8309a5080, 0xc8309a5100)
            /usr/local/go/src/cmd/compile/internal/gc/plive.go:1815 +0x3ba fp=0xc8b55897c8 sp=0xc8b55896e8
    cmd/compile/internal/gc.compile(0xc8201fcc60)
            /usr/local/go/src/cmd/compile/internal/gc/pgen.go:528 +0xe07 fp=0xc8b5589a38 sp=0xc8b55897c8
    cmd/compile/internal/gc.funccompile(0xc8201fcc60)
            /usr/local/go/src/cmd/compile/internal/gc/dcl.go:1484 +0x1c9 fp=0xc8b5589ab0 sp=0xc8b5589a38
    cmd/compile/internal/gc.Main()
            /usr/local/go/src/cmd/compile/internal/gc/lex.go:473 +0x1f08 fp=0xc8b5589db0 sp=0xc8b5589ab0
    cmd/compile/internal/amd64.Main()
            /usr/local/go/src/cmd/compile/internal/amd64/galign.go:127 +0x58d fp=0xc8b5589e18 sp=0xc8b5589db0
    main.main()
            /usr/local/go/src/cmd/compile/main.go:26 +0x189 fp=0xc8b5589ef0 sp=0xc8b5589e18
    runtime.main()
            /usr/local/go/src/runtime/proc.go:111 +0x2b0 fp=0xc8b5589f40 sp=0xc8b5589ef0
    runtime.goexit()
            /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1 fp=0xc8b5589f48 sp=0xc8b5589f40

As a side note, after compiling this example I saw that the executable had around 500 MB in size! So I realized that I should've just made it a literal map, and that brought it back to 8MB. And it compiled in both 1.4.1 and 1.5.2.

noram.go

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 9, 2015

/cc @aclements for triage.

@bradfitz bradfitz changed the title from 1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map to runtime: 1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map Dec 9, 2015

@ianlancetaylor ianlancetaylor changed the title from runtime: 1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map to cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map Dec 9, 2015

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 9, 2015

This is the compiler running out of memory, not the runtime.

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Dec 9, 2015

@aclements

This comment has been minimized.

Member

aclements commented Dec 10, 2015

I don't think this is new in 1.5.2. Both 1.5.2 and 1.4.1 use ~3.5GB of memory when compiling this.

$ /usr/bin/time $(gover run 1.5.2 go tool -n compile noram.go)
28.48user 8.36system 0:31.88elapsed 115%CPU (0avgtext+0avgdata 3512724maxresident)k
25224inputs+506144outputs (108major+800295minor)pagefaults 0swaps
$ du -h noram.o
248M    noram.o

$ /usr/bin/time $(gover run 1.4.1 go tool -n 6g noram.go) 
17.16user 2.46system 0:20.93elapsed 93%CPU (0avgtext+0avgdata 3701620maxresident)k
248inputs+983576outputs (1major+922620minor)pagefaults 0swaps
$ du -h noram.o
248M    noram.o

Tip (b8b65c1) is basically the same as 1.5.2. Since the memory use of the C version of the compiler is similar to the Go version of the compiler (a little worse, actually), this is certainly just the compiler using a lot of memory and not a runtime memory management issue.

@aclements aclements assigned randall77 and unassigned aclements Dec 10, 2015

@randall77

This comment has been minimized.

Contributor

randall77 commented Apr 29, 2016

This appears somewhat better now. It only used ~2GB to compile at tip.
I'm going to punt this to 1.8 for now. Almost freeze and no pending changes I know of will help this issue.

@randall77 randall77 modified the milestones: Go1.8, Go1.7 Apr 29, 2016

@quentinmit quentinmit added the NeedsFix label Oct 11, 2016

@rsc rsc modified the milestones: Go1.9, Go1.8 Oct 21, 2016

@josharian

This comment has been minimized.

Contributor

josharian commented May 17, 2017

Comparing 1.7.5, 1.8.1, and tip at c20e545 shows a mixed bag. Compile time is up, but memory use is down, and object file size is way down.

name \ time/op       17           18           tip
Pkg                   18.1s ± 3%   32.3s ± 3%   53.1s ± 2%

name \ user-time/op  17           18           tip
Pkg                   20.9s ± 4%   35.2s ± 3%   55.0s ± 2%

name \ alloc/op      17           18           tip
Pkg                  2.58GB ± 0%  4.14GB ± 0%  2.04GB ± 0%

name \ allocs/op     17           18           tip
Pkg                   6.34M ± 0%   6.99M ± 0%   6.21M ± 0%

name \ object-bytes  17           18           tip
Pkg                    252M ± 0%    193M ± 0%     28M ± 0%

name \ export-bytes  17           18           tip
Pkg                     176 ± 0%     181 ± 0%     208 ± 0%

I suspect that the memory usage decrease is due to CL 35114.

Compiling this file now spends a whopping 64% of its CPU time on line fuse.go:123, which is v.Block = c from:

	// move all of b's values to c.
	for _, v := range b.Values {
		v.Block = c
	}

Looks like fuseBlockPlain is accidentally quadratic. In a long series of plain blocks, every value gets moved along one at a time, block by block. I'll think about whether there's a low risk fix for that.

@gopherbot

This comment has been minimized.

gopherbot commented May 17, 2017

CL https://golang.org/cl/43570 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented May 17, 2017

CL https://golang.org/cl/43571 mentions this issue.

@josharian

This comment has been minimized.

Contributor

josharian commented May 17, 2017

There were actually two quadratic bits. CLs 43570 and 43571 contain fairly minimal fixes for them. Up to @randall77 and @cherrymui whether they want them for 1.9 or 1.10, and whether @cherrymui would like to do 43571 differently.

After those CLs, this code now compiles in a somewhat acceptable 8s, although memory usage is still high at 2gb. I don't see any obvious next fixes for the memory usage. The next likely source of speed-ups would be to fix #20393.

gopherbot pushed a commit that referenced this issue May 17, 2017

cmd/compile: fuse from end to beginning
fuseBlockPlain was accidentally quadratic.
If you had plain blocks b1 -> b2 -> b3 -> b4,
each containing single values v1, v2, v3, and v4 respectively,
fuseBlockPlain would move v1 from b1 to b2 to b3 to b4,
then v2 from b2 to b3 to b4, etc.

There are two obvious fixes.

* Look for runs of blocks in fuseBlockPlain
  and handle them in a single go.
* Fuse from end to beginning; any given value in a run
  of blocks to fuse then moves only once.

The latter is much simpler, so that's what this CL does.

Somewhat surprisingly, this change does not pass toolstash-check.

The resulting set of blocks is the same,
and the values in them are the same,
but the order of values in them differ,
and that order of values (while arbitrary)
is enough to change the compiler's output.
This may be due to #20178; deadstore is the next pass after fuse.

Adding basic sorting to the beginning of deadstore
is enough to make this CL pass toolstash-check:

	for _, b := range f.Blocks {
		obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID })
	}

Happily, this CL appears to result in better code on average,
if only by accident. It cuts 4k off of cmd/go; go1 benchmarks
are noisy as always but don't regress (numbers below).

No impact on the standard compilebench benchmarks.
For the code in #13554, this speeds up compilation dramatically:

name  old time/op       new time/op       delta
Pkg         53.1s ± 2%        12.8s ± 3%  -75.92%  (p=0.008 n=5+5)

name  old user-time/op  new user-time/op  delta
Pkg         55.0s ± 2%        14.9s ± 3%  -73.00%  (p=0.008 n=5+5)

name  old alloc/op      new alloc/op      delta
Pkg        2.04GB ± 0%       2.04GB ± 0%   +0.18%  (p=0.008 n=5+5)

name  old allocs/op     new allocs/op     delta
Pkg         6.21M ± 0%        6.21M ± 0%     ~     (p=0.222 n=5+5)

name  old object-bytes  new object-bytes  delta
Pkg         28.4M ± 0%        28.4M ± 0%   +0.00%  (p=0.008 n=5+5)

name  old export-bytes  new export-bytes  delta
Pkg           208 ± 0%          208 ± 0%     ~     (all equal)


Updates #13554


go1 benchmarks:

name                     old time/op    new time/op    delta
BinaryTree17-8              2.29s ± 2%     2.26s ± 2%  -1.43%  (p=0.000 n=48+50)
Fannkuch11-8                2.74s ± 2%     2.79s ± 2%  +1.63%  (p=0.000 n=50+49)
FmtFprintfEmpty-8          36.6ns ± 3%    34.6ns ± 4%  -5.29%  (p=0.000 n=49+50)
FmtFprintfString-8         58.3ns ± 3%    59.1ns ± 3%  +1.35%  (p=0.000 n=50+49)
FmtFprintfInt-8            62.4ns ± 2%    63.2ns ± 3%  +1.19%  (p=0.000 n=49+49)
FmtFprintfIntInt-8         95.1ns ± 2%    96.7ns ± 3%  +1.61%  (p=0.000 n=49+50)
FmtFprintfPrefixedInt-8     118ns ± 3%     113ns ± 2%  -4.00%  (p=0.000 n=50+49)
FmtFprintfFloat-8           191ns ± 2%     192ns ± 2%  +0.40%  (p=0.034 n=50+50)
FmtManyArgs-8               419ns ± 2%     420ns ± 2%    ~     (p=0.228 n=49+49)
GobDecode-8                5.26ms ± 3%    5.19ms ± 2%  -1.33%  (p=0.000 n=50+49)
GobEncode-8                4.12ms ± 2%    4.15ms ± 3%  +0.68%  (p=0.007 n=49+50)
Gzip-8                      198ms ± 2%     197ms ± 2%  -0.50%  (p=0.018 n=48+48)
Gunzip-8                   31.9ms ± 3%    31.8ms ± 3%  -0.47%  (p=0.024 n=50+50)
HTTPClientServer-8         64.4µs ± 0%    64.0µs ± 0%  -0.55%  (p=0.000 n=43+46)
JSONEncode-8               10.6ms ± 2%    10.6ms ± 3%    ~     (p=0.543 n=49+49)
JSONDecode-8               43.3ms ± 3%    43.1ms ± 2%    ~     (p=0.079 n=50+50)
Mandelbrot200-8            3.70ms ± 2%    3.70ms ± 2%    ~     (p=0.553 n=47+50)
GoParse-8                  2.70ms ± 2%    2.71ms ± 3%    ~     (p=0.843 n=49+50)
RegexpMatchEasy0_32-8      70.5ns ± 4%    70.4ns ± 4%    ~     (p=0.867 n=48+50)
RegexpMatchEasy0_1K-8       162ns ± 3%     162ns ± 2%    ~     (p=0.739 n=48+48)
RegexpMatchEasy1_32-8      66.1ns ± 5%    66.2ns ± 4%    ~     (p=0.970 n=50+50)
RegexpMatchEasy1_1K-8       297ns ± 7%     296ns ± 7%    ~     (p=0.406 n=50+50)
RegexpMatchMedium_32-8      105ns ± 5%     105ns ± 5%    ~     (p=0.702 n=50+50)
RegexpMatchMedium_1K-8     32.3µs ± 4%    32.2µs ± 3%    ~     (p=0.614 n=49+49)
RegexpMatchHard_32-8       1.75µs ±18%    1.74µs ±12%    ~     (p=0.738 n=50+48)
RegexpMatchHard_1K-8       52.2µs ±14%    51.3µs ±13%    ~     (p=0.230 n=50+50)
Revcomp-8                   366ms ± 3%     367ms ± 3%    ~     (p=0.745 n=49+49)
Template-8                 48.5ms ± 4%    48.5ms ± 4%    ~     (p=0.824 n=50+48)
TimeParse-8                 263ns ± 2%     256ns ± 2%  -2.98%  (p=0.000 n=48+49)
TimeFormat-8                265ns ± 3%     262ns ± 3%  -1.35%  (p=0.000 n=48+49)
[Geo mean]                 41.1µs         40.9µs       -0.48%


Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a
Reviewed-on: https://go-review.googlesource.com/43570
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>

gopherbot pushed a commit that referenced this issue May 18, 2017

cmd/compile: search for remaining WB ops from end to beginning
The writebarrier pass processes WB ops from beginning to end,
replacing them by other values.
But it also checks whether there are more ops to process
by walking from beginning to end.
This is quadratic, so walk from end to beginning instead.

This speeds up compiling the code in issue 13554:

name  old time/op       new time/op       delta
Pkg         11.9s ± 2%         8.3s ± 3%  -29.88%  (p=0.000 n=18+17)

Updates #13554

Passes toolstash-check.

Change-Id: I5f8a872ddc4b783540220d89ea2ee188a6d2b2ff
Reviewed-on: https://go-review.googlesource.com/43571
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

@josharian josharian modified the milestones: Go1.10, Go1.9 May 29, 2017

@mdempsky mdempsky modified the milestones: Go1.10, Go1.11 Nov 30, 2017

@gopherbot

This comment has been minimized.

gopherbot commented May 1, 2018

Change https://golang.org/cl/110560 mentions this issue: cmd/compile: optimize bvec routines

gopherbot pushed a commit that referenced this issue May 1, 2018

cmd/compile: optimize bvec routines
The recent improvements to the prove pass
make it possible to provide bounds
hints to the compiler in some bvec routines.

This speeds up the compilation of the code in

name  old time/op       new time/op       delta
Pkg         7.93s ± 4%        7.69s ± 3%  -2.98%  (p=0.000 n=29+26)

While we're here, clean up some C-isms.

Updates #13554
Updates #20393

Change-Id: I47a0ec68543a9fc95c5359c3f37813fb529cb4f0
Reviewed-on: https://go-review.googlesource.com/110560
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

@josharian josharian modified the milestones: Go1.11, Go1.12 May 1, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Nov 27, 2018

Change https://golang.org/cl/151497 mentions this issue: cmd/compile: don't generate algorithms for slice literal backing store

@gopherbot

This comment has been minimized.

gopherbot commented Nov 27, 2018

Change https://golang.org/cl/151498 mentions this issue: cmd/compile: eliminate write barriers when writing non-heap ptrs

gopherbot pushed a commit that referenced this issue Nov 29, 2018

cmd/compile: eliminate write barriers when writing non-heap ptrs
We don't need a write barrier if:
1) The location we're writing to doesn't hold a heap pointer, and
2) The value we're writing isn't a heap pointer.

The freshly returned value from runtime.newobject satisfies (1).
Pointers to globals, and the contents of the read-only data section satisfy (2).

This is particularly helpful for code like:
p := []string{"abc", "def", "ghi"}

Where the compiler generates:
   a := new([3]string)
   move(a, statictmp_)  // eliminates write barriers here
   p := a[:]

For big slice literals, this makes the code a smaller and faster to
compile.

Update #13554. Reduces the compile time by ~10% and RSS by ~30%.

Change-Id: Icab81db7591c8777f68e5d528abd48c7e44c87eb
Reviewed-on: https://go-review.googlesource.com/c/151498
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment