New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: an FSM with ~40k states takes a lot of memory to build #19379

Open
opennota opened this Issue Mar 3, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@opennota

opennota commented Mar 3, 2017

What version of Go are you using (go version)?

go version go1.8 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/opennota/gocode"
GORACE=""
GOROOT="/home/opennota/go"
GOTOOLDIR="/home/opennota/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build723393668=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I tried to build x.go from here: https://github.com/opennota/x . It is a 3.3 MB / 230 kLOC file containing an autogenerated finite state machine with about 40k states.

What did you expect to see?

The build is completed in a matter of seconds (there's no cgo, after all).

What did you see instead?

The build takes a goodish while and slowly consumes all the available memory. Actually, despite having 8 GB of memory I wasn't able to build the package.

With Go 1.7.3 the memory usage is stable (not increasing), but again, the build is a way too slow.

@mvdan mvdan changed the title from cmd/go: an FSM with ~40k states takes a lot of memory to build to cmd/compile: an FSM with ~40k states takes a lot of memory to build Mar 3, 2017

@mvdan

This comment has been minimized.

Member

mvdan commented Mar 3, 2017

Perhaps dup of #14082?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 3, 2017

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 3, 2017

Slow things here: phi insertion, deadcode elimination. Similar to #17926, but not quite a duplicate. Tip is about 5% better than 1.8 on both CPU and memory.

We could do a better job with switch statements, which would make a real difference here. My plan for them is to fix #19250, then move lowering of OSWITCH from the mid-tier to the AST->SSA conversion. That should prevent us from generating many pointless blocks (via labels and gotos), which will mean doing much less work here.

In the meantime, if you want to work around this, a good first step is: if your inner switch statements have dense enough cases, I recommend writing them as array lookups--st = arr[r-<offset>]. This should be both faster to compile and faster to run. (Yes, long term, the compiler could do this transformation for you. Even then, you'll get better compilation speeds by not making it do that work. And since the code is autogenerated, seems like you may as well.)

@josharian josharian added the ToolSpeed label Mar 3, 2017

@josharian josharian added this to the Go1.9Maybe milestone Mar 3, 2017

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 4, 2017

Ugh. Every time I look at lowering switch statements differently I am reminded what a pain it will be. I found a simpler approach in the meantime that helps almost as much here. Forthcoming.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 4, 2017

CL https://golang.org/cl/37760 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 6, 2017

cmd/compile: avoid generating some dead blocks
We generate a lot of pointless dead blocks
during the AST to SSA conversion.
There are a few commonly occurring kinds
of statements that contain neither variables
nor code and that switch to a new block themselves.
Stop making dead blocks for them.

For the code in #19379, this reduces compilation
wall time by 36% and max rss by 28%.

This also helps a little for regular code,
particularly code heavy on switch statements.

name       old time/op      new time/op      delta
Template        231ms ± 3%       230ms ± 5%    ~     (p=0.402 n=17+16)
Unicode         101ms ± 4%       103ms ± 5%    ~     (p=0.221 n=19+18)
GoTypes         635ms ± 5%       625ms ± 4%    ~     (p=0.063 n=20+18)
Compiler        2.93s ± 2%       2.89s ± 2%  -1.22%  (p=0.003 n=20+19)
SSA             4.53s ± 3%       4.52s ± 3%    ~     (p=0.380 n=20+19)
Flate           132ms ± 4%       133ms ± 5%    ~     (p=0.647 n=20+19)
GoParser        161ms ± 3%       161ms ± 4%    ~     (p=0.749 n=20+19)
Reflect         403ms ± 4%       397ms ± 3%  -1.53%  (p=0.030 n=20+19)
Tar             121ms ± 2%       121ms ± 8%    ~     (p=0.544 n=19+19)
XML             225ms ± 3%       224ms ± 4%    ~     (p=0.396 n=20+19)

name       old user-ns/op   new user-ns/op   delta
Template   302user-ms ± 1%  297user-ms ± 7%  -1.49%  (p=0.048 n=15+18)
Unicode    142user-ms ± 3%  143user-ms ± 5%    ~     (p=0.363 n=19+17)
GoTypes    852user-ms ± 5%  851user-ms ± 3%    ~     (p=0.851 n=20+18)
Compiler   4.11user-s ± 6%  3.98user-s ± 3%  -3.08%  (p=0.000 n=20+19)
SSA        6.91user-s ± 5%  6.82user-s ± 7%    ~     (p=0.113 n=20+19)
Flate      164user-ms ± 4%  168user-ms ± 4%  +2.42%  (p=0.001 n=18+19)
GoParser   207user-ms ± 4%  206user-ms ± 4%    ~     (p=0.176 n=20+18)
Reflect    509user-ms ± 4%  505user-ms ± 4%    ~     (p=0.113 n=20+19)
Tar        153user-ms ± 7%  151user-ms ± 9%    ~     (p=0.283 n=20+19)
XML        284user-ms ± 4%  282user-ms ± 4%    ~     (p=0.270 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       42.6MB ± 0%      41.9MB ± 0%  -1.55%  (p=0.000 n=19+19)
Unicode        31.7MB ± 0%      31.7MB ± 0%    ~     (p=0.828 n=20+18)
GoTypes         124MB ± 0%       121MB ± 0%  -2.11%  (p=0.000 n=20+17)
Compiler        534MB ± 0%       523MB ± 0%  -2.06%  (p=0.000 n=20+19)
SSA             989MB ± 0%       977MB ± 0%  -1.28%  (p=0.000 n=20+19)
Flate          27.8MB ± 0%      27.5MB ± 0%  -0.98%  (p=0.000 n=20+19)
GoParser       34.3MB ± 0%      34.0MB ± 0%  -0.81%  (p=0.000 n=20+19)
Reflect        84.6MB ± 0%      82.9MB ± 0%  -2.00%  (p=0.000 n=17+18)
Tar            28.8MB ± 0%      28.3MB ± 0%  -1.52%  (p=0.000 n=16+18)
XML            47.2MB ± 0%      45.8MB ± 0%  -2.99%  (p=0.000 n=20+19)

name       old allocs/op    new allocs/op    delta
Template         421k ± 1%        419k ± 1%  -0.41%  (p=0.001 n=20+19)
Unicode          338k ± 1%        338k ± 1%    ~     (p=0.478 n=20+19)
GoTypes         1.28M ± 0%       1.28M ± 0%  -0.36%  (p=0.000 n=20+18)
Compiler        5.06M ± 0%       5.03M ± 0%  -0.63%  (p=0.000 n=20+19)
SSA             9.14M ± 0%       9.11M ± 0%  -0.34%  (p=0.000 n=20+19)
Flate            267k ± 1%        266k ± 1%    ~     (p=0.149 n=20+19)
GoParser         347k ± 0%        347k ± 1%    ~     (p=0.103 n=19+19)
Reflect         1.07M ± 0%       1.07M ± 0%  -0.42%  (p=0.000 n=16+18)
Tar              274k ± 0%        273k ± 1%    ~     (p=0.116 n=19+19)
XML              449k ± 0%        446k ± 1%  -0.60%  (p=0.000 n=20+19)

Updates #19379

Change-Id: Ie798c347a0c081f5e349e1529880bebaae290967
Reviewed-on: https://go-review.googlesource.com/37760
Reviewed-by: David Chase <drchase@google.com>
@josharian

This comment has been minimized.

Contributor

josharian commented May 18, 2017

Tip (85b2940) is better than 1.8, but still not great:

name  old time/op       new time/op       delta
Pkg          196s ± 3%         131s ± 1%  -33.40%  (p=0.008 n=5+5)

name  old user-time/op  new user-time/op  delta
Pkg          233s ± 5%         153s ± 3%  -34.23%  (p=0.008 n=5+5)

name  old alloc/op      new alloc/op      delta
Pkg        19.8GB ± 0%       12.5GB ± 0%  -36.80%  (p=0.008 n=5+5)

name  old allocs/op     new allocs/op     delta
Pkg         12.6M ± 0%        11.6M ± 0%   -7.97%  (p=0.008 n=5+5)

name  old object-bytes  new object-bytes  delta
Pkg         5.90M ± 0%        5.21M ± 0%  -11.66%  (p=0.008 n=5+5)

name  old export-bytes  new export-bytes  delta
Pkg          93.0 ± 0%         80.0 ± 0%  -13.98%  (p=0.008 n=5+5)

Moving to 1.10.

@josharian josharian modified the milestones: Go1.10, Go1.9Maybe May 18, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 29, 2017

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented Oct 1, 2018

@opennota - Can you show me how are you compiling the file ? There is no func main as I see. Is it with go tool compile ?

@opennota

This comment has been minimized.

opennota commented Oct 1, 2018

@agnivade go build x.go

@agnivade

This comment has been minimized.

Member

agnivade commented Oct 1, 2018

Ah thanks. So you are not generating any binary. Just tested with 1.11. With my 16GB RAM, it took nearly 11.5GB before finishing, taking nearly 5 mins approx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment