New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: go1.7beta2 performance regression (floats and arrays) #16187

Open
taruti opened this Issue Jun 26, 2016 · 10 comments

Comments

Projects
None yet
@taruti
Contributor

taruti commented Jun 26, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
go version go1.6.2 linux/amd64
go version go1.7beta2 linux/amd64
  1. What operating system and processor architecture are you using (go env)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/taru/go"
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/taru/go"
GORACE=""
GOROOT="/tmp/go"
GOTOOLDIR="/tmp/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build453968611=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?

Try some float32 heavy code with arrays on 1.7beta2, saw a problem and reduced it to a self contained example:

go get github.com/taruti/float_degradation
cd ...
go test -test.bench=.

Or fancier
while date; do go test -test.bench=. >> /tmp/old.txt; GOROOT=/tmp/go /tmp/go/bin/go test -test.bench=. >> /tmp/new.txt; done

  1. What did you expect to see?

1.7beta2 producing as good or better code than 1.6

  1. What did you see instead?
benchcmp /tmp/old.txt /tmp/new.txt 
benchmark                 old ns/op     new ns/op     delta
BenchmarkOptimize16-2     33310269      41841404      +25.61%
BenchmarkOptimize16-2     33337835      41889202      +25.65%
BenchmarkOptimize16-2     33341432      42080975      +26.21%
BenchmarkOptimize16-2     33280481      41913719      +25.94%
BenchmarkOptimize16-2     33236073      41971975      +26.28%
BenchmarkOptimize16-2     33274075      41866159      +25.82%
@dgryski

This comment has been minimized.

Contributor

dgryski commented Jun 26, 2016

Related to #15925 ?

@taruti

This comment has been minimized.

Contributor

taruti commented Jun 26, 2016

At least related probably. -gcflags -ssa=0 restores 1.6 like performance.

I think that array zeroing is here not the fault, most time is spent in
linearBin that takes a *[16]float32 but does not zero any array.

This code is simplified from machine generated code.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 26, 2016

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Jun 26, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Jun 27, 2016

Looks like the register allocator is spilling some stuff to the stack that is unnecessary.
I don't really understand why it is doing that. Will fix for 1.8.

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 21, 2016

ping @randall77

@gopherbot

This comment has been minimized.

gopherbot commented Oct 31, 2016

CL https://golang.org/cl/32442 mentions this issue.

@cherrymui cherrymui modified the milestones: Go1.9, Go1.8 Nov 11, 2016

gopherbot pushed a commit that referenced this issue Nov 18, 2016

cmd/compile: make a copy of Phi input if it is still live
Register of Phi input is allocated to the Phi. So if the Phi
input is still live after Phi, we may need to use a spill. In
this case, copy the Phi input to a spare register to avoid a
spill.

Originally targeted the code in issue #16187, and this CL
indeed removes the spill, but doesn't seem to help on benchmark
result. It may help in general, though.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.79s ± 1%     2.76s ± 0%  -1.33%  (p=0.000 n=10+10)
Fannkuch11-12                3.02s ± 0%     3.14s ± 0%  +3.99%  (p=0.000 n=10+10)
FmtFprintfEmpty-12          51.2ns ± 0%    51.4ns ± 3%    ~      (p=0.368 n=8+10)
FmtFprintfString-12          145ns ± 0%     144ns ± 0%  -0.69%    (p=0.000 n=6+9)
FmtFprintfInt-12             127ns ± 1%     124ns ± 1%  -2.79%   (p=0.000 n=10+9)
FmtFprintfIntInt-12          186ns ± 0%     184ns ± 0%  -1.34%   (p=0.000 n=10+9)
FmtFprintfPrefixedInt-12     196ns ± 0%     194ns ± 0%  -0.97%    (p=0.000 n=9+9)
FmtFprintfFloat-12           293ns ± 2%     287ns ± 0%  -2.00%   (p=0.000 n=10+9)
FmtManyArgs-12               847ns ± 1%     829ns ± 0%  -2.17%   (p=0.000 n=10+7)
GobDecode-12                7.17ms ± 0%    7.18ms ± 0%    ~     (p=0.123 n=10+10)
GobEncode-12                6.08ms ± 1%    6.08ms ± 0%    ~      (p=0.497 n=10+9)
Gzip-12                      277ms ± 1%     275ms ± 1%  -0.47%   (p=0.028 n=10+9)
Gunzip-12                   39.1ms ± 2%    38.2ms ± 1%  -2.20%   (p=0.000 n=10+9)
HTTPClientServer-12         90.9µs ± 4%    87.7µs ± 2%  -3.51%   (p=0.001 n=9+10)
JSONEncode-12               17.3ms ± 1%    16.5ms ± 0%  -5.02%    (p=0.000 n=9+9)
JSONDecode-12               54.6ms ± 1%    54.1ms ± 0%  -0.99%    (p=0.000 n=9+9)
Mandelbrot200-12            4.45ms ± 0%    4.45ms ± 0%  -0.02%    (p=0.006 n=8+9)
GoParse-12                  3.44ms ± 0%    3.48ms ± 1%  +0.95%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-12      84.9ns ± 0%    85.0ns ± 0%    ~       (p=0.241 n=8+8)
RegexpMatchEasy0_1K-12       867ns ± 3%     915ns ±11%  +5.55%  (p=0.037 n=10+10)
RegexpMatchEasy1_32-12      82.7ns ± 5%    83.9ns ± 4%    ~      (p=0.161 n=9+10)
RegexpMatchEasy1_1K-12       361ns ± 1%     363ns ± 0%    ~      (p=0.098 n=10+8)
RegexpMatchMedium_32-12      126ns ± 0%     126ns ± 1%    ~      (p=0.549 n=8+10)
RegexpMatchMedium_1K-12     38.8µs ± 0%    39.1µs ± 0%  +0.67%    (p=0.000 n=9+8)
RegexpMatchHard_32-12       1.95µs ± 0%    1.96µs ± 0%  +0.43%    (p=0.000 n=9+9)
RegexpMatchHard_1K-12       59.0µs ± 0%    59.1µs ± 0%  +0.27%   (p=0.000 n=10+9)
Revcomp-12                   436ms ± 1%     431ms ± 1%  -1.19%  (p=0.005 n=10+10)
Template-12                 56.7ms ± 1%    57.1ms ± 1%  +0.71%   (p=0.001 n=10+9)
TimeParse-12                 312ns ± 0%     310ns ± 0%  -0.80%   (p=0.000 n=10+9)
TimeFormat-12                336ns ± 0%     332ns ± 0%  -1.19%    (p=0.000 n=8+7)
[Geo mean]                  59.2µs         58.9µs       -0.42%

On PPC64:
name                     old time/op    new time/op    delta
BinaryTree17-2              4.67s ± 2%     4.71s ± 1%    ~     (p=0.421 n=5+5)
Fannkuch11-2                3.92s ± 1%     3.94s ± 0%  +0.46%  (p=0.032 n=5+5)
FmtFprintfEmpty-2           122ns ± 0%     120ns ± 2%  -1.80%  (p=0.016 n=4+5)
FmtFprintfString-2          305ns ± 1%     299ns ± 1%  -1.84%  (p=0.008 n=5+5)
FmtFprintfInt-2             243ns ± 0%     241ns ± 1%  -0.66%  (p=0.016 n=4+5)
FmtFprintfIntInt-2          361ns ± 1%     356ns ± 1%  -1.49%  (p=0.016 n=5+5)
FmtFprintfPrefixedInt-2     355ns ± 1%     357ns ± 1%    ~     (p=0.333 n=5+5)
FmtFprintfFloat-2           502ns ± 2%     498ns ± 1%    ~     (p=0.151 n=5+5)
FmtManyArgs-2              1.55µs ± 2%    1.59µs ± 1%  +2.52%  (p=0.008 n=5+5)
GobDecode-2                13.0ms ± 1%    13.0ms ± 1%    ~     (p=0.841 n=5+5)
GobEncode-2                11.8ms ± 1%    11.8ms ± 1%    ~     (p=0.690 n=5+5)
Gzip-2                      499ms ± 1%     503ms ± 0%    ~     (p=0.421 n=5+5)
Gunzip-2                   86.5ms ± 0%    86.4ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-2         68.2µs ± 2%    69.6µs ± 3%    ~     (p=0.151 n=5+5)
JSONEncode-2               39.0ms ± 1%    37.2ms ± 1%  -4.65%  (p=0.008 n=5+5)
JSONDecode-2                122ms ± 1%     126ms ± 1%  +2.63%  (p=0.008 n=5+5)
Mandelbrot200-2            6.08ms ± 1%    5.89ms ± 1%  -3.06%  (p=0.008 n=5+5)
GoParse-2                  5.95ms ± 2%    5.98ms ± 1%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_32-2       331ns ± 1%     328ns ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_1K-2      1.45µs ± 0%    1.47µs ± 0%  +1.13%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-2       359ns ± 0%     353ns ± 0%  -1.84%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-2      1.79µs ± 0%    1.81µs ± 1%  +1.16%  (p=0.008 n=5+5)
RegexpMatchMedium_32-2      420ns ± 2%     413ns ± 0%  -1.72%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-2     70.2µs ± 1%    69.5µs ± 1%  -1.09%  (p=0.032 n=5+5)
RegexpMatchHard_32-2       3.87µs ± 1%    3.65µs ± 0%  -5.86%  (p=0.008 n=5+5)
RegexpMatchHard_1K-2        111µs ± 0%     105µs ± 0%  -5.49%  (p=0.016 n=5+4)
Revcomp-2                   1.00s ± 1%     1.01s ± 2%    ~     (p=0.151 n=5+5)
Template-2                  113ms ± 1%     113ms ± 2%    ~     (p=0.841 n=5+5)
TimeParse-2                 555ns ± 0%     550ns ± 1%  -0.87%  (p=0.032 n=5+5)
TimeFormat-2                736ns ± 1%     704ns ± 1%  -4.35%  (p=0.008 n=5+5)
[Geo mean]                  120µs          119µs       -0.77%

Reduce "spilled value remains" by 0.6% in cmd/go on AMD64.

Change-Id: If655df343b0f30d1a49ab1ab644f10c698b96f3e
Reviewed-on: https://go-review.googlesource.com/32442
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@odeke-em

This comment has been minimized.

Member

odeke-em commented Nov 23, 2016

@randall77 would you consider this a duplicate, or perhaps superset of #16141? Both deal with floats, regressions and approximately same title.

@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 23, 2016

I would not be surprised if they had the same underlying cause (partial SSE register stalls). Not sure though.

@josharian

This comment has been minimized.

Contributor

josharian commented May 18, 2017

Reproduced with tip at 638ebb0. Better than 1.7 and 1.8, but not 25% better.

name \ time/op  17           18           tip
Optimize16-8    21.9ms ± 3%  22.5ms ± 2%  21.2ms ± 1%

@randall77 randall77 modified the milestones: Go1.10, Go1.9 May 31, 2017

@mdempsky mdempsky modified the milestones: Go1.10, Go1.11 Nov 30, 2017

@mdempsky

This comment has been minimized.

Member

mdempsky commented Nov 30, 2017

Still repros for me: BenchmarkOptimize16-4 is about 28% slower at 1.10 than 1.6 on my workstation.

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned Jun 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment