Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimization of constant pool on arm #19844

Closed
benshi001 opened this issue Apr 5, 2017 · 33 comments

Comments

Projects
None yet
6 participants
@benshi001
Copy link
Member

commented Apr 5, 2017

For the following code

func ass12345678(a int) int {
return a + 0xffff
}

Currently go will store 0xffff to the constant pool and load it in runtime.
a.go:6 0x95a04 e59fb06c MOVW 0x6c(R15), R11
a.go:6 0x95a08 e080000b ADD R11, R0, R0
.................................
a.go:9 0x95a78 0000ffff STRD.EQ [R0], -PC, R15, R15

But gcc optimized it to
a = a + 0x10000
a = a - 1

Both 1 and 0x10000 can be directly encoded to $immediate-12 into the instructions without any access to memory.

@benshi001 benshi001 changed the title Opimization of constant pool cmd/internal/obj/arm: Opimization of constant pool Apr 5, 2017

@benshi001 benshi001 changed the title cmd/internal/obj/arm: Opimization of constant pool cmd/compile: Opimization of constant pool Apr 5, 2017

@bradfitz bradfitz added the Performance label Apr 5, 2017

@bradfitz bradfitz added this to the Unplanned milestone Apr 5, 2017

@bradfitz bradfitz changed the title cmd/compile: Opimization of constant pool cmd/compile: optimization of constant pool on arm Apr 5, 2017

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2017

Also,

a | 0xffff ->
orr $0xff, a, a
orr $0xff00, a, a

a & 0xffff0 ->
bic $0xf000000f, a, a
bic $0x0ff00000, a, a

@gopherbot

This comment has been minimized.

Copy link

commented Apr 5, 2017

CL https://golang.org/cl/39552 mentions this issue.

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Unplanned Apr 5, 2017

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 6, 2017

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 6, 2017

The above attachment is a rough test, the output log on my raspberry pi 2 is
test0(constant pool) cost 29 seconds, test1(imm12) cost 20 seconds

it means,
total 17179869120 pairs of (ldr/add) cost 29 seconds, while 17179869120 pairs(add/add) cost 20 seconds, about 50% improvement.

However, the constant pool is in the data cache. If not, the cache miss will cause much more inefficiency.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 6, 2017

@josharian, can you point @benshi001 to directions on how to run compiler benchmark tests?

@benshi001, you want to make pretty commit messages using https://godoc.org/golang.org/x/perf/cmd/benchstat like this one 50688fc

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2017

The time has come for me to clean up compiler benchmarking a bit. Let me do that first.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2017

Not done, but cleaned up enough for the moment. Do:

go get -u golang.org/x/tools/cmd/compilebench github.com/josharian/compilecmp golang.org/x/perf/cmd/benchstat

Make sure all resulting binaries are in your $PATH.

Commit your work. For memory benchmarking:

compilecmp -n 10

For exection time benchmarking (with everything else closed):

compilecmp -n 50 -cpu

These will compare master to HEAD. compilecmp supports lots of other variations, run with -h for more. Ask if you have questions or feature requests.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 7, 2017

I have troubles with my benchmark test that @josharian suggested. The reason is due to the network security policy.

I ssh connected to a remote host locates in USA, and did

git clone https://go.googlesource.com/go
git fetch https://go.googlesource.com/go refs/changes/52/39552/2 && git checkout FETCH_HEAD
GOPATH=/root/gopath go get -u golang.org/x/tools/cmd/compilebench
GOPATH=/root/gopath go get -u github.com/josharian/compilecmp
GOPATH=/root/gopath go get -u golang.org/x/perf/cmd/benchstat
PATH=/root/gopath/bin:$PATH
compilecmp -n 10

Then I got the expected result on the remote host via ssh terminal.

I copy the /root/go and the /root/gopath from the remote host to my local raspberry pi 2.
/root/go -> /home/pi/go
/root/gopath -> /home/pi/gopath

And did
GOPATH=/home/pi/gopath go build golang.org/x/tools/cmd/compilebench
GOPATH=/home/pi/gopath go build github.com/josharian/compilecmp
GOPATH=/home/pi/gopath go build golang.org/x/perf/cmd/benchstat

then the tools are built into ARM ELF.

But when I did
compilecmp -n 10, I only got part of the results,
compilecmp master HEAD
06:48:59 copy tree at master ( 19bd145 ) to /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd
06:49:03 /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd/src/make.bash
07:00:39 copy tree at HEAD ( 0f679e481a7cbcb0fbb930670581fa57cd027eee ) to /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee
07:00:57 /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee/src/make.bash
before: /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd
after: /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee
benchstat /tmp/590206338 /tmp/787780345
completed 10 of 10, estimated time remaining 0s (eta 7:13AM)
name old text-bytes new text-bytes delta
HelloSize 595k ± 0% 594k ± 0% -0.07% (p=0.000 n=10+10)

name old data-bytes new data-bytes delta
HelloSize 3.59k ± 0% 3.59k ± 0% ~ (all equal)

name old bss-bytes new bss-bytes delta
HelloSize 75.4k ± 0% 75.4k ± 0% ~ (all equal)

name old exe-bytes new exe-bytes delta
HelloSize 1.03M ± 0% 1.03M ± 0% ~ (all equal)

The others like
name old user-ns/op new user-ns/op delta
Template 488M ±11% 488M ± 6% ~ (p=0.920 n=10+9)
Unicode 249M ±10% 242M ±12% ~ (p=0.304 n=10+10)
GoTypes 1.45G ± 5% 1.48G ± 6% ~ (p=0.342 n=10+10)
Flate 216M ±15% 223M ±10% ~ (p=0.201 n=9+10)
GoParser 380M ± 9% 381M ± 8% ~ (p=0.762 n=10+9)
Reflect 603M ± 5% 599M ± 7% ~ (p=0.919 n=9+10)
Tar 258M ±16% 263M ± 6% ~ (p=0.752 n=10+10)
XML 502M ± 8% 522M ± 7% ~ (p=0.106 n=10+9)

are missing. what are wrong with my operations?

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 7, 2017

The network security policy forbids my raspberry pi board to access the internet, so I have to do it via tar cfz/ scp / tar xfz

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 7, 2017

Does any of the three tools need internet access when running? I do not think there is any difference between my board and the remote host.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2017

Does any of the three tools need internet access when running?

No. I suspect that the problem is that compilecmp uses \r to update a running estimate of when the task will be done, and that your terminal didn't like it. But (unless the tmp dir has been emptied), the results are still there. Just manually run:

benchstat /tmp/590206338 /tmp/787780345

You could also tweak compilecmp to remove the live update. If that is in fact the problem, I'd be happy to either add a flag to remove it or (better) do some terminal sniffing to decide when not to use it. (Suggestions welcome on the latter front.)

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

I encountered two issues when running the benchmark test. And they were not related to the '\r'.

  1. can't create compilebench.o: open compilebench.o: permission denied
  2. bexport.go:94:2: can't find import: "cmd/compile/internal/big"

The first one can be fixed by changing line 234 of src/golang.org/x/tools/cmd/compilebench/main.go
from
args := []string{"-o", "compilebench.o"}
to
args := []string{"-o", "/tmp/compilebench.o"}

BTW: I am using golang 1.6.2 as bootstrap and build golang.org/x/tools/cmd/compilebench.
log.zip

However I got the benchmark result, thank you.

Here is the log (/tmp/185436984)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
bexport.go:94:2: can't find import: "cmd/compile/internal/big"
compilebench: cannot find package "cmd/compile/internal/ssa" in any of:
/usr/lib/go-1.6/src/cmd/compile/internal/ssa (from $GOROOT)
($GOPATH not set)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
BenchmarkHelloSize 1 688575 text-bytes 5808 data-bytes 134376 bss-bytes 1058816 exe-bytes
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
bexport.go:94:2: can't find import: "cmd/compile/internal/big"
compilebench: cannot find package "cmd/compile/internal/ssa" in any of:
/usr/lib/go-1.6/src/cmd/compile/internal/ssa (from $GOROOT)
($GOPATH not set)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
BenchmarkHelloSize 1 688575 text-bytes 5808 data-bytes 134376 bss-bytes 1058816 exe-bytes

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

The golang 1.6.2 is installed to /usr/lib/go-1.6, where root privilege is required.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

Can I treat the following output as "there is improvement after applying my patch CL 39552" ?

pi@raspberrypi:~/pending/go/src $ compilecmp -n 20 -cpu

name old time/op new time/op delta
Template 2.42s ± 2% 2.41s ± 2% ~ (p=0.301 n=20+20)
Unicode 1.31s ± 6% 1.32s ± 4% ~ (p=0.369 n=20+20)
GoTypes 7.92s ± 2% 7.92s ± 1% ~ (p=0.813 n=20+19)
SSA 58.6s ± 3% 58.1s ± 3% ~ (p=0.068 n=20+20)
Flate 1.52s ± 3% 1.52s ± 2% ~ (p=0.647 n=20+19)
GoParser 1.88s ± 2% 1.88s ± 3% ~ (p=0.813 n=20+19)
Reflect 5.29s ± 1% 5.30s ± 2% ~ (p=0.258 n=19+20)
Tar 1.49s ± 5% 1.49s ± 4% ~ (p=0.925 n=20+20)
XML 2.67s ± 2% 2.67s ± 2% ~ (p=0.738 n=20+20)

name old user-ns/op new user-ns/op delta
Template 2.95G ± 2% 2.95G ± 2% ~ (p=0.515 n=19+20)
Unicode 1.60G ± 4% 1.59G ± 3% ~ (p=0.381 n=20+19)
GoTypes 9.49G ± 2% 9.50G ± 1% ~ (p=0.515 n=20+20)
SSA 74.1G ± 1% 73.6G ± 2% -0.65% (p=0.011 n=18+19)
Flate 1.74G ± 3% 1.74G ± 3% ~ (p=0.928 n=19+20)
GoParser 2.23G ± 3% 2.24G ± 4% ~ (p=0.479 n=20+20)
Reflect 6.25G ± 2% 6.25G ± 1% ~ (p=0.884 n=20+19)
Tar 1.85G ± 4% 1.85G ± 4% ~ (p=0.652 n=20+20)
XML 3.15G ± 3% 3.15G ± 2% ~ (p=0.856 n=20+20)

name old text-bytes new text-bytes delta
HelloSize 595k ± 0% 594k ± 0% -0.07% (p=0.000 n=20+20)

name old data-bytes new data-bytes delta
HelloSize 3.59k ± 0% 3.59k ± 0% ~ (all equal)

name old bss-bytes new bss-bytes delta
HelloSize 75.4k ± 0% 75.4k ± 0% ~ (all equal)

name old exe-bytes new exe-bytes delta
HelloSize 1.03M ± 0% 1.03M ± 0% ~ (all equal)

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

I also attached the log to CL 39552's commit message. I will try
"compilecmp -n 50 -cpu" tomorrow.

@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 10, 2017

I doubt -n50 will help, those p values are quite high.

But it doesn't really matter, does it? Correct me if i'm wrong, but the goal of your CL was to make the generated code faster, and not to make the compiler faster. So now you've verified that your CL does not make the compiler slower (in fact, it's slightly faster now), but you still have to write benchmarks that show that the code we generate on ARM is executed faster.

@bradfitz Am I missing something? Why are we making OP run the compiler benchmarks? Were you expecting the change to slow down the compiler?

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2017

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2017

For benchmarks, you can use some of the go1 benchmarks as a model:
https://github.com/golang/go/blob/master/test/bench/go1/binarytree_test.go
Ending the file name in _test.go is required, naming the benchmark Benchmark... is required I think.

Your benchmark program should contain a loop that runs b.N times so that the benchmarks harness can determine an appropriate run count:

func BenchmarkBinaryTree17(b *testing.B) {
    for i := 0; i < b.N; i++ {
        binarytree(17)
    }
}

Compile-and-test is go test -bench Benchmark -count 10, where "Benchmark" is a regular expression to match tests/benchmarks ("Benchmark" matches all the benchmarks, probably "B" would work just as well).

To compile into a binary so you can examine the generated code, rerun later for testing against a different version, use go test -c . .
Run the resulting test binary with (for example) ./go1.test -test.bench Benchmark -test.count 10.

We use benchstat (go get rsc.io/benchstat) to compare before and after results. If you save the output of two runs, you can compare them with

benchstat -geomean before.log after.log

I tend to run the benchmarks for a count of 25 to be sure that I get good numbers, and of course the more you can reduce changes in machine performance during a benchmark run, the better, but benchstat will help mitigate that somewhat or at least let you know that you have a problem.

@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 10, 2017

@benshi001 if you think your change will benefit ARM code in general, you can start from the go1 benchmark suite mentioned by @dr2chase, i.e. do:

$ cd go/test/bench/go1
$ go test -bench=. -count 20 > old.txt

from the current master, then again from your patch:

$ go test -bench=. -count 20 > new.txt

and then run benchstat old.txt new.txt to compare them.

benchstat is at golang.org/x/perf/cmd/benchstat (the rsc.io/benchstat repository is old and deprecated).

Otherwise (if you see no change on the go1 benchmark suite) you'll have to write a go function that your CL makes faster and benchmark it as @dr2chase explained.

The archive you posted is not a valid benchmark because it's c and assembly code. Benchmarks improvements needs to be measured on go code.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 11, 2017

Thanks to all of your helps. I will try it later.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2017

With patch set 5 of CL 39552, I did the go1 benchmark test. Here are my operation steps and results.

Steps:

  1. go test -bench=. -count 20 -timeout 60000s > ~/old.txt
  2. apply patch set 5 of CL 39552
  3. go test -bench=. -count 20 -timeout 60000s > ~/new.txt
  4. ~/gopath/bin/benchstat ~/old.txt ~/new.txt

How to understand the following results?

name old time/op new time/op delta
BreakImmediate-4 599ns ± 2% 435ns ± 0% -27.31% (p=0.000 n=20+17)
Fannkuch11-4 25.0s ± 0% 25.0s ± 0% +0.14% (p=0.012 n=19+20)
FmtFprintfEmpty-4 895ns ± 2% 893ns ± 0% ~ (p=0.579 n=20+17)
FmtFprintfString-4 1.51µs ± 2% 1.48µs ± 2% -2.16% (p=0.000 n=17+20)
FmtFprintfInt-4 1.50µs ± 2% 1.51µs ± 1% +0.57% (p=0.004 n=18+18)
FmtFprintfIntInt-4 2.19µs ± 1% 2.18µs ± 1% ~ (p=0.292 n=20+17)
FmtFprintfPrefixedInt-4 2.52µs ± 1% 2.52µs ± 0% +0.17% (p=0.010 n=16+20)
FmtFprintfFloat-4 4.60µs ± 1% 4.55µs ± 0% -0.93% (p=0.000 n=19+20)
FmtManyArgs-4 9.02µs ± 1% 8.92µs ± 1% -1.11% (p=0.000 n=20+17)
GobDecode-4 106ms ± 4% 107ms ± 3% +1.00% (p=0.008 n=20+20)
GobEncode-4 91.2ms ± 1% 91.3ms ± 1% ~ (p=0.461 n=19+20)
Gzip-4 4.29s ± 1% 4.30s ± 1% ~ (p=0.355 n=20+20)
Gunzip-4 611ms ± 1% 611ms ± 1% ~ (p=0.301 n=16+19)
HTTPClientServer-4 669µs ± 3% 665µs ± 3% ~ (p=0.277 n=20+20)
JSONEncode-4 284ms ± 2% 282ms ± 1% ~ (p=0.102 n=20+20)
JSONDecode-4 936ms ± 2% 940ms ± 1% ~ (p=0.079 n=20+19)
Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% -0.06% (p=0.030 n=20+18)
GoParse-4 44.8ms ± 1% 45.1ms ± 1% +0.61% (p=0.002 n=16+16)
RegexpMatchEasy0_32-4 1.29µs ± 0% 1.31µs ± 1% +1.82% (p=0.000 n=13+18)
RegexpMatchEasy0_1K-4 7.64µs ± 4% 7.67µs ± 5% ~ (p=0.642 n=20+19)
RegexpMatchEasy1_32-4 1.34µs ± 1% 1.33µs ± 1% -0.62% (p=0.000 n=18+18)
RegexpMatchEasy1_1K-4 10.3µs ± 4% 10.4µs ± 4% ~ (p=0.251 n=20+20)
RegexpMatchMedium_32-4 2.09µs ± 0% 2.12µs ± 1% +1.39% (p=0.000 n=12+20)
RegexpMatchMedium_1K-4 532µs ± 0% 534µs ± 1% +0.41% (p=0.001 n=16+18)
RegexpMatchHard_32-4 29.5µs ± 1% 29.8µs ± 0% +0.91% (p=0.000 n=17+19)
RegexpMatchHard_1K-4 889µs ± 2% 895µs ± 0% +0.60% (p=0.003 n=19+16)
Revcomp-4 84.9ms ± 2% 85.3ms ± 2% ~ (p=0.141 n=20+19)
Template-4 1.07s ± 3% 1.04s ± 2% -1.91% (p=0.000 n=18+20)
TimeParse-4 7.12µs ± 2% 7.19µs ± 2% +0.98% (p=0.001 n=19+19)
TimeFormat-4 13.5µs ± 0% 13.5µs ± 1% ~ (p=0.143 n=18+20)

name old speed new speed delta
GobDecode-4 7.22MB/s ± 4% 7.15MB/s ± 3% -0.87% (p=0.014 n=20+19)
GobEncode-4 8.42MB/s ± 1% 8.40MB/s ± 1% ~ (p=0.439 n=19+20)
Gzip-4 4.52MB/s ± 1% 4.51MB/s ± 1% ~ (p=0.211 n=20+20)
Gunzip-4 31.7MB/s ± 1% 31.8MB/s ± 1% ~ (p=0.295 n=16+19)
JSONEncode-4 6.83MB/s ± 2% 6.88MB/s ± 1% ~ (p=0.097 n=20+20)
JSONDecode-4 2.07MB/s ± 2% 2.07MB/s ± 1% -0.45% (p=0.040 n=20+19)
GoParse-4 1.29MB/s ± 1% 1.28MB/s ± 0% -0.71% (p=0.000 n=12+12)
RegexpMatchEasy0_32-4 24.8MB/s ± 0% 24.4MB/s ± 1% -1.77% (p=0.000 n=13+18)
RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 5% ~ (p=0.474 n=20+20)
RegexpMatchEasy1_32-4 23.9MB/s ± 1% 24.1MB/s ± 1% +0.63% (p=0.000 n=18+18)
RegexpMatchEasy1_1K-4 100MB/s ± 4% 99MB/s ± 4% ~ (p=0.250 n=20+20)
RegexpMatchMedium_32-4 480kB/s ± 0% 470kB/s ± 0% -2.08% (p=0.000 n=16+18)
RegexpMatchMedium_1K-4 1.93MB/s ± 0% 1.92MB/s ± 1% -0.50% (p=0.000 n=16+18)
RegexpMatchHard_32-4 1.09MB/s ± 1% 1.07MB/s ± 1% -1.18% (p=0.000 n=17+19)
RegexpMatchHard_1K-4 1.15MB/s ± 2% 1.14MB/s ± 1% -0.73% (p=0.003 n=19+16)
Revcomp-4 29.9MB/s ± 2% 29.8MB/s ± 2% ~ (p=0.139 n=20+19)
Template-4 1.82MB/s ± 4% 1.86MB/s ± 3% +2.29% (p=0.000 n=20+20)

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2017

I also attached the results to commit message of CL 39552.

@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 12, 2017

Mixed results I'd say? You made 5 benchmarks slightly faster, but 10 are slightly slower (and a dozen are unchanged). You can also pass the -geomean flag to benchstat to make it print a line that summarizes the average effect.

Also please change the commit message in the CL to make clear that BreakImmediate was not part of the go1 benchmark suite. E.g. show the effects on go1 bench suite and then add that you also committed a specific Benchmark function BreakImmediate that shows a 30% improvement with your CL.

At this point I'd say the effects of the change are documented, so you'll have to wait for the opinion of whoever review the change.

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2017

Helpful to include the -geomean option (why is it not the default? oh well) so you get the aggregate change.

In practice, for other optimizations, we look for a geomean improvement, and we look at the outliers, and we look skeptically at the inner loops of some of the frequent offenders in benchmark noise (Revcomp in particular).

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 14, 2017

I updated my patch, did a new benchmark test, and attached the result in the commit message of patch set 7 of CL 39552 (https://go-review.googlesource.com/?polygerrit=0#/c/39552/).

How to understand the "[Geo mean] 410µs 586µs +42.99%" ? A very bad result?

@ALTree

This comment has been minimized.

Copy link
Member

commented Apr 14, 2017

That geomean can't be correct, either there's something wrong with the files you passed to benchstat or there's a bug in the computation of the geomean. Most of the benchmarks got faster and the worst one is a +2.65%, the geomean can't be +40%.

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2017

Even the go1 benchmark run two times with the same go build, there are difference between the results.

name old time/op new time/op delta
BinaryTree17-4 42.2s ± 1% 42.5s ± 1% +0.61% (p=0.000 n=30+29)
Fannkuch11-4 23.8s ± 1% 23.8s ± 1% ~ (p=0.523 n=30+30)
FmtFprintfEmpty-4 883ns ± 1% 887ns ± 2% ~ (p=0.948 n=27+30)
FmtFprintfString-4 1.47µs ± 0% 1.44µs ± 1% -2.44% (p=0.000 n=28+27)
FmtFprintfInt-4 1.51µs ± 2% 1.49µs ± 1% -1.24% (p=0.000 n=30+27)
FmtFprintfIntInt-4 2.26µs ± 2% 2.25µs ± 1% -0.54% (p=0.003 n=30+30)
FmtFprintfPrefixedInt-4 2.59µs ± 0% 2.63µs ± 0% +1.21% (p=0.000 n=17+20)
FmtFprintfFloat-4 4.49µs ± 0% 4.52µs ± 1% +0.53% (p=0.000 n=24+26)
FmtManyArgs-4 8.78µs ± 1% 8.74µs ± 1% -0.44% (p=0.000 n=27+28)
GobDecode-4 103ms ± 1% 103ms ± 1% ~ (p=0.307 n=24+29)
GobEncode-4 89.8ms ± 2% 89.9ms ± 1% ~ (p=0.482 n=30+28)
Gzip-4 4.23s ± 1% 4.22s ± 2% ~ (p=0.124 n=30+29)
Gunzip-4 608ms ± 2% 605ms ± 1% ~ (p=0.147 n=27+27)
HTTPClientServer-4 729µs ± 3% 707µs ± 3% -2.94% (p=0.000 n=30+29)
JSONEncode-4 281ms ± 0% 281ms ± 1% ~ (p=0.848 n=20+29)
JSONDecode-4 921ms ± 1% 919ms ± 1% -0.17% (p=0.032 n=26+29)
Mandelbrot200-4 49.4ms ± 0% 49.4ms ± 0% ~ (p=0.057 n=25+23)
GoParse-4 45.1ms ± 2% 44.9ms ± 1% ~ (p=0.110 n=29+29)
RegexpMatchEasy0_32-4 1.32µs ± 2% 1.32µs ± 2% ~ (p=0.193 n=29+30)
RegexpMatchEasy0_1K-4 7.81µs ± 6% 7.69µs ± 6% -1.54% (p=0.013 n=30+29)
RegexpMatchEasy1_32-4 1.34µs ± 1% 1.34µs ± 1% ~ (p=0.242 n=29+28)
RegexpMatchEasy1_1K-4 10.5µs ± 2% 10.4µs ± 4% -1.03% (p=0.042 n=26+30)
RegexpMatchMedium_32-4 2.06µs ± 2% 2.05µs ± 1% ~ (p=0.267 n=30+27)
RegexpMatchMedium_1K-4 531µs ± 0% 531µs ± 1% ~ (p=0.274 n=26+28)
RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 1% ~ (p=0.072 n=25+28)
RegexpMatchHard_1K-4 887µs ± 3% 884µs ± 2% ~ (p=0.255 n=30+28)
Revcomp-4 82.6ms ± 3% 82.4ms ± 2% ~ (p=0.922 n=30+29)
Template-4 1.03s ± 1% 1.04s ± 1% ~ (p=0.166 n=29+26)
TimeParse-4 7.09µs ± 2% 7.10µs ± 2% ~ (p=0.190 n=30+30)
TimeFormat-4 13.4µs ± 0% 13.3µs ± 1% -0.45% (p=0.000 n=24+28)
[Geo mean] 746µs 744µs -0.33%

name old speed new speed delta
GobDecode-4 7.43MB/s ± 1% 7.42MB/s ± 1% ~ (p=0.254 n=24+29)
GobEncode-4 8.55MB/s ± 2% 8.54MB/s ± 2% ~ (p=0.370 n=30+29)
Gzip-4 4.59MB/s ± 1% 4.60MB/s ± 2% ~ (p=0.067 n=30+29)
Gunzip-4 31.9MB/s ± 2% 32.1MB/s ± 1% ~ (p=0.160 n=27+27)
JSONEncode-4 6.91MB/s ± 1% 6.90MB/s ± 1% ~ (p=0.576 n=21+29)
JSONDecode-4 2.11MB/s ± 1% 2.11MB/s ± 1% ~ (p=0.413 n=26+29)
GoParse-4 1.29MB/s ± 3% 1.29MB/s ± 1% ~ (p=0.188 n=30+29)
RegexpMatchEasy0_32-4 24.3MB/s ± 2% 24.3MB/s ± 2% ~ (p=0.140 n=29+30)
RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 5% +1.58% (p=0.012 n=30+29)
RegexpMatchEasy1_32-4 23.9MB/s ± 1% 23.9MB/s ± 1% ~ (p=0.298 n=29+28)
RegexpMatchEasy1_1K-4 97.5MB/s ± 2% 98.5MB/s ± 4% +1.06% (p=0.042 n=26+30)
RegexpMatchMedium_32-4 485kB/s ± 3% 490kB/s ± 0% +0.96% (p=0.001 n=30+24)
RegexpMatchMedium_1K-4 1.93MB/s ± 0% 1.93MB/s ± 1% ~ (p=0.286 n=26+28)
RegexpMatchHard_32-4 1.09MB/s ± 0% 1.09MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 1.15MB/s ± 3% 1.16MB/s ± 2% ~ (p=0.287 n=30+28)
Revcomp-4 30.8MB/s ± 3% 30.8MB/s ± 2% ~ (p=0.921 n=30+29)
Template-4 1.88MB/s ± 1% 1.87MB/s ± 1% -0.28% (p=0.041 n=29+26)
[Geo mean] 6.61MB/s 6.63MB/s +0.29%

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2017

Here is a contrast, can I treated the result as "
there is a -0.19% / + 1.08% improvement with my patch (excluding the floating error) ?

go VS go
[Geo mean] 748µs 749µs +0.25%
[Geo mean] 6.54MB/s 6.55MB/s +0.11%

My patch VS my patch
[Geo mean] 746µs 744µs -0.33%
[Geo mean] 6.61MB/s 6.63MB/s +0.29%

go VS my patch
[Geo mean] 748µs 744µs -0.52%
[Geo mean] 6.54MB/s 6.63MB/s +1.37%

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2017

I also attached a detailed log to the commit message of CL 39552 (https://go-review.googlesource.com/?polygerrit=0#/c/39552/ ).

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2017

I made go1 benchmark test two times for the original go and two times for my patch. Then 6 comparison are made. (old means original go and new means my patch)
old_1 vs old_2
new_1 vs new_2
old_1 vs new_1
old_1 vs new_2
old_2 vs new_1
old_2 vs new_2

The conclusion is,

  1. There are floating errors among each round of test
  2. Some single test (HTTPClientServer-4) varies much larger than others among each round of test
  3. My patch has optimization in general (excluding the floating error)

The attachment is the detailed comparison.
compare.zip

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2017

Current status,

Keith Randall has a better solution https://go-review.googlesource.com/41612
than mine https://go-review.googlesource.com/#/c/39552/

And here is a supplement https://go-review.googlesource.com/#/c/41679/

@benshi001

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2017

update:
Keith Randall's solution https://go-review.googlesource.com/41612 is merged. But more need to be optimized.

a = b + 0x00ffff00 -> a = (b + 0x01000000) - 0x00000100
a = b + 0xfffffff0 -> a = b - 0x10

@gopherbot

This comment has been minimized.

Copy link

commented May 9, 2017

CL https://golang.org/cl/42430 mentions this issue.

@gopherbot gopherbot closed this in 6897030 May 11, 2017

@golang golang locked and limited conversation to collaborators May 11, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.