Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: performance regression on append-like loop #19633

Closed
ALTree opened this issue Mar 21, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@ALTree
Copy link
Member

commented Mar 21, 2017

Consider the following benchmark:

func BenchmarkFoo(b *testing.B) {
	a := make([]byte, 1000)
	b.SetBytes(int64(len(a)))

	for i := 0; i < b.N; i++ {
		var n int
		for j := 0; j < 1000; j++ {
			a[n] = byte(j)
			n++
		}
		a = a[:n]
	}
}

Comparing tip (go version devel +d972dc2de9 Tue Mar 21 06:36:56 2017 +0000 linux/amd64) with go1.8 I get, on my machine:

name   old time/op    new time/op    delta
Foo-4     684ns ± 1%    1022ns ± 0%  +49.34%  (p=0.000 n=18+15)

name   old speed      new speed      delta
Foo-4  1.46GB/s ± 1%  0.98GB/s ± 0%  -32.99%  (p=0.000 n=18+18)

A pretty bad regression.

@ALTree ALTree added the Performance label Mar 21, 2017

@ALTree ALTree added this to the Go1.9 milestone Mar 21, 2017

@ALTree

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2017

The regression was introduced in CL 36205 (cmd/internal/obj: remove Follow pass). Also see: #18977.

cc @cherrymui

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

Tried on two machines, one Linux desktop and one Mac laptop, the regression I saw is much smaller

name    old time/op    new time/op    delta
Foo-12     488ns ± 0%     524ns ± 0%  +7.44%  (p=0.000 n=19+18)

name    old speed      new speed      delta
Foo-12  2.05GB/s ± 0%  1.91GB/s ± 0%  -6.93%  (p=0.000 n=19+18)

So this is probably very processor-dependent?

Assembly for tip:

        0x000c 00012 (x_test.go:17)     MOVQ    "".a+24(SP), AX
        0x0011 00017 (x_test.go:17)     MOVQ    "".a+16(SP), CX
        0x0016 00022 (x_test.go:16)     MOVQ    $0, DX
        0x0018 00024 (x_test.go:17)     CMPQ    DX, $1000
        0x001f 00031 (x_test.go:17)     JGE     $0, 53
        0x0021 00033 (x_test.go:18)     CMPQ    DX, AX
        0x0024 00036 (x_test.go:18)     JCC     $0, 46
        0x0026 00038 (x_test.go:18)     MOVB    DL, (CX)(DX*1)
        0x0029 00041 (x_test.go:19)     INCQ    DX
        0x002c 00044 (x_test.go:17)     JMP     24
        0x002e 00046 (x_test.go:18)     PCDATA  $0, $1
        0x002e 00046 (x_test.go:18)     CALL    runtime.panicindex(SB)
        0x0033 00051 (x_test.go:18)     UNDEF
        0x0035 00053 (x_test.go:21)     MOVQ    "".a+32(SP), AX
        0x003a 00058 (x_test.go:21)     CMPQ    DX, AX
        0x003d 00061 (x_test.go:21)     JHI     $0, 72
        0x003f 00063 (x_test.go:22)     MOVQ    (SP), BP
        0x0043 00067 (x_test.go:22)     ADDQ    $8, SP
        0x0047 00071 (x_test.go:22)     RET
        0x0048 00072 (x_test.go:21)     PCDATA  $0, $1
        0x0048 00072 (x_test.go:21)     CALL    runtime.panicslice(SB)
        0x004d 00077 (x_test.go:21)     UNDEF

Assembly for Go 1.8:

        0x000c 00012 (x_test.go:17)     MOVQ    "".a+24(FP), AX
        0x0011 00017 (x_test.go:17)     MOVQ    "".a+16(FP), CX
        0x0016 00022 (x_test.go:16)     MOVQ    $0, DX
        0x0018 00024 (x_test.go:17)     CMPQ    DX, $1000
        0x001f 00031 (x_test.go:17)     JGE     $0, 53
        0x0021 00033 (x_test.go:18)     CMPQ    DX, AX
        0x0024 00036 (x_test.go:18)     JCC     $0, 79
        0x0026 00038 (x_test.go:18)     MOVB    DL, (CX)(DX*1)
        0x0029 00041 (x_test.go:19)     INCQ    DX
        0x002c 00044 (x_test.go:17)     CMPQ    DX, $1000
        0x0033 00051 (x_test.go:17)     JLT     $0, 33
        0x0035 00053 (x_test.go:21)     MOVQ    "".a+32(FP), AX
        0x003a 00058 (x_test.go:21)     CMPQ    DX, AX
        0x003d 00061 (x_test.go:21)     JHI     $0, 72
        0x003f 00063 (x_test.go:22)     MOVQ    (SP), BP
        0x0043 00067 (x_test.go:22)     ADDQ    $8, SP
        0x0047 00071 (x_test.go:22)     RET
        0x0048 00072 (x_test.go:21)     PCDATA  $0, $1
        0x0048 00072 (x_test.go:21)     CALL    runtime.panicslice(SB)
        0x004d 00077 (x_test.go:21)     UNDEF
        0x004f 00079 (x_test.go:18)     PCDATA  $0, $1
        0x004f 00079 (x_test.go:18)     CALL    runtime.panicindex(SB)
        0x0054 00084 (x_test.go:18)     UNDEF

Seems the only difference is that on tip we make an unconditional jump for loop backedge (JMP 24), whereas on Go 1.8 it copies two instructions (and reverse the condition) (CMPQ DX, $1000; JLT 33).

@dr2chase made range loop generate for-until like code, which effectively does this. But this example is not range loop, so not affected by that change.

@dr2chase is also working on loop unrolling, which may help from a different direction.

I guess we still don't want to copy instructions in the assembler?

@ALTree

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2017

What I'm seeing is +50% (the benchmark above) on a Sandy bridge i5; but only +8% on a Haswell i7.

Looks like it hits hard on Sandy Bridge (2nd gen), while from 4th gen and above is not that bad? I don't have any 3rd gen (Ivy bridge) to check there.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 21, 2017

CL https://golang.org/cl/38431 mentions this issue.

@navytux

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

Benchmark result on pretty recent 6th-gen i7 (i7-6600U) on linux/amd64 for go18 and gotip (go version devel +846f925464 Wed Mar 29 06:50:07 2017 +0000 linux/amd64):

name   old time/op    new time/op    delta
Foo-4     341ns ± 0%     505ns ± 0%  +48.09%  (p=0.000 n=6+10)

name   old speed      new speed      delta
Foo-4  2.93GB/s ± 0%  1.98GB/s ± 0%  -32.44%  (p=0.000 n=10+10)

So here it is also ~ 50% slowdown.

@gopherbot gopherbot closed this in 39ce590 Apr 24, 2017

@golang golang locked and limited conversation to collaborators Apr 24, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.