-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
I have a small benchmark that shows a significant regression between go1.9.2 and go1.10beta2.
You can check out the code at https://github.com/cespare/xxhash. This is done with the latest commit as of writing (e4e2bd419cbedb5b2ec48e4a88e4c993ddb74555).
I'm on linux/amd64. The benchmark in question is BenchmarkStringHash. The code is simply:
var result uint64
func BenchmarkStringHash(b *testing.B) {
const s = "abcdefghijklmnop"
var r uint64
b.ReportAllocs()
for n := 0; n < b.N; n++ {
r = Sum64([]byte(s))
}
result = r
}
Between 1.9.2 and 1.10beta2 I see a 2x slowdown:
StringHash-72 10.8ns ± 0% 22.5ns ± 0% +108.33% (p=0.008 n=5+5)
Note that the implementation of Sum64 is in assembly. If I use the noasm
tag to select a pure-Go implementation, I still see a large slowdown:
StringHash-72 12.3ns ± 0% 24.1ns ± 0% +95.93% (p=0.008 n=5+5)
Because of the slowdown in the assembly case where the code for Sum64 should be identical, I infer that the difference is inside the BenchmarkStringHash function itself (related to the string->[]byte conversion or the function call). Perhaps that's an incorrect assumption, though.
I've poked around a profile a bit but I haven't spotted a smoking gun. Perhaps someone else can make more sense of it.
Here's some pprof disasm output for 1.9.2 (fast):
$ go1.9.2 test -bench StringHash -benchtime 5s -cpuprofile go1.9.2.prof
goos: linux
goarch: amd64
pkg: github.com/cespare/xxhash
BenchmarkStringHash-72 1000000000 10.8 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/cespare/xxhash 12.010s
$ PPROF_BINARY_PATH=. go1.9.2 tool pprof go1.9.2.prof
File: xxhash.test
Type: cpu
Time: Jan 12, 2018 at 2:57am (UTC)
Duration: 12.01s, Total samples = 11.84s (98.60%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) disasm StringHash
Total: 11.84s
ROUTINE ======================== github.com/cespare/xxhash.BenchmarkStringHash
1.55s 11.84s (flat, cum) 100% of Total
. . 4f0040: MOVQ FS:0xfffffff8, CX ;xxhash_test.go:18
. . 4f0049: CMPQ 0x10(CX), SP
. . 4f004d: JBE 0x4f00ed
. . 4f0053: SUBQ $0x60, SP
. . 4f0057: MOVQ BP, 0x58(SP)
. . 4f005c: LEAQ 0x58(SP), BP
. . 4f0061: MOVQ 0x68(SP), AX
. . 4f0066: MOVB $0x1, 0x122(AX) ;benchmark.go:119
. . 4f006d: XORL CX, CX
. . 4f006f: XORL DX, DX
. . 4f0071: JMP 0x4f00d0 ;xxhash_test.go:22
330ms 330ms 4f0073: MOVQ CX, 0x30(SP) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:22
10ms 10ms 4f0078: LEAQ 0x38(SP), AX
. . 4f007d: MOVQ AX, 0(SP) ;xxhash_test.go:23
. . 4f0081: LEAQ 0x4c3dc(IP), AX
280ms 280ms 4f0088: MOVQ AX, 0x8(SP) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4f008d: MOVQ $0x10, 0x10(SP) ;xxhash_test.go:23
. 4.69s 4f0096: CALL runtime.stringtoslicebyte(SB) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4f009b: MOVQ 0x20(SP), AX ;xxhash_test.go:23
340ms 340ms 4f00a0: MOVQ 0x18(SP), CX ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4f00a5: MOVQ 0x28(SP), DX ;xxhash_test.go:23
. . 4f00aa: MOVQ CX, 0(SP)
. . 4f00ae: MOVQ AX, 0x8(SP)
320ms 320ms 4f00b3: MOVQ DX, 0x10(SP) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
20ms 5.62s 4f00b8: CALL github.com/cespare/xxhash.Sum64(SB)
. . 4f00bd: MOVQ 0x30(SP), AX ;xxhash_test.go:23
. . 4f00c2: LEAQ 0x1(AX), CX ;xxhash_test.go:22
250ms 250ms 4f00c6: MOVQ 0x18(SP), DX ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4f00cb: MOVQ 0x68(SP), AX ;xxhash_test.go:23
. . 4f00d0: MOVQ 0xf0(AX), BX ;xxhash_test.go:22
. . 4f00d7: CMPQ BX, CX
. . 4f00da: JL 0x4f0073
. . 4f00dc: MOVQ DX, 0x118d1d(IP) ;xxhash_test.go:25
. . 4f00e3: MOVQ 0x58(SP), BP ;xxhash_test.go:26
. . 4f00e8: ADDQ $0x60, SP
. . 4f00ec: RET
. . 4f00ed: CALL runtime.morestack_noctxt(SB) ;xxhash_test.go:18
. . 4f00f2: ?
. . 4f00f3: ?
. . 4f00f4: ?
. . 4f00f5: ?
(pprof)
and here's the same output for 1.10beta2 (slow):
$ go1.10beta2 test -bench StringHash -benchtime 5s -cpuprofile go1.10beta2.prof
goos: linux
goarch: amd64
pkg: github.com/cespare/xxhash
BenchmarkStringHash-72 300000000 22.5 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/cespare/xxhash 9.209s
$ PPROF_BINARY_PATH=. go1.10beta2 tool pprof go1.10beta2.prof
File: xxhash.test
Type: cpu
Time: Jan 12, 2018 at 2:58am (UTC)
Duration: 9.21s, Total samples = 9s (97.76%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) disasm StringHash
Total: 9s
ROUTINE ======================== github.com/cespare/xxhash.BenchmarkStringHash
2.46s 9s (flat, cum) 100% of Total
. . 4e7470: MOVQ FS:0xfffffff8, CX ;xxhash_test.go:18
. . 4e7479: CMPQ 0x10(CX), SP
. . 4e747d: JBE 0x4e7513
. . 4e7483: SUBQ $0x60, SP
. . 4e7487: MOVQ BP, 0x58(SP)
. . 4e748c: LEAQ 0x58(SP), BP
. . 4e7491: MOVQ 0x68(SP), AX
. . 4e7496: MOVB $0x1, 0x122(AX) ;benchmark.go:119
. . 4e749d: XORL CX, CX
. . 4e749f: XORL DX, DX
. . 4e74a1: JMP 0x4e74f6 ;xxhash_test.go:22
. . 4e74a3: MOVQ CX, 0x30(SP)
. . 4e74a8: LEAQ 0x38(SP), AX ;xxhash_test.go:23
. . 4e74ad: MOVQ AX, 0(SP)
120ms 120ms 4e74b1: LEAQ 0x4ca45(IP), AX ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4e74b8: MOVQ AX, 0x8(SP) ;xxhash_test.go:23
. . 4e74bd: MOVQ $0x10, 0x10(SP)
. 1.64s 4e74c6: CALL runtime.stringtoslicebyte(SB) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4e74cb: MOVUPS 0x18(SP), X0 ;xxhash_test.go:23
1.92s 1.92s 4e74d0: MOVQ 0x28(SP), AX ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4e74d5: MOVUPS X0, 0(SP) ;xxhash_test.go:23
260ms 260ms 4e74d9: MOVQ AX, 0x10(SP) ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. 4.90s 4e74de: CALL github.com/cespare/xxhash.Sum64(SB)
. . 4e74e3: MOVQ 0x30(SP), AX ;xxhash_test.go:23
. . 4e74e8: LEAQ 0x1(AX), CX ;xxhash_test.go:22
80ms 80ms 4e74ec: MOVQ 0x18(SP), DX ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
. . 4e74f1: MOVQ 0x68(SP), AX ;xxhash_test.go:23
. . 4e74f6: MOVQ 0xf0(AX), BX ;xxhash_test.go:22
. . 4e74fd: CMPQ BX, CX
80ms 80ms 4e7500: JL 0x4e74a3 ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:22
;xxhash_test.go:25
. . 4e7502: MOVQ DX, github.com/cespare/xxhash.result(SB)
. . 4e7509: MOVQ 0x58(SP), BP ;xxhash_test.go:26
. . 4e750e: ADDQ $0x60, SP
. . 4e7512: RET
. . 4e7513: CALL runtime.morestack_noctxt(SB) ;xxhash_test.go:18
. . 4e7518: ?
. . 4e7519: PUSHL BX
. . 4e751a: ?
. . 4e751b: ?
(pprof)