Skip to content

cmd/compile: benchmark regression in github.com/cespare/xxhash between 1.9.2 and 1.10beta2 #23424

@cespare

Description

@cespare

I have a small benchmark that shows a significant regression between go1.9.2 and go1.10beta2.

You can check out the code at https://github.com/cespare/xxhash. This is done with the latest commit as of writing (e4e2bd419cbedb5b2ec48e4a88e4c993ddb74555).

I'm on linux/amd64. The benchmark in question is BenchmarkStringHash. The code is simply:

var result uint64

func BenchmarkStringHash(b *testing.B) {
	const s = "abcdefghijklmnop"
	var r uint64
	b.ReportAllocs()
	for n := 0; n < b.N; n++ {
		r = Sum64([]byte(s))
	}
	result = r
}

Between 1.9.2 and 1.10beta2 I see a 2x slowdown:

StringHash-72                      10.8ns ± 0%    22.5ns ± 0%  +108.33%  (p=0.008 n=5+5)

Note that the implementation of Sum64 is in assembly. If I use the noasm tag to select a pure-Go implementation, I still see a large slowdown:

StringHash-72                      12.3ns ± 0%    24.1ns ± 0%   +95.93%  (p=0.008 n=5+5)

Because of the slowdown in the assembly case where the code for Sum64 should be identical, I infer that the difference is inside the BenchmarkStringHash function itself (related to the string->[]byte conversion or the function call). Perhaps that's an incorrect assumption, though.

I've poked around a profile a bit but I haven't spotted a smoking gun. Perhaps someone else can make more sense of it.

Here's some pprof disasm output for 1.9.2 (fast):

$ go1.9.2 test -bench StringHash -benchtime 5s -cpuprofile go1.9.2.prof
goos: linux
goarch: amd64
pkg: github.com/cespare/xxhash
BenchmarkStringHash-72          1000000000              10.8 ns/op             0 B/op          0 allocs/op
PASS
ok      github.com/cespare/xxhash       12.010s
$ PPROF_BINARY_PATH=. go1.9.2 tool pprof go1.9.2.prof
File: xxhash.test
Type: cpu
Time: Jan 12, 2018 at 2:57am (UTC)
Duration: 12.01s, Total samples = 11.84s (98.60%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) disasm StringHash
Total: 11.84s
ROUTINE ======================== github.com/cespare/xxhash.BenchmarkStringHash
     1.55s     11.84s (flat, cum)   100% of Total
         .          .     4f0040: MOVQ FS:0xfffffff8, CX                         ;xxhash_test.go:18
         .          .     4f0049: CMPQ 0x10(CX), SP
         .          .     4f004d: JBE 0x4f00ed
         .          .     4f0053: SUBQ $0x60, SP
         .          .     4f0057: MOVQ BP, 0x58(SP)
         .          .     4f005c: LEAQ 0x58(SP), BP
         .          .     4f0061: MOVQ 0x68(SP), AX
         .          .     4f0066: MOVB $0x1, 0x122(AX)                     ;benchmark.go:119
         .          .     4f006d: XORL CX, CX
         .          .     4f006f: XORL DX, DX
         .          .     4f0071: JMP 0x4f00d0                             ;xxhash_test.go:22
     330ms      330ms     4f0073: MOVQ CX, 0x30(SP)                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:22
      10ms       10ms     4f0078: LEAQ 0x38(SP), AX
         .          .     4f007d: MOVQ AX, 0(SP)                                 ;xxhash_test.go:23
         .          .     4f0081: LEAQ 0x4c3dc(IP), AX
     280ms      280ms     4f0088: MOVQ AX, 0x8(SP)                             ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4f008d: MOVQ $0x10, 0x10(SP)                     ;xxhash_test.go:23
         .      4.69s     4f0096: CALL runtime.stringtoslicebyte(SB)         ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4f009b: MOVQ 0x20(SP), AX                           ;xxhash_test.go:23
     340ms      340ms     4f00a0: MOVQ 0x18(SP), CX                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4f00a5: MOVQ 0x28(SP), DX                           ;xxhash_test.go:23
         .          .     4f00aa: MOVQ CX, 0(SP)
         .          .     4f00ae: MOVQ AX, 0x8(SP)
     320ms      320ms     4f00b3: MOVQ DX, 0x10(SP)                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
      20ms      5.62s     4f00b8: CALL github.com/cespare/xxhash.Sum64(SB)
         .          .     4f00bd: MOVQ 0x30(SP), AX                           ;xxhash_test.go:23
         .          .     4f00c2: LEAQ 0x1(AX), CX                             ;xxhash_test.go:22
     250ms      250ms     4f00c6: MOVQ 0x18(SP), DX                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4f00cb: MOVQ 0x68(SP), AX                           ;xxhash_test.go:23
         .          .     4f00d0: MOVQ 0xf0(AX), BX                           ;xxhash_test.go:22
         .          .     4f00d7: CMPQ BX, CX
         .          .     4f00da: JL 0x4f0073
         .          .     4f00dc: MOVQ DX, 0x118d1d(IP)                   ;xxhash_test.go:25
         .          .     4f00e3: MOVQ 0x58(SP), BP                           ;xxhash_test.go:26
         .          .     4f00e8: ADDQ $0x60, SP
         .          .     4f00ec: RET
         .          .     4f00ed: CALL runtime.morestack_noctxt(SB)           ;xxhash_test.go:18
         .          .     4f00f2: ?
         .          .     4f00f3: ?
         .          .     4f00f4: ?
         .          .     4f00f5: ?
(pprof)

and here's the same output for 1.10beta2 (slow):

$ go1.10beta2 test -bench StringHash -benchtime 5s -cpuprofile go1.10beta2.prof
goos: linux
goarch: amd64
pkg: github.com/cespare/xxhash
BenchmarkStringHash-72          300000000               22.5 ns/op             0 B/op          0 allocs/op
PASS
ok      github.com/cespare/xxhash       9.209s
$ PPROF_BINARY_PATH=. go1.10beta2 tool pprof go1.10beta2.prof
File: xxhash.test
Type: cpu
Time: Jan 12, 2018 at 2:58am (UTC)
Duration: 9.21s, Total samples = 9s (97.76%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) disasm StringHash
Total: 9s
ROUTINE ======================== github.com/cespare/xxhash.BenchmarkStringHash
     2.46s         9s (flat, cum)   100% of Total
         .          .     4e7470: MOVQ FS:0xfffffff8, CX                         ;xxhash_test.go:18
         .          .     4e7479: CMPQ 0x10(CX), SP
         .          .     4e747d: JBE 0x4e7513
         .          .     4e7483: SUBQ $0x60, SP
         .          .     4e7487: MOVQ BP, 0x58(SP)
         .          .     4e748c: LEAQ 0x58(SP), BP
         .          .     4e7491: MOVQ 0x68(SP), AX
         .          .     4e7496: MOVB $0x1, 0x122(AX)                     ;benchmark.go:119
         .          .     4e749d: XORL CX, CX
         .          .     4e749f: XORL DX, DX
         .          .     4e74a1: JMP 0x4e74f6                             ;xxhash_test.go:22
         .          .     4e74a3: MOVQ CX, 0x30(SP)
         .          .     4e74a8: LEAQ 0x38(SP), AX                           ;xxhash_test.go:23
         .          .     4e74ad: MOVQ AX, 0(SP)
     120ms      120ms     4e74b1: LEAQ 0x4ca45(IP), AX                     ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4e74b8: MOVQ AX, 0x8(SP)                             ;xxhash_test.go:23
         .          .     4e74bd: MOVQ $0x10, 0x10(SP)
         .      1.64s     4e74c6: CALL runtime.stringtoslicebyte(SB)         ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4e74cb: MOVUPS 0x18(SP), X0                       ;xxhash_test.go:23
     1.92s      1.92s     4e74d0: MOVQ 0x28(SP), AX                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4e74d5: MOVUPS X0, 0(SP)                             ;xxhash_test.go:23
     260ms      260ms     4e74d9: MOVQ AX, 0x10(SP)                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .      4.90s     4e74de: CALL github.com/cespare/xxhash.Sum64(SB)
         .          .     4e74e3: MOVQ 0x30(SP), AX                           ;xxhash_test.go:23
         .          .     4e74e8: LEAQ 0x1(AX), CX                             ;xxhash_test.go:22
      80ms       80ms     4e74ec: MOVQ 0x18(SP), DX                           ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:23
         .          .     4e74f1: MOVQ 0x68(SP), AX                           ;xxhash_test.go:23
         .          .     4e74f6: MOVQ 0xf0(AX), BX                           ;xxhash_test.go:22
         .          .     4e74fd: CMPQ BX, CX
      80ms       80ms     4e7500: JL 0x4e74a3                               ;github.com/cespare/xxhash.BenchmarkStringHash xxhash_test.go:22
                                                                          ;xxhash_test.go:25
         .          .     4e7502: MOVQ DX, github.com/cespare/xxhash.result(SB)
         .          .     4e7509: MOVQ 0x58(SP), BP                           ;xxhash_test.go:26
         .          .     4e750e: ADDQ $0x60, SP
         .          .     4e7512: RET
         .          .     4e7513: CALL runtime.morestack_noctxt(SB)           ;xxhash_test.go:18
         .          .     4e7518: ?
         .          .     4e7519: PUSHL BX
         .          .     4e751a: ?
         .          .     4e751b: ?
(pprof)

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.release-blocker

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions