Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: don't truncate allocs/op #24631

Closed
teh-cmc opened this Issue Apr 1, 2018 · 11 comments

Comments

Projects
None yet
7 participants
@teh-cmc
Copy link

teh-cmc commented Apr 1, 2018

$ go version
go version go1.10.1 linux/amd64

testing.Benchmark gives me flaky allocation reports in the following case:

package main

import (
	"fmt"
	"testing"
)

func main() {
	var Eface interface{}
	res := testing.Benchmark(func(b *testing.B) {
		// i := 1, just in case, to avoid convT2E32's optimization wrt zero-values.
		for i := 1; i < b.N; i++ {
			Eface = uint32(i)
		}
	})
	fmt.Println(res.MemString())
}

The generated code for Eface = uint32(i) is what you'd expect:

;; Eface = uint32(i)
0x0050 00080 (main.go:12)	MOVL	CX, ""..autotmp_3+36(SP)
0x0054 00084 (main.go:12)	LEAQ	type.uint32(SB), AX
0x005b 00091 (main.go:12)	MOVQ	AX, (SP)
0x005f 00095 (main.go:12)	LEAQ	""..autotmp_3+36(SP), DX
0x0064 00100 (main.go:12)	MOVQ	DX, 8(SP)
0x0069 00105 (main.go:12)	PCDATA	$0, $1
0x0069 00105 (main.go:12)	CALL	runtime.convT2E32(SB)
0x006e 00110 (main.go:12)	MOVQ	24(SP), AX
0x0073 00115 (main.go:12)	MOVQ	16(SP), CX
0x0078 00120 (main.go:12)	MOVQ	"".&Eface+48(SP), DX
0x007d 00125 (main.go:12)	MOVQ	CX, (DX)
0x0080 00128 (main.go:12)	MOVL	runtime.writeBarrier(SB), CX
0x0086 00134 (main.go:12)	LEAQ	8(DX), DI
0x008a 00138 (main.go:12)	TESTL	CX, CX
0x008c 00140 (main.go:12)	JNE	148
0x008e 00142 (main.go:12)	MOVQ	AX, 8(DX)
0x0092 00146 (main.go:12)	JMP	46
0x0094 00148 (main.go:12)	CALL	runtime.gcWriteBarrier(SB)
0x0099 00153 (main.go:12)	JMP	46

Results:

$ go run main.go
       4 B/op	       1 allocs/op
$ go run main.go
       4 B/op	       1 allocs/op
$ go run main.go
       4 B/op	       1 allocs/op
$ go run main.go
       4 B/op	       0 allocs/op
$ go run main.go
       4 B/op	       0 allocs/op
$ go run main.go
       4 B/op	       1 allocs/op

I.e. while the number of allocated bytes per op is always correct, the number of allocations per op is sometimes wrong (expecting 1 allocs/op).

I guess I'm hitting some kind of rounding error here, but then again why would this behavior not be deterministic? Maybe due to the heuristics around the value of b.N?
I'm not sure what I'm missing here?

Thanks.

@teh-cmc

This comment has been minimized.

Copy link
Author

teh-cmc commented Apr 1, 2018

Just in case, here's the complete result string, which shows that the program is doing as much work in both cases:

$ go run main.go 
100000000	        12.7 ns/op        4 B/op	       1 allocs/op
$ go run main.go 
100000000	        12.4 ns/op        4 B/op	       0 allocs/op
$ go run main.go 
100000000	        12.3 ns/op        4 B/op	       1 allocs/op
$ go run main.go 
100000000	        12.2 ns/op        4 B/op	       0 allocs/op
@odeke-em

This comment has been minimized.

Copy link
Member

odeke-em commented Apr 2, 2018

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Apr 2, 2018

The number of reported allocs varies between 999999 and 1000003; when it' the former 0 allocs are reported, when it's the latter 1 alloc is reported.

Either we fix the flakiness in the memstat calls (I don't know if that can be done reliably) or we stop using integer truncated division to compute the result (where 999999 / 1000000 = 0) and instead we round to the nearest integer. This will give us more accurate results in the cases that are more likely to be flaky (very fast benchmarks that only allocate once or twice).

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Apr 2, 2018

Change https://golang.org/cl/104055 mentions this issue: testing: round (instead of truncating) in AllocsPerOp

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Apr 2, 2018

@teh-cmc, your benchmark loop is slightly non-linear, which I believe is what's causing this. The loop needs to repeat the benchmark b.N times, but your loop repeats it b.N-1 times. Either:

for i := 1; i < b.N+1; i++ {
	Eface = uint32(i)
}

Or, better, since it a keeps the standard benchmark loop form:

for i := 0; i < b.N; i++ {
	Eface = uint32(i+1)
}

Regardless, perhaps we shouldn't be truncating the reported allocs/op to an integer at all. The benchmark format is allowed to contain floating point numbers, so we could print out a few decimal places, like we already do for ns/op and MB/s.

@teh-cmc

This comment has been minimized.

Copy link
Author

teh-cmc commented Apr 2, 2018

Silly me. That makes sense @aclements, thanks.

I do agree with you and @ALTree that this shouldn't get truncated down to zero in any case though, that seems quite deceptive.

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Apr 2, 2018

In #19128 (comment), @rsc suggested:

I'd also like to have a check somewhere (in package testing or benchstat, probably the former) that b.N really is scaling linearly.

I wonder whether that check would be sensitive enough to catch the nonlinearity here.

At any rate, it sounds like just fixing the rounding would help. Shall we retitle the bug for that?

@aclements aclements changed the title testing: Benchmark reports flaky number of allocations testing: don't truncate allocs/op Apr 2, 2018

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Apr 2, 2018

OTOH, unlike ns/op and MB/s, allocs/op really is a whole number for a given iteration. I worry slightly that the random background allocations that aren't coupled to iterations (e.g., when a GC cycle runs) would almost always cause this number to be slightly above a whole number and that that could confuse people.

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Apr 2, 2018

Is allocs/op really a whole number? I would think that any API that involves a cache could legitimately end up with a fractional mean.

@bcmills bcmills added this to the Go1.11 milestone Apr 2, 2018

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Apr 2, 2018

For a given iteration, yes, it's a whole number. But caches are another good example of iteration-to-iteration variance and yet another example where summarizing a benchmark result distribution as just a mean can lose a great deal of information.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Apr 9, 2018

The code under test was buggy which caused the flake. When we added allocs/op we explicitly decided to truncate it, to report the number of allocs that happen every single time. It's too late to change that definition.

@rsc rsc closed this Apr 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.