Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: consider calling ReadMemStats less during benchmarking #20875

Open
josharian opened this Issue Jun 30, 2017 · 14 comments

Comments

Projects
None yet
7 participants
@josharian
Copy link
Contributor

josharian commented Jun 30, 2017

https://golang.org/cl/36791 reduced the number of times benchmarks call ReadMemStats. However, the implementation was incorrect (#20590, #20863), and it was rolled back (https://golang.org/cl/46612 and https://golang.org/cl/47350).

The rationale for the rollback is that ReadMemStats is now fast (https://golang.org/cl/34937). However, using tip as of June 30, 2017 (445652f), cpuprofiling of package sort's benchmarks still shows almost half of all execution time in ReadMemStats. See sort-cpu.pdf.

So for 1.10, either ReadMemStats should be made cheaper still, or we should re-roll https://golang.org/cl/36791 with a better implementation; see the initial patchsets of https://golang.org/cl/46612 for improvements.

cc @bradfitz @aclements @meirf @ALTree

@josharian josharian added this to the Go1.10 milestone Jun 30, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jun 30, 2017

Sorry about that. I suspected it might be the case that ReadMemStats would still be noticeably expensive after the rollback, but I also suspected you would file this very bug. I would've been happy rolling forward a few weeks ago, but at this point we're trying to lock down for a release.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jun 30, 2017

I also suspected you would file this very bug.

How did you guess? :)

at this point we're trying to lock down for a release.

But of course; not a problem. Happy to fix in 1.10 instead.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jul 3, 2017

Interesting. Do you happen to know how many times ReadMemStats is called versus how much time each call takes? After I optimized it I wasn't able to get it to take more than a few microseconds even under stress testing.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jul 11, 2017

Do you happen to know how many times ReadMemStats is called versus how much time each call takes?

ReadMemStats gets called a lot--every iteration, via StartTimer/StopTimer, e.g. at https://github.com/golang/go/blob/master/src/sort/sort_test.go#L640.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jul 17, 2017

ReadMemStats gets called a lot--every iteration, via StartTimer/StopTimer, e.g. at https://github.com/golang/go/blob/master/src/sort/sort_test.go#L640.

Oh, geez, okay. :)

If there's a way to safely avoid ReadMemStats, that's fine, but I also don't think we should over-optimize for this one weirdness in this one benchmark

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jul 21, 2017

If there's a way to safely avoid ReadMemStats, that's fine, but I also don't think we should over-optimize for this one weirdness in this one benchmark

I think there is, more or less described above. And for better or for worse, called StartTimer/StopTimer every iteration is not all that rare. :) Anyway, this is on my plate to try again for 1.10.

@iand

This comment has been minimized.

Copy link
Contributor

iand commented Sep 26, 2018

How about an opt-in approach: a new SkipMemStats method on B. When called it would disable the collection of memory usage. Running the benchmark with -test.benchmem would override SkipMemStats as would a later call to ReportAllocs in the same benchmark if people cared to do that.

@iand

This comment has been minimized.

Copy link
Contributor

iand commented Sep 27, 2018

I implemented the opt-in I suggested above and tried it out on the sort package tests. I found it made very little difference in the actual benchmark. This is on linux/amd64.

The cpu profile of master on my system shows that ReadMemStats has negligible impact. This may just be an issue with the speed of ReadMemStats on some architectures. cc @josharian @aclements

Benchmarks of sort when calling SkipMemStats compared to master:

benchmark                         old ns/op      new ns/op      delta
BenchmarkSearchWrappers-8         88.1           89.3           +1.36%
BenchmarkSortString1K-8           137554         139288         +1.26%
BenchmarkSortString1K_Slice-8     105527         104311         -1.15%
BenchmarkStableString1K-8         193180         195908         +1.41%
BenchmarkSortInt1K-8              82989          84371          +1.67%
BenchmarkStableInt1K-8            98501          99706          +1.22%
BenchmarkStableInt1K_Slice-8      53439          53342          -0.18%
BenchmarkSortInt64K-8             8134449        8232955        +1.21%
BenchmarkSortInt64K_Slice-8       4556906        4604299        +1.04%
BenchmarkStableInt64K-8           8856651        8728228        -1.45%
BenchmarkSort1e2-8                46021          43748          -4.94%
BenchmarkStable1e2-8              90498          89224          -1.41%
BenchmarkSort1e4-8                9528589        9573428        +0.47%
BenchmarkStable1e4-8              28448203       28591226       +0.50%
BenchmarkSort1e6-8                1484072102     1470491590     -0.92%
BenchmarkStable1e6-8              5864716595     5833299379     -0.54%
@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Sep 27, 2018

@iand you should run each benchmark multiple times (see the -count flag of go test), and then compare the output using benchstat (instad of benchcmp). It'll give you a report that is less likely to be infuenced by random external noise during the test.

@iand

This comment has been minimized.

Copy link
Contributor

iand commented Sep 27, 2018

@ALTree I did that too, but didn't post the results here since the profile showed that ReadMemStats was not affecting the run times on my system

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Oct 9, 2018

How about an opt-in approach

The approach I sketched (in the linked CLs) should not require any API. I still think it is worth exploring, and it remains in my queue.


I originally found that:

cpuprofiling of package sort's benchmarks still shows almost half of all execution time in ReadMemStats

@iand, you report:

The cpu profile of master on my system shows that ReadMemStats has negligible impact.

That's an interesting mismatch. I will have to re-measure to see whether I can still reproduce.

@mvdan

This comment has been minimized.

Copy link
Member

mvdan commented Jan 13, 2019

I agree with Josh that calling StartTimer and StopTimer at every iteration isn't rare. For example, I tend to do that often when the setup/cleanup cost per benchmark iteration is comparable to the benchmark work itself.

And this is a big problem when both of these costs are small. For example, right now I was benchmarking a function that took just under a microsecond, and the pprof cpu profile showed that ReadMemStats took over 60% of the CPU, while my function took less than 3%.

Removing Start/StopTimer from my benchmark is a possibility, but then I'm not really benchmarking my function anymore. The numbers could get better or worse by only changing the setup/cleanup code.

The cost of starting and stopping the timer can also throw off -benchtime=duration estimates, if its cost is much larger than the benchmarked func itself. For example, see #27217 (comment), where a go test -benchtime=0.1s ran for over five seconds. I can provide a small standalone example if that would help illustrate the point.

@iand

This comment has been minimized.

Copy link
Contributor

iand commented Jan 13, 2019

@mvdan what OS are you testing on? Are you able to repeat the results across Linux/Win/Mac?

@mvdan

This comment has been minimized.

Copy link
Member

mvdan commented Jan 13, 2019

I am on go version devel +5f699e400a Thu Jan 10 21:21:34 2019 +0000 linux/amd64. I'd like to have Windows/Mac machines around to test things with, but I'd rather not pay for those :)

mvdan added a commit to mvdan/benchinit that referenced this issue Jan 14, 2019

remove StartTimer/StopTimer from the bench loop
It can completely mess up the benchmark numbers for init functions that
were too small. Moreover, it could make the -benchtime estimates be way
off. For example, 'benchinit cmd/go' was taking over a minute to run the
benchmark, instead of the expected ~1s.

The benchtime estimate being off is likely the upstream issue
golang/go#27217.

The fact that StartTimer and StopTimer are expensive is being tracked in
golang/go#20875.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.