New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: consider calling ReadMemStats less during benchmarking #20875

Open
josharian opened this Issue Jun 30, 2017 · 11 comments

Comments

Projects
None yet
6 participants
@josharian
Contributor

josharian commented Jun 30, 2017

CL 36791 reduced the number of times benchmarks call ReadMemStats. However, the implementation was incorrect (#20590, #20863), and it was rolled back (CLs 46612 and 47350).

The rationale for the rollback is that ReadMemStats is now fast (CL 34937). However, using tip as of June 30, 2017 (445652f), cpuprofiling of package sort's benchmarks still shows almost half of all execution time in ReadMemStats. See sort-cpu.pdf.

So for 1.10, either ReadMemStats should be made cheaper still, or we should re-roll CL 36791 with a better implementation; see the initial patchsets of CL 44612 for improvements.

cc @bradfitz @aclements @meirf @ALTree

@josharian josharian added this to the Go1.10 milestone Jun 30, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 30, 2017

Sorry about that. I suspected it might be the case that ReadMemStats would still be noticeably expensive after the rollback, but I also suspected you would file this very bug. I would've been happy rolling forward a few weeks ago, but at this point we're trying to lock down for a release.

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 30, 2017

I also suspected you would file this very bug.

How did you guess? :)

at this point we're trying to lock down for a release.

But of course; not a problem. Happy to fix in 1.10 instead.

@aclements

This comment has been minimized.

Member

aclements commented Jul 3, 2017

Interesting. Do you happen to know how many times ReadMemStats is called versus how much time each call takes? After I optimized it I wasn't able to get it to take more than a few microseconds even under stress testing.

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 11, 2017

Do you happen to know how many times ReadMemStats is called versus how much time each call takes?

ReadMemStats gets called a lot--every iteration, via StartTimer/StopTimer, e.g. at https://github.com/golang/go/blob/master/src/sort/sort_test.go#L640.

@aclements

This comment has been minimized.

Member

aclements commented Jul 17, 2017

ReadMemStats gets called a lot--every iteration, via StartTimer/StopTimer, e.g. at https://github.com/golang/go/blob/master/src/sort/sort_test.go#L640.

Oh, geez, okay. :)

If there's a way to safely avoid ReadMemStats, that's fine, but I also don't think we should over-optimize for this one weirdness in this one benchmark

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 21, 2017

If there's a way to safely avoid ReadMemStats, that's fine, but I also don't think we should over-optimize for this one weirdness in this one benchmark

I think there is, more or less described above. And for better or for worse, called StartTimer/StopTimer every iteration is not all that rare. :) Anyway, this is on my plate to try again for 1.10.

@iand

This comment has been minimized.

Contributor

iand commented Sep 26, 2018

How about an opt-in approach: a new SkipMemStats method on B. When called it would disable the collection of memory usage. Running the benchmark with -test.benchmem would override SkipMemStats as would a later call to ReportAllocs in the same benchmark if people cared to do that.

@iand

This comment has been minimized.

Contributor

iand commented Sep 27, 2018

I implemented the opt-in I suggested above and tried it out on the sort package tests. I found it made very little difference in the actual benchmark. This is on linux/amd64.

The cpu profile of master on my system shows that ReadMemStats has negligible impact. This may just be an issue with the speed of ReadMemStats on some architectures. cc @josharian @aclements

Benchmarks of sort when calling SkipMemStats compared to master:

benchmark                         old ns/op      new ns/op      delta
BenchmarkSearchWrappers-8         88.1           89.3           +1.36%
BenchmarkSortString1K-8           137554         139288         +1.26%
BenchmarkSortString1K_Slice-8     105527         104311         -1.15%
BenchmarkStableString1K-8         193180         195908         +1.41%
BenchmarkSortInt1K-8              82989          84371          +1.67%
BenchmarkStableInt1K-8            98501          99706          +1.22%
BenchmarkStableInt1K_Slice-8      53439          53342          -0.18%
BenchmarkSortInt64K-8             8134449        8232955        +1.21%
BenchmarkSortInt64K_Slice-8       4556906        4604299        +1.04%
BenchmarkStableInt64K-8           8856651        8728228        -1.45%
BenchmarkSort1e2-8                46021          43748          -4.94%
BenchmarkStable1e2-8              90498          89224          -1.41%
BenchmarkSort1e4-8                9528589        9573428        +0.47%
BenchmarkStable1e4-8              28448203       28591226       +0.50%
BenchmarkSort1e6-8                1484072102     1470491590     -0.92%
BenchmarkStable1e6-8              5864716595     5833299379     -0.54%
@ALTree

This comment has been minimized.

Member

ALTree commented Sep 27, 2018

@iand you should run each benchmark multiple times (see the -count flag of go test), and then compare the output using benchstat (instad of benchcmp). It'll give you a report that is less likely to be infuenced by random external noise during the test.

@iand

This comment has been minimized.

Contributor

iand commented Sep 27, 2018

@ALTree I did that too, but didn't post the results here since the profile showed that ReadMemStats was not affecting the run times on my system

@josharian

This comment has been minimized.

Contributor

josharian commented Oct 9, 2018

How about an opt-in approach

The approach I sketched (in the linked CLs) should not require any API. I still think it is worth exploring, and it remains in my queue.


I originally found that:

cpuprofiling of package sort's benchmarks still shows almost half of all execution time in ReadMemStats

@iand, you report:

The cpu profile of master on my system shows that ReadMemStats has negligible impact.

That's an interesting mismatch. I will have to re-measure to see whether I can still reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment