Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: add B.Lap for phases of benchmarks #33194

Closed
seebs opened this issue Jul 19, 2019 · 15 comments
Closed

proposal: testing: add B.Lap for phases of benchmarks #33194

seebs opened this issue Jul 19, 2019 · 15 comments
Labels
Projects
Milestone

Comments

@seebs
Copy link
Contributor

@seebs seebs commented Jul 19, 2019

What version of Go are you using (go version)?

N/A (but 1.12.x)

What did you do?

I'm benchmarking some stuff. I have a longish and complicated process with multiple parts which are hard to separate or run independently. (Approximately in the vicinity of setup/teardown code, where the setup has to be matched by paired teardowns). Was looking at benchmarking options.

Thought of a thing that I'm not sure is all that useful, but the more I think about it, the more I think it would have desireable consequences.

What did you expect to see?

expect is a strong term.

What I'd like is a way to handle sub-benchmarks where it's not possible to just run each one 1..N times separately, but where the entire procedure has to be run 1..N times, but I'd like to track different parts of it.

What did you see instead?

This is sort of possible to emulate-ish using StartTimer/StopTimer, but there's a performance problem here; if one of the sections takes significantly less time than the others, running the entire procedure enough times that that section takes over a second to run takes a very long time, and is highly redundant. This is the sort of thing for which stopwatches have lap timers -- sub timers that can be captured while the main timer continues, and then reviewed separately.

So the idea I have is something like:

func BenchFoo(b *testing.B) {
	for i := 0; i < b.N; i++ {
		b.Lap("foo_setup", func(b *testing.B) {
			// setup code
		})
		b.Lap("foo_process", func(b *testing.B) {
			// functional code
		})
		b.Lap("foo_teardown", func(b *testing.B) {
			// cleanup code
		}
	}
}

And now we get to the kicker: The reason I think this is actually a sort of good idea.

Right now, the above would probably be written as:

func BenchFoo(b *testing.B) {
	b.StopTimer()
	for i := 0; i < b.N; i++ {
		// setup code
		b.StartTimer()
		// functional code
		b.StopTimer()
		// cleanup code
	}
}

Often, when I've had problems figuring out why benchmarks are taking so long to run even though the performance reported is good, the reason is issues in the performance of the setup/cleanup code.

If this feature existed, it would be close to free to report on the performance of those sections too -- and in several cases, if I'd been tracking that performance, I'd have noticed a serious performance regression in something that I wasn't consciously thinking about, but which was impacted by a change I was making.

Basically, I want to encourage people to collect the data, since for some of the benchmarks I do, the setup or cleanup code can easily be 80% of the runtime of a benchmark, and if I'm gonna be spending the CPU cycles to run it, it'd be nice to be able to track it without preventing me from getting good data on the performance of the inner bit.

(I haven't thought very carefully about the API, but I think it probably does want to get an inner b, because it might want to record custom stats, etc.)

@gopherbot gopherbot added this to the Proposal milestone Jul 19, 2019
@gopherbot gopherbot added the Proposal label Jul 19, 2019
@rsc
Copy link
Contributor

@rsc rsc commented Sep 25, 2019

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Sep 26, 2019

@seebs benchmark package supports https://golang.org/pkg/testing/#B.ReportMetric, which can be used to report custom metrics. As an example https://godoc.org/github.com/loov/hrtime/hrtesting adds percentiles to the output, such that it gives:

pkg: github.com/loov/hrtime/hrtesting
BenchmarkReport-32        3361303    360 ns/op    300 ns/p50     400 ns/p90     400 ns/p99
BenchmarkTSCReport-32     8682408    135 ns/op    120 tsc/p50    130 tsc/p90    150 tsc/p99
PASS

A similar package that would report lap times wouldn't be much more complicated:

func BenchFoo(b *testing.B) {
	laps := laps.NewBenchmark(b)
	defer laps.Report()

	setup := laps.Stage("setup")
	process := laps.Stage("process")
	teardown := laps.Stage("teardown")

	for laps.Next() {
		laps.Enter(setup)
		// setup code
		laps.Enter(process)
		// processing
		laps.Enter(teardown)
		// cleanup code
	}
}

The reason it's done that way, is because it can have lower measurement overhead.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Sep 26, 2019

Depending on your situation you can also measure the overhead separately:

func BenchFoo(b *testing.B) {
	for _, run := range []bool{false, true} {
		name := "Test"
		if !run {
			name = "Overhead"
		}
		b.Run(name, func(b *testing.B){
			for i := 0; i < b.N; i++ {
				// setup code
				if run {
					// process code
				}
				// cleanup code
			}
		})
	}
}

It can be made more concise with some package.

@seebs
Copy link
Contributor Author

@seebs seebs commented Sep 26, 2019

My concern with that second example is that while I can measure the overhead separately, I'm still running the overhead again to measure it separately, rather than just getting the measurements separately. Also, it's not obvious to me how I could implement that "laps" package -- I could record timestamps, I guess. But what about recording allocations, for instance? It just seems like being able to record sub-benchmarks, and sum them, would be a lot simpler. (Especially if I start adding custom metrics, because then I have to manage my custom metrics and my lap timer.)

@rsc rsc added this to Incoming in Proposals Dec 4, 2019
@rsc
Copy link
Contributor

@rsc rsc commented Apr 8, 2020

It's unclear that there's a new core concept here that belongs in the testing package proper. It does seem like the timing of subpieces can be done outside the package, as custom metrics, and maybe that would be a good place to start.

(It's also true that things like counting allocations for different phases can't be done as easily as custom metrics, but it's hard to make the phases completely orthogonal from all the things you might want to do.)

@rsc rsc moved this from Incoming to Active in Proposals Apr 8, 2020
@rsc rsc changed the title proposal: testing (*B).Lap(string name, b *testing.B) for sub-benchmarks of a single process proposal: testing: add B.Lap for phases of benchmarks Apr 8, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Apr 15, 2020

(Also, @ianlancetaylor points out b.Lap is the wrong name. A lap is something that keeps getting repeated. Maybe each of the b.N top-level iterations is a lap.)

I wrote last week that it seemed like sub-part timing should start outside this package. No responses. Does anyone object to that?

@seebs
Copy link
Contributor Author

@seebs seebs commented Apr 15, 2020

Yeah, "Lap" is probably the wrong name. "Segment" maybe.

I don't see a way for an external package to hook into the benchmark package's ability to measure allocations, for instance; would we be assuming that external packages would be duplicating that logic, then?

I think the problem here is that I suspect that a really nice implementation of this may need to be inside testing to get access to the knobs and levers it wants to do a good job of accumulating sample data; at a bare minimum, it'll have to duplicate a lot of code. But it may also be a good idea to have that experience to draw on before trying to design an interface for a thing like this that will work in practice. But on the Nth hand, it might be that an interface which is possible without that internal access would be very different from one that works with that internal access.

So I think the answer is "I may well some day have a very good objection to this being outside testing, but I don't think I'll know what it is unless I first try to write the external package and see how it works."

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 15, 2020

@seebs There's https://golang.org/pkg/runtime/#ReadMemStats for reading memory stats. Based on experience of writing hrtesting, I would think that it will need to duplicate very little code.

Here's a quick example for implementing phases: https://play.golang.org/p/wVZIA4TODxL. The API can be improved, but should suffice as a proof-of-concept.

@seebs
Copy link
Contributor Author

@seebs seebs commented Apr 15, 2020

I tried doing ReadMemStats in a loop at one point and it was about two orders of magnitude of performance hit on a loop that I thought would be "expensive" to run. The issue being that it's doing a ton of additional work that isn't actually necessary for the benchmem results, I think.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Apr 15, 2020

Benchmarking code uses the same ReadMemStats func: https://github.com/golang/go/blob/master/src/testing/benchmark.go#L122. One option would be to exclude the time spent inside the ReadMemStats from the results. e.g. add that time to some "overhead phase".

@seebs
Copy link
Contributor Author

@seebs seebs commented Apr 15, 2020

... huh, now I'm curious about how expensive ReadMemStats is itself, and whether something else went wrong in my thing that had performance problems when I was using it.

Answer: It stops the world and runs with systemstack, which is very cheap in a benchmark, and insanely expensive in a production app actually running 30+ active goroutines.

@josharian
Copy link
Contributor

@josharian josharian commented Apr 15, 2020

Related: #20875

@josharian
Copy link
Contributor

@josharian josharian commented Apr 15, 2020

Also possibly related to your last comment: #19812

@rsc
Copy link
Contributor

@rsc rsc commented Apr 22, 2020

I asked last week if anyone objected to this support starting outside the testing package. I don't see any objections to that. Good discussion, but no objections.

Based on the discussion, this seems like a likely decline.

@rsc rsc moved this from Active to Likely Decline in Proposals Apr 22, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Apr 29, 2020

No change in consensus; declined.

@rsc rsc closed this Apr 29, 2020
@rsc rsc moved this from Likely Decline to Declined in Proposals Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Declined
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.