New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: add B method for adding stats #26037

Open
josharian opened this Issue Jun 25, 2018 · 16 comments

Comments

Projects
None yet
6 participants
@josharian
Contributor

josharian commented Jun 25, 2018

Package testing has built-in support for three statistics: wall time, allocations, and memory allocated. There is discussion of adding more: #24918.

I propose that we also expose to users the ability to report their own benchmark-specific statistics. An example comes from package sort (proximally via #26031): The number of calls to Swap and Less.

Proposed API:

func (b *B) ReportStats(n int, units string)

This will cause benchmark invocations to include <n/b.N> <units>/op in the benchmark output. Then the sort of output reported in #26031 could be used as input to benchstat.

@gopherbot gopherbot added this to the Proposal milestone Jun 25, 2018

@gopherbot gopherbot added the Proposal label Jun 25, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 25, 2018

On a second read, perhaps ReportMetric or just Report would be a better name. And perhaps it should accept an int64.

@rsc

This comment has been minimized.

Contributor

rsc commented Jul 9, 2018

Not a float64?

/cc @aclements

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 16, 2018

Not a float64?

I couldn't think of any non-integral metrics, and an int has the advantage of making it clear that the intended usage is to report a total count, not count/b.N. But I'm not opposed to a float64.

We'd also want to make sure that downstream tools (like benchstat) are prepared to parse floats, on pain of ending up in a weird situation similar to testing.AllocsPerRun: AllocsPerRun returns the average number of allocations during calls to f. Although the return value has type float64, it will always be an integral value.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 16, 2018

Seems fine to me and @ianlancetaylor, @aclements, @rsc?

@aclements

This comment has been minimized.

Member

aclements commented Jul 20, 2018

Yes please!

Though I'm not so sure about implicitly dividing by b.N or automatically adding "/op". Many custom benchmarks report metrics that are not per-op. gcbench reports many metrics that are aggregated differently across iterations, such as quantiles and time-rates. x/benchmarks reports things like peak RSS (arguably also a quantile). benchcmd is similar.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

In this case, the type should clearly be float64, though I would argue that it should be float64 even if the framework implicitly divides by b.N. We already made this mistake with the most common metric: "time" is not an integral resource, and we have to multiply it by 109 in order to pretend that it is.

@aclements

This comment has been minimized.

Member

aclements commented Jul 20, 2018

This may be a dup of #16110, though the discussion there got a bit confusing.

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 20, 2018

This may be a dup of #16110, though the discussion there got a bit confusing.

I agree. I'll close that one in favor of this, since this one seems to be moving along.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

I'm inclined to add only a single method, and use examples and docs to get the rest of the way there. About out of laptop time, will update this proposal with something concrete soon.

@josharian

This comment has been minimized.

Contributor

josharian commented Jul 22, 2018

New proposal:

// ReportMetric adds n/unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

Examples to include something like a swap count (sorting) and a cache hit rate (sync.Pool? other?).

BenchmarkResult should also have a new field:

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I'm inclined to answer overwrite and panic, respectively, although documenting that is annoying.

Ready for a new round of input, thanks!

@aclements

This comment has been minimized.

Member

aclements commented Jul 23, 2018

SGTM.

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I agree that it should overwrite if called multiple times with the same unit (anything else seems more annoying or needlessly constrained).

I worry more about panicking on reserved units. Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future (e.g., for #21295). Since the set is somewhat dynamic, that would make the panicking behavior somewhat unpredictable. I would lean toward overwriting the built-in measure, but that's a bit messy to implement. And, arguably, if a later call to ReportMetric overwrites the earlier value, then it should just ignore reserved units, since those are being reported by the benchmark framework at the end of the benchmark and hence would "override" any user-reported values.

How does this interact with parallel benchmarks?

@rsc

This comment has been minimized.

Contributor

rsc commented Jul 23, 2018

In the description two messages up, "n/unit" should be "n unit" I think. Otherwise this seems OK. Will leave for Austin to accept the proposal when ready.

@rsc

This comment has been minimized.

Contributor

rsc commented Aug 6, 2018

@josharian, @aclements, any ideas about parallel benchmarks or other details needed to accept this proposal?

@aclements

This comment has been minimized.

Member

aclements commented Aug 6, 2018

I think if we're taking the value and unit from the user at face value, parallel benchmarks don't present any particular complication. We'll just report the value and unit in the benchmark line, whether or not it's parallel. (I'm not sure what, if anything, I had in mind when I questioned how this would interact with parallel benchmarks.)

So, tweaking Josh's proposal, how about:

// ReportMetric adds n unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// If ReportMetric is called multiple times for the same unit,
// later calls override earlier calls.
// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

plus extending BenchmarkResult with

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 7, 2018

@aclements LGTM, thanks.

@aclements

This comment has been minimized.

Member

aclements commented Aug 7, 2018

I started prototyping this and ran into two problems.

// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.

I'm not sure how to implement this. The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics. This makes me think that maybe it should instead panic on these.

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 10, 2018

The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics.

I'm inclined to say that we do what it says on the box: Extra records additional metrics, reported by ReportMetric. So we don't touch the original fields in testing.BenchmarkResult; everything that comes in via ReportMetric goes into BenchmarkResult.Extra, full stop. And then consult Extra to do the right thing in BenchmarkResult.String, BenchmarkResult.NsPerOp, and so on, and document that in those methods.

(I was originally drawn to panicking here. You asked, though: 'Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future?' I think the answer to the first question is probably "always", but I don't see any good answer to the second question.)

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

Agreed.

@mmcloughlin

This comment has been minimized.

Contributor

mmcloughlin commented Oct 26, 2018

Reminds me of "User-defined counters" in Google Benchmark: https://github.com/google/benchmark#user-defined-counters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment