Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: add B method for adding stats #26037

Closed
josharian opened this Issue Jun 25, 2018 · 32 comments

Comments

Projects
None yet
10 participants
@josharian
Copy link
Contributor

josharian commented Jun 25, 2018

Package testing has built-in support for three statistics: wall time, allocations, and memory allocated. There is discussion of adding more: #24918.

I propose that we also expose to users the ability to report their own benchmark-specific statistics. An example comes from package sort (proximally via #26031): The number of calls to Swap and Less.

Proposed API:

func (b *B) ReportStats(n int, units string)

This will cause benchmark invocations to include <n/b.N> <units>/op in the benchmark output. Then the sort of output reported in #26031 could be used as input to benchstat.

@gopherbot gopherbot added this to the Proposal milestone Jun 25, 2018

@gopherbot gopherbot added the Proposal label Jun 25, 2018

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jun 25, 2018

On a second read, perhaps ReportMetric or just Report would be a better name. And perhaps it should accept an int64.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jul 9, 2018

Not a float64?

/cc @aclements

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jul 16, 2018

Not a float64?

I couldn't think of any non-integral metrics, and an int has the advantage of making it clear that the intended usage is to report a total count, not count/b.N. But I'm not opposed to a float64.

We'd also want to make sure that downstream tools (like benchstat) are prepared to parse floats, on pain of ending up in a weird situation similar to testing.AllocsPerRun: AllocsPerRun returns the average number of allocations during calls to f. Although the return value has type float64, it will always be an integral value.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jul 16, 2018

Seems fine to me and @ianlancetaylor, @aclements, @rsc?

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jul 20, 2018

Yes please!

Though I'm not so sure about implicitly dividing by b.N or automatically adding "/op". Many custom benchmarks report metrics that are not per-op. gcbench reports many metrics that are aggregated differently across iterations, such as quantiles and time-rates. x/benchmarks reports things like peak RSS (arguably also a quantile). benchcmd is similar.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

In this case, the type should clearly be float64, though I would argue that it should be float64 even if the framework implicitly divides by b.N. We already made this mistake with the most common metric: "time" is not an integral resource, and we have to multiply it by 109 in order to pretend that it is.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jul 20, 2018

This may be a dup of #16110, though the discussion there got a bit confusing.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jul 20, 2018

This may be a dup of #16110, though the discussion there got a bit confusing.

I agree. I'll close that one in favor of this, since this one seems to be moving along.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

I'm inclined to add only a single method, and use examples and docs to get the rest of the way there. About out of laptop time, will update this proposal with something concrete soon.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jul 22, 2018

New proposal:

// ReportMetric adds n/unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

Examples to include something like a swap count (sorting) and a cache hit rate (sync.Pool? other?).

BenchmarkResult should also have a new field:

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I'm inclined to answer overwrite and panic, respectively, although documenting that is annoying.

Ready for a new round of input, thanks!

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jul 23, 2018

SGTM.

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I agree that it should overwrite if called multiple times with the same unit (anything else seems more annoying or needlessly constrained).

I worry more about panicking on reserved units. Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future (e.g., for #21295). Since the set is somewhat dynamic, that would make the panicking behavior somewhat unpredictable. I would lean toward overwriting the built-in measure, but that's a bit messy to implement. And, arguably, if a later call to ReportMetric overwrites the earlier value, then it should just ignore reserved units, since those are being reported by the benchmark framework at the end of the benchmark and hence would "override" any user-reported values.

How does this interact with parallel benchmarks?

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jul 23, 2018

In the description two messages up, "n/unit" should be "n unit" I think. Otherwise this seems OK. Will leave for Austin to accept the proposal when ready.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Aug 6, 2018

@josharian, @aclements, any ideas about parallel benchmarks or other details needed to accept this proposal?

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Aug 6, 2018

I think if we're taking the value and unit from the user at face value, parallel benchmarks don't present any particular complication. We'll just report the value and unit in the benchmark line, whether or not it's parallel. (I'm not sure what, if anything, I had in mind when I questioned how this would interact with parallel benchmarks.)

So, tweaking Josh's proposal, how about:

// ReportMetric adds n unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// If ReportMetric is called multiple times for the same unit,
// later calls override earlier calls.
// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

plus extending BenchmarkResult with

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Aug 7, 2018

@aclements LGTM, thanks.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Aug 7, 2018

I started prototyping this and ran into two problems.

// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.

I'm not sure how to implement this. The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics. This makes me think that maybe it should instead panic on these.

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Aug 10, 2018

The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics.

I'm inclined to say that we do what it says on the box: Extra records additional metrics, reported by ReportMetric. So we don't touch the original fields in testing.BenchmarkResult; everything that comes in via ReportMetric goes into BenchmarkResult.Extra, full stop. And then consult Extra to do the right thing in BenchmarkResult.String, BenchmarkResult.NsPerOp, and so on, and document that in those methods.

(I was originally drawn to panicking here. You asked, though: 'Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future?' I think the answer to the first question is probably "always", but I don't see any good answer to the second question.)

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

Agreed.

@mmcloughlin

This comment has been minimized.

Copy link
Contributor

mmcloughlin commented Oct 26, 2018

Reminds me of "User-defined counters" in Google Benchmark: https://github.com/google/benchmark#user-defined-counters

@jfn7

This comment has been minimized.

Copy link

jfn7 commented Jan 11, 2019

What is the status on this work?

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Jan 11, 2019

There are some outstanding questions—see Austin’s last post, above.

Austin, perhaps you could post your WIP CL so someone could see whether my proposed answers above make sense in context and/or suggestion other answers. (That someone might be me, but I can’t commit to doing it in a timely way.)

@jfn7

This comment has been minimized.

Copy link

jfn7 commented Jan 11, 2019

Ok, I would love to see progress made on this topic. I might be able to spend some cycles on this.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 29, 2019

Change https://golang.org/cl/160097 mentions this issue: linux/perf: API proposal

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Mar 8, 2019

// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.

I found a use case for this. I just wrote a benchmark where the only purpose is to measure STW time. The ns/op of the benchmark itself are irrelevant, so it would be nice to override this metric so the "op" is GC STW.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Mar 9, 2019

I found a use case for this.

Nice. Makes sense. I think that my proposed approach should be able to handle this.

I started prototyping this

Can you put up your prototype, whatever shape it is in? Maybe we can collectively get it finished.

@cespare

This comment has been minimized.

Copy link
Contributor

cespare commented Mar 9, 2019

I've been following this issue and noticing many times when it would be helpful in my own work. I am also available to work on or review this feature in the next couple of months. Is it feasible to get this in for 1.13?

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Mar 11, 2019

One other detail question: currently, if MB/s is 0 we don't report it (which is probably just an artifact of that being the default value), but we report ns/op even if it's 0. Should we not? That will never happen in a "real" benchmark, and would give a way for users to suppress the ns/op metric if it wasn't meaningful for a particular benchmark.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 11, 2019

Change https://golang.org/cl/166717 mentions this issue: testing: add B.ReportMetric for custom benchmark metrics

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Mar 11, 2019

Should we not? That will never happen in a "real" benchmark, and would give a way for users to suppress the ns/op metric if it wasn't meaningful for a particular benchmark.

Suppressing it when 0 sounds sensible to me.

@seebs

This comment has been minimized.

Copy link
Contributor

seebs commented Mar 17, 2019

One possible flaw with suppressing a metric when zero: benchcmp or benchstat.

Consider what happens if I try to run comparisons between two benchmark result sets, and one of them actually manages to, say, eliminate allocations from a particular use case. I no longer have a reported allocations for that benchmark, which means the comparison may not work as expected.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Mar 17, 2019

In the discussion in the CL we (tentatively) settled zero-is-special only for ns/op.

@gopherbot gopherbot closed this in 834d229 Mar 19, 2019

@janusdn

This comment has been minimized.

Copy link

janusdn commented Mar 20, 2019

In which release should we expect to see this cool functionality?

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Mar 20, 2019

It's now committed, so it should appear in the next release (Go 1.13).

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Mar 20, 2019

Probably 1.13. (There’s always a chance it gets reverted.)

@janusdn

This comment has been minimized.

Copy link

janusdn commented Mar 20, 2019

Thanks, looking forward to using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.