New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The benchmark harness runs a minimum of 300 iterations. #11010

Closed
huonw opened this Issue Dec 16, 2013 · 7 comments

Comments

Projects
None yet
8 participants
@huonw
Member

huonw commented Dec 16, 2013

After the change to handle benches taking longer than 1 ms, the test runner will run a minimum of 50 + 5 * 50 iterations of the piece of code of interest. This should probably scale back for very slow functions (at the cost of statistical accuracy).

@steveklabnik

This comment has been minimized.

Show comment
Hide comment
@steveklabnik

steveklabnik Feb 15, 2015

Member

related to #20142

Member

steveklabnik commented Feb 15, 2015

related to #20142

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Oct 26, 2015

This affects crypto-bench, in particular the PBKDF2 benchmarks. 300 seems like an arbitrary number, so it would be good to at least ensure that the minimum is based on some science.

briansmith commented Oct 26, 2015

This affects crypto-bench, in particular the PBKDF2 benchmarks. 300 seems like an arbitrary number, so it would be good to at least ensure that the minimum is based on some science.

@Craig-Macomber

This comment has been minimized.

Show comment
Hide comment
@Craig-Macomber

Craig-Macomber Dec 29, 2016

Contributor

I have a benchmark with a long setup time, and this runs the setup 101 times. Running the benchmark inner loop a lot of times would be pretty bad, but all those setup runs is far worse: it takes ~6 minutes.

For comparison, google benchmark (their C++ library) runs a comparable test setup once, and test 10 times, and takes about a second.

Contributor

Craig-Macomber commented Dec 29, 2016

I have a benchmark with a long setup time, and this runs the setup 101 times. Running the benchmark inner loop a lot of times would be pretty bad, but all those setup runs is far worse: it takes ~6 minutes.

For comparison, google benchmark (their C++ library) runs a comparable test setup once, and test 10 times, and takes about a second.

@Craig-Macomber

This comment has been minimized.

Show comment
Hide comment
@Craig-Macomber

Craig-Macomber Jan 2, 2017

Contributor

I have a branch https://github.com/Craig-Macomber/rust/tree/fast_bench that fixes this. The statistics are a work in progress, but in its current state seems to work pretty well. Its enough to unblock my work with slow benchmarks, but don't trust the data from it, the confidence interval math wrong in a few different ways.

This changes all the statistical logic, so its going to need some careful review before I even consider sending in a pull request.

Contributor

Craig-Macomber commented Jan 2, 2017

I have a branch https://github.com/Craig-Macomber/rust/tree/fast_bench that fixes this. The statistics are a work in progress, but in its current state seems to work pretty well. Its enough to unblock my work with slow benchmarks, but don't trust the data from it, the confidence interval math wrong in a few different ways.

This changes all the statistical logic, so its going to need some careful review before I even consider sending in a pull request.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum May 25, 2017

Member

Perhaps we should add a per-benchmark limit from the command line, at least for now.

Member

Mark-Simulacrum commented May 25, 2017

Perhaps we should add a per-benchmark limit from the command line, at least for now.

@joliss

This comment has been minimized.

Show comment
Hide comment
@joliss

joliss Jun 30, 2017

To add another similar use case: I've found myself wishing I had a function to benchmark something just once (or a fixed number of times), e.g. b.once, in addition to the b.iter method:

#[bench]
fn foo_bench(b: &mut Bencher) {
    b.once(|| {
        // Do something slow and expensive
    });
}

Alternatively, perhaps libtest could allow for printing timings for the entire function. It could do that either by detecting that we're not calling b.iter, or by inspecting the function signature to check if there is no Bencher parameter.

#[bench]
fn foo_bench(_b: &mut Bencher) {
    // Do something slow and expensive
}

Of course, running something once arguably doesn't give very reliable timings. The motivation for this is twofold though:

  • Some of our benchmarks are "integration benchmarks" that run on realistic complex datasets. If we run them many times it simply takes too long to be useful, but running them once or a few times would provide helpful data points.
  • We have a bunch of test cases to run our CPU profiler on. Arguably the profiling test cases produce useful benchmarking data on their own, so it seems reasonable to throw them in with the other benchmarks, even if it abuses the benchmarking facility slightly. In other words, we'll essentially run cargo bench --no-run and run the profiler on ./target/release/<binary> the_benchmark. But some of those profiling test cases are too slow for b.iter, so we'd only want to run them once -- this is definitely true for profiling, but probably also in the usual cargo bench run, to keep the running time in check.

joliss commented Jun 30, 2017

To add another similar use case: I've found myself wishing I had a function to benchmark something just once (or a fixed number of times), e.g. b.once, in addition to the b.iter method:

#[bench]
fn foo_bench(b: &mut Bencher) {
    b.once(|| {
        // Do something slow and expensive
    });
}

Alternatively, perhaps libtest could allow for printing timings for the entire function. It could do that either by detecting that we're not calling b.iter, or by inspecting the function signature to check if there is no Bencher parameter.

#[bench]
fn foo_bench(_b: &mut Bencher) {
    // Do something slow and expensive
}

Of course, running something once arguably doesn't give very reliable timings. The motivation for this is twofold though:

  • Some of our benchmarks are "integration benchmarks" that run on realistic complex datasets. If we run them many times it simply takes too long to be useful, but running them once or a few times would provide helpful data points.
  • We have a bunch of test cases to run our CPU profiler on. Arguably the profiling test cases produce useful benchmarking data on their own, so it seems reasonable to throw them in with the other benchmarks, even if it abuses the benchmarking facility slightly. In other words, we'll essentially run cargo bench --no-run and run the profiler on ./target/release/<binary> the_benchmark. But some of those profiling test cases are too slow for b.iter, so we'd only want to run them once -- this is definitely true for profiling, but probably also in the usual cargo bench run, to keep the running time in check.
@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay Nov 19, 2017

Member

I would like to track this as part of rust-lang/rfcs#816. That way we keep the standard test framework minimal while allowing libraries to develop more sophisticated custom test frameworks with knobs for number of iterations, access to statistics, control over CPU caches, etc.

Member

dtolnay commented Nov 19, 2017

I would like to track this as part of rust-lang/rfcs#816. That way we keep the standard test framework minimal while allowing libraries to develop more sophisticated custom test frameworks with knobs for number of iterations, access to statistics, control over CPU caches, etc.

@dtolnay dtolnay closed this Nov 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment