New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not run outer setup part of benchmarks multiple times to fix issue 20142 #38779
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
This is very nice, it could also bring variance down a bit in some benchmarks (heap fragmentation, stuff crossing page/cache_lines, etc). |
This seems reasonable to me, though I don't know enough about why the benchmarking infrastructure is designed the way it is. |
Thanks for the PR @Craig-Macomber! I don't personally know why the benchmark harness was set up this way and I doubt there's anyone around who does still, so this seems like an ok change to me. Can you elaborate on why |
BenchMode is used to indicate if iter should actually do its bench marking, or just run it once (which happens when running as a test instead of a benchmark via run_once called by convert_benchmarks_to_tests). This is the same behavior as before, but the information is needed in iter now since that is where all the logic is that decides what runs to do is. |
I'd also support this change. This has been a pain point for me as well, which I've typically solved by using |
@Craig-Macomber oh oops sorry this fell out of my inbox, thanks for the explanation! In that case this looks good to me, thanks again for the PR! @bors: r+ |
📌 Commit 7cb2040 has been approved by |
Do not run outer setup part of benchmarks multiple times to fix issue 20142 Fix #20142 This is my first real rust code, so I expect the quality is quite bad. Please let me know in which ways it is horrible and I'll fix it. Previously the whole benchmark function was rerun many times, but with this change, only the callback passed to iter is rerun. This improves performances by saving benchmark startup time. The setup used to be called a minimum of 101 times, and now only runs once. I wasn't sure exactly what should be done for the case where iter is never called, so I left a FIXME for that: currently it does not error, and I added tests to cover that. I have left the algorithm and statistics unchanged: I don't like how the minimum number of runs is 301 (that's bad for very slow benchmarks) but I consider such changes out of scope for this fix.
☀️ Test successful - status-appveyor, status-travis |
Sorry, I know I'm late to the party, but would it be possible to make |
IMHO, I'd like to see this stabilized soon so I'd argue against exposing anything other than what's strictly needed for the sake of quick stabilization and future-proofing. |
Just to clarify, I was referring to the private field
|
With the implementation as is, the statistics won't work with a single run, so we don't even bother collecting the data. Having one single stabilized implementation of benchmarking doesn't seem very important compared to having the tests runable with the standard tools. So I wonder: Is it likely that bench might get stabilized in something like its current state? It seems to me like it might make more sense to make the test running tools a bit more extensible, and throw bench (or at least uncommon use cases/configuration options) in a crate. Alternatively, we could make bench very extensible. There are so many different things people may want from something like bench (Ex: for some benchmarks you want the min time, others mean, others mean without outliers etc, and some you want to report rates, others times, others time complexities etc.). If we simply made #[bench] take in any configuration from the environment (running as benchmark or test mainly, but maybe more in the future), and something to write progress and results to, then main loops and statistics code could be swap-able. We could stabilize one simple common case, and delegate any complex/custom policies to users/crates. I'm new to rust (this change was my first rust project), so I don't have a good sense of what direction things should head from here, but I may be able to help with an implementation. |
@Craig-Macomber Generally speaking, this type of discussion is better had on the rfcs repo or even in the forums. This very topic actually has a pretty recent thread: https://internals.rust-lang.org/t/pre-rfc-stabilize-bench-bencher-and-black-box/4565
It has been in wide use for years at this point. Folks acknowledge that there are shortcomings, but it Generally Works.
This is already done. |
Fix #20142
This is my first real rust code, so I expect the quality is quite bad. Please let me know in which ways it is horrible and I'll fix it.
Previously the whole benchmark function was rerun many times, but with this change, only the callback passed to iter is rerun. This improves performances by saving benchmark startup time. The setup used to be called a minimum of 101 times, and now only runs once.
I wasn't sure exactly what should be done for the case where iter is never called, so I left a FIXME for that: currently it does not error, and I added tests to cover that.
I have left the algorithm and statistics unchanged: I don't like how the minimum number of runs is 301 (that's bad for very slow benchmarks) but I consider such changes out of scope for this fix.