New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking / cargo bench #2287

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 11, 2018

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

I think the current API is pretty decent. We should probably have custom test framework support at some point (e.g. this would be great for cargo-fuzz, but given that this has been languishing with no progress for three years I think it's better to stabilize the current API so that people can actually use it, and later try to build a custom test/benchmarking plugin system.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

@Centril Yes. I think it does a bunch more things and for now I want to get the most basic stuff in stable. More functions can be added to the bencher in the future, or, ideally, we'll have a custom test framework system that we can plug Criterion into.

@Centril
Copy link
Contributor

Centril left a comment

Looks all good to me - I have a single nit =)

[reference-level-explanation]: #reference-level-explanation

The bencher reports the median value and deviation (difference between min and max).
Samples are winsorized, so extreme outliers get clamped.

This comment has been minimized.

@Centril

Centril Jan 11, 2018

Contributor

Might be a good idea to insert a link here: https://en.wikipedia.org/wiki/Winsorizing

This comment has been minimized.

@Manishearth

Manishearth Jan 11, 2018

Member

done, thanks!

@Manishearth Manishearth force-pushed the Manishearth:benchmarking branch from 00d1a55 to fe29ce5 Jan 11, 2018

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Jan 11, 2018

This RFC is written as if this is a new feature, which is a good description to have. But a "diff" from the current state of Nightly would also be good: are you proposing any change, or only stabilization of what’s already implemented today?

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

@SimonSapin oh, should have clarified. This proposes zero changes, however it stabilizes only a part of the bencher API

@alkis

This comment has been minimized.

Copy link

alkis commented Jan 11, 2018

Thanks @Manishearth for starting this up. I think adding iter_n() to bencher will get the current API to a good enough state for most tasks and everything else can be done with enough gymnastics on top: type/value parametrization, etc. See the rationale for iter_n() here.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

The intent here is to stabilize a minimum useful surface, kept minimum to maximize the probability of stabilization. I have no objection to stabilizing iter_n, but others might.

Do folks have concerns? I'll add it otherwise.

@alkis

This comment has been minimized.

Copy link

alkis commented Jan 11, 2018

I understand the need for minimal surface and I am totally onboard. iter_n cannot be implemented on top of the current API though so in that sense adding it is inline with "minimal API surface" and not adding it is "incomplete API surface" :-)

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

@alkis

This comment has been minimized.

Copy link

alkis commented Jan 11, 2018

Let me try for a compelling argument.

My thesis is that more often than not, iter_n() is what you want to use when benchmarking. If it accounts for anything, we have both APIs at Google and most benchmarks are written with iter_n() rather than iter() (in C++).

Here's a few examples from random code:

benchmarking insertion in a deque implementation

We want to benchmark deque of sizes that fit in different caches. When benchmarking insertion we need to insert enough elements to target some cache. Each call to iter() will have to insert N elements where N depends on the cache we are targeting. This means the time per iter() is not comparable across invocations. If we use iter_n() we report time per insertion which makes times comparable across caches (and gives more insights on how our deque works).

benchmarking a parser

We benchmark across different data sets. Naturally each benchmark is of different size. There is a lot more context when reporting time per byte instead of time to parse document X, Y, Z. This makes the times comparable between data sets. In addition it gives more insights: if for example the time per byte to parse document X is much larger than document Y suggests that there might be an optimization opportunity for documents that look like X.

benchmarking boolean formula optimizer

Given a boolean formula the optimizer reduces the number of clauses in it by removing redundant ones. Naturally the number of clauses in the original formula affect how much time it takes to optimize it. So again we want to benchmark using iter_n() where N is the number of clauses in the expression.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

I don't quite get why this affects the parser case, but the other two are pretty compelling. I'm not clear that these will be common enough, but I'll add it and see if others object.

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented Jan 11, 2018

Thanks @Manishearth! I would love to see this stabilized. I'd like to lend my voice to say that what's there is already useful, and stabilizing that alone would be a win. I know I have at least been productively using it for years now. I look forward to improving my workflow of course, but I think this is a step in the right direction.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

Actually, wait, iter_n seems to be an API that doesn't currently exist.

I'd rather use this RFC to stabilize the existing API with tweaks; things like iter_n can be added in further RFCs.

The reason being that if this RFC lands as is, we can immediately start the stabilization process; however for iter_n we have to go through a couple nightly cycles as well.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

However if folks think that we can immediately stabilize an iter_n (I'm not sure what the procedure is, @BurntSushi would know) I'm open to adding it to the RFC.

@BurntSushi

This comment has been minimized.

Copy link
Member

BurntSushi commented Jan 11, 2018

@Manishearth I believe you're right. Any new API generally needs to go through a couple cycles as an unstable API first. With that said, I think it might be possible to add iter_n as an unstable API and then stabilize it later via normal FCP if it's simple enough without necessarily needing to get it into this RFC.

(But I am not an expert on matters of policy, and others should check that I have this right.)


- Should stuff be in `std::test` or a partially-stabilized `libtest`?
- Should we stabilize any other `Bencher` methods (like `run_once`)?
- Stable mchine-readable output for this would be nice, but can be done in a separate RFC.

This comment has been minimized.

@udoprog

udoprog Jan 11, 2018

typo: mchine

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 11, 2018

In hopes of not accidentally prohibiting a custom benchmark/test engines I’ll write down all the components of the current benchmark engine and shortly describe potential pain points when integrating with a such future system.

  • #[bench] attribute -- would be collected and dispatched to the custom engine as per compiler flags, just like #[test]. No problems here.
  • The Bencher argument -- a potential pain point. The type comes from test::* and so cannot be swapped out easily, which the custom crates will 100% want to do. Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?
  • The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

This crate does not say whether we should stabilise the functionality from within the test crate or move it into core. I feel like moving it to core would be more appropriate, especially since most of the test crate is perma-unstable and has nothing useful other than Bencher or black_box.

@udoprog

This comment has been minimized.

Copy link

udoprog commented Jan 11, 2018

I'm glad to see this happening.

Should probably mention the move from test to std::test, or does this not count as a change?

What @nagisa just wrote is also what I'd like to see clarified in the RFC, make sure that we don't lock ourselves into a corner w.r.t. external benchers, or at least have a plan for how to integrate them into the future.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Jan 11, 2018

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I would rather design it such that it doesn't prohibit us adding custom frameworks in the future.

It's taken us more than a year to stabilize pluggable allocators; and we're stabilizing that pretty much as-is. This is despite it being something people (especially those doing integration with C/++ apps, like Firefox) really wanted. There is no such significant ask for custom test frameworks right now, and I have nearly zero confidence that such a proposal will move forward right now at a reasonable pace. Benchmarking is something folks actually use significantly, let's stabilize that first.

@Manishearth Manishearth force-pushed the Manishearth:benchmarking branch from fe29ce5 to 0d0e89c Jan 11, 2018

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

The Bencher argument -- a potential pain point.

I mostly envision custom test frameworks as being able to choose what attribute to use, and being able to have custom args. It is not necessary that #[bench]-using benchers all have the same, cross-compatible format, as long as what we currently stabilize is possible to rewrite as a custom framework in the future.

For example, for cargo-fuzz we ideally need the ability for the function to take any arg of type T: Arbitrary. Benchers will each have their own bencher arg, which may be a different bencher from this one.

Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?

I think this stabilizes the interface which custom test frameworks can be accessed through, which might be too much. Other benchers may wish to provide more fine grained control, which can't be accessed through a generic bencher.

The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

Good point. Adding to unresolved questions.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Jan 11, 2018

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I agree, but we should do it right rather than stabilize what we have. I know that the intention of this RFC is specifically to push back against this kind of thinking, and I agree with it in many cases, but not here.

I agree this is very important, which is why we shouldn't just stabilize whatever was written by some random contributors years ago and then not really touched since then.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 11, 2018

I agree, but we should do it right rather than stabilize what we have.

Sure. I disagree that doing this right implies going for custom test frameworks. IMO the problem with our "perfect is the enemy of the good" issue isn't that we try to get things right, it's that we try to get things right in the most general way possible, and that ends up stopping things in their tracks because people have neither the bandwidth nor the drive to push for huge projects which address one use case they care about and a million use cases they don't.

Custom test frameworks are a project that's comparable in size to the simple string-based proc macro stabilization, and that actually had a lot of folks who want it. And it still took forever to get that ball rolling.

If we want to get it right, we can also tweak the API, tweak the functionality, whatever. Folks mostly seem to be happy with what exists so I'm trying to keep it the same, but if there are strong reasons for there being a better API or set of functionality, I'm game. So far, there is @alkis's suggestion, which I'm considering, but that doesn't change the existing APIs, just adds one more, which I'd rather table to another RFC. But if we end up changing other things anyway I'd probably want to add it to this RFC.


Basically, I'm not convinced that we must go for custom test frameworks to get this right. Even if we do, I'd want us to be providing some kind of bencher out of the box, so I feel like stabilizing a decent, minimal, bencher; and making sure it can be made to work with the custom test framework support when it happens (as a test plugin), is the best.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 19, 2018

Added a better API for bencher, moved black_box to std::mem, and added iter_n

@Manishearth Manishearth force-pushed the Manishearth:benchmarking branch from 0d0e89c to ea4543f Jan 19, 2018

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Jan 19, 2018

@Manishearth we discussed this a bit on irc.

This is how currently test::black_box is defined:

/// A function that is opaque to the optimizer, to allow benchmarks to
/// pretend to use outputs to assist in avoiding dead-code
/// elimination.
///
/// This function is a no-op, and does not even read from `dummy`.
pub fn black_box<T>(dummy: T) -> T {
    // we need to "use" the argument in some way LLVM can't
    // introspect.
    unsafe { asm!("" : : "r"(&dummy)) }
    dummy
}

I think we should make its definition, independently of the implementation, a bit more precise. What does exactly mean to be "opaque to the optimizer"?

For example, Google's Benchmark framework, define's two functions black_box (they call black_box DoNotOptimize though, I'll use black box here to avoid confusion) and clobber, and this is how they would be defined in Rust (I ported them from here, maybe I screwed up):

/// Read/write barrier; flushes pending writes to global memory.
///
/// Memory managed by block scope objects must be "escaped" using 
/// `black_box` before it can be clobbered.
#[inline(always)]
fn clobber_memory() { 
  unsafe { asm!("" : : : "memory" : "volatile") };
}

/// Prevents a value or the result of an expression from being optimized away by 
/// the compiler adding as little overhead as possible.
///
/// It does not prevent optimizations on the expression generating `x` in any way: 
/// the expression might be removed entirely when the result is already known. It 
/// forces, however, the result of `x` to be stored in either memory or a register.
#[inline(always)]
fn black_box<T>(x: &T) {
    unsafe { 
        asm!("" // template
          : // output operands
          : "r"(x) // input operands
          // r: any general purpose register
          : "memory"    // clobbers all memory
          : "volatile"  // has side-effects
        );
   }
}

This is how you test Vec::push_back with test::black_box vs google::black_box:

pub fn test_black_box() {
    let mut v = Vec::with_capacity(1);
    v.push(42);
    test::black_box(v);
}

pub fn google_black_box() {
    let mut v = Vec::with_capacity(1);
    unsafe { black_box(&*v.as_ptr()) }; // Data to be clobbered
    v.push(42);
    clobber_memory(); // Force 42 to be written to memory.
}

If someone wants to play with these two and our current test::black_box, they can start in this rust.godbolt.org link and check the generated assembly. Looking at the assembly generated in our test::black_box case, it does not look like 42 is written to memory (but I am no expert, could somebody check?). The ones from Google Benchmark look like they generate slightly less instructions 1-2 less. Their docs are here (in the preventing optimization section):

The main differences between test::black_box and google's black_box are that google's:

  • is volatile (has side-effects),
  • clobbers all memory,
  • the function call is intended to disappear (is #[inline(always)]).
  • the argument is not moved into the function.

I think that clobber is a generally useful addition to mem.

In any case we should define exactly what test::black_box does, and depending on the outcome of that, we might need to add clobber as well, and a "How do we teach this" section explaining how to use them together in benchmarks to create meaningful ones.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 19, 2018

@gnzlbg this sounds good. could you open a PR to my RFC that changes the guide-level section with more examples and adds impl details to the reference-level explanation? Let's see what others think of it.

Note that black_box currently returns the value passed to it as well, which may be useful to allow so that you can place an "optimization barrier" between two operations. For example, it lets you ensure that 400 is not constant-optimized in pow(100, black_box(400)). This could also be achieved by giving it an &mut reference to the thing, but that leads to worse-looking code IMO.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jan 19, 2018

Opened a supplementary PR with @gnzlbg's suggestions at Manishearth#1

I'll merge it in if folks seem to agree on it.

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented on text/0000-benchmarking.md in 10dde7e Jan 19, 2018

With the new API you should remove the semicolon here.

This comment has been minimized.

Copy link
Member

Manishearth replied Jan 19, 2018

fixed, thanks

@Manishearth Manishearth force-pushed the Manishearth:benchmarking branch from ea4543f to b37b88f Jan 19, 2018

@clarcharr

This comment has been minimized.

Copy link
Contributor

clarcharr commented Jan 24, 2018

Thinking again, perhaps iter and iter_n should be bench and bench_exact.

Additionally, I agree that startup/teardown should be excluded from the benchmark. Perhaps a trait could be used in place of a closure, like so:

trait BuidBench {
    type Bench: Bench;
    fn build_bench(&self) -> Self::Bench;
}
trait Bench {
    fn run(&mut self);
}

impl<F, T> BuildBench for F where F: Fn() -> T {
    type Bench = &Self;
    fn build_bench(&self) -> Bench {
        self
    }
}
impl<F, T> Bench for &F where F: Fn() -> T {
    fn run(&mut self) {
        std::mem::black_box(self())
    }
}

This would allow using closures like the existing implementation, but allow excluding parts from the benchmark by calling build_bench and drop outside the benchmarked part.

@hdevalence

This comment has been minimized.

Copy link

hdevalence commented Jan 31, 2018

How does this relate to #1484 ? There's some discussion there about reasons to provide an optimization barrier that don't involve benchmarking.

@Manishearth Manishearth referenced this pull request Feb 1, 2018

Open

black_box improvements #1

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Feb 1, 2018

We discussed this at the dev-tools meeting today. The consensus was that we prefer #2318 as an alternative - to be clear, that we would not stabilise bench in any form and remove support from the compiler, providing a 'custom test framework' which does what bench does today. There are probably some improvements that could be made too (and discussed in this thread), but they should probably be iterated on rather than an RFC now. The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

@rfcbot fcp close

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Feb 1, 2018

Team member @nrc has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Feb 2, 2018

@nrc

The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

We already have an issue open for that here: #1484

I've started writing an RFC for it (based on the pull-request to this one). Will post it in internals "soon" and link it here and on that issue, so that those interesting in evolving it can give feedback and we can iterate on it before submitting it.

@fitzgen

This comment has been minimized.

Copy link
Member

fitzgen commented Feb 2, 2018

@rfcbot reviewed

4 similar comments
@japaric

This comment has been minimized.

Copy link
Member

japaric commented Feb 3, 2018

@rfcbot reviewed

@wycats

This comment has been minimized.

Copy link
Contributor

wycats commented Feb 5, 2018

@rfcbot reviewed

@michaelwoerister

This comment has been minimized.

Copy link

michaelwoerister commented Feb 12, 2018

@rfcbot reviewed

@killercup

This comment has been minimized.

Copy link
Member

killercup commented Mar 5, 2018

@rfcbot reviewed

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 5, 2018

Closing, we're going pretty strongly for custom test frameworks.

@gnzlbg care to open that RFC for test::black_box?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 6, 2018

@Manishearth ok, the discussion in the tracking issue is frozen so an RFC for those are the next step.

@frewsxcv

This comment has been minimized.

Copy link
Member

frewsxcv commented Jun 11, 2018

for anyone who (like myself) discovered this pull request and wants a link to the black_box rfc: #2360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment