Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime benchmark CLI output #1463

Merged
merged 7 commits into from
Oct 17, 2022
Merged

Improve runtime benchmark CLI output #1463

merged 7 commits into from
Oct 17, 2022

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Oct 12, 2022

Benchmark results are now printed interactively and they output basic statistics of the results:
image

@nnethercote
Copy link
Contributor

I'm concerned about our terminology here. I.e. the meaning of words like "result", "measurement", "metric". It looks like this patch is changing some of these in the code. We have a glossary, I'll need to look over it carefully to see if the code matches. (And if some of the definitions in the glossary are bad, we can consider changing them.)

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 13, 2022

I agree that the terminology of the runtime benchmarks should be improved. The annoying thing is that the runtime benchmarks will have a slightly different structure than the compile time benchmarks. For example, currently there are individual benchmarks, but they are also grouped in benchmark suites (binaries), which doesn't really exist for comptime. Another thing is that we will most likely measure each runtime benchmark repeatedly (the code already does this), which again isn't done for comptime benchmarks. So maybe we will need to update some of the terminology.

I'll start with aligning this PR with the existing terms.

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 13, 2022

I updated the code and documentation to hopefully better align with the previous meanings of the terms.

@Kobzol Kobzol force-pushed the runtime-cli-progress branch 2 times, most recently from f89e283 to b3e79ca Compare October 13, 2022 21:32
Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't mentioned every place you'll need to change, e.g. there are mentions of "suite" in various comments that should be changed to "group" or "binary". And the benchmark_suite macro added in #1464 should be changed to benchmark_group.

collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
docs/glossary.md Outdated Show resolved Hide resolved
collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
collector/src/runtime/mod.rs Outdated Show resolved Hide resolved
collector/src/runtime/mod.rs Outdated Show resolved Hide resolved
@nnethercote
Copy link
Contributor

I think what I've suggested above makes sense. However, I am now wondering if we really need the "benchmark" vs "benchmark group" distinction.

I can see why some people might want, for example, to have multiple hashmap benchmarks: insertions, deletions, mixed, etc. But I also think that pushes us strongly in the direction of microbenchmarks, which I want to avoid. I'd prefer to have, for example, a single hashmap benchmark that replays a variety of insertions/deletions/operations that come from a real program. (Relatedly, for the compile-time benchmarks, at one point we had multiple ctfe benchmarks measuring different aspects, but I ended up merging them into a single benchmark.)

What do you think?

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 14, 2022

I like the new terminology, suite -> group -> benchmark makes sense.

I'm opposed to removing the groups though.

If we don't have groups, we'd need to either have a benchmark per binary (lots of unnecessary code in the repo, additional overhead for building the benchmarks, additional overhead for running the benchmark suite) or all benchmarks in a single binary (inability to choose different dependency versions or compilation flags for different benchmarks, higher latency for compilation when experimenting with the benchmarks).

It's true that if we had a binary per benchmark, this would probably discourage micro-benchmarks somewhat, but this doesn't seem like a good enforcement for this, it seems to me that it would actually discourage from adding new benchmarks altogether, if you always needed to create a new crate rather than just add a new macro and a closure in some existing benchmark.

I also don't think that it's so useful to discourage microbenchmarks in general. Definitely, it's great if we have snippets from real-world programs. But I also think that micro-benchmarking things that are crucial and ubiquitous (like sorting, hashmap ops) is worth it.

With the runtime benchmarks, we have a big advantage vs compile time benchmarks, in that we can benchmark exactly the thing that interests us. With a compile time benchmark, even if you compile hello world, you will always execute an enormous amount of code which isn't really hello world specific. Therefore it might not make sense to have 5 CTFE benchmarks, since anyway it will be hard to distinguish them, as e.g. only 10 % of the execution will be benchmark specific. But with runtime benchmarks, we can actually distinguish 5 different microbenchmarks, and if only one of them regresses, it might make it easier to investigate what happens. With a hashmap benchmark that does 5 things, it will take more time to investigate what has regressed than if we had 5 microbenchmarks where each one does only a single thing. Of course, there are also disadvantages to micro benchmarks (more noise, not applicable for real world usage), so we should avoid things like define_benchmark! { || a + b } in review.

@nnethercote
Copy link
Contributor

I'm happy to continue with groups for now, even though all the current groups only have one benchmark in them.

Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few suggestions for small changes, but I think we're very close.

collector/benchlib/src/benchmark.rs Outdated Show resolved Hide resolved
collector/benchlib/src/benchmark.rs Outdated Show resolved Hide resolved
collector/benchlib/src/benchmark.rs Outdated Show resolved Hide resolved
collector/benchlib/src/benchmark.rs Outdated Show resolved Hide resolved
collector/benchlib/src/benchmark.rs Show resolved Hide resolved
collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
collector/src/runtime/benchmark.rs Show resolved Hide resolved
collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
collector/src/runtime/benchmark.rs Outdated Show resolved Hide resolved
@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 17, 2022

I addressed the comments. I think that the only thing left is to decide on the name for run_benchmark_group.

@nnethercote
Copy link
Contributor

Good enough, IMO. Ship it!

@Kobzol Kobzol enabled auto-merge October 17, 2022 11:00
@Kobzol Kobzol merged commit d386f1e into master Oct 17, 2022
@Kobzol Kobzol deleted the runtime-cli-progress branch October 17, 2022 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants