Ability to benchmark a top-level binary #2

epage · 2023-07-25T01:32:34Z

Thanks for working to improve on iai!

I'm looking to do deterministic end-to-end benchmarking. It'd be great if I could have my target binary and flags passed directly to callgrind. First off, I'm unsure if callgrind is recursive for process spawns. Even if it is, I'd still have the overhead of the "bench" function.

Joining7943 · 2023-07-26T06:35:06Z

Thanks for your interest in iai-callgrind. I'm not sure if I understand correctly. Could you give me an example with a little bit more context and how you would expect the command line of cargo bench to look like? As far as I've understood, you would like to run a binary of your project without using a library function in a benchmarking function.

First off, I'm unsure if callgrind is recursive for process spawns. Even if it is, I'd still have the overhead of the "bench" function.

Afaik, callgrind does not support recursive process spawns. You would get the event counts for the process spawn itself but not the event counts for what is happening inside the subprocess.

There's currently only a hacky way to run a binary directly with iai-callgrind, getting the event counts but without having the possibility to pass arguments to the binary. iai-callgrind uses the iai-callgrind-runner under the hood to spawn the benchmarks binary. It's possible to use the iai-callgrind-runner directly, for example

$ iai-callgrind-runner 0.4.0 src/main.rs playground --iai-bench=main target/release/playground

would run the binary target/release/playground looking for events in the src/main.rs file inside the function main of the playground module and count all the events happening under it.

Is this basically what you're looking for? Just with the possibility to pass arguments to the binary?

epage · 2023-07-26T14:28:56Z

I'm not sure if I understand correctly. Could you give me an example with a little bit more context and how you would expect the command line of cargo bench to look like? As far as I've understood, you would like to run a binary of your project without using a library function in a benchmarking function.

I would like an API within theiai-callgrind Rust bench API to setup a fixture, run a binary with arbitrary arguments, and then do clean up on the fixture.

Take my project typos. I want to do end-to-end benchmarking of the final binary. For the basic case, I would want to tell the benchmarking framework to benchmark [env!("CARGO_BIN_EXE_typos-cli"), "tests/fixture/corpus.txt"]. For a more advanced case, I would want to copy the file into a temp directory and run [env!("CARGO_BIN_EXE_typos-cli"), "tests/fixture/corpus.txt", "--write-changes"] and then have the temp directory code run.

Is this basically what you're looking for? Just with the possibility to pass arguments to the binary?

I would want

Runs via cargo bench
Set it up with a Rust API
Bench setup code
Run with arguments
Bench teardown code

Joining7943 · 2023-07-27T01:27:35Z

Thanks! Your description helped me a lot. I think all of your points should be possible and would be a great addition to the framework. However, this change would be fairly big and the final implementation may take some weeks.

Joining7943 · 2023-07-31T12:30:34Z

I'm pretty close to finish the implementation, so if you like to, it would be a great time to hear your feedback and opinion.

You can have a look at the implementation on the https://github.com/Joining7943/iai-callgrind/tree/2-ability-to-benchmark-a-top-level-binary branch. I've updated the README there and most of the new stuff is described in the https://github.com/Joining7943/iai-callgrind/tree/2-ability-to-benchmark-a-top-level-binary#binary-benchmarks section.

If you want to try it out with your crate, you can check out the 2-ability-to-benchmark-a-top-level-binary branch. You would need to build the iai-callgrind-runner and specify the path to it with the IAI_CALLGRIND_RUNNER environment variable when running your benchmark. This special treatment of the iai-callgrind-runner is only needed for development. The benchmarks in benchmark-tests/benches also show some examples how benchmarking binaries could look like. Note you can run benchmarks with RUST_LOG=debug or RUST_LOG=trace if you want to have a look behind the curtains.

epage · 2023-07-31T18:14:51Z

You would need to build the iai-callgrind-runner and specify the path to it with the IAI_CALLGRIND_RUNNER environment variable when running your benchmark. This special treatment of the iai-callgrind-runner is only needed for development.

By "only needed for development", are you saying that this is a short term hack while this is still being worked on? If not, this would be a no-go to require people install an extra binary.

Haven't given it a try yet but some quick thoughts

I'm surprised everything is going through main!. I would assume a more criterion like API.
- This is especially problematic if I want to have different benchmark cases within the same binary
I feel like the sandboxing should be left for me to decide.
- I don't see how the security requirement is any different than any of my other tests
- Even if the results become more deterministic, it imposes API constraints that make this more difficult to work with
I'm curious why #[inline(never)] is needed for setup and teardown. I thought that was for callgrind to filter the results but setup and teardown shouldn't be running in a process under callgrind, my binary should

epage · 2023-07-31T18:22:44Z

Gave it a quick go. You can track my side at crate-ci/typos#783

The first problem I ran into is that my cmd must be a string literal which doesn't allow me to use env!("CARGO_BIN_EXE_<name>") to look up the binary's path.

Joining7943 · 2023-08-01T07:22:20Z

By "only needed for development", are you saying that this is a short term hack while this is still being worked on? If not, this would be a no-go to require people install an extra binary.

iai-callgrind-runner is needed to run the benchmarks and there's no way around it. Usually, installing the runner is as easy as running cargo install. After that, you can mostly forget about the iai-callgrind-runner. It's like any other tool you would install like cargo-fuzz or similar with the difference, that you can still use cargo bench to run iai-callgrind benchmarks. I don't think this is a no-go. The IAI_CALLGRIND_RUNNER environment variable is a way to specify the binary of the development branch and makes developing iai-callgrind easier, or in your case trying out a development version of iai-callgrind possible. You can install the binary for example with

cd typos
cargo install --git 'https://github.com/Joining7943/iai-callgrind.git' --branch '2-ability-to-benchmark-a-top-level-binary' --root /tmp iai-callgrind-runner
IAI_CALLGRIND_RUNNER=/tmp/bin/iai-callgrind-runner cargo bench --package typos-cli --bench cli

to install the binary into /tmp/bin and then run the benchmark cli of the package typos-cli. The runner is necessary in order to separate the code which is needed to run the benchmarks from the benchmarked code and therefore the benchmark binary. If all code would end up in the same binary (the benchmark binary) the runner code (and every change of it) would have an effect on the benchmarked code.

However, there's room for improvement. I thought about installing the runner into a directory (maybe $HOME/.cache/iai-callgrind-runner or somewhere in the target directory) just in case it cannot be found in the PATH. Also managing different runner version that way shouldn't be a big problem.

I'm surprised everything is going through main!. I would assume a more criterion like API.
This is especially problematic if I want to have different benchmark cases within the same binary

In general, criterion benchmarks and walltime benchmarks are very different from iai_callgrind and how valgrind's callgrind works, what necessarily leads to a different api. Especially, if you expect deterministic behavior, the main function's code within the benchmark binary has to be reduced a minimum or else every change of the benchmark's setup code has an effect on the benchmarked code.

I feel like the sandboxing should be left for me to decide.
I don't see how the security requirement is any different than any of my other tests
Even if the results become more deterministic, it imposes API constraints that make this more difficult to work with

I like to have the possibility of sandboxing because it ensures better and more deterministic result. However, I can add a switch like sandboxing = false so you have the possibility to choose. I also thought about adding follow_symlinks = bool to the fixtures argument. This would allow for symlinks in a fixtures directory and instead of copying the symlink the file itself would be copied into the sandbox.

Assuming a fixtures directory in crates/typos-cli/benches/fixtures/ with a symlink to the file (like ../../typos-dict/assets/words.csv) you'd like to have available in the sandbox fixtures directory, your main! would then look like (The setup wouldn't be required)

iai_callgrind::main!(
    fixtures = "benches/fixtures", follow_symlinks = true;
    run = cmd = "typos", args = ["fixtures/words.csv"]
);

Note that your benchmark as it currently is would work in sandbox mode because the include_str! macro is expanded at compile time.

The first problem I ran into is that my cmd must be a string literal which doesn't allow me to use env!("CARGO_BIN_EXE_") to look up the binary's path.

You can simply specify the name of the binary, in your case typos, the real name of the binary as far as I've seen. Binary discovery works as follows: If the path is not absolute, at first env!(CARGO_BIN_EXE_name) is searched for. If it doesn't exist, then the binary is searched in the PATH. This behavior is described in the README and I made it the default because most users would want to benchmark a binary of the crate, so using env!(CARGO_BIN_EXE_name) is somewhat redundant. However, I think it's no problem to change cmd to take an expression instead of a literal, so you're usage would work too.

I'm curious why #[inline(never)] is needed for setup and teardown. I thought that was for callgrind to filter the results but setup and teardown shouldn't be running in a process under callgrind, my binary should

The before, after, setup and teardown functions are run under callgrind the moment they are benchmarked. The inline(never) ensures that callgrind can see these functions and toggle the event counting on when entering the function and off when leaving it. The inline(never) also ensures that these functions aren't optimized together. Or, at least the compiler opitimizations are reduced to the minimum which is possible.

I noticed that typos is failing if running it with words.csv. This let's the benchmark fail too. I really think this should stay the default behavior. However, I'll add an option so that you can set the expected exit code to something different than 0.

Joining7943 · 2023-08-03T14:16:14Z

Hi @epage. I integrated some changes. Here's what I added since the last time, I wrote:

The top-level argument sandbox, so you can switch off the sandboxing behavior
The top-level argument fixtures now takes an argument follow_symlinks
An option exit_with(ExitWith) for the opts argument Options to specify other exit codes than 0
cmd now takes an expr instead of a literal, so you can choose to either specify simply typos or env!(CARGO_BIN_EXE_typos)

With all that in place you should now be able to run benchmarks for the typos binary. The binary benchmarks feature feels pretty much complete now. However, if you have any concerns, ideas etc. I would love to hear them. I recommend reading the section about binary benchmarks in the README of the 2-ability-to-benchmark-a-top-level-binary branch.

Your cli benchmark could be written in many ways. For example

pub static CORPUS: &str = include_str!("../../typos-dict/assets/words.csv");

fn setup() {
    std::fs::write("words.csv", CORPUS);
}

iai_callgrind::main!(
    setup = setup;
    sandbox = false;
    run = cmd = "typos",
          opts = Options::default().exit_with(ExitWith::Code(2)),
          args = ["words.csv"]
);

Note that the above example would also work if sandbox = true with the benefit, that the words.csv file would be automatically cleaned up. Below an example, assuming a fixtures directory in crates/typos-cli/benches/fixtures with a symlink to ../../../typos-dict/assets/words.csv in it:

iai_callgrind::main!(
    fixtures = "benches/fixtures", follow_symlinks = true;
    run = cmd = "typos",
          opts = Options::default().exit_with(ExitWith::Code(2)),
          args = ["fixtures/words.csv"]
);

epage · 2023-08-03T15:10:13Z

For me, the biggest issue is that I'm a bit adverse to doing everything within a macro as that is a bit more obtuse to work with (everything is dependent on how well the documentation is written vs what rustdoc can extract automatically and when that fails I have to decide how much I'm willing to dig into macros) and I feel like this design would be encouraging bad practices by using a more JUnit style of global and local setup and teardown, making it likely that people will put too much in the global one.

Joining7943 · 2023-08-05T09:40:25Z

Macros might be a little bit obtuse to work, but a builder like api on the other side feels excessive. I really understand your concerns and autocompletion, formatting and rustdocs aren't working that great within macros. However, I took care that the compiler error messages are as helpful as they can be and guide you a little bit through the macro.

everything is dependent on how well the documentation is written

That's what I've done in the README. The macro is completely documented there. I'm going to update the outdated library documentation and add some missing docs of some library functions, so from a documentation point of view, there should be nothing missing.

I think the current implementation delivers all needed functionality to comfortably benchmark binaries with reliable results. I'm going to have a look into a builder like api as an alternative to the macro api but I'll postpone this to after merging this branch.

I feel like this design would be encouraging bad practices by using a more JUnit style of global and local setup and teardown, making it likely that people will put too much in the global one.

All I can do is providing the tools to create different kind of setups for different kind of binaries.

Joining7943 added the enhancement New feature or request label Jul 27, 2023

Joining7943 mentioned this issue Aug 7, 2023

Provide a builder like api for binary benchmarks in addition to the macro api #3

Closed

Joining7943 closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to benchmark a top-level binary #2

Ability to benchmark a top-level binary #2

epage commented Jul 25, 2023

Joining7943 commented Jul 26, 2023

epage commented Jul 26, 2023

Joining7943 commented Jul 27, 2023

Joining7943 commented Jul 31, 2023

epage commented Jul 31, 2023

epage commented Jul 31, 2023

Joining7943 commented Aug 1, 2023

Joining7943 commented Aug 3, 2023

epage commented Aug 3, 2023

Joining7943 commented Aug 5, 2023

Ability to benchmark a top-level binary #2

Ability to benchmark a top-level binary #2

Comments

epage commented Jul 25, 2023

Joining7943 commented Jul 26, 2023

epage commented Jul 26, 2023

Joining7943 commented Jul 27, 2023

Joining7943 commented Jul 31, 2023

epage commented Jul 31, 2023

epage commented Jul 31, 2023

Joining7943 commented Aug 1, 2023

Joining7943 commented Aug 3, 2023

epage commented Aug 3, 2023

Joining7943 commented Aug 5, 2023