Add a benchmark suite based on criterion #26

vbfox · 2020-02-23T20:25:29Z

Motivation

I wanted to know the performance of each primitive, especially between local and public but also between the go and the rust version.

Test Plan

cargo bench give the result of the new benches and was checked to run both under Windows and Linux (Via WSL)

Mythra · 2020-02-23T22:52:44Z

Hey, Thanks for this PR, and putting in work. Unfortunately I'm going to have to reject this PR. At least for inclusion in this repository. The main reason being because it has no context inside of this repository, and can very easily lead to the root of all evil "Premature Optimization".

I'd highly recommend open sourcing this work in a repository that can compare all the different implementations (or at least the ones you care about). Something like say: BenchmarkGames does. Where you have a series of examples that are capable of running/plotting for all particular implementations that they care about comparing. I say this because even as I reject this, I cannot lie and say I'm interested to know the performance difference. (And would happily fix bugs that are performance hinderances).

As a maintainer though these benchmarks by themselves don't really mean much, because I'm not comparing them against anything. It's just some numbers on a screen with no context. If I make a function 50 nanoseconds slow for an easier maintenance what does that mean? Will they actually impact me? Will it impact people using my library? I'd argue in 99% of cases probably not. Even at the scale we were using this to hundreds of thousands of message validations a day no one has ever blinked an eye.

Now obviously some users may have incredibly strict timing requirements, but even for them these benchmarks might be useless. The benchmarks as they are written now don't show me different sizes of data (what if I'm using large payloads?, what if I'm using smaller payloads?) which may dramatically change the results. What if I'm using the more popular raw interfaces, which many more people have reported using? What if I'm not using specific validations, etc.? Context matters a whole bunch for micro-benchmarking and these are left out of these benchmarks. Not to mention I'm also missing the context of what I'm comparing against (the go benchmarks in this case, as well as the other impls).

However even if you modified this PR and included the context it wouldn't be generically applicable. I parameterized the benchmarks to run at a series of different sizes. Great, which sizes do I run it with on CI consistently that covers a decent majority? How do I know when specific number of added nanoseconds/milliseconds matter? How do I look at a PR, and say "this change is not acceptable due to perf reasons" and also say "this change does worsen perf but not enough to impact the general majority"? For example the PAE benchmark. No one should be using PAE directly ever (in fact at the next breaking change I intend to un pub mod it.) So if I make a change that worsens PAE's score but makes other improvements so the total tokan encryption time stays the same. Do I care? No. So those numbers don't mean anything to me.

In a seperate repo you can document all of these tradeoffs that you care about, and the context you do too. It's easy to say things like:

I care about encrypting $X messages a second, and it's absolutely critical I do because of $Y. I also need to use the token builder interface because of $Z, and all of my messages are guaranteed to be around size $A. If you have different requirements feel free to fork this benchmark and modify it to suit your data. Remember it can be easy to look at a microbenchmark and optimize the wrong part of the process. (e.g. optimizing your crypto library that takes 2ms, instead of your app database call that is unoptimized and takes multiple seconds to complete). You should always look at your data's picture as a whole.

Again I would like to thank you for this PR, and would be happy to assist in any way I can in your endeavours of benchmarking. (Some ideas I've lied out in the body of this text such as: comparing against the same data as the golang interface exactly, using the raw interfaces as well as the token builder, making the data size configurable, and dropping the pae benchmarks). And of course if you find any performance problems I'd be more than happy to look into them to file fixes, or to review/merge any PRs that make a performance difference. I just really want to not fall into the trap of prematurely optimizing without context.

Thanks,
Cynthia

vbfox · 2020-02-24T08:58:26Z

👍 No problem with that in general.

A few points:

On the low level interface there were tests for it (The one in the token directory test the high level interface but the ones in v2 test the low level one) I think the only one missing are the v1 ones, mainly because I don't care that much about v1 😅
I might do a repository with the bench as they interest me yes, my intention in including them in the repository was mostly to allow later to ensure that PRs or evolution don't regress on performance and that the library keep in the same ballpark as other implementations. But yes perf measurement is always tricky.
I planned to compare to the go test (that was my main target as I plan to generate tokens from rust but consume them both in Rust and go) but they require a few missing bits:
- Being able to inject time, the best way to do that is to have an _opt variant to high level methods that can hold options (because I also want it to add validators in there) but this was very hard to do due to the feature selection duplication (And how I would add a bunch more of it) and lead me to Fix tokens module when not all features are enabled #23 and Simplify features build #24... So this rabbit hole is deep :)
- A method to extract the footer (Util:: extractFooter in php, ParseFooter in go) I just got lost in the previous point rabbit hole and didn't take time to PR that XD
On pae the benchmark was mostly for me to see how much time is spend there, in the end I don't see much time being spent, and having experimented around removing all the base64 and concat copies in the v2/public signing I think there are potential memory gains but not much perf ones.

Add a benchmark suite based on criterion

8e10b1c

Mythra closed this Feb 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a benchmark suite based on criterion #26

Add a benchmark suite based on criterion #26

vbfox commented Feb 23, 2020

Mythra commented Feb 23, 2020

vbfox commented Feb 24, 2020

Add a benchmark suite based on criterion #26

Add a benchmark suite based on criterion #26

Conversation

vbfox commented Feb 23, 2020

Motivation

Test Plan

Mythra commented Feb 23, 2020

vbfox commented Feb 24, 2020