-
Notifications
You must be signed in to change notification settings - Fork 13
Add a benchmark suite based on criterion #26
Conversation
Hey, Thanks for this PR, and putting in work. Unfortunately I'm going to have to reject this PR. At least for inclusion in this repository. The main reason being because it has no context inside of this repository, and can very easily lead to the root of all evil "Premature Optimization". I'd highly recommend open sourcing this work in a repository that can compare all the different implementations (or at least the ones you care about). Something like say: BenchmarkGames does. Where you have a series of examples that are capable of running/plotting for all particular implementations that they care about comparing. I say this because even as I reject this, I cannot lie and say I'm interested to know the performance difference. (And would happily fix bugs that are performance hinderances). As a maintainer though these benchmarks by themselves don't really mean much, because I'm not comparing them against anything. It's just some numbers on a screen with no context. If I make a function 50 nanoseconds slow for an easier maintenance what does that mean? Will they actually impact me? Will it impact people using my library? I'd argue in 99% of cases probably not. Even at the scale we were using this to hundreds of thousands of message validations a day no one has ever blinked an eye. Now obviously some users may have incredibly strict timing requirements, but even for them these benchmarks might be useless. The benchmarks as they are written now don't show me different sizes of data (what if I'm using large payloads?, what if I'm using smaller payloads?) which may dramatically change the results. What if I'm using the more popular raw interfaces, which many more people have reported using? What if I'm not using specific validations, etc.? Context matters a whole bunch for micro-benchmarking and these are left out of these benchmarks. Not to mention I'm also missing the context of what I'm comparing against (the go benchmarks in this case, as well as the other impls). However even if you modified this PR and included the context it wouldn't be generically applicable. I parameterized the benchmarks to run at a series of different sizes. Great, which sizes do I run it with on CI consistently that covers a decent majority? How do I know when specific number of added nanoseconds/milliseconds matter? How do I look at a PR, and say "this change is not acceptable due to perf reasons" and also say "this change does worsen perf but not enough to impact the general majority"? For example the PAE benchmark. No one should be using PAE directly ever (in fact at the next breaking change I intend to un In a seperate repo you can document all of these tradeoffs that you care about, and the context you do too. It's easy to say things like:
Again I would like to thank you for this PR, and would be happy to assist in any way I can in your endeavours of benchmarking. (Some ideas I've lied out in the body of this text such as: comparing against the same data as the golang interface exactly, using the raw interfaces as well as the token builder, making the data size configurable, and dropping the pae benchmarks). And of course if you find any performance problems I'd be more than happy to look into them to file fixes, or to review/merge any PRs that make a performance difference. I just really want to not fall into the trap of prematurely optimizing without context. Thanks, |
👍 No problem with that in general. A few points:
|
Motivation
I wanted to know the performance of each primitive, especially between local and public but also between the go and the rust version.
Test Plan
cargo bench
give the result of the new benches and was checked to run both under Windows and Linux (Via WSL)