Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving and extending benchmarks #103

Closed
5 tasks
bytesnake opened this issue Mar 19, 2021 · 1 comment
Closed
5 tasks

Improving and extending benchmarks #103

bytesnake opened this issue Mar 19, 2021 · 1 comment
Labels
infrastructure General tasks effecting all implementations

Comments

@bytesnake
Copy link
Member

bytesnake commented Mar 19, 2021

One area where we are lacking right now is the benchmarking coverage. I would like to improve that in the coming weeks.

Infrastructure for benchmarking

Benchmarks are an essential part of linfa. They should give feedback for contributors on their implementations and users confidence that we're doing good work. In order to automate the process we have to employ an CI system which creates a benchmark report on (a) PR (b) commits to master branch. This is difficult with wall-clock benchmarks (aka criterio.rs) but possible with valgrind.

  • use iai for benchmarking
  • add a workflow executing the benchmark on PR/commits to master and create reports in JSON format
  • build a script parsing reports and posting it as comments to PR (see here)
  • add a page to the website which displays reports in a human-readable way
  • (pro) use polynomial regression to find influence of predictors (e.g. #weights, #features, #samples, etc.) to targets (e.g. L1 cache misses, cycles etc.) and post the algorithmic complexity as well
@relf relf mentioned this issue Mar 19, 2021
3 tasks
@bytesnake bytesnake added the infrastructure General tasks effecting all implementations label Jul 22, 2021
@YuhanLiin YuhanLiin mentioned this issue Jun 15, 2022
24 tasks
@YuhanLiin
Copy link
Collaborator

While Iai is more consistent, it also hasn't been updated in almost 2 years and it also can't exclude setup code from the benchmarks. I'm also not sure how valid instruction-count benchmarks would be for multithread code. I believe we should stick with Criterion for future benchmarks. For most changes we can just do benchmark comparisons manually. For CI usage I'd rather set up a dedicated benchmarking machine like rustc-perf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure General tasks effecting all implementations
Projects
None yet
Development

No branches or pull requests

2 participants