-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Benchmark overhaul #41408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark overhaul #41408
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
McPatate
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw to disable the associated CI workflows while I rework them later:
on:
workflow_dispatch:and you can delete the rest of the triggers
A few naming comments overall, but minor stuff I believe, feel free to ignore, gj 👌🏻
c7e15c3 to
d2c89aa
Compare
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Big refactor, still classes to move around and script to re-complexify * Move to streamer, isolate benches, propagate num tokens * Some refacto * Added compile mode to name * Re-order * Move to dt_tokens * Better format * Fix and disable use_cache by default * Fixed compile and SDPA backend default * Refactor results format * Added default compile mode * Always use cache * Fixed cache and added flex * Plan for missing modules * Experiments: no cg and shuffle * Disable compile for FA * Remove wall time, add sweep mode, get git commit * Review compliance, start * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Update benchmark_v2/framework/benchmark_runner.py Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Disable workflow * Pretty print * Added some pretty names to have pretty logs * Review n2 compliance (end?) * Style and end of PR --------- Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Big refactor, still classes to move around and script to re-complexify * Move to streamer, isolate benches, propagate num tokens * Some refacto * Added compile mode to name * Re-order * Move to dt_tokens * Better format * Fix and disable use_cache by default * Fixed compile and SDPA backend default * Refactor results format * Added default compile mode * Always use cache * Fixed cache and added flex * Plan for missing modules * Experiments: no cg and shuffle * Disable compile for FA * Remove wall time, add sweep mode, get git commit * Review compliance, start * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Update benchmark_v2/framework/benchmark_runner.py Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Disable workflow * Pretty print * Added some pretty names to have pretty logs * Review n2 compliance (end?) * Style and end of PR --------- Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
This PR overhauls the benchmarking suite that is included in transformers.
The benchmarking suite is now based around three main components:
BenchmarkingConfigis a dataclass-like object which contains everything needed to reproduce a benchmark on the same machine: input length, generation length, whether to usekernelsorcompile, attention implementation, etc. (subject to name change)BenchmarkRunneris the class that runs the benchmarks defined by the configs, with a given number of measurement iterations, warmup iterations, and a model-id. The runner takes care of setting up the runs in a way that ensures no run interacts with the downstream ones: the model is reloaded, the cache is emptied and the GPU memory is flushed. It also saves the results, the config, and any additional metadata needed to reproduce the benchmark, like hardware information and package versions.For now, the new benchmarking suite replaces the
benchmark_v2part oftransformersbut it could also overwrite thebenchmark(v1) part. It would be good to make that decision in this PR. And update the CI workflows that rely on the currentbenchmark_v2(putting the PR in draft mode until then).An example of how to use the new benchmarking suite can be found in
run_benchmarks.py.The format of the results file can (and may be bound to) change as we develop tools to analyze them.
If there is a metric you want to see measured in
transformers, please leave a comment before this is merged 🙂