Skip to content

Conversation

@remi-or
Copy link
Collaborator

@remi-or remi-or commented Oct 7, 2025

This PR overhauls the benchmarking suite that is included in transformers.
The benchmarking suite is now based around three main components:

  • BenchmarkingConfig is a dataclass-like object which contains everything needed to reproduce a benchmark on the same machine: input length, generation length, whether to use kernels or compile, attention implementation, etc. (subject to name change)
  • BenchmarkRunner is the class that runs the benchmarks defined by the configs, with a given number of measurement iterations, warmup iterations, and a model-id. The runner takes care of setting up the runs in a way that ensures no run interacts with the downstream ones: the model is reloaded, the cache is emptied and the GPU memory is flushed. It also saves the results, the config, and any additional metadata needed to reproduce the benchmark, like hardware information and package versions.
  • The created results files, which contain enough informations to induces (to my knowledge) most of the metrics used to evaluate a model: e2e_atency, tpot, ttft, even inter-token latency. Results also include a sample of what has been generated, which is useful to check if it was gibberish. The results files are in json format and are made to be easily created from the dataclass-like objects and vice versa.

For now, the new benchmarking suite replaces the benchmark_v2 part of transformers but it could also overwrite the benchmark (v1) part. It would be good to make that decision in this PR. And update the CI workflows that rely on the current benchmark_v2 (putting the PR in draft mode until then).
An example of how to use the new benchmarking suite can be found in run_benchmarks.py.

The format of the results file can (and may be bound to) change as we develop tools to analyze them.
If there is a metric you want to see measured in transformers, please leave a comment before this is merged 🙂

@remi-or remi-or self-assigned this Oct 7, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw to disable the associated CI workflows while I rework them later:

on:
  workflow_dispatch:

and you can delete the rest of the triggers

A few naming comments overall, but minor stuff I believe, feel free to ignore, gj 👌🏻

remi-or and others added 5 commits October 14, 2025 13:28
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
@remi-or remi-or marked this pull request as ready for review October 14, 2025 13:42
@remi-or remi-or merged commit 94df0e6 into huggingface:main Oct 14, 2025
13 checks passed
i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 15, 2025
* Big refactor, still classes to move around and script to re-complexify

* Move to streamer, isolate benches, propagate num tokens

* Some refacto

* Added compile mode to name

* Re-order

* Move to dt_tokens

* Better format

* Fix and disable use_cache by default

* Fixed compile and SDPA backend default

* Refactor results format

* Added default compile mode

* Always use cache

* Fixed cache and added flex

* Plan for missing modules

* Experiments: no cg and shuffle

* Disable compile for FA

* Remove wall time, add sweep mode, get git commit

* Review compliance, start

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Update benchmark_v2/framework/benchmark_runner.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Disable workflow

* Pretty print

* Added some pretty names to have pretty logs

* Review n2 compliance (end?)

* Style and end of PR

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* Big refactor, still classes to move around and script to re-complexify

* Move to streamer, isolate benches, propagate num tokens

* Some refacto

* Added compile mode to name

* Re-order

* Move to dt_tokens

* Better format

* Fix and disable use_cache by default

* Fixed compile and SDPA backend default

* Refactor results format

* Added default compile mode

* Always use cache

* Fixed cache and added flex

* Plan for missing modules

* Experiments: no cg and shuffle

* Disable compile for FA

* Remove wall time, add sweep mode, get git commit

* Review compliance, start

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Update benchmark_v2/framework/benchmark_runner.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Disable workflow

* Pretty print

* Added some pretty names to have pretty logs

* Review n2 compliance (end?)

* Style and end of PR

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants