Add Flux benchmark #2654

StrongerXi · 2025-11-19T21:33:26Z

This adds a benchmark for the Flux image generation pipeline. Specifically, it only benchmarks the diffusion transformer (and omits the text encoder and vae, which don't take up much time for the e2e generation in Flux).

Needs pytorch/pytorch#168176 to run in pytorch repo:

python ./benchmarks/dynamo/torchbench.py --accuracy --inference --backend=inductor --only flux
python ./benchmarks/dynamo/torchbench.py --performance --inference --backend=inductor --only flux

This adds a benchmark for the Flux image generation pipeline. Specifically, it only benchmarks the diffusion transformer (and omits the text encoder and vae, which don't take up much time for the e2e generation in Flux). Needs pytorch/pytorch#168176 to run in pytorch repo: ``` python ./benchmarks/dynamo/torchbench.py --accuracy --inference --backend=inductor --only flux python ./benchmarks/dynamo/torchbench.py --performance --inference --backend=inductor --only flux ```

StrongerXi · 2025-11-19T21:42:25Z

cc @sayakpaul we are trying to add some diffusers models to our nightly benchmark, this represents a first attempt using Flux. Would love to get your thoughts on

What models to add. I'm thinking focusing on category first (e.g., Flux for txt2img, Qwen Edit for txt-img2img, Wan for txt2video, ...), then add more popular models to improve coverage.
Whether there are incentives to consolidate this and the diffusers benchmarks. At a high level I think the pytorch benchmark will enable early detections of pytorch changes that break compile + diffusers (help nightly users and avoid bug congestion during release), and the diffusers benchmark will do similar thing, but for diffusers changes and users. If the benchmark infra are sufficiently different maybe it's fine to have a bit of code dup.
What exactly to benchmark -- right now we are just benchmarking the denoising transformer. Are there incentives to benchmark the txt/img encoder and vae as well? Say if the number of steps are low for distilled model variants? There are also variables like image size, which actually affects torch.compile speedup ratio, as larger size make it more compute bound e2e.

Also cc @anijain2305 @BoyuanFeng.

StrongerXi · 2025-11-19T21:43:43Z

torchbenchmark/models/flux/__init__.py

@@ -0,0 +1,72 @@
+import torch
+from torchbenchmark.tasks import COMPUTER_VISION


These 3 files are mostly an adaptation of https://github.com/pytorch/benchmark/tree/f0b2a09591de5cf276bbfa7ce06a3d35f508da84/torchbenchmark/models/stable_diffusion_unet for Flux.

StrongerXi · 2025-11-19T21:50:03Z

torchbenchmark/models/flux/metadata.yaml

@@ -0,0 +1,10 @@
+devices:
+  NVIDIA A100-SXM4-40GB:


Not sure why it's all A100-40GB in this repo.

I got about 1.3x inference speed up on A100 80G.

For more context, A100 40G was the SKU we had from AWS at the beginning when A100 was still new. Nowadays, TorchInductor benchmark is run on H100 though, so there is no incentive to migrate A100 to the 80GB version on CI anymore

So are these annotations really used anywhere.....?

Strictly speaking, https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml is still running an failing without anyone care. I think I should submit a PR to stop this, but it's compiler team's decision

StrongerXi · 2025-11-19T21:51:40Z

Got this from CI, might need some help from devinfra.

Access to model black-forest-labs/FLUX.1-dev is restricted and you are not in the authorized list. Visit https://huggingface.co/black-forest-labs/FLUX.1-dev to ask for access.

huydhn · 2025-11-20T00:55:31Z

Access to model black-forest-labs/FLUX.1-dev is restricted and you are not in the authorized list. Visit https://huggingface.co/black-forest-labs/FLUX.1-dev to ask for access.

This issue has been fixed. Another one shows up though pytorch/pytorch#167895, so let me skip the installation of stabilityai/stable-diffusion-2 on TorchBench for now

huydhn · 2025-11-20T02:03:10Z

@StrongerXi Let's see if #2656 helps

sayakpaul · 2025-11-20T03:45:58Z

What models to add. I'm thinking focusing on category first (e.g., Flux for txt2img, Qwen Edit for txt-img2img, Wan for txt2video, ...), then add more popular models to improve coverage.

Yeah, this list is perfect! Many models in the image and video gen space just come with a lot of promise and vanish quickly. Only a few stick around. The ones you mentioned are the most promising ones and have somewhat stood the "test of time". I would also add SDXL to this mix.

Whether there are incentives to consolidate this and the diffusers benchmarks. At a high level I think the pytorch benchmark will enable early detections of pytorch changes that break compile + diffusers (help nightly users and avoid bug congestion during release), and the diffusers benchmark will do similar thing, but for diffusers changes and users. If the benchmark infra are sufficiently different maybe it's fine to have a bit of code dup.

I am fine with that, I think that's a great idea! We could always test against the nightlies (both PyTorch and Diffusers) to ensure no performance regression. This (reference on AWS) is the infra on which our benchmark is run.

What exactly to benchmark -- right now we are just benchmarking the denoising transformer. Are there incentives to benchmark the txt/img encoder and vae as well? Say if the number of steps are low for distilled model variants? There are also variables like image size, which actually affects torch.compile speedup ratio, as larger size make it more compute bound e2e.

I don't think there are any incentives to benchmark the other components, TBH, as any gain in the denoiser is just a compounding factor in most cases. I think benchmarking for different batch sizes and for 2/3 different resolutions is a good idea. Also, there are multiple use cases beyond text-to-{image,video,audio} such as image-guided editing, etc. But I think it's fine to just keep it at the text-to-X level as it's easy to extrapolate to the other use cases.

Just as a note, in our benchmark, we also check for quantization backends and memory features like layerwise casting. I understand that this deviates from a core PyTorch benchmark, but these features are quite popular in the community, hence we benchmark them. Nothing to fret about, just noting.

StrongerXi requested a review from BoyuanFeng November 19, 2025 21:33

StrongerXi temporarily deployed to docker-s3-upload November 19, 2025 21:33 — with GitHub Actions Inactive

StrongerXi had a problem deploying to docker-s3-upload November 19, 2025 21:33 — with GitHub Actions Failure

meta-cla bot added the cla signed label Nov 19, 2025

StrongerXi commented Nov 19, 2025

View reviewed changes

StrongerXi mentioned this pull request Nov 19, 2025

(auto)Bump diffusers version pin #2655

Open

StrongerXi commented Nov 19, 2025

View reviewed changes

StrongerXi had a problem deploying to docker-s3-upload November 19, 2025 22:33 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Flux benchmark #2654

Add Flux benchmark #2654

Uh oh!

StrongerXi commented Nov 19, 2025

Uh oh!

StrongerXi commented Nov 19, 2025

Uh oh!

StrongerXi Nov 19, 2025

Uh oh!

StrongerXi Nov 19, 2025

Uh oh!

huydhn Nov 19, 2025 •

edited

Loading

Uh oh!

StrongerXi Nov 19, 2025

Uh oh!

huydhn Nov 19, 2025

Uh oh!

StrongerXi commented Nov 19, 2025

Uh oh!

huydhn commented Nov 20, 2025

Uh oh!

huydhn commented Nov 20, 2025

Uh oh!

sayakpaul commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,72 @@
		import torch
		from torchbenchmark.tasks import COMPUTER_VISION

Add Flux benchmark #2654

Are you sure you want to change the base?

Add Flux benchmark #2654

Uh oh!

Conversation

StrongerXi commented Nov 19, 2025

Uh oh!

StrongerXi commented Nov 19, 2025

Uh oh!

StrongerXi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

StrongerXi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StrongerXi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

StrongerXi commented Nov 19, 2025

Uh oh!

huydhn commented Nov 20, 2025

Uh oh!

huydhn commented Nov 20, 2025

Uh oh!

sayakpaul commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

huydhn Nov 19, 2025 •

edited

Loading