Fix misleadingly high AOT Inductor dashboard performance #153060

benjaminglass1 · 2025-05-07T15:56:44Z

Fixes misleadingly high AOTInductor performance benchmark numbers in scenarios where a model updates internal parameters during torch.export.export. Since FakeTensorMode is enabled during export, all such parameters become FakeTensors, slowing down future eager-mode runs using that model substantively. This, in turn, causes misleading performance stats, where the slowness of eager-mode makes AOTInductor look very good.

An example benchmark with this issue. The equivalent cpp_wrapper benchmark run shows a 2x performance gain, not 20x.

Only two benchmarks we regularly run are affected by this, both in the TIMM set.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

pytorch-bot · 2025-05-07T15:56:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153060

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] Removal of ephemeral variants on scale-config.yml

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit 07ca959 with merge base 7243c69 ():

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4) (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, lf.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
[ FAILED ] AotInductorTest.BasicTestCpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benchmarks/dynamo/common.py

benjaminglass1 · 2025-05-13T15:00:49Z

@pytorchbot merge

pytorchmergebot · 2025-05-13T15:02:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-05-13T20:40:15Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Details for Dev Infra team

Raised by workflow job

benjaminglass1 · 2025-05-13T20:52:29Z

Test failure was an infra flake.

@pytorchbot merge -i

pytorchmergebot · 2025-05-13T20:54:21Z

Merge started

Your change will be merged while ignoring the following 2 checks: inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Fixes misleadingly high AOTInductor performance benchmark numbers in scenarios where a model updates internal parameters during `torch.export.export`. Since `FakeTensorMode` is enabled during export, all such parameters become `FakeTensor`s, slowing down future eager-mode runs using that model substantively. This, in turn, causes misleading performance stats, where the slowness of eager-mode makes `AOTInductor` look _very_ good. An [example benchmark](https://hud.pytorch.org/benchmark/timm_models/inductor_aot_inductor?dashboard=torchinductor&startTime=Wed%2C%2030%20Apr%202025%2015%3A54%3A04%20GMT&stopTime=Wed%2C%2007%20May%202025%2015%3A54%3A04%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=main&lCommit=1dd36ad2d440a4f3faf724b3a8e13925e3180c24&rBranch=main&rCommit=cc7346bf19c019255dcb4484694a75850ed74d5a&model=convit_base) with this issue. The equivalent `cpp_wrapper` benchmark run shows a 2x performance gain, not 20x. Only two benchmarks we regularly run are affected by this, both in the TIMM set. X-link: pytorch/pytorch#153060 Approved by: https://github.com/desertfire Reviewed By: jeanschmidt Differential Revision: D74729281 fbshipit-source-id: bf25cd22933d9670018d935747b0604dec4178aa

benjaminglass1 self-assigned this May 7, 2025

pytorch-bot bot added ciflow/inductor module: dynamo labels May 7, 2025

benjaminglass1 added the topic: not user facing topic category label May 7, 2025

pytorchbot added the open source label May 7, 2025

benjaminglass1 force-pushed the benjaminglass1/fixup-aot-inductor-performance-benchmarks branch 2 times, most recently from 7899276 to 4657eb9 Compare May 10, 2025 18:05

Fix skewed AOTInductor performance stats

07ca959

benjaminglass1 force-pushed the benjaminglass1/fixup-aot-inductor-performance-benchmarks branch from 4657eb9 to 07ca959 Compare May 12, 2025 20:35

benjaminglass1 requested a review from desertfire May 12, 2025 20:39

benjaminglass1 marked this pull request as ready for review May 12, 2025 20:39

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 13, 2025

desertfire reviewed May 13, 2025

View reviewed changes

benchmarks/dynamo/common.py Show resolved Hide resolved

desertfire approved these changes May 13, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 13, 2025

pytorchmergebot added the merging label May 13, 2025

pytorchmergebot removed the merging label May 13, 2025

pytorchmergebot added the merging label May 13, 2025

pytorchmergebot closed this in e8596c2 May 13, 2025

pytorchmergebot added Merged and removed merging labels May 13, 2025

github-actions bot deleted the benjaminglass1/fixup-aot-inductor-performance-benchmarks branch June 18, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix misleadingly high AOT Inductor dashboard performance #153060

Fix misleadingly high AOT Inductor dashboard performance #153060

Uh oh!

benjaminglass1 commented May 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

benjaminglass1 commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Uh oh!

benjaminglass1 commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix misleadingly high AOT Inductor dashboard performance #153060

Fix misleadingly high AOT Inductor dashboard performance #153060

Uh oh!

Conversation

benjaminglass1 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153060

❗ 1 Active SEVs

❌ 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

Uh oh!

benjaminglass1 commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Merge started

Uh oh!

pytorchmergebot commented May 13, 2025

Merge failed

Uh oh!

benjaminglass1 commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

benjaminglass1 commented May 7, 2025 •

edited

Loading

pytorch-bot bot commented May 7, 2025 •

edited

Loading