Improve cache key graph printing performance #151928

aorenste · 2025-04-22T18:24:05Z

Teach the graph printer how to allow overriding printing SymTypes (SymInt, SymFloat, SymBool) and then use that to reuse the fast SymNode printing from torch._inductor.utils.sympy_str() to make computing the cache key faster.

On my computer the repro from #151823 goes from 480s -> 80s (still terrible... but better).

Fixes #151823

Stack from ghstack (oldest at bottom):

-> Improve cache key graph printing performance #151928

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-04-22T18:24:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151928

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

❌ 1 New Failure, 5 Unrelated Failures

As of commit c9e0ad7 with merge base d57bf53 ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, ephemeral.linux.2xlarge) (gh)
test_modules_can_be_imported

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
Process completed with exit code 1.
inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
Process completed with exit code 1.
pull / linux-jammy-py3.9-gcc11-mobile-lightweight-dispatch-build / build (gh) (trunk failure)
/var/lib/jenkins/workspace/aten/src/ATen/native/BlasKernel.cpp:741:23: error: ‘bf16_dot’ is not a member of ‘at::native::blas_impl’; did you mean ‘fp16_dot’?
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
'test/dynamo/test_unittest.py::CPythonTest_Assertions::testAssertNotRegex'

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (#152916)
../aten/src/ATen/native/BlasKernel.cpp:741:23: error: ‘bf16_dot’ is not a member of ‘at::native::blas_impl’; did you mean ‘fp16_dot’?

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: fb7a44f Pull Request resolved: #151928

torch/_inductor/compile_fx.py

laithsakka

looks good just one comment about pushing the override logic inside print_readable to make it easier to replicate this on other places and reduce code repetition

Teach the graph printer how to allow overriding printing SymTypes (`SymInt`, `SymFloat`, `SymBool`) and then use that to reuse the fast SymNode printing from `torch._inductor.utils.sympy_str()` to make computing the cache key faster. On my computer the repro from #151823 goes from 480s -> 80s (still terrible... but better). Fixes #151823 cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: e14d5bb Pull Request resolved: #151928

aorenste · 2025-05-06T15:04:41Z

test_modules_can_be_imported looks unrelated (appears in several different PRs)
@pytorchbot merge -i

pytorchmergebot · 2025-05-06T15:06:43Z

Merge started

Your change will be merged while ignoring the following 5 checks: pull / linux-jammy-py3.9-gcc11-mobile-lightweight-dispatch-build / build, pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, ephemeral.linux.2xlarge), inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu), inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Improve cache key graph printing performance

e35105c

[ghstack-poisoned]

aorenste mentioned this pull request Apr 22, 2025

Add precedence to the infix printing done by sympy_str. #151920

Closed

pytorch-bot bot added ciflow/inductor module: inductor release notes: fx release notes category labels Apr 22, 2025

facebook-github-bot added the fx label Apr 22, 2025

aorenste added a commit that referenced this pull request Apr 22, 2025

Improve cache key graph printing performance

9df00a3

ghstack-source-id: fb7a44f Pull Request resolved: #151928

aorenste added the topic: not user facing topic category label Apr 22, 2025

aorenste requested a review from laithsakka April 23, 2025 15:08

aorenste marked this pull request as ready for review April 24, 2025 01:37

laithsakka reviewed Apr 29, 2025

View reviewed changes

torch/_inductor/compile_fx.py Show resolved Hide resolved

laithsakka approved these changes Apr 29, 2025

View reviewed changes

aorenste requested a review from bdhirsh as a code owner May 5, 2025 20:02

aorenste added a commit that referenced this pull request May 5, 2025

Improve cache key graph printing performance

a71dada

ghstack-source-id: e14d5bb Pull Request resolved: #151928

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 6, 2025

pytorchmergebot added the merging label May 6, 2025

pytorchmergebot added the Merged label May 6, 2025

pytorchmergebot closed this in 7a0781e May 6, 2025

pytorchmergebot removed the merging label May 6, 2025

github-actions bot deleted the gh/aorenste/224/head branch June 17, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve cache key graph printing performance #151928

Improve cache key graph printing performance #151928

Uh oh!

aorenste commented Apr 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

laithsakka left a comment

Uh oh!

aorenste commented May 6, 2025

Uh oh!

pytorchmergebot commented May 6, 2025

Uh oh!

Uh oh!

Improve cache key graph printing performance #151928

Improve cache key graph printing performance #151928

Uh oh!

Conversation

aorenste commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151928

❗ 1 Active SEVs

❌ 1 New Failure, 5 Unrelated Failures

Uh oh!

Uh oh!

laithsakka left a comment

Choose a reason for hiding this comment

Uh oh!

aorenste commented May 6, 2025

Uh oh!

pytorchmergebot commented May 6, 2025

Merge started

Uh oh!

Uh oh!

aorenste commented Apr 22, 2025 •

edited

Loading

pytorch-bot bot commented Apr 22, 2025 •

edited

Loading