Add current cuda device index to FXGraphCache key #147464

jamesjwu · 2025-02-19T17:50:24Z

Stack from ghstack (oldest at bottom):

-> Add current cuda device index to FXGraphCache key #147464

This PR intends to fix the cache related issues from #147405.
It does not handle the dynamo recompile case in process, because it does not introduce any extra guards. For FXGraphCache and AOTAutogradCache, we simply have to have the device context in the cache key.

Note that for any function that accepts tensor inputs, the device context is naturally already included in the cache key by the metadata of example inputs. However, for functions that return constants or have no arguments, the device context still needs to be in the cache key.

A more robust fix for this would be to have inductor generate device guards that are dynamic, instead of specialized. This would also help us share more cache artifacts.

I've added unit tests for FXGraphCache and AOTAutogradCache, both of which would fail without this change.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D69875939

[ghstack-poisoned]

pytorch-bot · 2025-02-19T17:50:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147464

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2adfd4d with merge base f63db62 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

bdhirsh · 2025-02-19T20:08:34Z

torch/_inductor/codecache.py

+        # This device index is usually already encoded by the device of the inputs
+        # but fx graphs don't necessarily have tensor inputs
+        if torch.cuda.is_available():
+            self.default_cuda_device_index = torch.cuda.current_device()


partially a PSA / question for @albanD - to what extent do you expect people to be using torch.accelerator.current_device() + PT2 today (and I guess... changing their device index from run to run)? James and I talked about it offline a bit - we'll need to do something similar for other accelerators in the long run to avoid accelerator-specific warm cache problems

Yes you should use torch.accelerator instead of cuda for these things. They lead to the same thing for the cuda device but you also get rocm/mtia/xpu support for free.

Is it guaranteed that:

if torch.cuda.is_available(), then torch.accelerator.is_available()?
And that torch.cuda.current_device() == torch.accelerator.current_device?

If so happy to change

Seems like it works in my tests, will run with it!

bdhirsh

sgtm! Although we should fix the dynamo guard issue too :)

[ghstack-poisoned]

jamesjwu · 2025-02-19T21:20:30Z

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

jamesjwu · 2025-02-19T21:40:48Z

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This PR intends to fix the cache related issues from #147405. It does *not* handle the dynamo recompile case in process, because it does not introduce any extra guards. For FXGraphCache and AOTAutogradCache, we simply have to have the device context in the cache key. Note that for any function that accepts tensor inputs, the device context is naturally already included in the cache key by the metadata of example inputs. However, for functions that return constants or have no arguments, the device context still needs to be in the cache key. A more robust fix for this would be to have inductor generate device guards that are dynamic, instead of specialized. This would also help us share more cache artifacts. I've added unit tests for FXGraphCache and AOTAutogradCache, both of which would fail without this change. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D69875939](https://our.internmc.facebook.com/intern/diff/D69875939) [ghstack-poisoned]

ghstack-source-id: ec82e6e Pull Request resolved: #147464

jamesjwu · 2025-02-20T03:41:51Z

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jamesjwu · 2025-02-20T03:41:55Z

Rebase to rerun tests

facebook-github-bot · 2025-02-20T12:30:51Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-02-20T12:32:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

68b930c

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Feb 19, 2025

Update

e429a15

[ghstack-poisoned]

jamesjwu added the topic: not user facing topic category label Feb 19, 2025

jamesjwu requested review from eellison, bdhirsh, oulgen and anijain2305 February 19, 2025 19:25

bdhirsh reviewed Feb 19, 2025

View reviewed changes

bdhirsh approved these changes Feb 19, 2025

View reviewed changes

anijain2305 approved these changes Feb 19, 2025

View reviewed changes

Update

d716b91

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 19, 2025

Update

d8a679f

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Feb 20, 2025

Add current cuda device index to FXGraphCache key

4b3040c

ghstack-source-id: ec82e6e Pull Request resolved: #147464

pytorchmergebot added the merging label Feb 20, 2025

pytorchmergebot added the Merged label Feb 20, 2025

pytorchmergebot closed this in 574371d Feb 20, 2025

pytorchmergebot removed the merging label Feb 20, 2025

github-actions bot deleted the gh/jamesjwu/110/head branch March 27, 2025 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add current cuda device index to FXGraphCache key #147464

Add current cuda device index to FXGraphCache key #147464

Uh oh!

jamesjwu commented Feb 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

Uh oh!

bdhirsh Feb 19, 2025 •

edited

Loading

Uh oh!

albanD Feb 19, 2025

Uh oh!

jamesjwu Feb 19, 2025

Uh oh!

jamesjwu Feb 19, 2025

Uh oh!

bdhirsh left a comment

Uh oh!

jamesjwu commented Feb 19, 2025

Uh oh!

jamesjwu commented Feb 19, 2025

Uh oh!

jamesjwu commented Feb 20, 2025

Uh oh!

jamesjwu commented Feb 20, 2025

Uh oh!

facebook-github-bot commented Feb 20, 2025

Uh oh!

pytorchmergebot commented Feb 20, 2025

Uh oh!

Uh oh!

Add current cuda device index to FXGraphCache key #147464

Add current cuda device index to FXGraphCache key #147464

Uh oh!

Conversation

jamesjwu commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147464

✅ No Failures

Uh oh!

bdhirsh Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

jamesjwu Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Feb 19, 2025

Uh oh!

jamesjwu commented Feb 19, 2025

Uh oh!

jamesjwu commented Feb 20, 2025

Uh oh!

jamesjwu commented Feb 20, 2025

Uh oh!

facebook-github-bot commented Feb 20, 2025

Uh oh!

pytorchmergebot commented Feb 20, 2025

Merge started

Uh oh!

Uh oh!

jamesjwu commented Feb 19, 2025 •

edited

Loading

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

bdhirsh Feb 19, 2025 •

edited

Loading