Add debug backend that applies CrossRefFakeMode, use in compiler bisector #138651

eellison · 2024-10-22T22:30:25Z

Stack from ghstack (oldest at bottom):

-> Add debug backend that applies CrossRefFakeMode, use in compiler bisector #138651

I was debugging an internal ne divergence for a while that ended up being because of a bad meta. I added an explicit a config option and an explicit backend aot_eager_decomp_partition_crossref to enable the FakeCrossRefMode when running the graph. I added an explicit backend bc I suspect it will be useful for internal models but I'm also happy to leave as config option.

It will only test ops that have meta to avoid memory overhead of hitting fallback path and running in eager.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec

…ctor [ghstack-poisoned]

pytorch-bot · 2024-10-22T22:30:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138651

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0a106eb with merge base 2e48788 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ctor ghstack-source-id: d6a5d0b Pull Request resolved: #138651

zou3519

I don't have thoughts on backend vs config, maybe Brian does. It seems easy enough to switch or to add both, so approving.

zou3519 · 2024-10-24T17:05:02Z

torch/_subclasses/fake_utils.py

        kwargs = kwargs or {}

        fake_r = None
+        breakpoint()


don't think this is supposed to be here? :P

zou3519 · 2024-10-24T17:05:47Z

torch/_functorch/config.py


+
+# Run aot eager decomp partition with CrossRefFakeMode
+fake_tensor_crossref = False


to check, your PR made this both a config option and a backend?

that's correct

…mpiler bisector" I was debugging an internal ne divergence for a while that ended up being because of a bad meta. I added an explicit a config option and an explicit backend `aot_eager_decomp_partition_crossref` to enable the FakeCrossRefMode when running the graph. I added an explicit backend bc I suspect it will be useful for internal models but I'm also happy to leave as config option. It will only test ops that have meta to avoid memory overhead of hitting fallback path and running in eager. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

…ctor ghstack-source-id: b354040 Pull Request resolved: #138651

eellison · 2024-10-24T17:26:22Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T17:28:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

bdhirsh

sgtm, agreed that piping configs through internal can be a pain and an extra backend seems harmless

pytorchmergebot · 2024-10-24T23:27:18Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eellison · 2024-10-24T23:28:38Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T23:30:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-25T05:28:58Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eellison · 2024-10-25T14:43:03Z

@pytorchbot merge

pytorchmergebot · 2024-10-25T14:44:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eellison · 2024-10-25T15:56:23Z

@pytorchbot merge -f

pytorch-bot · 2024-10-25T15:56:26Z

❌ 🤖 pytorchbot command failed:

@pytorchbot merge: error: argument -f/--force: expected one argument

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Try @pytorchbot --help for more info.

eellison · 2024-10-25T15:56:37Z

@pytorchbot merge -f "merge taking forever"

pytorchmergebot · 2024-10-25T15:56:56Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-10-25T15:58:22Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add debug backend that applies CrossRefFakeMode, use in compiler bise…

dc7b061

…ctor [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Oct 22, 2024

eellison added a commit that referenced this pull request Oct 22, 2024

Add debug backend that applies CrossRefFakeMode, use in compiler bise…

1fdaa21

…ctor ghstack-source-id: d6a5d0b Pull Request resolved: #138651

eellison requested review from bdhirsh and zou3519 October 22, 2024 22:35

eellison added the release notes: dynamo label Oct 22, 2024

zou3519 approved these changes Oct 24, 2024

View reviewed changes

zou3519 reviewed Oct 24, 2024

View reviewed changes

eellison added a commit that referenced this pull request Oct 24, 2024

Add debug backend that applies CrossRefFakeMode, use in compiler bise…

ef5a13a

…ctor ghstack-source-id: b354040 Pull Request resolved: #138651

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2024

pytorchmergebot added the merging label Oct 24, 2024

bdhirsh approved these changes Oct 24, 2024

View reviewed changes

pytorchmergebot closed this in fe18a22 Oct 25, 2024

pytorchmergebot added Merged and removed merging labels Oct 25, 2024

github-actions bot deleted the gh/eellison/707/head branch November 25, 2024 02:10



		# Run aot eager decomp partition with CrossRefFakeMode
		fake_tensor_crossref = False

Add debug backend that applies CrossRefFakeMode, use in compiler bisector #138651

Add debug backend that applies CrossRefFakeMode, use in compiler bisector #138651

Uh oh!

Conversation

eellison commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138651

✅ No Failures

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

eellison commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Oct 24, 2024

Uh oh!

eellison commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 25, 2024

Uh oh!

eellison commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge started

Uh oh!

eellison commented Oct 25, 2024

Uh oh!

pytorch-bot bot commented Oct 25, 2024

Uh oh!

eellison commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eellison commented Oct 22, 2024 •

edited

Loading

pytorch-bot bot commented Oct 22, 2024 •

edited

Loading