[dynamo] fix dynamo + DTensor to work with 2d #108329

wanchaol · 2023-08-31T06:04:57Z

Stack from ghstack (oldest at bottom):

-> [dynamo] fix dynamo + DTensor to work with 2d #108329

pair debugged with @wconstab and we found some issue in both dynamo and
the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration
so that the current graph break FSDP can work with tensor parallel by moving
the torch.compile after FSDP wrapping.

cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov

This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping [ghstack-poisoned]

pytorch-bot · 2023-08-31T06:04:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108329

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2452522 with merge base e68b3ad ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping ghstack-source-id: 1922a173c9b148a8a6bfe19e820bbebd531435dd Pull Request resolved: #108329

pair debugged with wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. [ghstack-poisoned]

@wconstab

pair debugged with @wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. ghstack-source-id: 1922a173c9b148a8a6bfe19e820bbebd531435dd Pull Request resolved: #108329

pair debugged with wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

@wconstab

pair debugged with @wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. ghstack-source-id: 4c26e2721a36a92d0988044f4c4fdc7491dc6dfd Pull Request resolved: #108329

wconstab

lgtm, but definitely add the assert for kwargs, might even be easy enough to just support kwargs too

wconstab · 2023-08-31T17:05:28Z

test/distributed/_tensor/test_dtensor_compile.py

            process_group=fsdp_pg,
            device_id=self.rank,
            use_orig_params=True,
        )

+        # TODO: once aot autograd support is ready we can just use default backend
+        compiled_2d = torch.compile(fsdp_2d, backend="eager")


nit: we might also want to test inductor backend, as it does break sometimes more than eager, due to customized logic for some parts of aot-autograd

aot autograd is currently broken until @bdhirsh aot dispatch subclass PR landed #104483, will switch this test to test both aot_eager or default backend once it's there :)

wconstab · 2023-08-31T17:06:58Z

torch/_dynamo/variables/torch.py

@@ -572,6 +572,7 @@ def get_state_from_generator():
        elif is_from_local(self.value):
            # rewrite non-primitive args/kwargs to be included in the on-the-fly prim function
            # and rewrite args to have only proxyable args, then insert call_function
+            # TODO: support cases where device_mesh + placements specified as kwargs


mby good to assert kwargs is empty (or that it just contains the 1 bool you expect, etc)

Actually it's not quite hard to directly support kwargs so I added that directly

pair debugged with wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

@wconstab

pair debugged with @wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. ghstack-source-id: 4f48e003224e0d48e32b3db57923859d91b50e0e Pull Request resolved: #108329

wanchaol · 2023-08-31T18:24:47Z

@pytorchbot merge

pytorchmergebot · 2023-08-31T18:27:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

bdhirsh · 2023-08-31T18:45:04Z

Nice!

[dynamo] fix dynamo + DTensor to work with 2d

91142d0

This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping [ghstack-poisoned]

wanchaol requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, fegin, fduwjj and wz337 as code owners August 31, 2023 06:04

wanchaol requested review from kiukchung and d4l3k as code owners August 31, 2023 06:04

github-actions bot added module: dynamo ciflow/inductor labels Aug 31, 2023

wanchaol requested a review from wconstab August 31, 2023 06:07

wanchaol added release notes: distributed (dtensor) release notes category ciflow/trunk Trigger trunk jobs on your pull request labels Aug 31, 2023

Skylion007 approved these changes Aug 31, 2023

View reviewed changes

wconstab approved these changes Aug 31, 2023

View reviewed changes

pytorchmergebot added the merging label Aug 31, 2023

pytorchmergebot added Merged and removed merging labels Aug 31, 2023

pytorchmergebot closed this in a29b910 Aug 31, 2023

facebook-github-bot deleted the gh/wanchaol/355/head branch September 4, 2023 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dynamo] fix dynamo + DTensor to work with 2d #108329

[dynamo] fix dynamo + DTensor to work with 2d #108329

wanchaol commented Aug 31, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Aug 31, 2023 •

edited

wconstab left a comment

wconstab Aug 31, 2023

wanchaol Aug 31, 2023

wconstab Aug 31, 2023

wanchaol Aug 31, 2023

wanchaol commented Aug 31, 2023

pytorchmergebot commented Aug 31, 2023

bdhirsh commented Aug 31, 2023

[dynamo] fix dynamo + DTensor to work with 2d #108329

[dynamo] fix dynamo + DTensor to work with 2d #108329

Conversation

wanchaol commented Aug 31, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Aug 31, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108329

✅ No Failures

wconstab left a comment

Choose a reason for hiding this comment

wconstab Aug 31, 2023

Choose a reason for hiding this comment

wanchaol Aug 31, 2023

Choose a reason for hiding this comment

wconstab Aug 31, 2023

Choose a reason for hiding this comment

wanchaol Aug 31, 2023

Choose a reason for hiding this comment

wanchaol commented Aug 31, 2023

pytorchmergebot commented Aug 31, 2023

Merge started

bdhirsh commented Aug 31, 2023

wanchaol commented Aug 31, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Aug 31, 2023 •

edited