Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DDP docs for Dynamo/DDPOptimizer #89096

Closed
wants to merge 15 commits into from

Conversation

wconstab
Copy link
Contributor

@wconstab wconstab commented Nov 15, 2022

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 15, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89096

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit 55d73a6:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Nov 15, 2022
ghstack-source-id: 558da2a190769ecb6d99dd7315fba37814961750
Pull Request resolved: #89096
.. code::

ddp_model = DDP(model, device_ids=[rank])
ddp_model = torch.compile(ddp_model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't merge this quite yet since won't exist until sometime next week, in the meantime you can can use the optimize API if you'd rather merge this now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no rush, i can just land it once its ready. keep me posted.

------------------------

DDP's performance advantage comes from overlapping allreduce collectives with computations during backwards.
AotAutograd prevents this overlap when used with TorchDynamo for compiling a whole forward and whole backward graph,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine the DDP audience may not know what AotAutograd is, I'd rather expanding on this a bit more

Maybe a picture would help make things clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to duplicate the picture/explanation here, or would a link out to @davidberard98's blog suffice? He explains it well and has pictures

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. link is a paragraph below, but i could move it up if you think it would help.

wconstab added a commit that referenced this pull request Nov 16, 2022
ghstack-source-id: b4d0a1abeb03710aa477781ce5675513b495f3ab
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 16, 2022
ghstack-source-id: 716a9055db202f30ff9f2545ede1148092a9f07d
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 22, 2022
ghstack-source-id: a84797334268424aaf311e771bed627400a8e237
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 28, 2022
ghstack-source-id: 359655fac3984a341cb9703191268b20d34a1252
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 28, 2022
ghstack-source-id: 2c34285b13781a21b886cd633706c129a135c8c2
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: a8e4fde4a0db66e245534a7c8f31fd498141d5d9
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: bd7a4b3748d21e55bf82401a74ac8398794007e2
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 464ccb5944f000297df63655e9135e3bad774b79
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 43a8e5795cd8c5d6fb6f5702080a9cccc2b6b1b5
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 6c18a6c7c4d5c517302e57cb2ffb5a46d60f8696
Pull Request resolved: #89096
wconstab added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 88cc651f705d7c6f60fc6d245a0629f8e99259f8
Pull Request resolved: #89096
@wconstab wconstab added the release notes: distributed (ddp) release notes category label Nov 29, 2022
@wconstab
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 4 additional jobs have failed, first few of them are: windows-binary-libtorch-debug ,windows-binary-libtorch-debug / libtorch-cpu-shared-with-deps-debug-test ,trunk ,trunk / win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)

Details for Dev Infra team Raised by workflow job

@wconstab
Copy link
Contributor Author

@pytorchbot merge -f "unrelated CI fail"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
@facebook-github-bot facebook-github-bot deleted the gh/wconstab/38/head branch June 8, 2023 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (ddp) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants