Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PT-D][Tensor parallelism] Add documentations for TP #94421

Closed
wants to merge 4 commits into from

Conversation

fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Feb 8, 2023

Stack from ghstack (oldest at bottom):

This is far from completed and we will definitely polish it down the road.

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94421

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8d99683:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fduwjj added a commit that referenced this pull request Feb 8, 2023
ghstack-source-id: 06caff66222400a86a803e30ca9d09afe24a6aba
Pull Request resolved: #94421
@fduwjj fduwjj marked this pull request as draft February 8, 2023 18:24
@fduwjj fduwjj added the release notes: distributed (dtensor) release notes category label Feb 8, 2023
@fduwjj fduwjj marked this pull request as ready for review February 8, 2023 18:26
@fduwjj fduwjj added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 8, 2023
@fduwjj fduwjj requested a review from wanchaol February 8, 2023 18:27
@fduwjj fduwjj marked this pull request as draft February 8, 2023 18:29
fduwjj added a commit that referenced this pull request Feb 8, 2023
ghstack-source-id: f105a1ade4ccac6ea9a8337d34e1830d12099132
Pull Request resolved: #94421
@fduwjj fduwjj marked this pull request as ready for review February 8, 2023 20:17
Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass, let's add the experimental line as this is prototype release.

docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Show resolved Hide resolved
:members:

We also enabled 2D parallelism to integrate with ``FullyShardedDataParallel``.
Users just need to call the following API explicitly:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered we have a FSDP extension, Is TP automatically register the extension now?

Also, I wonder if we should give a small code snippet showing how the 2-D parallel look like

Copy link
Contributor Author

@fduwjj fduwjj Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registrations is in the is_available. Let me send a follow-up PR for this one.



.. currentmodule:: torch.distributed.tensor.parallel.fsdp
.. autofunction:: is_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to add this API to the doc? I remembered is_available is introduced when we are in tau, but since now it's pytorch I think fsdp should always be available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of 2D hook registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will send a follow-up PR to address the naming of this one.

docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved
docs/source/distributed.tensor.parallel.rst Outdated Show resolved Hide resolved

This is far from completed and we will definitely polish it down the road.


[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Feb 8, 2023
ghstack-source-id: 8fa4b414901367acdd0ac76b0844a1eb59ee5c67
Pull Request resolved: #94421
@fduwjj
Copy link
Contributor Author

fduwjj commented Feb 8, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here


This is far from completed and we will definitely polish it down the road.


[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/fduwjj/71/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/94421)

pytorchmergebot pushed a commit that referenced this pull request Feb 8, 2023
ghstack-source-id: d03f0b1bf33d5f3f662d1f574a35828bbc336330
Pull Request resolved: #94421
@fduwjj
Copy link
Contributor Author

fduwjj commented Feb 8, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/fduwjj/71/head branch June 8, 2023 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants