-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Initial add of torch.distributed.pipelining #124776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124776
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 866111e with merge base c82fcb7 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Random thought: Any chance that |
Thanks for the suggestion @awgu ! |
I was about to say I have a similar suggestion with @awgu, I feel
Curious why it's not a kind of parallelism? |
Where are these names in our package offering?
It is really a subtle difference, here is an answer from Quora's bot:
In short, pipelining focuses on breaking down a job, while parallelism focuses on having multiple workers do the same job. |
I think the name 'pipeline parallel' is sufficiently established that it doesn't matter what quora says, people know what it means and understand it. This is the obvious safe option to me. Pipelining is arguably a correct name but also not one the ML community uses as often. It is also shorter and clean. I would have been more convinced by this if I wasn't already so used to saying pp, but I am. |
Thanks @wconstab. I agree that "pipeline parallel" is a well-known concept -- it is also what we use in our README. But, as a package name, I think it is too long. "PP" is short, but no descriptive enough. Ideally, I prefer a package name that's one word, such as "distributed", "compiler", "profiler", "export". It can also be two shortened meaningful words concatenated, such as "autograd". But I think "pipepara" looks weird. |
output = schedule.step() | ||
``` | ||
|
||
Note that since we split our model into three stages, we must run this script with three workers. For this example, we will use `torchrun` to run multiple processes within a single machine for demonstration purposes. We can collect up all of the code blocks above into a file named [example.py](https://github.com/pytorch/PiPPy/tree/main/examples/basic) and then run it with `torchrun` like so: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the links to pippy repo going to migrate too eventually, or do we leave the examples there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The temporary decision is leaving the examples there (not migrated). Eventually they would be hosted in some tutorial repo I hope.
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Pull Request resolved: #124875 Approved by: https://github.com/H-Huang ghstack dependencies: #124776
This is a helper function which: 1. computes the gradients for the stage inputs, and 2. accumulates gradients for the stage module's parameters. A unit test for this function is also added. Pull Request resolved: #124958 Approved by: https://github.com/wconstab ghstack dependencies: #124776, #124875
Pull Request resolved: #125273 Approved by: https://github.com/H-Huang ghstack dependencies: #124776, #124875, #124958
Pull Request resolved: #124776 Approved by: https://github.com/wconstab
Pull Request resolved: #124875 Approved by: https://github.com/H-Huang ghstack dependencies: #124776
This is a helper function which: 1. computes the gradients for the stage inputs, and 2. accumulates gradients for the stage module's parameters. A unit test for this function is also added. Pull Request resolved: #124958 Approved by: https://github.com/wconstab ghstack dependencies: #124776, #124875
Stack from ghstack (oldest at bottom):
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k