Skip to content

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Apr 23, 2024

Copy link

pytorch-bot bot commented Apr 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124776

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 866111e with merge base c82fcb7 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ci-td-distributed oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Apr 23, 2024
kwen2501 added a commit that referenced this pull request Apr 23, 2024
ghstack-source-id: d4977d7
Pull Request resolved: #124776
@kwen2501 kwen2501 added release notes: distributed (pipeline) release notes category and removed topic: not user facing topic category labels Apr 23, 2024
@kwen2501 kwen2501 requested review from H-Huang and wconstab April 23, 2024 20:43
@awgu
Copy link
Collaborator

awgu commented Apr 23, 2024

Random thought: Any chance that torch.distributed.pipeline_parallel is an option? :)

@kwen2501
Copy link
Contributor Author

kwen2501 commented Apr 23, 2024

Thanks for the suggestion @awgu !
We did consider torch.distributed.pipeline_parallel but unfortunately it seemed a bit long :)
Also, some people may argue that pipelining is not a kind of parallelism in a strict sense.

@wanchaol
Copy link
Collaborator

Thanks for the suggestion @awgu ! We did consider torch.distributed.pipeline_parallel but unfortunately it seemed a bit long :) Also, some people may argue that pipelining is not a kind of parallelism in a strict sense.

I was about to say I have a similar suggestion with @awgu, I feel pipeline_parallel aligns well with other parts of our offerings: fully_sharded_data_parallel, tensor_parallel.

pipelining is not a kind of parallelism in a strict sense.

Curious why it's not a kind of parallelism?

@kwen2501
Copy link
Contributor Author

kwen2501 commented Apr 23, 2024

fully_sharded_data_parallel, tensor_parallel

Where are these names in our package offering?

Curious why it's not a kind of parallelism?

It is really a subtle difference, here is an answer from Quora's bot:

Pipelining and parallelism are both techniques used in computer architecture to improve performance, but they operate in different ways.
Pipelining involves breaking down the execution of instructions into a series of stages, where each stage performs a different part of the instruction. This allows multiple instructions to be processed simultaneously, with each stage working on a different instruction. As a result, the overall throughput of the processor is increased.
Parallelism, on the other hand, involves executing multiple instructions simultaneously by using multiple processing units. This can be achieved through techniques such as multi-core processors or multi-processor systems. Parallelism allows for true simultaneous execution of instructions, which can significantly improve overall system performance.
In summary, pipelining focuses on breaking down the execution of individual instructions into smaller stages to improve throughput, while parallelism involves executing multiple instructions at the same time using multiple processing units to improve overall system performance.

In short, pipelining focuses on breaking down a job, while parallelism focuses on having multiple workers do the same job.

@wconstab
Copy link
Contributor

I think the name 'pipeline parallel' is sufficiently established that it doesn't matter what quora says, people know what it means and understand it. This is the obvious safe option to me.

Pipelining is arguably a correct name but also not one the ML community uses as often. It is also shorter and clean. I would have been more convinced by this if I wasn't already so used to saying pp, but I am.

@kwen2501
Copy link
Contributor Author

kwen2501 commented Apr 24, 2024

Thanks @wconstab. I agree that "pipeline parallel" is a well-known concept -- it is also what we use in our README. But, as a package name, I think it is too long. "PP" is short, but no descriptive enough.

Ideally, I prefer a package name that's one word, such as "distributed", "compiler", "profiler", "export". It can also be two shortened meaningful words concatenated, such as "autograd". But I think "pipepara" looks weird.

output = schedule.step()
```

Note that since we split our model into three stages, we must run this script with three workers. For this example, we will use `torchrun` to run multiple processes within a single machine for demonstration purposes. We can collect up all of the code blocks above into a file named [example.py](https://github.com/pytorch/PiPPy/tree/main/examples/basic) and then run it with `torchrun` like so:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the links to pippy repo going to migrate too eventually, or do we leave the examples there?

Copy link
Contributor Author

@kwen2501 kwen2501 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporary decision is leaving the examples there (not migrated). Eventually they would be hosted in some tutorial repo I hope.

@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 24, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Apr 30, 2024
pytorchmergebot pushed a commit that referenced this pull request May 1, 2024
This is a helper function which:
1. computes the gradients for the stage inputs, and
2. accumulates gradients for the stage module's parameters.

A unit test for this function is also added.

Pull Request resolved: #124958
Approved by: https://github.com/wconstab
ghstack dependencies: #124776, #124875
pytorchmergebot pushed a commit that referenced this pull request May 2, 2024
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
Pull Request resolved: #124875
Approved by: https://github.com/H-Huang
ghstack dependencies: #124776
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
This is a helper function which:
1. computes the gradients for the stage inputs, and
2. accumulates gradients for the stage module's parameters.

A unit test for this function is also added.

Pull Request resolved: #124958
Approved by: https://github.com/wconstab
ghstack dependencies: #124776, #124875
@github-actions github-actions bot deleted the gh/kwen2501/14/head branch June 2, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-td-distributed ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (pipeline) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants