New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[pipelining] Add manual pipeline stage #126123

Closed

H-Huang wants to merge 3 commits into gh/H-Huang/115/base from gh/H-Huang/115/head

Member

H-Huang commented May 13, 2024 •

edited

Add ManualPipelineStage under _PipelineStage.py

Fix some type hints since args_recv_info can contain more than one RecvInfo. Previously the hint was Tuple[InputInfo] which meant it is a tuple of size 1. This is different from List[InputInfo] which can contain any number of items. I needed to update to Tuple[InputInfo, ...] to make the number of items flexible.

Stack from ghstack (oldest at bottom):

-> [pipelining] Add manual pipeline stage #126123

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k


          [pipelining] Add manual pipeline stage

a38652d

[ghstack-poisoned]

pytorch-bot bot commented May 13, 2024 •

edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126123

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c14c901 with merge base 7f1d5ab ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added oncall: distributed release notes: distributed (pipeline) labels

H-Huang added a commit that referenced this pull request


          [pipelining] Add manual pipeline stage

0542b0d

ghstack-source-id: 810e042f19328e74385fa6a57ccf8d15cfbaa7bc
Pull Request resolved: #126123


          Update on "[pipelining] Add manual pipeline stage"

17da510

Add `ManualPipelineStage` under `_PipelineStage.py`




cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]

H-Huang added a commit that referenced this pull request


          [pipelining] Add manual pipeline stage

8b0b424

ghstack-source-id: 25a61caadb9e7ef56f5b51e0766aeca501dd59d6
Pull Request resolved: #126123

H-Huang requested review from wconstab and kwen2501

May 14, 2024 14:42


          Update on "[pipelining] Add manual pipeline stage"

c14c901

Add `ManualPipelineStage` under `_PipelineStage.py`

Fix some type hints since `args_recv_info` can contain more than one RecvInfo. Previously the hint was `Tuple[InputInfo]` which meant it is a tuple of size 1. This is different from `List[InputInfo]` which can contain any number of items. I needed to update to `Tuple[InputInfo, ...]` to make the number of items flexible.




cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]

H-Huang added a commit that referenced this pull request


          [pipelining] Add manual pipeline stage

acc69cb

ghstack-source-id: cae56595598646cc58af6dd6298d03aa31cc4740
Pull Request resolved: #126123

kwen2501 approved these changes

View reviewed changes

Contributor

kwen2501 left a comment

LGTM.
nit: privatize methods

wconstab reviewed

View reviewed changes

torch/distributed/pipelining/_PipelineStage.py

		return metadata


		def get_stage_shapes(

Contributor

wconstab May 14, 2024

i'm happy to land this code first as is, but for the changes for lazy shape inference (and adding complete info like strides dtype etc) can you open an issue to start discussing the proposed changes?

wconstab reviewed

View reviewed changes

torch/distributed/pipelining/_PipelineStage.py

+                          )
+                      return grad_recv_info
+                  def init_p2p_neighbors(self):

Contributor

wconstab May 14, 2024

should this API be exposed from only ManualStage? I think it should be exposed by the base stage ideally.

(should we recommend users call this? how much does it help perf?)

wconstab reviewed

View reviewed changes

torch/distributed/pipelining/_PipelineStage.py

		return True


		def validate_stage_shapes(pipeline_stages: List[ManualPipelineStage]):

Contributor

wconstab May 14, 2024

will this method be needed anymore after we switch to lazy shape inference? I think we should be confident enough in our shape inference code that having a separate validation pass would be superfluous. Maybe we just delete this?

wconstab reviewed

View reviewed changes

torch/distributed/pipelining/_PipelineStage.py

		PLACEHOLDER_VAL = -1


		def create_empty_tensors(

Contributor

wconstab May 14, 2024

as a lower priority thing than the shape inference, I'd like to replace this with use of pytree if that makes sense. The upside would be not having to write a helper function at all, and at the same time getting other data structures (like a dict of tensors) for free.

The downsides could be- perf isn't great for pytree, but it wont matter if we only use it during initialization; we would be supporting more types of inputs, but then we'd also need to define what types of inputs are not supported and guard against those.

wconstab reviewed

View reviewed changes

torch/distributed/pipelining/_PipelineStage.py

		return metadata_tensor


		def extract_metadata_from_tensor(tensor: torch.Tensor) -> List[torch.Size]:

Contributor

wconstab May 14, 2024

I didn't see any helper method on torch.tensor for getting its metadata as a dict, but i think building a dict here with keys like 'shape, stride, dtype' makes sense, and then just sending that whole dict over the wire, and finally feeding the received dict as **kwargs to torch.empty call would be my approach. do you think that would work?

H-Huang mentioned this pull request

Allow Lazy Initialization for ManualPipelineStage #126216

Open

Member Author

H-Huang commented May 14, 2024

@pytorchbot merge

pytorch-bot bot added the ciflow/trunk label

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented May 14, 2024

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

470723f

pytorchmergebot removed the merging label

ZelboK pushed a commit to ZelboK/pytorch that referenced this pull request


          [pipelining] Add manual pipeline stage (pytorch#126123)

ed27236

Add `ManualPipelineStage` under `_PipelineStage.py`

Fix some type hints since `args_recv_info` can contain more than one RecvInfo. Previously the hint was `Tuple[InputInfo]` which meant it is a tuple of size 1. This is different from `List[InputInfo]` which can contain any number of items. I needed to update to `Tuple[InputInfo, ...]` to make the number of items flexible.

Pull Request resolved: pytorch#126123
Approved by: https://github.com/kwen2501

github-actions bot deleted the gh/H-Huang/115/head branch

June 14, 2024 01:55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment