-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Open
Labels
oncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuepipeline parallelismIssues related to https://pytorch.org/docs/master/pipeline.htmlIssues related to https://pytorch.org/docs/master/pipeline.html
Description
🚀 Feature
Continuing the requests to support various needs of the models in the new Pipe pytorch feature, this one brings up
Tied layers
For models that have tied weights (e.g. encoder/decoder in transformer-type of models. It's important to:
- avoid memory duplication
- handle the gradients correctly (have to be reduced twice in the typical encoder+decoder transformer scenario)
You can see this feature provided in DeepSpeed Tied Layers and its discussion.
Thank you.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23
Metadata
Metadata
Assignees
Labels
oncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuepipeline parallelismIssues related to https://pytorch.org/docs/master/pipeline.htmlIssues related to https://pytorch.org/docs/master/pipeline.html