Skip to content

[Pipe] Tied layers #51931

@stas00

Description

@stas00

🚀 Feature

Continuing the requests to support various needs of the models in the new Pipe pytorch feature, this one brings up

Tied layers

For models that have tied weights (e.g. encoder/decoder in transformer-type of models. It's important to:

  • avoid memory duplication
  • handle the gradients correctly (have to be reduced twice in the typical encoder+decoder transformer scenario)

You can see this feature provided in DeepSpeed Tied Layers and its discussion.

Thank you.

@pritamdamania87

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23

Metadata

Metadata

Assignees

No one assigned

    Labels

    oncall: distributedAdd this issue/PR to distributed oncall triage queuepipeline parallelismIssues related to https://pytorch.org/docs/master/pipeline.html

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions