Skip to content

Latest commit

 

History

History
80 lines (51 loc) · 2.06 KB

pipe.rst

File metadata and controls

80 lines (51 loc) · 2.06 KB

Model sharding using Pipeline Parallel

Let us start with a toy model that contains two linear layers.

To run this model on 2 GPUs we need to convert the model to torch.nn.Sequential and then wrap it with fairscale.nn.Pipe.

This will run the first two layers on cuda:0 and the last layer on cuda:1. To learn more, visit the Pipe documentation.

You can then define any optimizer and loss function

Finally, to run the model and compute the loss function, make sure that outputs and target are on the same device.

You can find a complete example under the examples folder in the fairscale repo.