Let us start with a toy model that contains two linear layers.
import torch
import torch.nn as nn
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = torch.nn.Linear(10, 10)
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5)
def forward(self, x):
x = self.relu(self.net1(x))
return self.net2(x)
model = ToyModel()
To run this model on 2 GPUs we need to convert the model to torch.nn.Sequential
and then wrap it with fairscale.nn.Pipe
.
This will run the first two layers on cuda:0
and the last layer on cuda:1
. To learn more, visit the Pipe documentation.
You can then define any optimizer and loss function
Finally, to run the model and compute the loss function, make sure that outputs and target are on the same device.
You can find a complete example under the examples folder in the fairscale repo.