Copyright (c) 2023 Graphcore Ltd. All rights reserved.

This Notebook takes you through the simple steps to use Lightning (https://github.com/Lightning-AI/lightning/) to run on the IPU. It takes a simple model from Lightning (LightningModule) and wraps the model in Lightning's standard processes for defining training, validation and optimiser behaviour. The only IPU-specific code is the dataloader and the instructions telling Lightning to run on the IPU.

This Notebook assumes you are running in a Docker container which needs to be updated to include all the required Linux packages.

The code in this Notebook shares requirements and dependencies with the adjacent PyTorch models. Install all requirements from the pytorch directory.

In [None]:
import os

import_location = os.getenv("POPTORCH_CNN_IMPORTS", "../pytorch")
number_of_ipus = 4
dataset_directory = os.getenv("DATASET_DIR", "fashionmnist_data/")

Move to the main pytorch lightling directory

In [None]:
%cd ../..

If you are running this Notebook in a docker container you will have the sudo rights to execute the cell below. If not, you may need to execute this separately.

In [None]:
!apt update
!apt-get install -y $(< {import_location}/required_apt_packages.txt)

Install PyTorch requirements

In [None]:
!make install -C {import_location}
!make install-turbojpeg -C {import_location}

In [None]:
import torch
import torchvision
import poptorch
import pytorch_lightning as pl
from pytorch_lightning.strategies import IPUStrategy
import torchvision.models as models
from torch import nn
import argparse

This notebook runs an of-the-shelf PyTorch mode

Take an off-the-shelf Resnet18 torchvision model and modify it for 10 FashionMNIST classes.

In [None]:
class TorchVisionBackbone(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.network = models.resnet18()

        # Overwriting the imported model's Conv1 layer to change it from 3 RGB channels to FashionMNIST's 1.
        self.network.conv1 = nn.Conv2d(1, 64, 7)

        # Overwriting the imported model's FC layer
        num_features = self.network.fc.in_features
        self.network.fc = nn.Linear(num_features, 10)

    def forward(self, x):
        x = self.network(x)
        x = torch.nn.functional.log_softmax(x)
        return x

The following code shows how you can use a PyTorch Lightning module to wrap your model class and describe the behaviour for training and (optionally) validation steps. We use the `LightningModule`'s built in methods to configure the optimiser.
For more information, see the [PyTorch Lightning Documentation for pl.LightningModule](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html).

In [None]:
# ResNet18 from-scratch with some changes to suit the FashionMNIST dataset
class ResNetClassifier(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, x):
        x = self.model(x)
        return x

    def training_step(self, batch, _):
        x, y = batch
        output = self.forward(x)
        loss = torch.nn.functional.nll_loss(output, y)
        return loss

    def validation_step(self, batch, _):
        x, y = batch
        output = self.forward(x)
        preds = torch.argmax(output, dim=1)
        acc = torch.sum(preds == y).float() / len(y)
        return acc

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

This class defines how to feed data to the model. It gets data from the local directory fashionmnist_data/ and declares a dataloader for training and for validation, based on the IPU-specific poptoch dataloader (https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.DataLoade). This will be passed to the trainer below.

In [None]:
class FashionMNIST(pl.LightningDataModule):
    def __init__(self, options, batch_size=4):
        super().__init__()
        self.batchsize = batch_size
        self.options = options

    def setup(self, stage="train"):
        # Retrieving the datasets
        self.train_data = torchvision.datasets.FashionMNIST(
            dataset_directory,
            train=False,
            download=True,
            transform=torchvision.transforms.Compose(
                [
                    torchvision.transforms.ToTensor(),
                    torchvision.transforms.Normalize((0.1307,), (0.3081,)),
                ]
            ),
        )

        self.validation_data = torchvision.datasets.FashionMNIST(
            dataset_directory,
            train=False,
            download=True,
            transform=torchvision.transforms.Compose(
                [
                    torchvision.transforms.ToTensor(),
                    torchvision.transforms.Normalize((0.1307,), (0.3081,)),
                ]
            ),
        )

    def train_dataloader(self):
        return poptorch.DataLoader(
            dataset=self.train_data,
            batch_size=self.batchsize,
            options=self.options,
            shuffle=True,
            drop_last=True,
            mode=poptorch.DataLoaderMode.Async,
            num_workers=64,
        )

    def val_dataloader(self):
        return poptorch.DataLoader(
            dataset=self.validation_data,
            batch_size=self.batchsize,
            options=self.options,
            drop_last=True,
            mode=poptorch.DataLoaderMode.Async,
            num_workers=64,
        )

Set up training.

In [None]:
model = TorchVisionBackbone()

Set up the number of IPUs to use. And how many epochs to train for

In [None]:
num_epochs = 2

Pass the model to the PT-Lightning classifier and create a trainer with some IPU-specifc options;
call pl.Trainer with acclerator set to "ipu" and strategy set to "IPUStrategy"

In [None]:
model = ResNetClassifier(model)

options = poptorch.Options()
options.deviceIterations(250)
options.replicationFactor(number_of_ipus)

datamodule = FashionMNIST(options)

trainer = pl.Trainer(
    accelerator="ipu",
    devices=number_of_ipus,
    max_epochs=num_epochs,
    log_every_n_steps=1,
    accumulate_grad_batches=8,
    strategy=IPUStrategy(inference_opts=options, training_opts=options),
)

Now train the model

In [None]:
trainer.fit(model, datamodule)