### Image Recognition on MNIST using Pytorch Lightning

Demonstrate the elements of machine learning:

1) Experience (Datasets and Dataloaders)
2) Task (Classifier Model)
3) Performance (Accuracy)

**Experience:** <br>
We use MNIST dataset for this demo. MNIST is made of 28x28 images of handwritten digits. The train split has 60,000 images and the test split has 10,000 images. Images are all gray-scale.

**Task:**<br>
Our task is to classify the images into 10 classes. We use ResNet18 model from torchvision.models. The ResNet18 model is subclassed to accept a single channel input. The number of classes is set to 10.

**Performance:**<br>
We use accuracy metric to evaluate the performance of our model. We use the test split to evaluate the performance of our model. We use `torchmetrics` to calculate the accuracy.

**[Pytorch Lightning](https://www.pytorchlightning.ai/):**<br>
Our demo uses Pytorch Lightning to simplify the process of training and testing. We use the Pytorch Lightning Trainer to train our model. We use the Pytorch Lightning Evaluator to evaluate our model.

Let us install `pytorch-lightning` and `torchmetrics`.

In [11]:
!pip install pytorch-lightning --upgrade
!pip install lightning-bolts --upgrade
!pip install torchmetrics --upgrade



In [12]:
import torch
import torchvision
from pytorch_lightning import LightningModule, Trainer
from torchmetrics.functional import accuracy

### ResNet18MNIST Model

This is ResNet18 model from torchvision.models that is subclassed to support a single channel input. We replace the input convolutional layer with a single channel input.

`ResNet` class can be found [here](https://pytorch.org/vision/0.8/_modules/torchvision/models/resnet.html)

In [13]:
class ResNet18MNIST(torchvision.models.ResNet):
    def __init__(self, **kwargs):
        super().__init__(torchvision.models.resnet.BasicBlock, [2, 2, 2, 2], **kwargs)
        self.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

### Pytorch Lightning Model

PyTorch Lightning Model is a PyTorch Model. It is a subclass of LightningModule. It is a container for the model, the optimizer, the loss function, the metrics, and the data loaders.

With PyTorch Lightning, we simplify the training and testing process since we do not need to write boiler plate code for training and testing.

In [14]:
class LitMNISTModel(LightningModule):
    def __init__(self, num_classes=10, lr=0.001, batch_size=32):
        super().__init__()
        self.save_hyperparameters()
        self.model = ResNet18MNIST(num_classes=num_classes)
        self.loss = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    # this is called during fit()
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        loss = self.loss(y_hat, y)
        acc = accuracy(y_hat, y)
        return {"loss": loss, "acc": acc}

    # this is called at the end of an epoch
    def test_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        loss = self.loss(y_hat, y)
        acc = accuracy(y_hat, y)
        return {"test_loss": loss, "test_acc": acc}

    # this is called at the end of all epochs
    def test_epoch_end(self, outputs):
        avg_loss = torch.stack([x["test_loss"] for x in outputs]).mean()
        avg_acc = torch.stack([x["test_acc"] for x in outputs]).mean()
        self.log("avg_test_loss", avg_loss, prog_bar=True)
        self.log("avg_test_acc", avg_acc, prog_bar=True)
        return {"avg_test_loss": avg_loss, "avg_test_acc": avg_acc}

    # this is model optimizer
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams.lr)
    
    # this is called after model instatiation to initiliaze the datasets and dataloaders
    def setup(self, stage=None):
        self.train_dataloader()
        self.test_dataloader()

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
            torchvision.datasets.MNIST(
                "./data", train=True, download=True, transform=torchvision.transforms.ToTensor()
            ),
            batch_size=self.hparams.batch_size,
            shuffle=True,
            num_workers=48,
            pin_memory=True,
        )

    def test_dataloader(self):
        return torch.utils.data.DataLoader(
            torchvision.datasets.MNIST(
                "./data", train=False, download=True, transform=torchvision.transforms.ToTensor()
            ),
            batch_size=self.hparams.batch_size,
            shuffle=False,
            num_workers=48,
            pin_memory=True,
        )

### Program Arguments

When run on command line, we can pass arguments to the program.

In [15]:
def get_args():
    import argparse
    parser = argparse.ArgumentParser(description="PyTorch Lightning MNIST Example")
    parser.add_argument("--batch_size", type=int, default=32, help="batch size")
    parser.add_argument("--lr", type=float, default=0.001, help="learning rate")
    parser.add_argument("--num_workers", type=int, default=48, help="num workers")
    parser.add_argument("--num_classes", type=int, default=10, help="num classes")
    parser.add_argument("--gpus", type=int, default=1, help="num gpus")
    parser.add_argument("--epochs", type=int, default=5, help="num epochs")
    args = parser.parse_args("")
    return args

### Main Function

Get command line arguments. Instatiate a Pytorch Lightning Model. Train the model. Evaluate the model.

In [16]:
if __name__ == "__main__":
    args = get_args()
    model = LitMNISTModel(num_classes=args.num_classes, lr=args.lr, batch_size=args.batch_size)
    model.setup()
    trainer = Trainer(
        max_epochs=args.epochs,
        gpus=args.gpus,
    )
    trainer.fit(model)
    trainer.test(model)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name  | Type             | Params
-------------------------------------------
0 | model | ResNet18MNIST    | 11.2 M
1 | loss  | CrossEntropyLoss | 0     
-------------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.701    Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]


Testing: 0it [00:00, ?it/s]