# Project 2: A minimal model training experiment

**Goal**:

- Create a [PyTorch LightningModule](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html) named `ImageClassifier` that holds a convolutional network with ResNet18 backbone.
- Learn to adapt the dataset class to be compatible with the model.
- Understand a minimal set of components of a model training pipeline: loss, optimizer, metrics, training loop.
- Assemble the components to train a model using the dataset class and the dataloader created in the previous object.
- Learn how to visualize model predictions.
- Understand the concept of fine-tuning and the benefits of starting from a pre-trained model. (may move to the next project)
- Understand the benefits and options of a dataloader. (may move to the next project)

**Acceptance Criteria**:

- Implement a test that checks a simple ImageClassifier can be predict on an image that has the correct shape.
- The `ImageClassifier` can be trained on the CIFAR10 dataset, showing decreasing loss and accuracy for several epochs.

**Resources**:

- If you want to use docker to run code in this notebook, `pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime` is a good choice.

## Step 1: Create a model using `LightningModule`

`LightningModule` is a convenient and structured way to implement a PyTorch model, as well as its training and validation behaviors. For more information, please refer to the [documentation](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html) for `LightningModule`.

In [None]:
# Install the dependencies:
!pip install torch==1.12.0 torchvision==0.13.0 pytorch-lightning==1.6.4 torchmetrics

In [8]:
import pytorch_lightning as pl
from torchvision.models import resnet18

class ImageClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.net = resnet18(num_classes=10, weights=None)
    
    def forward(self, x):
        return self.net(x)

In [9]:
model = ImageClassifier()

### Testing the forward path of the model

As a trainable function, the model is callable. The model's forward path (as defined in `forward()`) is the normal execution of the function. We can test this by passing an image to the model.

In [13]:
assert callable(model)  # is the model callable?

#### What is a valid input?

To test the forward path of the model, we need to pass a valid input. The input should be a tensor of shape `(b, 3, 32, 32)` where `b` is the batch size (an integer).

**Your Task**: Fix the code below to pass the test.

**Tips:** `torch.from_numpy()` can be used to convert a numpy array to a tensor.


In [None]:
import numpy as np
import torch

# TODO: The test is broken. Please fix it.
def test_model_can_predict_on_a_random_image():
    input_image = np.ones(shape=(3, 224, 224), dtype=np.float32)
    output = model(input_image)  # run the model on the input image
    assert output.shape == (1, 10)  # is the output shape correct?


test_model_can_predict_on_a_random_image()

## Step 2: Prepare the dataset and data loader

You have two choices here.

1. Use the `CIFAR10` dataset class provided by `torchvision`.
   ```
   from torchvision.datasets import CIFAR10

   cifar10_train = CIFAR10(train=True)
   cifar10_val = CIFAR10(train=False)
   ```
2. Reuse the `CustomDataset` and dataloader from the previous object.

In [None]:
# TODO: import `CIFAR10` or copy over the code from the previous project.

## Step 3: Training the model

#### Loss

A loss is a function that takes the model's output and the ground truth as input and returns a scalar value. The loss is used as the feedback mechantism to optimize the model.

For classification, the loss is typically the cross-entropy. Use `torch.nn.functional.cross_entropy` when the model output are logits or `torch.nn.functional.nll_loss` when the outputs are probabilities (a.k.a. softmax).

**Your Task**: The following code almost works but there is a bug related to data type. Please fix it.

In [None]:
from torch.nn import functional as F

# To simplify the code, assume there are 3 image classes.
y_pred = torch.from_numpy(np.array([[0, 0, 5]], dtype=np.float32))
y_true = torch.from_numpy(np.array([1], dtype=np.float32))
print(F.cross_entropy(y_pred, y_true))

**Your Task**: Can you make the loss lower? Try changing `y_pred` in the next cell.

In [None]:
# TODO: change y_pred to make the loss lower.
y_pred = torch.from_numpy(np.array([[0, 0, 5]], dtype=np.float32))
y_true = torch.from_numpy(np.array([1], dtype=int))
print(F.cross_entropy(y_pred, y_true))

#### Optimizer

The gradient over model parameters (a.k.a. model weights) is computed of the loss function, and informs how the model parameters should be updated. The specifics of how the model is updated is handled by an optimizer.

An optimizer can be created by passing the model parameters (`model.parameters()`) to the optimizer constructor, as well as learning rate, momentum, and other hyper-parameters. Below is a minimal example of using an optimizer.

```{python}
for input, target in dataset:
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()
```

When using `LightningModule`, this is taken care of under the hood (pseudo-code):

```
# put model in train mode and enable gradient calculation
model.train()
torch.set_grad_enabled(True)

outs = []
for batch_idx, batch in enumerate(train_dataloader):
    loss = training_step(batch, batch_idx)
    outs.append(loss.detach())

    # clear gradients
    optimizer.zero_grad()

    # backward
    loss.backward()

    # update parameters
    optimizer.step()
```

For more details, please refer to [the documentation for PyTorch optimizers](https://pytorch.org/docs/stable/optim.html) and [the documentation for `LightningModule`](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html).

#### Training Step

We can start specifying the training behavior to the model by adding the `training_step` method in the `ImageClassifier` class.

In [None]:
class ImageClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.net = resnet18(num_classes=10, weights=None)
    
    def forward(self, x):
        return self.net(x)
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.001)
    
    def loss(self, y_hat, y):
        return F.cross_entropy(y_hat, y)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        return loss

**Your task**: Implement the following unit test. Use the data loader to get a batch of data.
Then, call the `training_step` and assert that the return value is a dictionary that looks
like this: `{'loss': 0.5}`. 

In [None]:
def test_training_step_works():
    # TODO: Use the data loader to get a batch of data.
    #    Then, call the `training_step` and assert that the return value is a dictionary that looks
    #    like this: {'loss': 0.5}.
    ...


test_training_step_works()

### Metrics

A metric is a function that takes the model's output and the ground truth as input and returns a scalar value. The metric is used to evaluate the model's performance. It is mainly used to monitor the training progress and does not directly influence the model's training behavior. However, it can influence model selection and early stopping.

For classification, the metric is typically the accuracy. We will use `torchmetrics.Accuracy` in this project.

In [44]:
from torchmetrics import Accuracy

# Again, for simplicity, assume there are only 3 imge classes.
y_pred = torch.from_numpy(np.array([[0, 0, 5], [2, -1, 1]], dtype=np.float32))
# The following line is optional and shouldn't change the result.
# y_pred = F.softmax(y_pred, dim=1)
y_true = torch.from_numpy(np.array([1, 0], dtype=int))
acc = Accuracy(num_classes=3)
acc.update(y_pred, y_true)
print("Accuracy:", acc.compute())

Accuracy: tensor(0.5000)


### Training loop

A training loop manages the training process that iteratively passes batches of training examples to the model and running the training step. 

The training steps are organized into "epochs", where an epoch can be a single pass through the entire training dataset, or it can be simply a predefined number of training steps. At the end of an epoch, the model is used predict on the validation dataset and calculate accuracy metrics.

To do this, we need to add `validation_step` and `validation_epoch_end` methods and accuracy metrics to the model.

In [None]:
from torchmetrics import Accuracy


class ImageClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.net = resnet18(num_classes=10, weights=None)
        self.val_accuracy = Accuracy(num_classes=10)
    
    def forward(self, x):
        return self.net(x)
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.001)
    
    def loss(self, y_hat, y):
        return F.cross_entropy(y_hat, y)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.loss(y_hat, y)
        return {'loss': loss}
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        self.val_accuracy.update(preds=y_hat, target=y)
        val_loss = self.loss(y_hat, y)
        return val_loss
    
    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x for x in outputs]).mean()
        avg_acc = self.val_accuracy.compute()
        self.log('val_loss', avg_loss, prog_bar=True)
        self.log('val_acc', avg_acc, prog_bar=True)

### Start model training

**Your Task**: In the next cells, fill out the details in the training loop implementation. The training loop can be described in pseudo code:

```{python}
for epoch in epochs:
    for train_step in train_steps_per_epoch:
        # TODO: loop over the training dataset and run training steps
        pass
    
    # TODO: loop over the validation dataset and run validation steps.
```

In [None]:
epochs = 3
train_steps_per_epoch = 10

for epoch in epochs:
    for train_step in train_steps_per_epoch:
        # TODO: loop over the training dataset and run training steps
        pass
    
    # TODO: loop over the validation dataset and run validation steps.

## Step 4: Visualize model predictions

Having loss going down and accuracy going up is nice. But to have the peace of mind that the model is working and is indeed better, it is helpful to visualize the model's predictions.

**Your Task**: Complete the following cells to visualize the model's predictions.

First, visualize predictions from a randomly initialized model:

In [None]:
# TODO: call the un-trained model to predict on a few images from the validation set.
#   Use the skill learned in project 1 to display a batch of images.

Next, take the trained model:

In [None]:
# TODO: call the trained model to predict on a few images from the validation set.