# Improving training and transfer learning

## Data augmentation

Data augmentation is a technique used to increase the size of a training dataset by creating new training samples from the existing ones. This is particularly useful when the original dataset is small or when the model is not able to generalize well.

There are several types of data augmentation techniques:

1. **Image transformations**: This includes rotation, translation, scaling, flipping, and cropping.
2. **Color manipulation**: Adjusting brightness, contrast, saturation, and hue.
3. **Noise addition**: Adding random noise to the images.
4. **Blurring**: Applying blurring filters to the images.
5. **Distortion**: Distorting the images to create new samples.

## Optimizers

An optimizer is an algorithm used to update the parameters of a model during training. The goal of an optimizer is to adjust the model's parameters to minimize the loss function.

### SGD

Stochastic Gradient Descent (SGD) is a simple and effective optimizer. It updates the model's parameters using the gradient of the loss function with respect to the parameters. The upda

### Adam

Adam is an optimizer that uses the gradient of the loss function with respect to the parameters. It differs from SGD in that it uses a more sophisticated update rule that takes into account the momentum and the second moment of the gradient.

## Regularization

Regularization is a technique used to prevent overfitting. It adds a penalty term to the loss function to discourage the model from fitting the training data too closely. This helps in improving the model's generalization ability.

### Dropout

Dropout is a regularization technique where randomly selected neurons are ignored during training. This helps in preventing the neurons from co-adapting.

### L2 Regularization

L2 regularization adds a penalty term to the loss function to discourage the model from fitting the training data too closely. This helps in improving the model's generalization ability.




In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T
import matplotlib.pyplot as plt

import numpy as np

USE_GPU = True
dtype = torch.float32 # We will be using float throughout this tutorial.

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss.
print_every = 100
print('using device:', device)

In [None]:
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.5719),(0.1684)) 
            ])

# load medMNIST datset

from medmnist import OrganSMNIST
dataset_train = OrganSMNIST(root='utils/datasets', split='train', transform=transform, download=True)
loader_train = DataLoader(dataset_train, batch_size=64, shuffle=True, num_workers=2)

dataset_val = OrganSMNIST(root='utils/datasets', split='val', transform=transform, download=True)
loader_val = DataLoader(dataset_val, batch_size=64, shuffle=False, num_workers=2)

dataset_test = OrganSMNIST(root='utils/datasets', split='test', transform=transform, download=True)
loader_test = DataLoader(dataset_test, batch_size=64, shuffle=False, num_workers=2)


## Augmentation
Here we will only work with the VGG model and see how we can improve its performance by using data augmentation, regularization and transfer learning.

First let's train the VGG model and see how it performs on the test set.

In [None]:
from utils.models import vgg
from utils.train import train, check_accuracy


model = vgg.VGG13(num_classes=11, in_channels=1)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Set number of epochs
num_epochs = 10


# Train the model
model, results = train(model, loader_train, loader_val, criterion, optimizer, device, num_epochs)

# Check accuracy on the test set
test_loss, test_accuracy = check_accuracy(model, loader_test, criterion, device)
train_loss, train_accuracy = check_accuracy(model, loader_train, criterion, device)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")



Have a look at the [torchvision documentation](https://pytorch.org/vision/0.19/) to find out how you can use data augmentation with torchvision. Then design an augmentation pipeline and apply it to the training data.

In [None]:
import torchvision.transforms.v2 as v2
transform = v2.Compose([
                T.ToTensor(),
                T.Normalize((0.5719),(0.1684)) 
                # Add data augmentation here
            ])

test_transform = v2.Compose([
                T.ToTensor(),
                T.Normalize((0.5719),(0.1684)) 
                # No data augmentation here
            ])
# load medMNIST datset

from medmnist import OrganSMNIST
dataset_train_aug = OrganSMNIST(root='utils/datasets', split='train', transform=transform, download=True)
loader_train_aug = DataLoader(dataset_train_aug, batch_size=64, shuffle=True, num_workers=2)

dataset_val_aug = OrganSMNIST(root='utils/datasets', split='val', transform=test_transform, download=True)
loader_val_aug = DataLoader(dataset_val_aug, batch_size=64, shuffle=False, num_workers=2)

dataset_test = OrganSMNIST(root='utils/datasets', split='test', transform=test_transform, download=True)
loader_test = DataLoader(dataset_test, batch_size=64, shuffle=True, num_workers=2)


Let's plot the images before and after augmentation to see how it affects the images. This is an important step to ensure that the augmentation is working as expected.

In [None]:
def plot_images(dataset, title, start_index=0):
    # Plot 10 images starting from start_index
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    fig.suptitle(title)
    
    for i, ax in enumerate(axes.flat):
        if i < 10:
            # Get the image and its label
            img, _ = dataset[start_index + i]
            
            # Denormalize the image
            img = img.squeeze().cpu().numpy()
            img = img * 0.1684 + 0.5719
            img = np.clip(img, 0, 1)
            
            ax.imshow(img, cmap='gray')
            ax.axis('off')
    
    plt.tight_layout()
    plt.show()

# Define the augmented transform
transform_aug = T.Compose([
    T.ToTensor(),
    T.RandomRotation(10),  # Rotate by up to 10 degrees
    T.RandomAffine(0, translate=(0.1, 0.1)),  # Translate by up to 10%
    T.Normalize((0.5719), (0.1684))
])

# Create datasets
dataset_val = OrganSMNIST(root='utils/datasets', split='val', transform=transform, download=True)
dataset_val_aug = OrganSMNIST(root='utils/datasets', split='val', transform=transform_aug, download=True)

# Set a random seed for reproducibility
torch.manual_seed(42)

# Plot original validation images
plot_images(dataset_train, "Original Validation Images")

# Plot augmented validation images
plot_images(dataset_train_aug, "Augmented Validation Images")

## Question
What is an example of when a data augmentation technique may not be helpful or even harmful to the performance of a model?

In [None]:
from utils.models import vgg
from utils.train import train, check_accuracy


model = vgg.VGG13(num_classes=11, in_channels=1)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Set number of epochs
num_epochs = 10


# Train the model
model, results = train(model, loader_train_aug, loader_val_aug, criterion, optimizer, device, num_epochs)

# Check accuracy on the test set
test_loss, test_accuracy = check_accuracy(model, loader_test, criterion, device)
train_loss, train_accuracy = check_accuracy(model, loader_train_aug, criterion, device)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")



## Question
What differences do you observe in the training and validation loss curves when using data augmentation? Plot the training and validation loss curves for both cases.

# Optimizers

Let's compare the performance of SGD and Adam optimizer. We will use the same model and the same dataset as above. Implement SGD and Adam optimizer and train the model. Compare the training and validation loss curves for both optimizers.

## Question
What are the advantages and disadvantages of using SGD and Adam optimizer?

In [None]:
# TODO Training with SGD and plot the training and validation loss curves

In [None]:
# TODO Training with Adam and plot the training and validation loss curves

## Transfer learning

Transfer learning is a technique where a model trained on one task is reused as the starting point for training a model on a second, related task. This is useful when the second task has less data available than the first task.

### Using a pre-trained model

We will use the a model pre-trained on ImageNet and fine-tune it on the OrganSMNIST dataset.

Look at the [torchvision documentation](https://pytorch.org/vision/stable/models.html) to find out how to load a pre-trained model.

1. Try to train the model by training it from scratch and replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset.

2. Load a pre-trainined model, freeze all the layers of the model except the last one and replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset.

3. Load a pre-trained model, replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset and train the model on the OrganSMNIST dataset.

### 1. Training with a model from Torchvision from scratch

In [None]:
from torchvision import models

model = None # TODO: Load a model from torchvision.models and train it from scratch
print(model)
# Replace the first layer with a new layer that has the number of channels in the OrganSMNIST dataset e.g.
#model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# Replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset e.g.
#model.fc = nn.Linear(model.fc.in_features, 11)





In [None]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Set number of epochs
num_epochs = 10


# Train the model
model, results = train(model, loader_train_aug, loader_val_aug, criterion, optimizer, device, num_epochs)

# Check accuracy on the test set
test_loss, test_accuracy = check_accuracy(model, loader_test, criterion, device)
train_loss, train_accuracy = check_accuracy(model, loader_train_aug, criterion, device)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")

### 2. Training with a pre-trained model and frozen layers

In [None]:
model = None # TODO: Load a model from torchvision.models and its weights
# Freeze all the layers of the model except the last one
for param in model.parameters():
    param.requires_grad = False

# Replace the first layer with a new layer that has the number of channels in the OrganSMNIST dataset e.g.
#model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# Replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset e.g.
#model.fc = nn.Linear(model.fc.in_features, 11)

#TODO Train the model

### 3. Training with a pre-trained model and unfrozen layers

In [None]:
model = None # TODO: Load a model from torchvision.models and its weights

# Replace the first layer with a new layer that has the number of channels in the OrganSMNIST dataset e.g.
#model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# Replace the last layer with a new layer that has the number of classes in the OrganSMNIST dataset e.g.
#model.fc = nn.Linear(model.fc.in_features, 11)

#TODO Train the model
