# Practical session n°3

Notions:
- Training from scratch
- Validation step
- Learning curves
- Transfer learning
- Fine tuning
- Freezing

Duration: 2 h

Now that we have covered the basic building blocks, we will train a Convolutional Neural Network (CNN) on slightly more challenging problems than separation of points in a 2D space:
- handwritten digit recognition (part **I.**)
- binary classification of photos (part **II.**)

The first machine learning problem will give us the opportunity to train a tiny CNN from scratch through a complete training loop (including training and validation steps).
An efficient training from scratch on the second problem would need much more images than the few available photos (200). We hence use one of the most interesting features of the neural networks: once trained on a very big dataset on a very general task, they could be "retrained" (one says fine tuned) on a very specific task that share the same inputs. As such pretrained neural network are much bigger than our first tiny CNN, a graphics card will be used to significantly speed up the process.

### **I.A.** The MNIST Database of Handwritten Digit

The Database of Handwritten Digit of the NIST (National Institute of Standards and Technologogies) comprises 70,000 black and white  images of handwritten digits of 28x28 pixels. A specific dataset object is allocated to it in the torchvision.datasets module. \\
The subsequent cells are designed to import packages, download the MNIST database, define dataLoaders and showcase some images.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, models, transforms

In [None]:
# transforms (format/normalization)
tr=torchvision.transforms.Compose([
   torchvision.transforms.ToTensor(),
   torchvision.transforms.Normalize((0.1307,), (0.3081,))
   ])

# Definition of training sets:
trainval_dataset = datasets.MNIST(root='./data',
                                  train=True,
                                  download=True,
                                  transform=tr)

# Split indices for training and validation
num_images = len(trainval_dataset)
indices = list(range(num_images))
split = int(np.floor(0.2 * num_images))  # 20% validation

# Shuffle indices
np.random.seed(42)  # Seed for reproducibility
np.random.shuffle(indices)

# Create train and validation samplers
train_indices, val_indices = indices[split:], indices[:split]

from torch.utils.data import SubsetRandomSampler
train_sampler = SubsetRandomSampler(train_indices)
val_sampler = SubsetRandomSampler(val_indices)
train_size = len(train_sampler)
val_size = len(val_sampler)

# Definition of the train/val loaders
bs = 8
num_workers = 2 # try : print(os.cpu_count())

train_loader = DataLoader(trainval_dataset, batch_size=bs,
                          sampler=train_sampler, num_workers=num_workers)
val_loader = DataLoader(trainval_dataset, batch_size=bs,
                        sampler=val_sampler, num_workers=num_workers)

In [None]:
x, t = next(iter(train_loader))

print(x.shape)

fig = plt.figure()
for i in range(8):
  plt.subplot(4,2,i+1)
  plt.tight_layout()
  plt.imshow(x[i,0,:,:], cmap='gray') #, interpolation='none')
  plt.title("Ground Truth: {}".format(t[i]))
  plt.xticks([])
  plt.yticks([])

**Exercise 1**:
- Are images sampled by train_loader and val_loader normalized?
- How much images are in *train_loader* and *val_loader*?
- What will be the role of the validation loader?

### **I.B.** A vanilla CNN

Now, we will define a vanilla CNN with two convolution layers.

**Exercise 2:**  Determine *N* in such a way that the network can accept MNIST images as input.
How outputs will be interpreted after the training ?

In [None]:
# N = ...

class CNN(nn.Module):

    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5, padding=2)
        self.conv2 = nn.Conv2d(10, 10, kernel_size=5, padding=2)
        self.fc1 = nn.Linear(N, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))

        # convert an image to a 1D torch.tensor:
        x = x.view(-1, N)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

### **I.C.** Training of the CNN

To train a CNN, let's define a loss function. Since the log of output probabilities has been computed with *F.log_softmax*, we only need to gather the logits associated with the target classes. This can be done with the torch.gather function (see **P1**), but the standard way in PyTorch is to use *torch.nn.NLLLoss()*.

In [None]:
model = CNN()

#optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.9)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)

# NLLLoss() will have the same effect as torch.gather (see TP1)
loss_fn =  torch.nn.NLLLoss()

A complete training loop has (at least) two phases: weights are updated only in the first phase dedicated to training. During the validation phase, **generalization performance** on independent images is monitored.

**Exercise 3**:
Complete the following code to print the mean loss and the accuracy on the train and validation sets.


In [None]:
import time

num_epochs = 2

# Initialize time
start_time = time.time()

# Learning Loop:
for epoch in range(num_epochs):
    print(f'Epoch: {epoch}')

    running_loss_train = 0.0
    running_corrects_train = 0
    running_loss_val = 0.0
    running_corrects_val = 0

    # Phase 1: Training
    model.train()  # Set the model to training mode
    for x, label in train_loader:
        optimizer.zero_grad()
        output = model(x)
        loss = loss_fn(output, label)
        loss.backward()
        optimizer.step()

        # Get predicted classes:
        _, preds = torch.max(output, 1)

        # Update counters:
        running_loss_train += loss.item() * x.shape[0]
        running_corrects_train += torch.sum(preds == label.data)

    # Calculate training scores (todo):
    epoch_loss_train = ...
    epoch_acc_train = ...

    print(f'Train Loss: {epoch_loss_train:.4f} Acc: {epoch_acc_train:.4f}')

    # Phase 2: Validation
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        for x, label in val_loader:
            output = model(x)
            loss = loss_fn(output, label)

            # Get predicted classes:
            _, preds = torch.max(output, 1)

            # Update counters:
            running_loss_val += loss.item() * x.shape[0]
            running_corrects_val += torch.sum(preds == label.data)

    # Calculate validation scores (todo):
    epoch_loss_val = ...
    epoch_acc_val = ...

    print(f'Validation Loss: {epoch_loss_val:.4f} Acc: {epoch_acc_val:.4f}')

    # Print elapsed time:
    elapsed_time = time.time() - start_time
    print(f'Time: {round(elapsed_time)} seconds')

    # Update start time for the next epoch
    start_time = time.time()

In [None]:
train_losses = []
val_losses = []

train_accs = []
val_accs = []

### BEGIN SOLUTION

# Initialize time
start_time = time.time()

# Learning Loop:
for epoch in range(6):
    print('epoch :' + str(epoch))

    running_loss_train = 0.
    running_corrects_train = 0.
    running_loss_val = 0.
    running_corrects_val = 0.

    # Training
    for x, label in train_loader:
        optimizer.zero_grad()
        output = model(x)
        l = loss_fn(output, label)
        l.backward()
        optimizer.step()

        # Get predicted classes:
        _, preds = torch.max(output, 1)

        # Counters:
        running_loss_train += l.item() * x.shape[0]
        running_corrects_train += torch.sum(preds == label.data)

    # Calculate training scores and store:
    epoch_loss_train = running_loss_train / train_size
    epoch_acc_train = running_corrects_train.float() / train_size
    train_losses.append(epoch_loss_train)
    train_accs.append(epoch_acc_train)

    print('{} Loss: {:.4f} Acc: {:.4f}'.format(
        'train', epoch_loss_train, epoch_acc_train))


    # validation
    model.eval()

    for x, label in val_loader:

        with torch.no_grad():
            output = model(x)
            l = loss_fn(output, label)

        # Get predicted classes:
        _, preds = torch.max(output, 1)

        # Counters:
        running_loss_val += l.item() * x.shape[0]
        running_corrects_val += torch.sum(preds == label.data)

    # Calculate training scores and store:
    epoch_loss_val = running_loss_val / val_size
    epoch_acc_val = running_corrects_val.float() / val_size
    val_losses.append(epoch_loss_val)
    val_accs.append(epoch_acc_val)


    print('{} Loss: {:.4f} Acc: {:.4f}'.format(
        'val', epoch_loss_val, epoch_acc_val))

    # Print elapsed time:
    elapsed_time = time.time() - start_time
    print(f'Time: {round(elapsed_time)} seconds')


### END SOLUTION

**Exercise 4**:
At each epoch, store the accuracy and the cost function value in the lists *train_losses*, *val_losses*, *train_accs*, and *val_accs*.
Plot the **learning curves** over six epochs. \\

In [None]:
fig, ax = plt.subplots()
plt.title('evolution of training and validation accuracies')

ax.plot( ... , color = 'r')
ax.plot( ... , color = 'b')
ax.legend(['train acc.', 'val acc.'])


In [None]:
fig2, ax2 = plt.subplots()
plt.title('loss = f(epoch)')
ax2.plot( ... ,  color = 'r')
ax2.plot( ... , color = 'b')
ax2.legend(['train losses', 'val losses'])

**Exercise 5:** Complete the following perceptron (P60) to directly take MNIST images as input.
Compare the standalone perceptron to the CNN in terms of size (number of weights) and performance on a test set.

In [None]:
class P60(nn.Module):

    def __init__(self):
        super(P60, self).__init__()
        self.fc1 = nn.Linear(... , 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        # flattening x
        x = x.view(-1, ...)

        # apply first layer
        x = F.relu(self.fc1(x))

        # apply second layer
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x

perceptron = P60()
optimizer = torch.optim.Adam(perceptron.parameters(), lr = 0.001)
loss_fn =  torch.nn.NLLLoss()

In [None]:
# Size comparison:

...

In [None]:
# Performance comparison (on the validation set):

...

### **II.A.** Load and viz the Hymenoptera dataset:

Through a second image classification problem, we focus on two other important aspects of deep learning: speeding up the learning with GPU cards and the ability to use pretrained networks.

To illustrate the first aspect, we will use the GPUs available under Google Colab. To do this, before starting this part, go to **Modifier**/**Modifier les param du notebook** and select a GPU.

In [None]:
# Check GPU availability
if torch.cuda.is_available():
  device = torch.device("cuda:0") # 0 is the index of the GPU
  print(torch.cuda.get_device_name(device))
else:
  print('Change the runtime type to GPU')

Now let's download inputs (RGB images of bees or ants) and targets ("bee" or "ant").

In [None]:
# download the dataset
! wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
! unzip -qq hymenoptera_data.zip

In [None]:
dir_data = 'hymenoptera_data'
print(os.listdir(dir_data))

The dataset is in a standard format, and we can manipulate it with a ready-to-use dataset object of the datasets.ImageFolder class:

In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        #transforms.RandomVerticalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

image_datasets = {x: datasets.ImageFolder(os.path.join(dir_data, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=0)
              for x in ['train', 'val']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

print('Dataset sizs:' )
print(dataset_sizes)

Since the provided dataset is very small, we need to maximize its utility. We will produce new images through additional transformations that preserve the nature of the object (data augmentation). \\
In the code, transforms.*RandomResizedCrop()*, *transforms.RandomHorizontalFlip()* and *transforms.RandomVerticalFlip()* apply horizontal or vertical axis symmetry with a probability of 1/2. Note that these transformations might not be suitable for other datasets like MNIST since the mirror image of a digit is generally not another digit. \\
Some images are presented below.

In [None]:
def imshow(inp, ax=None, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    if ax is None:
      plt.imshow(inp)
      plt.title(title)
    else:
      ax.imshow(inp)
      ax.set_title(title)

In [None]:
def plot_batch(images, labels, class_names):
    num_images = len(images)
    fig, axs = plt.subplots(1, num_images, figsize=(15, 5))

    for i in range(num_images):
        axs[i].axis('off')
        imshow(images[i],axs[i],class_names[labels[i]])
    plt.show()

# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))
class_names = image_datasets['train'].classes
# Assuming `inputs` is a batch of images and `classes` are the corresponding class labels
plot_batch(inputs, classes, class_names)

### **II.B.** Using a Graphics Card:

In this part, the lightest of the ResNet architectures is adapted to our binary classification problem and trained over one epoch.

**Exercise 6:**

- Load an untrained ResNet18. How many total weights does it contain? Check [here](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html).

- How many neurons does the last layer of the network have?

- Is there a softmax operation at the end of *ResNet.forward()*?

- Modify the last layer of the classifier so that it has as many neurons as there are classes in hymenoptera_data.

In [None]:
model = models.resnet18(pretrained=False)
...

In [None]:
# Modification of the last layer of the classifier
def get_model(pretrained):
  model = models.resnet18(pretrained=pretrained)

  ...

  return model

model = get_model(False)

Now, let's define the negative log-likelihood as the cost function. To compute the log-likelihood, we could add a LogSoftmax layer to the ResNet. Another common way to do that is to use a loss function that includes *LogSoftmax*. In this regard, in PyTorch,  *nn.CrossEntropyLoss* combines both *LogSoftmax* and *NLLLoss*.

In [None]:
loss_fn = nn.CrossEntropyLoss()

Finally, let's define a function that incorporates the training loop:

In [None]:
def train_model(dataloaders, model, loss_fn, optimizer, num_epochs=1):
    since = time.time()

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()
            else:
                model.eval()
                # Weights are not updated during the validation phase

            running_loss = 0.0
            running_corrects = 0

            for inputs, labels in dataloaders[phase]:
                optimizer.zero_grad()
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = loss_fn(outputs, labels)
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {100*epoch_acc:.2f}%')

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')

    return model

**Exercise 7:** With the *train_model* function, train the ResNet over one epoch with mini-batches of 64 images.

In [None]:
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=64,
                                             shuffle=True, num_workers=2)
              for x in ['train', 'val']}

model = get_model(pretrained=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

# Training over 1 epoch:
...

### **II.C.** Using a Graphics Card:

With more than 10 million parameters, training a ResNet18 on a CPU is much slower than the networks in Part I. \\
Let's repeat the same training using the GPU.

In [None]:
print(f'Runtime device :{device}')

# Load the model to the GPU:
model = model.to(device)

To load a torch.tensor on GPU, the syntax is the same:

In [None]:
x = torch.rand(2,1,4,4)
print("On CPU :\n",x)
x = x.to(device)
# Note: You can also use .cuda() without specifying the device name
# but this method is not recommended especially in a multi-gpu environment
print("On GPU :\n",x)

# bring back the x tensor to the CPU RAM:
x = x.to('cpu') # or x.cpu()
print('Back to CPU:\n',x)

**Exercise 8:**
- Complete the fonction *train_model_gpu* to train the model on GPU.
- Compare the CPU and GPU training times.
- What are the validation scores after 20 epochs on GPUs ?

In [None]:
def train_model_gpu(dataloaders, model, loss_fn, optimizer, num_epochs=1):
  ...

### **II.D.** Impact of pretraining on performance:

Training is faster on a GPU, but it only leads to a very poor score, barely better than random chance. To improve performance, a simple idea is to use a network trained on a similar (or more general) task as a starting point for learning. Here, it works particularly well with networks trained on ImageNet, whose convolutional filters are already very rich.

**Note:**
This method is refered to as **fine-tuning** a **pretrained model**.

**Exercise 9:** Compare two ResNet18 trainings, one randomly initialized and the other pre-trained, using learning curves, over 25 epochs.

In [None]:
max_epochs = 25
# Learning "from scratch" (random weights) :
# get the model
# Put the model on GPU
# get the loss, optimize, the scheduler and starting the training
# ...
# resnet_scratch, accs_scratch = train(...)



In [None]:
# fine tuning a pretrained model:
# ...
# resnet_ft, accs_ft = train(...)


The fine-tuning approach has many variations that fit into the broader framework of **transfer learning**. Partial fine-tuning, as illustrated in the following exercise, is one of these variations.

**Exercise 10:** Instead of retraining all the weights, you can simply use the weights of the classifier. This is referred to as *freezing* the other weights during retraining. \\
Implement this approach and compare it with the previous ones.

In [None]:
...

# freeze all the layers except the classifier (the last dense layers at end)
# using this snippet :
    ...
    for param in module.parameters():
      param.requires_grad = False
    ...


In the end, for this small dataset, retraining the last layer performs just as well as global training. To conclude, let's make some predictions with the model on the validation dataset:

In [None]:
def visualize_model(model, num_images=10):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure(figsize=(25,num_images//5*5))

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//5, 5, images_so_far)
                ax.axis('off')
                imshow(inputs.cpu().data[j],ax,'Predicted: {}'.format(class_names[preds[j]]))

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)