# Model Trainer
Train the models on specific datasets, using feature extractor or finetuning

**Author**

`Nathan Inkawhich <https://github.com/inkawhich>`

**Customizations**

`Marco Alecci <https://github.com/MarcoAlecci>`

`Francesco Marchiori <https://github.com/FrancescoMarchiori>`

`Luca Martinelli <https://github.com/luca-martinelli-09>`

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/luca-martinelli-09/learn-the-art/blob/main/modelTrainer.ipynb)

In [None]:
import os

# @markdown ## Setup project
# @markdown This section will download the datasets from GitHub to use for the training phase

if not os.path.exists("./datasets"):
    !git clone "https://github.com/luca-martinelli-09/learn-the-art.git"

    %cd learn-the-art/

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import models, transforms
import time
import copy
import math

print("PyTorch Version:", torch.__version__)
print("Torchvision Version:", torchvision.__version__)

In [None]:
# Detect if we have a GPU available
print("CUDA available:", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### Set a manual seed

In [None]:
SEED = 151836

def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)

set_seed(SEED)

## Settings

In [None]:
# @markdown ## Dataset
# @markdown Directory of the dataset
dataset_dir = "bing" # @param ["bing", "ddg", "google"]
data_dir = r"./datasets/{}".format(dataset_dir)

# @markdown Number of classes in the dataset
num_classes = 2 # @param {type:"integer", min: 1}

# @markdown Number of maximum samples per class (training)
num_samples = 3500  # @param {type:"integer"}

# @markdown Ratio between classes cat and dog
ratios = "50,50" # @param ["50,50", "40,60", "30,70", "20,80"]
ratio = int(ratios.split(",")[0]) / int(ratios.split(",")[1])
dataset_sizes = [math.ceil(num_samples * ratio), num_samples]

# @markdown Check images in the dataset before training
check_images = False # @param {type: "boolean"}

# @markdown Save PIL loaded image in a dictionary (consume more memory)
use_cache = True  # @param {type: "boolean"}

# @markdown ## DataLoader

num_workers = 0 # @param {type:"integer", min: 1}

pin_memory = True # @param {type:"boolean"}

# @markdown ## Model

# @markdown Model to use
model_name = "scratch" # @param ["resnet", "alexnet", "vgg", "squeezenet", "densenet", "inception", "scratch"]

# @markdown Batch size for training (change depending on how much memory you have)
batch_size = 16 # @param {type:"integer", min: 1}

# @markdown Number of epochs to train for
num_epochs = 500 # @param {type:"integer", min: 1}

# @markdown Patience for early stopping
patience_es = 20 # @param {type:"integer", min: 1}

# @markdown Delta for early stopping
delta_es = 0.0001 # @param {type:"number"}

# @markdown Flag for feature extracting. When False, we finetune the whole model, 
# @markdown when True we only update the reshaped layer params
feature_extract = False # @param {type:"boolean"}

# @markdown The learning rate of the optimizer
learning_rate = 0.001 # @param {type:"number"}

# @markdown The momentum of the optimizer
momentum = 0.9 # @param {type:"number"}

# @markdown ## Output
# @markdown Save the model after been trained
save_model = False # @param {type: "boolean"}

# @markdown Save entire model (not only weights)
save_entire_model = False # @param {type: "boolean"}

# @markdown Save all (model, history, optimizer, criterion, best_epoch)
save_all = True # @param {type: "boolean"}

model_save_path = "{}_{}".format(model_name, "_".join(ratios.split(",")))


# Normalization values
normalization_vals = {
    "bing": {
        "train": [[0.5407, 0.5059, 0.4523], [0.2830, 0.2794, 0.2898]],
        "val": [[0.5341, 0.5012, 0.4385], [0.2809, 0.2752, 0.2863]],
        "test": [[0.5257, 0.4953, 0.4290], [0.2799, 0.2730, 0.2844]]
    },
    "ddg": {
        "train": [[0.5366, 0.5061, 0.4544], [0.2860, 0.2820, 0.2917]],
        "val": [[0.5364, 0.5036, 0.4522], [0.2868, 0.2817, 0.2917]],
        "test": [[0.5323, 0.5006, 0.4465], [0.2825, 0.2784, 0.2881]]
    },
    "google": {
        "train": [[0.5635, 0.5371, 0.4781], [0.2899, 0.2861, 0.3035]],
        "val": [[0.5653, 0.5397, 0.4751], [0.2872, 0.2835, 0.3018]],
        "test": [[0.5736, 0.5468, 0.4893], [0.2954, 0.2914, 0.3083]]
    }
}

Helper Functions
----------------

Before we write the code for adjusting the models, lets define a few
helper functions.

### Model Training and Validation Code

The ``train_model`` function handles the training and validation of a
given model. As input, it takes a PyTorch model, a dictionary of
dataloaders, a loss function, an optimizer, a specified number of epochs
to train and validate for, and a boolean flag for when the model is an
Inception model. The *is_inception* flag is used to accomodate the
*Inception v3* model, as that architecture uses an auxiliary output and
the overall model loss respects both the auxiliary output and the final
output, as described
`here <https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958>`.
The function trains for the specified number of epochs and after each
epoch runs a full validation step. It also keeps track of the best
performing model (in terms of validation accuracy), and at the end of
training returns the best performing model. After each epoch, the
training and validation accuracies are printed.




In [None]:
def print_gpu_stats():
    print('Using device:', device)
    print()

    # Additional Info when using cuda
    if device.type == 'cuda':
        print(torch.cuda.get_device_name(0))
        print('[💻 MEMORY USAGE]')
        print('[📌 ALLOCATED]', round(
            torch.cuda.memory_allocated(0) / 1024 ** 3, 1), 'GB')
        print('[🧮 CACHED]', round(torch.cuda.memory_reserved(0) / 1024 ** 3, 1), 'GB')

In [None]:
def get_scores(labels, predicted):
    acc = torch.sum(predicted == labels) / len(predicted)

    tp = (labels * predicted).sum()
    tn = ((1 - labels) * (1 - predicted)).sum()
    fp = ((1 - labels) * predicted).sum()
    fn = (labels * (1 - predicted)).sum()

    precision = tp / (tp + fp)
    recall = tp / (tp + fn)

    f1 = 2 * (precision * recall) / (precision + recall)

    return acc, precision, recall, f1

In [None]:
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False, delta=0, patience=10):
    since = time.time()
    last_since = time.time()

    scores_history = []
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_f1 = 0.0

    best_score = None
    counter = 0

    for epoch in range(num_epochs):
        print('[💪 EPOCH] {}/{}'.format(epoch + 1, num_epochs))
        print('-' * 10)

        epoch_score = None

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            labels_outputs = torch.tensor([]).to(device, non_blocking=True)
            labels_targets = torch.tensor([]).to(device, non_blocking=True)

            # Iterate over data
            set_seed(SEED)
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device, non_blocking=True)
                labels = labels.to(device, non_blocking=True)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4 * loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                labels_outputs = torch.cat([labels_outputs, preds], dim=0)
                labels_targets = torch.cat([labels_targets, labels], dim=0)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc, epoch_prec, epoch_rec, epoch_f1 = get_scores(labels_targets, labels_outputs)

            print('[🗃️ {}] Loss: {:.4f} Acc: {:.4f} Pre: {:.4f} Rec: {:.4f} F-Score: {:.4f}'.format(
                phase.upper(), epoch_loss, epoch_acc, epoch_prec, epoch_rec, epoch_f1))
            
            time_elapsed = time.time() - last_since
            last_since = time.time()
            print("\t[🕑] {:.0f}m {:.0f}s".format(time_elapsed // 60, time_elapsed % 60))
            
            if phase == 'val':
                epoch_score = epoch_f1

                # deep copy the model
                if epoch_f1 > best_f1:
                    best_f1 = epoch_f1
                    best_model_wts = copy.deepcopy(model.state_dict())
                
                # Store scores history
                scores_history.append({
                    "loss": epoch_loss,
                    "acc": epoch_acc.cpu().numpy(),
                    "precision": epoch_prec.cpu().numpy(),
                    "recall": epoch_rec.cpu().numpy(),
                    "f1": epoch_f1.cpu().numpy()
                })
        
        if best_score is None:
            best_score = epoch_score
        elif epoch_score <= best_score + delta:
            counter += 1
            print("\t[⚠️ EARLY STOPPING] {}/{}".format(counter, patience))
            if counter >= patience:
                break
        else:
            best_score = epoch_score
            counter = 0

        print()

    time_elapsed = time.time() - since
    print()
    print('[🕑 TRAINING COMPLETE] {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('[🥇 BEST SCORE] F-Score: {:4f}'.format(best_f1))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, scores_history

### Set Model Parameters’ .requires_grad attribute

This helper function sets the ``.requires_grad`` attribute of the
parameters in the model to False when we are feature extracting. By
default, when we load a pretrained model all of the parameters have
``.requires_grad=True``, which is fine if we are training from scratch
or finetuning. However, if we are feature extracting and only want to
compute gradients for the newly initialized layer then we want all of
the other parameters to not require gradients. This will make more sense
later.




In [None]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

## Initialize and Reshape the Networks

In [None]:
import torch.nn.functional as F

class Scratch(nn.Module): 
    def __init__(self):
        super(Scratch, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(58320, 1024)
        self.fc2 = nn.Linear(1024, 2)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(x.shape[0],-1)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return x

In [None]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes) 
        input_size = 224

    elif model_name == "inception":
        """ Inception v3 
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 299

    elif model_name == "scratch":
        """ Our own simple CNN model
        """
        model_ft = Scratch()
        #num_ftrs = model_ft.fc.in_features
        #model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    else:
        print("Invalid model name, exiting...")
        exit()
    
    return model_ft, input_size

# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Print the model we just instantiated
print(model_ft)

Load Data
---------

Now that we know what the input size must be, we can initialize the data
transforms, image datasets, and the dataloaders. Notice, the models were
pretrained with the hard-coded normalization values, as described
`here <https://pytorch.org/docs/master/torchvision/models.html>`.




In [None]:
from imageLimitedDataset import ImageLimitedDataset

# Data resize and normalization
normalization_pars = normalization_vals[dataset_dir]
data_transforms = {
    "train": transforms.Compose([
        transforms.Resize(input_size),
        transforms.ToTensor(),
        transforms.Normalize(
            normalization_pars["train"][0],
            normalization_pars["train"][1]
        )
    ]),
    "val": transforms.Compose([
        transforms.Resize(input_size),
        transforms.ToTensor(),
        transforms.Normalize(
            normalization_pars["val"][0],
            normalization_pars["val"][1]
        )
    ]),
}

# Create training and validation datasets

# Create the slices to decide which samples keep
slices = {
        "train": [slice(None, cut_point) for cut_point in dataset_sizes],
        "val": None,
    }

# Create training and validation datasets
image_datasets = {x: ImageLimitedDataset(os.path.join(data_dir, x),
                    transform=data_transforms[x],
                    slices=slices[x],
                    check_images=check_images,
                    use_cache=use_cache) for x in ["train", "val"]}

# Check the sizes of the created datasets
for x in ["train", "val"]:
    print()

    print("[🗃️ {}]".format(x.upper()))
    for cls in image_datasets[x].classes:
        cls_index = image_datasets[x].class_to_idx[cls]
        num_cls = np.count_nonzero(np.array(image_datasets[x].targets) == cls_index)
        print("[🧮 # ELEMENTS] {}: {}".format(cls, num_cls))

# Create training and validation dataloaders
set_seed(SEED)
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=pin_memory) for x in ["train", "val"]}

Create the Optimizer
--------------------

Now that the model structure is correct, the final step for finetuning
and feature extracting is to create an optimizer that only updates the
desired parameters. Recall that after loading the pretrained model, but
before reshaping, if ``feature_extract=True`` we manually set all of the
parameter’s ``.requires_grad`` attributes to False. Then the
reinitialized layer’s parameters have ``.requires_grad=True`` by
default. So now we know that *all parameters that have
.requires_grad=True should be optimized.* Next, we make a list of such
parameters and input this list to the SGD algorithm constructor.

To verify this, check out the printed parameters to learn. When
finetuning, this list should be long and include all of the model
parameters. However, when feature extracting this list should be short
and only include the weights and biases of the reshaped layers.




In [None]:
model_ft = model_ft.to(device, non_blocking=True)

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are 
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.
params_to_update = model_ft.parameters()
print("[🧠 PARAMS TO LEARN]")
if feature_extract:
    params_to_update = []
    for name, param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t", name)
else:
    for name, param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t", name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=learning_rate, momentum=momentum)

Run Training and Validation Step
--------------------------------

Finally, the last step is to setup the loss for the model, then run the
training and validation function for the set number of epochs. Notice,
depending on the number of epochs this step may take a while on a CPU.
Also, the default learning rate is not optimal for all of the models, so
to achieve maximum accuracy it would be necessary to tune for each model
separately.




In [None]:
# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

# Train and evaluate
set_seed(SEED)
model_ft, scores_history = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft,
  num_epochs=num_epochs, is_inception=(model_name=="inception"),
    delta=delta_es, patience=patience_es)

print_gpu_stats()

In [None]:
if save_model:
  torch.save(model_ft, model_save_path + "_weights.pt")
  print("[💾 SAVED] Weights")

if save_entire_model:
  torch.save(model_ft.state_dict(), model_save_path + ".pt")
  print("[💾 SAVED] Entire model")

if save_all:
  torch.save({
    'model': model_ft,
    'dataset': dataset_dir,
    'learning_rate': learning_rate,
    'momentum': momentum,
    'dataset_sizes': dataset_sizes,
    'model_name': model_name,
    'batch_size': batch_size,
    'num_epochs': num_epochs,
    'criterion': criterion,
    'optimizer': optimizer_ft,
    'scores_history': scores_history,
    'delta_es': delta_es,
    'patience_es': patience_es
  }, model_save_path + "_all.pt")

  print("[💾 SAVED] All")