<a href="https://colab.research.google.com/github/mrSaggio/NEIRO/blob/main/PyTorch_%7C_Multiclass_Image_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# For neural network model summary, similar to Keras
!pip install torchsummary

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
from IPython.core.display import display, HTML, Javascript

color_map = ['#FFFFFF','#FF5733']

prompt = color_map[-1]
main_color = color_map[0]
strong_main_color = color_map[1]
custom_colors = [strong_main_color, main_color]

css_file = '''
div #notebook {
background-color: white;
line-height: 20px;
}

#notebook-container {
%s
margin-top: 2em;
padding-top: 2em;
border-top: 4px solid %s;
-webkit-box-shadow: 0px 0px 8px 2px rgba(224, 212, 226, 0.5);
    box-shadow: 0px 0px 8px 2px rgba(224, 212, 226, 0.5);
}

div .input {
margin-bottom: 1em;
}

.rendered_html h1, .rendered_html h2, .rendered_html h3, .rendered_html h4, .rendered_html h5, .rendered_html h6 {
color: %s;
font-weight: 600;
}

div.input_area {
border: none;
    background-color: %s;
    border-top: 2px solid %s;
}

div.input_prompt {
color: %s;
}

div.output_prompt {
color: %s;
}

div.cell.selected:before, div.cell.selected.jupyter-soft-selected:before {
background: %s;
}

div.cell.selected, div.cell.selected.jupyter-soft-selected {
    border-color: %s;
}

.edit_mode div.cell.selected:before {
background: %s;
}

.edit_mode div.cell.selected {
border-color: %s;

}
'''

def to_rgb(h):
    return tuple(int(h[i:i+2], 16) for i in [0, 2, 4])

main_color_rgba = 'rgba(%s, %s, %s, 0.1)' % (to_rgb(main_color[1:]))
open('notebook.css', 'w').write(css_file % ('width: 95%;', main_color, main_color, main_color_rgba,
                                            main_color,  main_color, prompt, main_color, main_color,
                                            main_color, main_color))

def nb():
    return HTML("<style>" + open("notebook.css", "r").read() + "</style>")
nb()

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
from torchvision import datasets
import torchvision.transforms as transforms
import collections
import torchsummary
from torchvision import utils
from torch.utils.data import DataLoader
from torchvision import models
from torchsummary import summary
from torch import nn
from torch import optim
from torch.optim.lr_scheduler import CosineAnnealingLR
import torch
%matplotlib inline
import os

print(os.getcwd())
path = '/kaggle/working/data'
if not os.path.exists(path):
    os.mkdir(path)

# <b><span style='color:#F1C40F'>1 |</span> INTRODUCTION</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.1 | STL-10 DATASET</b></p>
</div>

In the current problem, a datset consisting of 10 classes (__STL-10__) is used as a basis for a **<span style='color:#F1C40F'>multiclass image classification problem</span>**

> - The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms
> - It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training
> - The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior
> - We also expect that the higher resolution of this dataset (96x96) will make it a challenging benchmark for developing more scalable unsupervised learning methods


<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.2 | INITIALISATION OF CNN NETWORK</b></p>
</div>


The aim is to train a __CNN__ model using two approaches:
- (I) randomised weight initialisation
- (II) transfer learning weight initialisation

and compare the difference between the two.

- __Randomised Initialisation__ is a prerequisite of neural networks to start learning and updating subsequent weights in an interative manner, the downside is that __it can take a while for a network to learn something about the dataset__, this is a very typical issue encountered in the solution of Partial Differential Equations (PDE), initial conditions can affect the solution
- Likewise, we can set an assumption: we can __potentially benefit from previous model weights__, in order to obtain an accurate model much quicker, provided these model weights actually are useful for a particular problem, this process is often called **<span style='color:#F1C40F'>Transfer Learning</span>**, simply implying neural network coefficient initialisation is prestructured

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.3 | PRETRAINED MODELS</b></p>
</div>

- For our model, a prexisting model will be used; **[resnet18](https://pytorch.org/vision/0.8/models.html)** is one of the available preset models, having already been pretrained on a much larger dataset, thus we can potentially benefit from these coefficients for our classification problem using the **<span style='color:#F1C40F'>STL-10</span>** dataset
- We will try both of the weight initialisations stated above and compare the results using the **<span style='color:#F1C40F'>resnet18</span>** CNN model



# <b><span style='color:#F1C40F'>2 |</span> GET THE TRAINING DATA</b>

- We will be using the **<span style='color:#F1C40F'>STL-10</span>** for this problem, available from the
<code>torchvision.datasets</code> module.

__Our Dataset Division__
- **<span style='color:#F1C40F'>STL-10</span>** contains 10 unique classes, <code>collections.Counter()</code> can be used to count all the unique images for each class.
- Training Dataset: 5000 images (3 channels 96x96 px)
- Evaluation & Test Set: 8000: (3 channels 96x96 px)

In [None]:
''' 1. Training STL-10 Dataset '''
# Fetch the data & convert numpy to pytorch tensor format

transf = transforms.Compose([transforms.ToTensor()])
train_ds = datasets.STL10(path,download=True,
                          split='train',
                          transform=transf)

''' 2. Test / Evaluation STL-10 Dataset '''

test_all = datasets.STL10(path,
                          download=True,
                          split='test',
                          transform=transf)

In [None]:
# data stored in torchvision.datasets format
print(type(train_ds))
print(type(train_ds.data))

In [None]:
# Get the numpy array data
print(f'Shape of training set: {train_ds.data.shape}')
print(f'Shape of test set: {test_all.data.shape}')  # both evaluation & test sets

- Images have three channels & have a dimension of 96 x 96 px
-  <code>collections.Counter()</code> can be used to count all the unique images for each class

In [None]:
# Count the number of images per category
y_train = [y for _,y in train_ds]
counter_train = collections.Counter(y_train)
print('Class Image Counter for Training Data')
print(counter_train,'\n')

# Count the number of images per category
y_testall = [y for _,y in test_all]
counter_testall = collections.Counter(y_testall)
print('Class Image Counter for Test Data')
print(counter_testall)

# <b><span style='color:#F1C40F'>3 |</span> SPLITTING TEST DATA</b>

- Having downloaded the dataset __test_all__, which will be used for evaluation & test evaluation, we need to split them, we will use **<span style='color:#F1C40F'>StratifiedShuffleSplit</span>**
- A **<span style='color:#F1C40F'>80/20</span>** division is used for the evaluation/test datasets

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit
from torch.utils.data import Subset

''' Stratified Shuffle Splitting '''
# using stratified splitting of classes

split = StratifiedShuffleSplit(n_splits=1,
                               test_size=0.2,
                               random_state=0)

indices = list(range(len(test_all))) # make index list for upto max value
y_test0 = [y for _,y in test_all]    # extract list of all class ids

''' Validation / Test Split Index '''

print('Validation / Test Split Index:')
for idx_test,idx_val in split.split(indices,y_test0):
    print("Test Indicies:", idx_test)
    print("Validation Indicies:", idx_val)
    print(len(idx_val),len(idx_test))

- Using **<span style='color:#F1C40F'>Subset</span>**, we can generate torch datasets

In [None]:
''' Create two datasets from test_all '''
# use torch utility to make subsets for test_all

val_ds = Subset(test_all,
                idx_val)  # list of indicies for valid split
test_ds = Subset(test_all,
                 idx_test) # list of indicies for test split

''' Recount the number of images per class (similar to training data)'''

y_test = [y for _,y in test_ds]
y_val = [y for _,y in val_ds]

counter_test = collections.Counter(y_test)
counter_val = collections.Counter(y_val)
print(counter_test)
print(counter_val)

# <b><span style='color:#F1C40F'>4 |</span> PLOTTING TENSOR DATA</b>

- Let's plot our dataset, so we know what exactly our classes represent. Examples are taken from the validation dataset; <code>val_ds</code>
- Looks like our classes represent in order:
> Aircraft (0), Birds (1), Automobiles (2), Cats (3), Deers (4), Dogs (5), Horses (6), Monkeys (7), Ships (8), Trucks (9); a total of 10 classes.

In [None]:
def plot_img(img,y=None,color=True):
    npimg = img.numpy()
    npimg_T = np.transpose(npimg,(1,2,0))
    plt.imshow(npimg_T)
    plt.title('Image samples from each of the 10 classes')
    plt.axis('on')

# Plot PyTorch Tensor Samples
def plot_tensor(tensor,random_id=False,class_id=None):

    if(random_id is True):
        rnd_inds = np.random.randint(0,len(tensor),100)
        X_show = [tensor[i][0] for i in rnd_inds]
        target = [tensor[i][1] for i in rnd_inds]
    else:

        if(class_id is None):
            X_show = []
            # cycle through all classes
            for j in range(0,10):
                ii=-1
                for i in range(0,1000):
                    if(tensor[i][1] is j):
                        ii+=1
                        if(ii>19):
                            break
                        else:
                            X_show.append(tensor[i][0])

        if(class_id is not None):

            print(f'Showing samples from {len(tensor)} tensors:')

            X_show = []
            ii=-1
            for i in range(0,1000):
                if(tensor[i][1] is class_id):
                    ii+=1
                    if(ii>19):
                        break
                    else:
                        X_show.append(tensor[i][0])


    X_grid = utils.make_grid(X_show,nrow=20,padding=1)
    plt.figure(figsize=(30,10))
    plot_img(X_grid,y=None,color=True)

In [None]:
# Show data from all classes
plot_tensor(val_ds)

In [None]:
# Show data from one class
plot_tensor(val_ds,class_id=1)

# <b><span style='color:#F1C40F'>5 |</span> TRAINING DATASET TRANSFORMATIONS</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>5.1 | TRANSFORMATION LIST</b></p>
</div>

We'll apply some transformations that will change the image data every epoch, we can visualise the first samples as well
> - RandomHorizontalFlip
> - RandomVerticalFlip
> - ToTensor
> - Normalize (custom normaliser)

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>5.2 | CALCULATE THE MEAN & STD OF TRAIN_DS</b></p>
</div>

For normalisation (Normalize), we need to specify the **mean** & **standard deviation** values we want to use, as part of the trasformation function

In [None]:
def get_meanstd(data):

    # list of lists of mean values for each image
    meanRGB = [np.mean(x.numpy(),axis=(1,2)) for x,_ in data]
    stdRGB = [np.std(x.numpy(),axis=(1,2)) for x,_ in data]
    print('Mean & std values for sample:')
    print(meanRGB[0]); print(stdRGB[0])

    # global dataset mean of those means
    meanR = np.mean([m[0] for m in meanRGB])
    meanG = np.mean([m[1] for m in meanRGB])
    meanB = np.mean([m[2] for m in meanRGB])

    # global dataset standard deviation mean
    stdR = np.mean([s[0] for s in stdRGB])
    stdG = np.mean([s[1] for s in stdRGB])
    stdB = np.mean([s[2] for s in stdRGB])

    print('\nMean value for dataset:')
    print(f'Mean Values: {meanR} {meanG} {meanB}')
    print(f'STD Values: {stdR} {stdG} {stdB}')

    return [meanR,meanG,meanB],[stdR,stdG,stdB]

means,stds = get_meanstd(train_ds)

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>5.3 | SET TRANSFORMATIONS</b></p>
</div>

- Both training & test contain transformers <code>.ToTensor()</code>
- We can visually confirm that the transforms **<span style='color:#F1C40F'>RandomHorizontalFlip</span>**, **<span style='color:#F1C40F'>RandomVerticalFlip</span>** have been applied in the dataset
- As well as the normalisation, which has changed the colours

In [None]:
''' Define the image transformations ( for train_ds & test_all) '''

# Transformations for training set
train_transformer = transforms.Compose([transforms.RandomHorizontalFlip(p=0.5),
                                        transforms.RandomVerticalFlip(p=0.5),
                                        transforms.ToTensor(),
                                        transforms.Normalize([means[0],means[1],means[2]],
                                                             [stds[0],stds[1],stds[2]])])

# Standard transformations for test set
test0_transformer = transforms.Compose([transforms.ToTensor(),
                                        transforms.Normalize([means[0],means[1],means[2]],
                                                             [stds[0],stds[1],stds[2]])])

''' Update the transform functions for train_ds & test_all '''
train_ds.transform = train_transformer
test_all.transform = test0_transformer

plot_tensor(train_ds,class_id=1) # Can plot the converted data after transformation

# <b><span style='color:#F1C40F'>6 |</span> CREATE DATALOADERS</b>

- Next we'll need to create the **<span style='color:#F1C40F'>data loaders</span>**, which will used to access the dataset during training
- We need to define a **<span style='color:#F1C40F'>batch_size</span>**:
> The number of **<span style='color:#F1C40F'>images extracted from the dataset each iteration</span>**, a batch size of 32 is chosen for the training data & 64 for the evaluation dataset.

In [None]:
''' Create dataloaders from train_ds & val_ds '''

# Create Data Loaders (training)
train_dl = DataLoader(train_ds,
                      batch_size=32,
                      shuffle=True)

# Create Data Loader (validation)
val_dl = DataLoader(val_ds,
                    batch_size=64,
                    shuffle=False)

# And get a batch of data from train_dl
for x,y in train_dl:
    print(x.shape)
    print(y.shape)
    break

# Extract a batch of data from val_dl
for x,y in val_dl:
    print(x.shape)
    print(y.shape)
    break

# <b><span style='color:#F1C40F'>7 |</span> BUILDING A MODEL</b>

- We will be preloading an existing model from the <code>torchvision.models</code> library; **<span style='color:#F1C40F'>resnet18</span>**
- The default resnet18 model **<span style='color:#F1C40F'>fc layer</span>** uses 1000 classes:
> - (fc): Linear(in_features=512, out_features=1000, bias=True


- We'll be changing it to 10 classes resetnet18</code>, so we will need to **<span style='color:#F1C40F'>adjust the final layer (fc)</span>** to match out 10 classes
- We will be looking into two cases; a resenet18 model:
> - w/ __non pretrained__ coefficients (initialised weights)
> - one with __pretrained__ coefficients (via transfer learning)


In [None]:
''' Non Pretrained Model Variant '''
model_resnet18 = models.resnet18(pretrained=False)
num_ftrs = model_resnet18.fc.in_features
model_resnet18.fc = nn.Linear(num_ftrs,10)

# device = torch.device("cuda:0")
# model_resnet18.to(device)

In [None]:
''' Pretrained Model Variant '''
pre_resnet18 = models.resnet18(pretrained=True)
num_ftrs = pre_resnet18.fc.in_features
pre_resnet18.fc = nn.Linear(num_ftrs,10)

# device = torch.device("cuda:0")
# pre_resnet18.to(device)

# <b><span style='color:#F1C40F'>8 |</span> DEFINING A LOSS FUNCTION & OPTIMISER</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>8.1 | LOSS FUNCTION</b></p>
</div>

- The load of defining a loss function is to optimise the model towards a predefined metric
- The standard metric used for classification is **<span style='color:#F1C40F'>cross-entropy loss (logloss)</span>**
- In the definition of loss function, we need to consider the number of model outputs & the activation functions
- For multiclass classification the number of outputs of a model is set to the number of classes, output activation function determines the loss function.

__Some Options__

- The resnet18 has linear outputs, with **<span style='color:#F1C40F'>no activation function</span>**, so let's choose:
> **<span style='color:#F1C40F'>nn.CrossEntropyLoss</span>** - which combines **<span style='color:#F1C40F'>nn.LogSoftmax()</span>** & **<span style='color:#F1C40F'>nn.NLLLoss()</span>** in one class
- If output activation uses **<span style='color:#F1C40F'>log_softmax</span>**, we can use **<span style='color:#F1C40F'>nn.NLLLoss</span>**

In [None]:
''' Defining a Loss Function '''
loss_func = nn.CrossEntropyLoss(reduction='sum')

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>8.2 | DEFINING AN OPTIMISER</b></p>
</div>

- An optimiser will hold the current state & will update the model parameters based on the computed gradients
- The **<span style='color:#F1C40F'>choice of an optimiser</span>** in a problem can be **<span style='color:#F1C40F'>considered as a hyperparameter</span>** & an investigation into which one works best is often prefered
- PyTorch's **<span style='color:#F1C40F'>torch.optim</span>** also includes other useful tools like a **<span style='color:#F1C40F'>learning scheduler</span>**:
> which is useful to adjust the __learning rate__ on the fly automatically during training, in an attempt to improve the model performance


- For __classification tasks__, **<span style='color:#F1C40F'>Stochastic Gradient Descent (SGD)</span>** & **<span style='color:#F1C40F'>Adam</span>** Optimisers are very common
- __Adam__ Optimiser outperforms SGD when it comes to speed and accuracy more often than not, so let's choose it for this problem

In [None]:
# get the current learning rate helper function
def get_lr(opt):
    for param_group in opt.param_groups:
        return param_group['lr']

In [None]:
''' Define an optimiser '''
optimiser= optim.Adam(model_resnet18.parameters(),
                 lr=1e-4)

current_lr = get_lr(optimiser)
print(f'current lr = {current_lr}')

''' Define learning rate scheduler '''
lr_scheduler = CosineAnnealingLR(optimiser,
                                 T_max=2,
                                 eta_min=1e-5)

In [None]:
def plot_out(loss_hist,metric_hist,epochs=None):

    # Train-Validation Progress
    fig = make_subplots(rows=1, cols=2,subplot_titles=['lost_hist','metric_hist'])

    # Plot Model Learning Rate
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=loss_hist["train"],
                             name='loss_hist["train"]',
                             line=dict(color="#0000ff")),row=1, col=1)
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=loss_hist["val"],
                             name='loss_hist["val"]'),row=1, col=1)

    # Plot Metric
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=metric_hist["train"],
                             name='metric_hist["train"]'),row=1, col=2)
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=metric_hist["val"],
                             name='metric_hist["val"]'),row=1, col=2)

    fig.update_layout(template='plotly_white')
    fig.update_layout(margin={"r":0,"t":60,"l":0,"b":0},height=300)
    fig.show()

# <b><span style='color:#F1C40F'>9 |</span> MODEL EVALUATION</b>

Neural networks need weights initiaisation in either case, we will try two approaches:

- __Random weight initialisation__ (standard) approach
- __Transfer Learning__ (preloaded model weights) approach

In [None]:
import copy

''' Training and Tranfer Learning '''

''' Helper function to count the number of correct predictions '''
def metrics_batch(output, target):
    # get output class
    pred = output.argmax(dim=1, keepdim=True)

    # compare output class with target class
    corrects=pred.eq(target.view_as(pred)).sum().item()
    return corrects

''' Helper function to compute the loss value per batch of data '''
def loss_batch(loss_func, output, target, opt=None):

    # get loss
    loss = loss_func(output, target)

    # get performance metric
    metric_b = metrics_batch(output,target)

    if opt is not None:
        opt.zero_grad()
        loss.backward()
        opt.step()

    return loss.item(), metric_b

# define computation hardware approach (GPU/CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Compute the Loss Value & Performance Metric

def loss_epoch(model,loss_func,dataset_dl,check_id=False,opt=None):

    # initialising variables
    running_loss=0.0
    running_metric=0.0
    len_data=len(dataset_dl.dataset)

    # internal loop
    for xb, yb in dataset_dl:

        xb=xb.to(device) # move X of batch to device
        yb=yb.to(device) # move y of batch to device

        output=model(xb) # get model output
        loss_b,metric_b=loss_batch(loss_func, output, yb, opt) # get loss per batch
        running_loss+=loss_b # update running loss

        if(metric_b is not None):
            running_metric+=metric_b # update running metric
        if(check_id):
            break # stop if only checking

    loss=running_loss/float(len_data) # average loss value
    metric=running_metric/float(len_data) # average metric value

    return loss, metric

In [None]:
''' TRAINING FUNCTION '''
# model - input model
# parameters - input parameter dictionary

def train_val(model, params, verbose = False):

    # extract model parameters
    epochs=params["epochs"]
    loss_func=params["loss_func"]
    opt=params["optimiser"]
    train_dl=params["train_dl"]
    val_dl=params["val_dl"]
    check_id=params["check_id"]
    lr_scheduler=params["lr_scheduler"]
    path=params["path"]

    loss_history={"train": [],"val": []} # history of loss values in each epoch
    metric_history={"train": [],"val": []} # histroy of metric values in each epoch
    best_model_wts = copy.deepcopy(model.state_dict()) # copy weights for best model
    best_loss=float('inf') # initialize best loss to a large value

    # main loop
    for epoch in range(epochs):

        current_lr=get_lr(opt) # get current learning rate
        if(verbose):
            print(f"Epoch {epoch}/{epochs-1}, current lr={current_lr}")

        # train model on training dataset
        model.train()
        train_loss, train_metric=loss_epoch(model,loss_func,train_dl,check_id,opt)

        # collect loss and metric for training dataset
        loss_history["train"].append(train_loss)
        metric_history["train"].append(train_metric)

        # evaluate model on validation dataset
        model.eval()
        with torch.no_grad():
            val_loss, val_metric=loss_epoch(model,loss_func,val_dl,check_id)


        # store best model
        if val_loss < best_loss:
            best_loss = val_loss
            best_model_wts = copy.deepcopy(model.state_dict())

            # store weights into a local file
            torch.save(model.state_dict(), path)
            if(verbose):
                print("Copied best model weights!")

        # collect loss and metric for validation dataset
        loss_history["val"].append(val_loss)
        metric_history["val"].append(val_metric)

        # learning rate schedule
        lr_scheduler.step()

        if(verbose):
            print(f"train loss: {train_loss:.6f}, dev loss: {val_loss:.6f}, accuracy: {100*val_metric:.2f}")
            print('')

    # load best model weights
    model.load_state_dict(best_model_wts)

    return model, loss_history, metric_history

In [None]:
def plot_out(loss_hist,metric_hist,epochs=None):

    # Train-Validation Progress
    fig = make_subplots(rows=1, cols=2,subplot_titles=['lost_hist','metric_hist'])

    # Plot Model Learning Rate
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=loss_hist["train"],
                             name="train",
                             mode='lines',
                             line_color='#F1C40F'),
                  row=1, col=1)
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=loss_hist["val"],
                             name="val",
                             mode='lines',line_color='#232323'),
                  row=1, col=1)
    # Plot Metric
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=metric_hist["train"],
                             name="train",
                             mode='lines',
                             line_color='#F1C40F'),
                  row=1, col=2)
    fig.add_trace(go.Scatter(x=[*range(1,epochs+1)],
                             y=metric_hist["val"],
                             name="val",
                             mode='lines',line_color='#232323'),
                  row=1, col=2)

    fig.update_layout(template='plotly_white',
                      showlegend=False,
                      title='Learning Rate & Metric History',height=400)
    fig.update_layout(yaxis2 = dict(range=[0.4,1]))
    fig.show()

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:#F1C40F;"><b>9.1 | RANDOM WEIGHTS RESNET MODEL</b></p>
</div>

 - The first approach uses randomised weight initialisation

In [None]:
# Send earlier defined model to device
device = torch.device("cuda:0")
model_resnet18.to(device)

# loss function, optimiser, LR sheduler
loss_func = nn.CrossEntropyLoss(reduction="sum")
optimiser = optim.Adam(model_resnet18.parameters(), lr=1e-4)
lr_scheduler = CosineAnnealingLR(optimiser,T_max=5,eta_min=1e-6)

params_train={
 "epochs": 100,
 "optimiser": optimiser,
 "loss_func": loss_func,
 "train_dl": train_dl,
 "val_dl": val_dl,
 "check_id": False,
 "lr_scheduler": lr_scheduler,
 "path": "resnet18.pt",
}

# train and validate the model
model_resnet18,loss_hist,metric_hist=train_val(model_resnet18,
                                               params_train,
                                               verbose=False)
plot_out(loss_hist,metric_hist,epochs=params_train["epochs"])

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:#F1C40F;"><b>9.2 | PRETRAINED WEIGHTS RESTNET MODEL</b></p>
</div>

- The second approach uses preloaded weights already loaded into the model

In [None]:
# Send earlier defined model to device
device = torch.device("cuda:0")
pre_resnet18.to(device)

# Loss function, optimiser, LR sheduler
loss_func = nn.CrossEntropyLoss(reduction="sum")
optimiser = optim.Adam(pre_resnet18.parameters(), lr=1e-4)
lr_scheduler = CosineAnnealingLR(optimiser,T_max=5,eta_min=1e-6)

# Set Training Parameters
params_train={
 "epochs": 100,
 "optimiser": optimiser,
 "loss_func": loss_func,
 "train_dl": train_dl,
 "val_dl": val_dl,
 "check_id": False,
 "lr_scheduler": lr_scheduler,
 "path": "pre_resnet18.pt",
}

# Train and validate the model
pre_resnet18,loss_hist,metric_hist=train_val(pre_resnet18,
                                             params_train,
                                             verbose=False)

# Plot History
plot_out(loss_hist,metric_hist,epochs=params_train["epochs"])