# <font style="color:blue">Table of Content</font>

- [Launch TensorBoard](#launch)
- [Data Processing Utils](#utils)
- [Add Data Embeddings / Projector](#embeds)
- [System Configuration](#sys-config)
- [Training Configuration](#train-config)
- [System Setup](#sys-setup)
- [Add PR Curves to TensorBoard](#pr-curves)
- [Push Wrong Prediction to TensorBoard](#wrong-preds)
- [Training Function](#train-fn)
- [Validation Function](#validate-fn)
- [Add Histogram of Weights and Graph Network](#hist)
- [Main Function for Training and Validation](#main)
- [Optimizer and Scheduler](#optim)
- [ResNet Model](#model)
- [Transfer Learning](#tl)
- [Fine-Tuning](#fine-tune)

# <font style="color:blue">Transfer Learning and Fine-Tuning </font>

Learn to fine-tune a pre-trained model for a different task.

When we train a network from scratch, we face two limitations :

- Huge data required - As the network has millions of parameters,  to get an optimal set of parameters, you need  a lot of data.


- High computing power needed - Even if you have enough data, training generally calls for  multiple iterations, which in turn takes a toll on the computing resources.

The pre-trained models are trained on very large-scale image classification problems. The convolutional layers act as feature extractor, and the fully-connected layers behave like classifiers.

These very large models have seen a huge number of images, so they tend to learn good, discriminative features. Either use the convolutional layers merely as a feature extractor, and change the last layer in line with the problem. Or tweak the already-trained convolutional layers to suit the problem at hand. The former approach is called **Transfer Learning**, and the latter **Fine-tuning**.

To fine-tune a network, just tweak the parameters of an already-trained network, so that it adapts to the new task. The initial layers of a network learn very general features. But as we go higher up the network, the layers tend to learn patterns more specific to the task they are being trained on. Thus, we freeze or keep the initial layers intact for fine-tuning, and re-train only the later layers for the task.

Fine-tuning thus avoids both the limitations discussed above.

1. It does not require much data s for training  here   because:

    - First, we are not training the entire network. 

    - Second, the part that is being trained is not trained from scratch.
    
2. Less  parameters need to be updated, so the amount of time needed is also less.


Generally, when we have a small training set and the model has been pre-trained to tackle a similar task, we use transfer learning. However, if  we have enough data, we try and tweak the convolutional layers, so  they learn more robust features relevant to our problem.  For a detailed overview of Fine-tuning and Transfer Learning [click here](http://cs231n.github.io/transfer-learning/).

In [1]:
%matplotlib inline

In [2]:
import matplotlib.pyplot as plt  # one of the best graphics library for python
plt.style.use('ggplot')

In [3]:
import os
import time

from typing import Iterable
from dataclasses import dataclass

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchvision import datasets, transforms, models

from torch.optim import lr_scheduler

from torch.utils.tensorboard import SummaryWriter

In [36]:
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile

# <font style="color:blue">Launch TensorBoard </font><a name="launch"></a>


Use the refresh button on the dashboard to display the training metrics in real time after you start training.

[Find tensorboard.dev logs link here](https://tensorboard.dev/experiment/gEH87smeTpKbyVOdgRVknA/). 

**Note:** At the time of uploading this log, tensorbaord.dev is only supporting `scalars`, `graphs`, `histograms`, `distributions`, and `hparams`. So the link does not have `images`, `pr-curves`, and `projectors`. 

In [5]:
%load_ext tensorboard
# %reload_ext tensorboard

%tensorboard --logdir=log_resnet18/transfer_learning

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 3396), started 0:01:26 ago. (Use '!kill 3396' to kill it.)

# <font style="color:green">Data Processing Utils</font><a name="utils"></a>

In [7]:
!wget "https://www.dropbox.com/sh/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra?dl=1" -O ./data.zip

--2021-03-22 07:05:46--  https://www.dropbox.com/sh/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.9.18
Connecting to www.dropbox.com (www.dropbox.com)|162.125.9.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /sh/dl/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra [following]
--2021-03-22 07:05:46--  https://www.dropbox.com/sh/dl/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucfc6894838f0f1c73dbb521fe2e.dl.dropboxusercontent.com/zip_download_get/AujaQRZJQbvE20eq3o-QkTa7sayBCt5PgM2_7i3ZPbrOarwgpg1JmmjlcIkUC6xacKUOhCMtxLCHhem8TeV0-1MbfT7pk9SSkuDWbQLgbUGAew?dl=1 [following]
--2021-03-22 07:05:50--  https://ucfc6894838f0f1c73dbb521fe2e.dl.dropboxusercontent.com/zip_download_get/AujaQRZJQbvE20eq3o-QkTa7sayBCt5PgM2_7i3ZPbrOarwgpg1JmmjlcIkUC6xacKUOhCMtxLCHhem8TeV0-1MbfT7pk9SSkuDWbQLgbU

 11300K .......... .......... .......... .......... ..........  5% 21.3M 6m42s
 11350K .......... .......... .......... .......... ..........  5% 37.4M 6m40s
 11400K .......... .......... .......... .......... ..........  5% 89.9K 6m47s
 11450K .......... .......... .......... .......... ..........  5% 1.51M 6m45s
 11500K .......... .......... .......... .......... ..........  5% 5.81M 6m44s
 11550K .......... .......... .......... .......... ..........  6% 8.49M 6m42s
 11600K .......... .......... .......... .......... ..........  6% 11.7M 6m40s
 11650K .......... .......... .......... .......... ..........  6%  140K 6m44s
 11700K .......... .......... .......... .......... ..........  6% 1.53M 6m42s
 11750K .......... .......... .......... .......... ..........  6% 6.47M 6m41s
 11800K .......... .......... .......... .......... ..........  6% 18.4M 6m39s
 11850K .......... .......... .......... .......... ..........  6% 41.6M 6m37s
 11900K .......... .......... .......... .......... 

 20450K .......... .......... .......... .......... .......... 10% 7.05M 6m12s
 20500K .......... .......... .......... .......... .......... 10% 22.0M 6m11s
 20550K .......... .......... .......... .......... .......... 10% 30.8M 6m10s
 20600K .......... .......... .......... .......... .......... 10%  120K 6m12s
 20650K .......... .......... .......... .......... .......... 10% 1.65M 6m11s
 20700K .......... .......... .......... .......... .......... 10% 4.77M 6m10s
 20750K .......... .......... .......... .......... .......... 10% 15.4M 6m9s
 20800K .......... .......... .......... .......... .......... 10% 27.6M 6m8s
 20850K .......... .......... .......... .......... .......... 10% 90.5K 6m12s
 20900K .......... .......... .......... .......... .......... 10% 10.5M 6m11s
 20950K .......... .......... .......... .......... .......... 10% 1.42M 6m10s
 21000K .......... .......... .......... .......... .......... 10% 15.2M 6m9s
 21050K .......... .......... .......... .......... ...

108450K .......... .......... .......... .......... .......... 56%  950K 94s
108500K .......... .......... .......... .......... .......... 56% 6.76M 94s
108550K .......... .......... .......... .......... .......... 56% 5.01M 94s
108600K .......... .......... .......... .......... .......... 56% 6.06M 93s
108650K .......... .......... .......... .......... .......... 56% 6.33M 93s
108700K .......... .......... .......... .......... .......... 56%  873K 93s
108750K .......... .......... .......... .......... .......... 56% 5.51M 93s
108800K .......... .......... .......... .......... .......... 56% 6.96M 93s
108850K .......... .......... .......... .......... .......... 56% 6.35M 93s
108900K .......... .......... .......... .......... .......... 56% 5.97M 93s
108950K .......... .......... .......... .......... .......... 56% 1.11M 93s
109000K .......... .......... .......... .......... .......... 56% 7.97M 93s
109050K .......... .......... .......... .......... .......... 56% 6.83M 93s

124550K .......... .......... .......... .......... .......... 64% 9.01M 69s
124600K .......... .......... .......... .......... .......... 64%  649K 69s
124650K .......... .......... .......... .......... .......... 64% 12.1M 69s
124700K .......... .......... .......... .......... .......... 64% 10.2M 69s
124750K .......... .......... .......... .......... .......... 64% 9.88M 69s
124800K .......... .......... .......... .......... .......... 64% 10.9M 69s
124850K .......... .......... .......... .......... .......... 64%  363K 69s
124900K .......... .......... .......... .......... .......... 64% 10.6M 69s
124950K .......... .......... .......... .......... .......... 64% 7.93M 69s
125000K .......... .......... .......... .......... .......... 64% 11.2M 69s
125050K .......... .......... .......... .......... .......... 64% 10.1M 69s
125100K .......... .......... .......... .......... .......... 64%  640K 69s
125150K .......... .......... .......... .......... .......... 64% 8.22M 69s

132900K .......... .......... .......... .......... .......... 68% 10.5M 59s
132950K .......... .......... .......... .......... .......... 68% 9.80M 58s
133000K .......... .......... .......... .......... .......... 68% 12.7M 58s
133050K .......... .......... .......... .......... .......... 68%  879K 58s
133100K .......... .......... .......... .......... .......... 68% 10.9M 58s
133150K .......... .......... .......... .......... .......... 69% 8.18M 58s
133200K .......... .......... .......... .......... .......... 69% 10.7M 58s
133250K .......... .......... .......... .......... .......... 69% 12.0M 58s
133300K .......... .......... .......... .......... .......... 69% 1.36M 58s
133350K .......... .......... .......... .......... .......... 69% 8.99M 58s
133400K .......... .......... .......... .......... .......... 69% 11.4M 58s
133450K .......... .......... .......... .......... .......... 69% 13.3M 58s
133500K .......... .......... .......... .......... .......... 69% 5.93M 58s

### <font style="color:green">Extract the Data</font>

In [8]:
!unzip -q ./data.zip

'unzip' is not recognized as an internal or external command,
operable program or batch file.


In [10]:
#!apt-get install tree
#!tree -d ./cat-dog-panda
!tree ./cat-dog-panda

Folder PATH listing for volume OS
Volume serial number is 000000C3 10A7:0A3A
C:\USERS\MERRI\PROJECTS\DEEP\WEEK6\CAT-DOG-PANDA
ÃÄÄÄtraining
³   ÃÄÄÄcat
³   ÃÄÄÄdog
³   ÀÄÄÄpanda
ÀÄÄÄvalidation
    ÃÄÄÄcat
    ÃÄÄÄdog
    ÀÄÄÄpanda


In [11]:
def image_preprocess_transforms():
    
    preprocess = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor()
        ])
    
    return preprocess

In [12]:
def image_common_transforms(mean, std):
    preprocess = image_preprocess_transforms()
    
    common_transforms = transforms.Compose([
        preprocess,
        transforms.Normalize(mean, std)
    ])
    
    return common_transforms
    

In [13]:
def data_augmentation_preprocess(mean, std):
    
    initail_transoform = transforms.RandomChoice([
        transforms.RandomHorizontalFlip(),
        transforms.RandomVerticalFlip(),
        transforms.RandomRotation(90)
        ])
    
    common_transforms = image_common_transforms(mean, std)
                
    aug_transforms = transforms.Compose([
        initail_transoform,
        transforms.RandomGrayscale(p=0.1),
        common_transforms
        ])
    
    return aug_transforms
    

In [14]:
def data_loader(data_root, transform, batch_size=16, shuffle=False, num_workers=2):
    dataset = datasets.ImageFolder(root=data_root, transform=transform)
    
    loader = torch.utils.data.DataLoader(dataset, 
                                         batch_size=batch_size,
                                         num_workers=num_workers,
                                         shuffle=shuffle)
    
    return loader

In [15]:
def get_mean_std():
    
    mean = [0.485, 0.456, 0.406] 
    std = [0.229, 0.224, 0.225]
    
    return mean, std

In [16]:
def get_data(batch_size, data_root, tb_writer, num_workers=4, data_augmentation=True):
    
    train_data_path = os.path.join(data_root, 'training')
       
    mean, std = get_mean_std()
    
    common_transforms = image_common_transforms(mean, std)
        
   
    # if data_augmentation is true 
    # data augmentation implementation
    if data_augmentation:    
        train_transforms = data_augmentation_preprocess(mean, std)
    # else do common transforms
    else:
        train_transforms = common_transforms
        
        
    # train dataloader
    
    train_loader = data_loader(train_data_path, 
                               train_transforms, 
                               batch_size=batch_size, 
                               shuffle=True, 
                               num_workers=num_workers)
    
    # test dataloader
    
    test_data_path = os.path.join(data_root, 'validation')
    
    test_loader = data_loader(test_data_path, 
                              common_transforms, 
                              batch_size=batch_size, 
                              shuffle=False, 
                              num_workers=num_workers)
    
    # test dataloader
    
    testdata = datasets.ImageFolder(root=test_data_path, transform=common_transforms)
    
    # add embedding / projector
    
    add_data_embedings(testdata, tb_writer, n=100)
    
    return train_loader, test_loader

# <font style="color:blue">Add Data Embeddings / Projector</font><a name="embeds"></a>

In [17]:
animal_classes = ['cat', 'dog', 'panda']


In [18]:
def get_random_inputs_labels(inputs, targets, n=100):
    """
    get random inputs and labels
    """

    assert len(inputs) == len(targets)

    rand_indices = torch.randperm(len(targets))
    
    data = inputs[rand_indices][:n]
    
    labels = targets[rand_indices][:n]
    
    class_labels = [animal_classes[lab] for lab in labels]
    
    return data, class_labels

In [19]:
def add_data_embedings(dataset, tb_writer, n=100):
    """
    Add a few inputs and labels to tensorboard. 
    """
    
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=n, num_workers=4, shuffle=True)
    
    images, labels = next(iter(dataloader))
    
    tb_writer.add_embedding(mat = images.view(-1, 3 * 224 * 224), 
                            metadata=labels, 
                            label_img=images)
    
    return

## <font style="color:green">System Configuration</font><a name="sys-config"></a>

In [20]:
@dataclass
class SystemConfiguration:
    '''
    Describes the common system setting needed for reproducible training
    '''
    seed: int = 21  # seed number to set the state of all random number generators
    cudnn_benchmark_enabled: bool = True  # enable CuDNN benchmark for the sake of performance
    cudnn_deterministic: bool = True  # make cudnn deterministic (reproducible training)

## <font style="color:green">Training Configuration</font><a name="train-config"></a>

In [21]:
@dataclass
class TrainingConfiguration:
    '''
    Describes configuration of the training process
    '''
    batch_size: int = 32  
    epochs_count: int = 20 
    init_learning_rate: float = 0.001  # initial learning rate for lr scheduler
    decay_rate: float = 0.1  
    log_interval: int = 500  
    test_interval: int = 1  
    data_root: str = "./cat-dog-panda" 
    num_workers: int = 10  
    device: str = 'cuda'  
    


## <font style="color:green">System Setup</font><a name="sys-setup"></a>

In [22]:
def setup_system(system_config: SystemConfiguration) -> None:
    torch.manual_seed(system_config.seed)
    if torch.cuda.is_available():
        torch.backends.cudnn_benchmark_enabled = system_config.cudnn_benchmark_enabled
        torch.backends.cudnn.deterministic = system_config.cudnn_deterministic

In [23]:
def prediction(model, device, batch_input, max_prob=True):
    """
    get prediction for batch inputs
    """
    
    # send model to cpu/cuda according to your system configuration
    model.to(device)
    
    # it is important to do model.eval() before prediction
    model.eval()

    data = batch_input.to(device)

    output = model(data)

    # get probability score using softmax
    prob = F.softmax(output, dim=1)
    
    if max_prob:
        # get the max probability
        pred_prob = prob.data.max(dim=1)[0]
    else:
        pred_prob = prob.data
    
    # get the index of the max probability
    pred_index = prob.data.max(dim=1)[1]
    
    return pred_index.cpu().numpy(), pred_prob.cpu().numpy()

In [24]:
def get_target_and_prob(model, dataloader, device):
    """
    get targets and prediction probabilities
    """
    
    pred_prob = []
    targets = []
    
    for _, (data, target) in enumerate(dataloader):
        
        _, prob = prediction(model, device, data, max_prob=False)
        
        pred_prob.append(prob)
        
        target = target.numpy()
        targets.append(target)
        
    targets = np.concatenate(targets)
    targets = targets.astype(int)
    pred_prob = np.concatenate(pred_prob, axis=0)
    
    return targets, pred_prob
    
    

# <font style="color:blue">Add PR Curves to TensorBoard</font><a name="pr-curves"></a>

In [25]:
def add_pr_curves_to_tensorboard(model, dataloader, device, tb_writer, epoch, num_classes=3):
    """
    Add precession and recall curve to tensorboard.
    """
    
    targets, pred_prob = get_target_and_prob(model, dataloader, device)
    
    for cls_idx in range(num_classes):
        binary_target = targets == cls_idx
        true_prediction_prob = pred_prob[:, cls_idx]
        
        tb_writer.add_pr_curve(animal_classes[cls_idx], 
                               binary_target, 
                               true_prediction_prob, 
                               global_step=epoch)
        
    return
    

# <font style="color:blue">Push Wrong Prediction to TensorBoard</font><a name="wrong-preds"></a>

In [26]:
def add_wrong_prediction_to_tensorboard(model, dataloader, device, tb_writer, 
                                        epoch, tag='Wrong_Predections', max_images='all'):
    """
    Add wrong predicted images to tensorboard.
    """
    #number of images in one row
    num_images_per_row = 8
    im_scale = 3
    
    plot_images = []
    wrong_labels = []
    pred_prob = []
    right_label = []
    
    mean, std = get_mean_std()
    
    for _, (data, target) in enumerate(dataloader):
        
        
        images = data.numpy()
        pred, prob = prediction(model, device, data)
        target = target.numpy()
        indices = pred.astype(int) != target.astype(int)
        
        plot_images.append(images[indices])
        wrong_labels.append(pred[indices])
        pred_prob.append(prob[indices])
        right_label.append(target[indices])
        
    plot_images = np.concatenate(plot_images, axis=0).squeeze()
    plot_images = (np.moveaxis(plot_images, 1, -1) * std) + mean
    wrong_labels = np.concatenate(wrong_labels)
    wrong_labels = wrong_labels.astype(int)
    right_label = np.concatenate(right_label)
    right_label = right_label.astype(int)
    pred_prob = np.concatenate(pred_prob)
    
    
    if max_images == 'all':
        num_images = len(images)
    else:
        num_images = min(len(plot_images), max_images)
        
    fig_width = num_images_per_row * im_scale
    
    if num_images % num_images_per_row == 0:
        num_row = num_images/num_images_per_row
    else:
        num_row = int(num_images/num_images_per_row) + 1
        
    fig_height = num_row * im_scale
        
    plt.style.use('default')
    plt.rcParams["figure.figsize"] = (fig_width, fig_height)
    fig = plt.figure()
    
    for i in range(num_images):
        plt.subplot(num_row, num_images_per_row, i+1, xticks=[], yticks=[])
        plt.imshow((plot_images[i]*255).astype(np.uint8))
        plt.gca().set_title('{0}({1:.2}), {2}'.format(animal_classes[wrong_labels[i]], 
                                                          pred_prob[i], 
                                                          animal_classes[right_label[i]]))
        
    tb_writer.add_figure(tag, fig, global_step=epoch)
    
    return


## <font style="color:green">Training Function</font><a name="train-fn"></a>

You are familiar with the training pipeline used in PyTorch.

In [27]:
def train(
    train_config: TrainingConfiguration, model: nn.Module, optimizer: torch.optim.Optimizer,
    train_loader: torch.utils.data.DataLoader, epoch_idx: int, tb_writer: SummaryWriter
) -> None:
    
    # change model in training mode
    model.train()
    
    # to get batch loss
    batch_loss = np.array([])
    
    # to get batch accuracy
    batch_acc = np.array([])
        
    for batch_idx, (data, target) in enumerate(train_loader):
        
        # clone target
        indx_target = target.clone()
        # send data to device (it is mandatory if GPU has to be used)
        data = data.to(train_config.device)
        # send target to device
        target = target.to(train_config.device)

        # reset parameters gradient to zero
        optimizer.zero_grad()
        
        # forward pass to the model
        output = model(data)
        
        # cross entropy loss
        loss = F.cross_entropy(output, target)
        
        # find gradients w.r.t training parameters
        loss.backward()
        # Update parameters using gradients
        optimizer.step()
        
        batch_loss = np.append(batch_loss, [loss.item()])
        
        # get probability score using softmax
        prob = F.softmax(output, dim=1)
            
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1]  
                        
        # correct prediction
        correct = pred.cpu().eq(indx_target).sum()
            
        # accuracy
        acc = float(correct) / float(len(data))
        
        batch_acc = np.append(batch_acc, [acc])

        if batch_idx % train_config.log_interval == 0 and batch_idx > 0:
            
            total_batch = epoch_idx * len(train_loader.dataset)/train_config.batch_size + batch_idx
            tb_writer.add_scalar('Loss/train-batch', loss.item(), total_batch)
            tb_writer.add_scalar('Accuracy/train-batch', acc, total_batch)
            
    epoch_loss = batch_loss.mean()
    epoch_acc = batch_acc.mean()
    return epoch_loss, epoch_acc

## <font style="color:green">Validation Function</font><a name="validate-fn"></a>

In [28]:
def validate(
    train_config: TrainingConfiguration,
    model: nn.Module,
    test_loader: torch.utils.data.DataLoader
) -> float:
    # 
    model.eval()
    test_loss = 0
    count_corect_predictions = 0
    for data, target in test_loader:
        indx_target = target.clone()
        data = data.to(train_config.device)
        
        target = target.to(train_config.device)
        
        output = model(data)
        # add loss for each mini batch
        test_loss += F.cross_entropy(output, target).item()
        
        # get probability score using softmax
        prob = F.softmax(output, dim=1)
        
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1] 
        
        # add correct prediction count
        count_corect_predictions += pred.cpu().eq(indx_target).sum()

    # average over number of mini-batches
    test_loss = test_loss / len(test_loader)  
    
    # average over number of dataset
    accuracy = 100. * count_corect_predictions / len(test_loader.dataset)
    
    return test_loss, accuracy/100.0

# <font style="color:blue">Add Histogram of Weights</font><a name="hist"></a>

In [29]:
def add_model_weights_as_histogram(model, tb_writer, epoch):
    for name, param in model.named_parameters():
        tb_writer.add_histogram(name.replace('.', '/'), param.data.cpu().abs(), epoch)
    return

# <font style="color:blue">Add Network Graph</font><a name="graph"></a>

In [30]:
def add_network_graph_tensorboard(model, inputs, tb_writer):
    tb_writer.add_graph(model, inputs)
    return

## <font style="color:green">Main Function for Training and Validation</font><a name="main"></a>

Use the configuration parameters defined above and start training. Important actions in the code below:

1. Set up system parameters like CPU/GPU, number of threads etc.


2. Load the data using dataloaders.


3. For each epoch, call the train function. For every test interval, call the validation function.


4. Do `scheduler.step()` to update the learning rate for next epoch.


5. Set up variables to track loss and accuracy and start training.


In [31]:
def main(model, optimizer, tb_writer, scheduler=None, system_configuration=SystemConfiguration(), 
         training_configuration=TrainingConfiguration(), data_augmentation=False):
    
    # system configuration
    setup_system(system_configuration)

    # batch size
    batch_size_to_set = training_configuration.batch_size
    # num_workers
    num_workers_to_set = training_configuration.num_workers
    # epochs
    epoch_num_to_set = training_configuration.epochs_count

    # if GPU is available use training config, 
    # else lower batch_size, num_workers and epochs count
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
        batch_size_to_set = 16
        num_workers_to_set = 2

    # data loader
    train_loader, test_loader = get_data(
        batch_size=batch_size_to_set,
        data_root=training_configuration.data_root,
        tb_writer=tb_writer,
        num_workers=num_workers_to_set,
        data_augmentation=data_augmentation
    )
    
    
    # Update training configuration
    training_configuration = TrainingConfiguration(
        device=device,
        batch_size=batch_size_to_set,
        num_workers=num_workers_to_set
    )
        
    # send model to device (GPU/CPU)
    model.to(training_configuration.device)
    
    
    # add network graph with inputs info
    images, labels = next(iter(test_loader))
    images = images.to(training_configuration.device)
    add_network_graph_tensorboard(model, images, tb_writer)

    best_loss = torch.tensor(np.inf)
    
    # epoch train/test loss
    epoch_train_loss = np.array([])
    epoch_test_loss = np.array([])
    
    # epoch train/test accuracy
    epoch_train_acc = np.array([])
    epoch_test_acc = np.array([])
    
    add_wrong_prediction_to_tensorboard(model, test_loader, 
                                                training_configuration.device, 
                                                tb_writer, 0, max_images=300)
    
    
    # training time measurement
    t_begin = time.time()
    for epoch in range(training_configuration.epochs_count):
        
        # Traing
        train_loss, train_acc = train(training_configuration, model, optimizer, train_loader, epoch, tb_writer)
        
        epoch_train_loss = np.append(epoch_train_loss, [train_loss])
        
        epoch_train_acc = np.append(epoch_train_acc, [train_acc])
        
        # add scalar (loss/accuracy) to tensorboard
        tb_writer.add_scalar('Loss/Train',train_loss, epoch)
        tb_writer.add_scalar('Accuracy/Train', train_acc, epoch)

        elapsed_time = time.time() - t_begin
        speed_epoch = elapsed_time / (epoch + 1)
        speed_batch = speed_epoch / len(train_loader)
        eta = speed_epoch * training_configuration.epochs_count - elapsed_time
        
        # add time metadata to tensorboard
        tb_writer.add_scalar('Time/elapsed_time', elapsed_time, epoch)
        tb_writer.add_scalar('Time/speed_epoch', speed_epoch, epoch)
        tb_writer.add_scalar('Time/speed_batch', speed_batch, epoch)
        tb_writer.add_scalar('Time/eta', eta, epoch)
        

        # Validate
        if epoch % training_configuration.test_interval == 0:
            current_loss, current_accuracy = validate(training_configuration, model, test_loader)
            
            epoch_test_loss = np.append(epoch_test_loss, [current_loss])
        
            epoch_test_acc = np.append(epoch_test_acc, [current_accuracy])
            
            # add scalar (loss/accuracy) to tensorboard
            tb_writer.add_scalar('Loss/Validation', current_loss, epoch)
            tb_writer.add_scalar('Accuracy/Validation', current_accuracy, epoch)
            
            # add scalars (loss/accuracy) to tensorboard
            tb_writer.add_scalars('Loss/train-val', {'train': train_loss, 
                                           'validation': current_loss}, epoch)
            tb_writer.add_scalars('Accuracy/train-val', {'train': train_acc, 
                                               'validation': current_accuracy}, epoch)
            
            if current_loss < best_loss:
                best_loss = current_loss
                
            # add wrong predicted image to tensorboard
            add_wrong_prediction_to_tensorboard(model, test_loader, 
                                                training_configuration.device, 
                                                tb_writer, epoch, max_images=300)
        
        # scheduler step/ update learning rate
        if scheduler is not None:
            scheduler.step()
            
        # adding model weights to tensorboard as histogram
        add_model_weights_as_histogram(model, tb_writer, epoch)
        
        # add pr curves to tensor board
        add_pr_curves_to_tensorboard(model, test_loader, 
                                     training_configuration.device, 
                                     tb_writer, epoch, num_classes=3)
        
                
    print("Total time: {:.2f}, Best Loss: {:.3f}".format(time.time() - t_begin, best_loss))
    
    
    
    return model, epoch_train_loss, epoch_train_acc, epoch_test_loss, epoch_test_acc

## <font style="color:green">Optimizer and Scheduler</font><a name="optim"></a>

We consider optimizer and scheduler to be a method because we use it in all training experiments.

In [32]:
def get_optimizer_and_scheduler(model):
    train_config = TrainingConfiguration()

    init_learning_rate = train_config.init_learning_rate

    # optimizer
    optimizer = optim.SGD(
        model.parameters(),
        lr = init_learning_rate,
        momentum = 0.9
    )

    decay_rate = train_config.decay_rate

    lmbda = lambda epoch: 1/(1 + decay_rate * epoch)

    # Scheduler
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lmbda)
    
    return optimizer, scheduler
    


# <font style="color:blue">ResNet Model</font><a name="model"></a>

Load the `resnet18` model with its pretrained weights.

The layers are configured such that if you pass the transfer_learning flag, it just replaces the last layers of the network. Otherwise, it will retrain all the layers, but with the pretrained weights, and not from scratch.

In [33]:
def pretrained_resnet18(transfer_learning=True, num_class=3):
    resnet = models.resnet18(pretrained=True)
    
    if transfer_learning:
        for param in resnet.parameters():
            param.requires_grad = False
            
    last_layer_in = resnet.fc.in_features
    resnet.fc = nn.Linear(last_layer_in, num_class)
    
    return resnet

# <font style="color:blue">Transfer Learning</font><a name="tl"></a>


In [34]:
model = pretrained_resnet18(transfer_learning=True)
print(model)
# get optimizer and scheduler
optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
transfer_learning_sw = SummaryWriter('log_resnet18/transfer_learning')   

# train and validate
model, train_loss_exp2, train_acc_exp2, val_loss_exp2, val_acc_exp2 = main(model, 
                                                                           optimizer,
                                                                           transfer_learning_sw,
                                                                           scheduler,
                                                                           data_augmentation=True)
transfer_learning_sw.close()

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to C:\Users\merri/.cache\torch\hub\checkpoints\resnet18-5c106cde.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'

# <font style="color:blue">Fine-Tuning</font><a name="fine-tune"></a>


In [37]:
model = pretrained_resnet18(transfer_learning=False)
# print(model)
# get optimizer and scheduler
optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
fine_tuning_sw = SummaryWriter('log_resnet18/fine_tuning')   

model, train_loss_exp9, train_acc_exp9, val_loss_exp9, val_acc_exp9 = main(model, 
                                                                           optimizer, 
                                                                           fine_tuning_sw,
                                                                           scheduler,
                                                                           data_augmentation=True)

fine_tuning_sw.close()

RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 83, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 83, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\utils\data\_utils\collate.py", line 53, in default_collate
    storage = elem.storage()._new_shared(numel)
  File "C:\Users\merri\anaconda3\lib\site-packages\torch\storage.py", line 135, in _new_shared
    return cls._new_using_filename(size)
RuntimeError: Couldn't open shared file mapping: <torch_15036_2334844122>, error code: <1455>


As discussed earlier, display the data, using tensorboard.