__Finetuning torchvision models__

1. [Load tools](#Load-tools)
1. [Initialize and reshape the networks](#Initialize-and-reshape-the-networks)
    1. [Resnet](#Resnet)
    1. [Alexnet](#Alexnet)
    1. [VGG](#VGG)
    1. [Squeezenet](#Squeezenet)
    1. [Densenet](#Densenet)
    1. [Inception v3](#Inception-v3)
1. [Inputs](#Inputs)
1. [Helper functions](#Helper-functions)
    1. [Model training and validation code](#Model-training-and-validation-code)
    1. [Set model parameters’ .requires_grad attribute](#Set-model-parameters’-.requires_grad-attribute)
    1. [Initialize models](#Initialize-models)
1. [Load data](#Load-data)
1. [Run training and validation step](#Run-training-and-validation-step)
1. [Comparison with model trained from scratch](#Comparison-with-model-trained-from-scratch)


# Load tools

<a id = 'Load-tools'></a>

In [45]:
# Standard libary and settings
import os
import sys
import time
import copy
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# Data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.options.display.float_format = "{:,.6f}".format

# import PyTorch
import torch
from torch.utils.data import Dataset, DataLoader
import torch.autograd as autograd
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.jit import script, trace
import torchvision
import torchvision.transforms as transforms

# Visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

# Magic functions
%matplotlib inline

# Initialize and reshape the networks

The initialization for each one of the transfer learning model is slightly different, but in general we need to adapt the last layer of the model to match the number of classes we are trying to predict. The builtin PyTorch models were trained on an ImageNet dataset, which has 1,000 different classes. The fully connected layer at the end of each of these models has 1,000 different output nodes, and we need to amend this fully connected layer so that it works with our data set.

Further, if we are only feature extracting, then we are only adjusting the weights in the final layer, so we need to turn of the gradients for all other layers. If we are finetuning then we leave all gradients on.

Lastly, the inception_v3 model requires the input size to be 299,999, and all others expect 224, 224.

<a id = 'Initialize-and-reshape-the-networks'></a>

## Resnet

Resenet comes in several different sizes in terms of its layer. The models include Resnet18, Resnet34, Resnet50, Resnet101 and Resnet152. In this case we will use Resnet18 because the dataset is small and there are only two classes

<a id = 'Resnet'></a>

In [46]:
# print model
resnet_model = torchvision.models.resnet18()
resnet_model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Co

In [47]:
# reinitialize the last layer
model.fc = nn.Linear(512, num_classes)

## Alexnet

<a id = 'Alexnet'></a>

In [48]:
# print model
alexnetModel = torchvision.models.alexnet()
alexnetModel

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p

In [49]:
#
alexnetModel.classifier[6] = nn.Linear(4096, num_classes)

## VGG

<a id = 'VGG'></a>

In [50]:
# print model
vggModel = torchvision.models.vgg11_bn()
vggModel

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU(inplace)
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): ReLU(inplace)
    (11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (13): ReLU(inplace)
    (14): MaxPool2d(kernel_size=2, stride=

In [51]:
#
vggModel.classifier[6] = nn.Linear(4096, num_classes)

## Squeezenet

<a id = 'Squeezenet'></a>

In [52]:
# print model
squeezenetModel = torchvision.models.squeezenet1_0()
squeezenetModel

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 96, kernel_size=(7, 7), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(96, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): Fire(
      (squeeze): Conv2d(128, 32, kerne

In [53]:
#
squeezenetModel.classifier[1] = nn.Conv2d(
    512, num_classes, kernel_size=(1, 1), stride=(1, 1)
)

## Densenet

<a id = 'Densenet'></a>

In [54]:
# print model
densenetModel = torchvision.models.densenet121()
densenetModel

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplac

In [55]:
#
densenetModel.classifier = nn.Linear(1024, num_classes)

## Inception v3

This model is unique because it has two outputs layers in the training phase. The second output is an auxiliary output and is in the AuxLogits section of the network. the primary output is a standard linear layer.

<a id = 'Inception v3'></a>

In [56]:
# print model
inceptionModel = torchvision.models.inception_v3()
inceptionModel

Inception3(
  (Conv2d_1a_3x3): BasicConv2d(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
    (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_2a_3x3): BasicConv2d(
    (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_2b_3x3): BasicConv2d(
    (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_3b_1x1): BasicConv2d(
    (conv): Conv2d(64, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(80, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_4a_3x3): BasicConv2d(
    (conv): Conv2d(80, 192, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, t

In [57]:
#
inceptionModel.AuxLogits.fc = nn.Linear(768, num_classes)
inceptionModel.fc = nn.Linear(2048, num_classes)

# Inputs

<a id = 'Inputs'></a>

In [58]:
# image file directory that conforms to ImageFolder structure
data_dir = "C:/Users/petersont/Desktop/data/hymenoptera_data"

# models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "squeezenet"

# number of classes to predict
num_classes = 2

# batch size
batch_size = 8

# number of epochs to train for
num_epochs = 15

# flag to indicate whether we are feature extracting or fine tuning
feature_extract = True

# Helper functions

<a id = 'Helper-functions'></a>

## Model training and validation code

<a id = 'Model-training-and-validation-code'></a>

In [65]:
#
import copy


def train_model(
    model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False
):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print("Epoch {} / {}".format(epoch, num_epochs - 1))
        print("-" * 25)

        # each epoch has a training and validation phase
        for phase in ["train", "val"]:
            if phase == "train":
                model.train()
            else:
                model.eval()

            running_loss = 0.0
            running_corrects = 0

            # iterate over data
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the gradients
                optimizer.zero_grad()

                # forward pass
                # track history when in training mode
                with torch.set_grad_enabled(phase == "train"):
                    if is_inception and phase == "train":
                        ouputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4 * loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward pass in training phase
                    if phase == "train":
                        loss.backward()
                        optimizer.step()

                # capture running statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == "val" and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            else:
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print(
        "Training complete in {:.0f}m {:.0f}s".format(
            time_elapsed // 60, time_elapsed % 60
        )
    )
    print("Best val accuracy: {:.4f}".format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

## Set model parameters’ .requires_grad attribute

<a id = 'Set-model-parameters’-.requires_grad-attribute'></a>

In [66]:
#
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

## Initialize models

<a id = 'Initialize models'></a>

In [67]:
#
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):

    model_ft = None
    input_size = 0

    if model_name == "resnet":
        model_ft = torchvision.models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224
    elif model_name == "alexnet":
        model_ft = torchvision.models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)
        input_size = 224
    elif model_name == "vgg":
        model_ft = torchvision.models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs, num_classes)
        input_size = 224
    elif model_name == "squeezenet":
        model_ft = torchvision.models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(
            512, num_classes, kernel_size=(1, 1), stride=(1, 1)
        )
        model_ft.num_classes = num_classes
        input_size = 224
    elif model_name == "densenet":
        model_ft = torchvision.models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224
    elif model_name == "inception":
        model_ft = torchvision.models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)

        # auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)

        # primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)

        input_size = 299
    else:
        print("error, exiting...")
    return model_ft, input_size


# initialize the model
model_ft, input_size = initialize_model(
    model_name, num_classes, feature_extract, use_pretrained=True
)
print(model_ft)

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 96, kernel_size=(7, 7), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(96, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): Fire(
      (squeeze): Conv2d(128, 32, kerne

# Load data

The input_size is used as an argument in several of the data transforms. Also of note is that the normalization values are hard coded with the suggested inputs for mean and standard deviation for each of the channels.

<a id = 'Load-data'></a>

In [68]:
#
data_transforms = {
    "train": transforms.Compose(
        [
            transforms.RandomResizedCrop(input_size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
    "val": transforms.Compose(
        [
            transforms.Resize(input_size),
            transforms.CenterCrop(input_size),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
}

data_dir = "C:/Users/petersont/Desktop/data/hymenoptera_data"
image_datasets = {
    x: torchvision.datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
    for x in ["train", "val"]
}
dataloaders_dict = {
    x: torch.utils.data.DataLoader(
        image_datasets[x], batch_size=4, shuffle=True, num_workers=4
    )
    for x in ["train", "val"]
}
dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
class_names = image_datasets["train"].classes
device = "cpu"

# Create optimizer

<a id = 'Create-optimizer'></a>

In [69]:
# send model to device
model_ft = model_ft.to(device)

params_to_update = model_ft.parameters()
print("params to learn: ")
if feature_extract:
    params_to_update = []
    for name, param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t", name)
else:
    for name, param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t", name)

#
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

params to learn: 
	 classifier.1.weight
	 classifier.1.bias


# Run training and validation step

<a id = 'Run-training-and-validation-step'></a>

In [70]:
# loss function
criterion = nn.CrossEntropyLoss()

# train and evaluate
model_ft, hist = train_model(
    model_ft,
    dataloaders_dict,
    criterion,
    optimizer_ft,
    num_epochs=num_epochs,
    is_inception=(model_name == "inception"),
)

Epoch 0 / 14
-------------------------
train Loss: 0.5857 Acc: 0.7213
val Loss: 0.3890 Acc: 0.8693

Epoch 1 / 14
-------------------------
train Loss: 0.3479 Acc: 0.8648
val Loss: 0.2603 Acc: 0.9085

Epoch 2 / 14
-------------------------
train Loss: 0.2623 Acc: 0.8893
val Loss: 0.2842 Acc: 0.8954

Epoch 3 / 14
-------------------------
train Loss: 0.2089 Acc: 0.8811
val Loss: 0.2961 Acc: 0.9346

Epoch 4 / 14
-------------------------
train Loss: 0.2975 Acc: 0.8893
val Loss: 0.2724 Acc: 0.9150

Epoch 5 / 14
-------------------------
train Loss: 0.2425 Acc: 0.8975
val Loss: 0.2673 Acc: 0.9216

Epoch 6 / 14
-------------------------
train Loss: 0.1890 Acc: 0.9139
val Loss: 0.2936 Acc: 0.9150

Epoch 7 / 14
-------------------------
train Loss: 0.1936 Acc: 0.9221
val Loss: 0.2429 Acc: 0.9412

Epoch 8 / 14
-------------------------
train Loss: 0.1598 Acc: 0.9344
val Loss: 0.2977 Acc: 0.9346

Epoch 9 / 14
-------------------------
train Loss: 0.2058 Acc: 0.9098
val Loss: 0.2610 Acc: 0.9346



# Comparison with model trained from scratch

<a id = 'Comparison-with-model-trained-from-scratch'></a>

# A

<a id = ''></a>

# A

<a id = ''></a>