**This notebook walks through the steps of applying transfer learning for image recognition.  Pytorch has many pretrained models that can be adapted to a given application.  The pretrained models simply need to be loaded and adapated for the given application.**

**For this case, ten objects will be classified from the CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) data set.  The pretrained Inception-v3 (https://arxiv.org/abs/1512.00567) model will be adapted to classify these images.**

**The ten objects to be classified are:**

*   airplane
*   automobile
*   bird
*   cat
*   deer
*   dog
*   frog
*   horse
*   ship
*   truck

**Note that truck here refers to large semi type trucks.**

**First load the necessary packages.  Note that 'models' is being imported from 'torchvision'.  This gives access to many pretrained models.  Pretrained models can be obtained elsewhere as well but this package gives plenty of options for our purposes.**




In [0]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import numpy as np

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F

import torchvision
from torchvision import datasets, transforms, models

#from collections import OrderedDict

**The CIFAR-10 data set is included in the `torchvision.datasets` module.  It is loaded below.**

**Images need to undergo transformations before being loaded.  First the transforms for the training set are defined.**

**The random rotation aids in training.  It will allow the network to see the image from different angles during each training pass.  This will help the network to generalize.  Randomness is also added to the cropping step.  The inception v3 net is unique in that it expects size 299 (299x299 pixels).  Most models take size 224 but sometimes it is fun to be different.  A random horizontal flip is added to further aid in generalization and the normalization is defined to fit what the pretrained network expects.**

In [0]:
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(299),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])


**No randomness is added to the test set as accuracy should be graded on the true images.  However, they are resized and then cropped to fit what the net expects.**

In [0]:
test_transforms = transforms.Compose([transforms.Resize(320),
                                      transforms.CenterCrop(299),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])


**The train and test set are loaded below using the data loader utility.  Batches of 64 images at a time will be passed through the model.  The training set will be randomly shuffled to prevent the net from picking up patterns based on the order images are seen.  This is not necessary for the test set.  Notice that the train and test transforms are applied at this step.**  

In [0]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=train_transforms)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=test_transforms)
testloader = torch.utils.data.DataLoader(testset, batch_size=64)



Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Files already downloaded and verified


**The classes are defined listed below in the order they are indexed (alphabetically).**

In [0]:
classes = ('airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

**The inception v3 model is loaded and described below.  It is pretrained but the output configuration needs to be determined in order to properly attach new layers.  Only 10 output layers are needed for this classification problem whereas the pretrained model outputs 1,000.  The key here is to recognize that the last layer takes 2,048 features and is named as (fc).  A small neural net will basically be attached to this layer and a new output layer will be constructed.  Syntactically this new small net will be named as (fc) and will replace the old output layer.**

In [0]:
model = models.inception_v3(pretrained = True)
model

Downloading: "https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth" to /root/.torch/models/inception_v3_google-1a9a5a14.pth
108857766it [00:01, 90671655.86it/s]


Inception3(
  (Conv2d_1a_3x3): BasicConv2d(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
    (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_2a_3x3): BasicConv2d(
    (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_2b_3x3): BasicConv2d(
    (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_3b_1x1): BasicConv2d(
    (conv): Conv2d(64, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(80, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (Conv2d_4a_3x3): BasicConv2d(
    (conv): Conv2d(80, 192, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, t

**The first step in constructing a custom net from a pretrained net is to freeze the pretrained model.  New layers will be trained but the everything about the pretrained model being leveraged should remain the same.  Therefore the gradients are turned off.**

In [0]:
for param in model.parameters():
    param.requires_grad = False

**Now the old output layer will be replaced with the small net defined below.  It will start with a layer that takes 2,048 features as its input and it will output a log probability for each class.**

**Two fully connected rectified linear (ReLU) layers are built.  The first takes in the 2,048 features the model currently inputs to its output layer and instead outputs 500 features.  The next layer inputs those 500 and outputs 250 features.  Finally, those 250 features are converted into 10 outputs with a log softmax layer.  Note that dropout of .2 is used for both ReLU layers to combat overfitting.  This image classifier is trained to recognize 1,000 different image classes and this application only needs to classify 10**

**A good exercise would be to play with this architecture and see how it affects final accuracy.  Only a few tests were done on different configurations due to a limited gpu budget and time constraints (full time job and family).**

In [0]:
model.fc = nn.Sequential(nn.Linear(2048, 500),
                                 nn.ReLU(),
                                 nn.Dropout(.2), 
                                 nn.Linear(500, 250),
                                 nn.ReLU(),
                                 nn.Dropout(.2), 
                                 nn.Linear(250, 10),
                                 nn.LogSoftmax(dim=1))          

**Negative log likelihood loss is chosen as the cost function.  Adam is chosen as the optimizer.  Documentation on the Adam optimizer is easy to find but outside the scope of this tutorial.  At this point, it is sufficient to know that it is basically a "souped up" version of gradient descent.**

**Note the `to("cuda")` command when defining the loss function.  This is the first of a few objects that needs to be moved to the gpu to drastically speed up training.

In [0]:
criterion = nn.NLLLoss().to("cuda")
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

**Now the model is ready to be trained.  Loss is being tracked after every batch just for fun.**

**Much of the code below is explained in the 'income_model_binary_nn.ipynb' notebook in the same repo as this notebook.  This code is tracked differently simply for demonstration purposes and because it is fun to watch the model work (at least for me).**

**A big difference worth addressing is following two lines:**

                  `logps_t = model.forward(images)
                   logps = logps_t[0] #inception v3 outputs a tuple in train mode`
                    
**The code above is necessary because the inception v3 model outputs a tuple containing two tensors.  Position `[0]` in the tensor is the log proabilities that are needed to make predictions.  Position `[1]` contains the auxiliary logits with dimesionality equal to the original model's output (1,000).  That output is not needed so the log probabilities are stored and the auxiliary logits are ignored.**

**A fun exercise would be to store the batch and loss and make some plots that (hopefully) show the loss decreasing.  This is usually tracked across epochs but that takes a lot longer to run and gpu time can be limited for most people.**

In [0]:
model = model.to("cuda") #model is also moved to the gpu

In [0]:
epochs = 2
batch = 0

for epoch in range(epochs):
     
    running_loss = 0
    
    for images, labels in trainloader:
      
        #images and labels are moved to gpu as well
        images = images.to("cuda")
        labels = labels.to("cuda")
        
        batch += 1
        
        model.train()
        
        optimizer.zero_grad()
        
        logps_t = model.forward(images)
        logps = logps_t[0] #inception v3 outputs a tuple in train mode
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
#         print(f"Epoch {epoch+1} Batch {batch}.. "
#               f"Train loss: {train_loss:.3f}.. ")

    train_loss = running_loss/batch

    print(f"Epoch {epoch+1}.. "
          f"Train loss: {train_loss:.3f}.. ")
            

**Since this model has only been run for one epoch, the full model can be saved and training resumed at a later time.  Code to load the model is provided below as well.**

**First the google drive will be mounted and then the model will be saved in a folder called 'models' in the google drive.  The mounting code only needs to be run once per session so comment it out after running if models will be saved/loaded multiple times.**

In [0]:
# from google.colab import drive
# drive.mount('/content/gdrive')

**The model has been named 'inception_v3_transfer' and stored on the drive.  It can be loaded with the code that is currently commented out.**


In [0]:
# model_name = 'inception_v3_transfer.pt'
# path = F'/content/gdrive/My Drive/models/{model_name}'
# torch.save(model, path)

# model = torch.load('/content/gdrive/My Drive/models/inception_v3_transfer.pt')


**Predictions are made below.  Loss and accuracy are both tracked but should not change by batch outside of some random variation.  The model has been switched to evaluation mode so that it will remain static.  In regular practice, accuracy for each batch would be stored and aggregated to get an overall accuracy metric.**

**Note that in evaluation mode, this model now only outputs the tensor of log probabilities.  This is because the auxiliary logits are only needed for training.  And since log probabilities are less useful, the `torch.exp()` function is used to convert the output to probabilities.**

**The predictions and labels are also being stored to do some fun analysis.  `preds` and `true` will store the predictions and true labels, respectively, as a list of tensors.  Data manipulations will be performed to make it possible to get some specific information about model performance.**

In [0]:
batch = 0

preds = []
true = []

with torch.no_grad():
    model.eval()
    for images, labels in testloader:
      
      images = images.to("cuda")
      labels = labels.to("cuda")
      
      batch += 1
      
      logps = model.forward(images)
      test_loss = criterion(logps, labels)

      ps = torch.exp(logps)
      top_p, top_class = ps.topk(1, dim=1)
      equals = top_class == labels.view(*top_class.shape)
      accuracy = torch.mean(equals.type(torch.FloatTensor)).item()
      
      true.append([labels.view(*top_class.shape).view(64)])
      preds.append([top_class.view(64)])
    

      print(f"Batch {batch}.. "
            f"Test loss: {test_loss:.3f}.. "
            f"Accuracy: {accuracy}.. ")

**First note that accuracy is roughly 55% across all batches.  This is not bad considering the minimal amount of work needed to get this running.  Even better when considering the minimal amount of optimization performed to get the best configuration and hyperparamters.  Keep in mind, a random guess would give 10% accuracy.  5x improvement over random is not bad for one weekend of work.**

**This is the power of transfer learning.  If somone else has already done the heavy lifting, there is no need to go through all that yourself.  It is time consuming and expensive to architect very deep neural networks.  If someone has already incurred that expense, nobody else has to start from scratch again.**

**With all that being said, overall accuracy does not tell the whole story.  Is the model better on some classes than others?  When the model misses, is there systematic bias?  This will be interesting to evaluate.**

**First the output will be converted from lists to one dimensional tensors.  Those tensors will then be converted to arrays so that a convenient built-in function can be used.  Here the data is renamed at each step to allow tracking of the different data objects created.  It is fine to overwrite the old data structures with the new ones if the user is confident.**

In [0]:
# initialize empty tensors and make tensor conversions

true_tensor = torch.empty(len(true), 64)
preds_tensor = torch.empty(len(true), 64)

for i in range(len(true)):
  true_tensor[i] = torch.stack(true[i])
  preds_tensor[i] = torch.stack(preds[i])

# convert the tensor to 1-d
true_1d = true_tensor.view(true_tensor.shape[0]*true_tensor.shape[1])
preds_1d = preds_tensor.view(preds_tensor.shape[0]*preds_tensor.shape[1])

# convert the 1-d tensors to 1-d numpy arrays
true_class = true_1d.numpy()
predicted_class = preds_1d.numpy()


**Now a confusion matrix will be constructed.  A function is defined that can convert the raw counts to either precision or recall or simply keep the raw counts as a default.  Filling the matrix with either precision or recall sometimes gives better context than raw counts.  Typically when raw counts are displayed, most people start scrambling to convert to those percentages anyway.  However, viewing the raw counts is very useful as well.**

**It should be mentioned, the author has used the terms precision and recall loosely here as those would only technically be the diagonals of the outputted matrices.  These terms are used for brevity and hopefully the reader takes away the idea behind displaying and evaluating these values.**

**That being said, the confustion matrix will serve two purposes.  First, instead of showing overall accuracy, this will give an idea of which classes the model is better at classifying.  This could lead to many adjustments both within the model itself or in how the model is used.  Second, it shows what is happening when the model "misses".  No model is perfect but some are useful.  Knowing the limitations and biases inherent in a model will make it far more useful.**

**The function is defined below.  Explanation is given within the comments of the function.**

In [0]:
def cm_metrics(matrix_of_confusion, classes, metric = 'raw'):
    """
    This function takes returns a confusion matrix with recall or precision 
    instead of raw counts.  Recall is the number of correct predictions 
    divided by the number of true examples for a given class.  Precision
    is the number of correct prections divided by the number predicted
    for a given class.  This function assumes that true labels are 
    represented by the x-axis and predictions are represented by the y-axis.
    
    inputs:
        matrix_ of_confusion: a numpy array (output of 
        sklearn.metrics.confusion_matrix()) and outputs a 

        classes: a list of class labels in the order they appear in 
        the confusion matrix (order of indexes)
        
        metric: either 'precision' or 'recall'
              
    returns: a pandas dataframe with the requested matrix
    """
    
    # ensure pandas is imported
    
    import pandas as pd
    
    # compute the correct matrix given the desired output
    
    if metric == 'recall':
        row_sums = matrix_of_confusion.sum(axis = 1)
        return_matrix = matrix_of_confusion/row_sums[:, None] 
    
    elif metric == 'precision':
        col_sums = matrix_of_confusion.sum(axis = 0)
        return_matrix = matrix_of_confusion/col_sums[None, :] 
        
    elif metric == 'raw':
        return_matrix = matrix_of_confusion
    
    #convert output to dataframe with classes displayed
    
    cmdf = pd.DataFrame(return_matrix)
    cmdf.columns = classes
    
    cl = pd.DataFrame(classes)
    cl.columns = ['Classes']
    
    return_df = pd.concat([cl, cmdf], axis = 1)
    
    return return_df
        

**The tuple of classes needs to be converted to a list and the builtin confusion matrix function run and stored.  Then the function is run on recall and precision in that order.  A perfect model would have all 1s on the diagonal.  The rest of the entries will require careful observation to understand how the model misses.**



In [0]:
# convert the classes tuple to a list
cl = list(classes)

# run built in function and store confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_class, predicted_class)


In [0]:
cm_metrics(cm, cl)


Unnamed: 0,Classes,airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck
0,airplane,654,31,29,3,10,1,4,33,232,1
1,automobile,28,847,2,5,1,0,14,21,64,17
2,bird,179,18,352,32,111,18,122,123,39,5
3,cat,59,43,64,247,28,103,204,177,45,27
4,deer,29,4,55,10,451,1,178,236,29,7
5,dog,34,24,48,85,32,398,59,261,40,16
6,frog,12,6,51,27,21,1,841,25,12,4
7,horse,49,9,22,9,56,16,21,777,21,17
8,ship,74,33,5,4,9,3,10,12,839,8
9,truck,45,490,2,6,2,2,10,64,103,276


In [0]:
cm_metrics(cm, cl, metric = 'recall')



Unnamed: 0,Classes,airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck
0,airplane,0.655311,0.031062,0.029058,0.003006,0.01002,0.001002,0.004008,0.033066,0.232465,0.001002
1,automobile,0.028028,0.847848,0.002002,0.005005,0.001001,0.0,0.014014,0.021021,0.064064,0.017017
2,bird,0.179179,0.018018,0.352352,0.032032,0.111111,0.018018,0.122122,0.123123,0.039039,0.005005
3,cat,0.059178,0.043129,0.064193,0.247743,0.028084,0.10331,0.204614,0.177533,0.045135,0.027081
4,deer,0.029,0.004,0.055,0.01,0.451,0.001,0.178,0.236,0.029,0.007
5,dog,0.034102,0.024072,0.048144,0.085256,0.032096,0.399198,0.059178,0.261785,0.04012,0.016048
6,frog,0.012,0.006,0.051,0.027,0.021,0.001,0.841,0.025,0.012,0.004
7,horse,0.049147,0.009027,0.022066,0.009027,0.056169,0.016048,0.021063,0.779338,0.021063,0.017051
8,ship,0.074223,0.033099,0.005015,0.004012,0.009027,0.003009,0.01003,0.012036,0.841525,0.008024
9,truck,0.045,0.49,0.002,0.006,0.002,0.002,0.01,0.064,0.103,0.276


In [0]:
cm_metrics(cm, cl, metric = 'precision')


Unnamed: 0,Classes,airplane,automobile,bird,cat,deer,dog,frog,horse,ship,truck
0,airplane,0.562339,0.020598,0.046032,0.007009,0.01387,0.001842,0.002734,0.019086,0.162921,0.002646
1,automobile,0.024076,0.562791,0.003175,0.011682,0.001387,0.0,0.009569,0.012146,0.044944,0.044974
2,bird,0.153912,0.01196,0.55873,0.074766,0.153953,0.033149,0.08339,0.071139,0.027388,0.013228
3,cat,0.050731,0.028571,0.101587,0.577103,0.038835,0.189687,0.13944,0.102371,0.031601,0.071429
4,deer,0.024936,0.002658,0.087302,0.023364,0.62552,0.001842,0.121668,0.136495,0.020365,0.018519
5,dog,0.029235,0.015947,0.07619,0.198598,0.044383,0.732965,0.040328,0.150954,0.02809,0.042328
6,frog,0.010318,0.003987,0.080952,0.063084,0.029126,0.001842,0.574846,0.014459,0.008427,0.010582
7,horse,0.042132,0.00598,0.034921,0.021028,0.07767,0.029466,0.014354,0.449393,0.014747,0.044974
8,ship,0.063629,0.021927,0.007937,0.009346,0.012483,0.005525,0.006835,0.00694,0.589185,0.021164
9,truck,0.038693,0.325581,0.003175,0.014019,0.002774,0.003683,0.006835,0.037016,0.072331,0.730159


**The observations below are made from viewing the confusion matrix pasted below.  It was generated after running only 12 test batches through the model.  It is meant to demonstrate the means by which the results whould be interpreted.  If a users results differ, please apply the concepts and not this direct knowledge.  Time permitting, more results can be generated to see if these trends hold.**

**UPDATE:   After 120 batches, the trends still hold.  The model thinks that frogs resemble cats moreso than dogs.  This is not surprising, as the model had no opportunity to improve across test batches.  But it is good to account for small sample variation before making blanket statements about trends.**

![](http://i67.tinypic.com/2dbku1w.png)

**As one might expect, the model frequently confuses the truck for an automobile.  When it sees a truck, it is almost as likely to think it saw an automobile as a truck.  However, when it thinks it sees a truck, it is usually correct.  Somewhat conversely, if it sees an automobile, it is very likely to identify it correctly.  However, it also falsely predicts that many trucks are also automobiles.**

**This makes sense if you look at how often the model predicts automobile versus truck.  It predicts automobile a lot more than the number present in the test set.  In short, it overpredicts automobile and underpredicts truck.**

**As one might also expect, the model confuses ships with airplanes and dogs with cats.  Most would have probably expected this outcome.**

**What is somewhat surprising is that if the model sees a cat but gets it wrong, it predicts frog over twice as much as dog.  It still predicts cat most often when it sees a cat.  That is the good news.  But the model thinks frogs look more like cats than dogs do.**

**Knowing the strengths and weaknesses of a model is very important for two main reasons:**

**The first reason is that the model can then be used effectively in its current state.  Assigning a degree of confidence to different types of predictions can help to guide decisions made using the model's output.  It may cause someone to ignore certain types of predictions and/or take swift action based on others.  Ignorantly regarding every type of output as the same is a big mistake often made by relatively smart people.  Take the time to learn nuances of any model.**

**The other reason it is important is because knowledge of the current state can help accelerate development and get the model to a better state.  This could mean changing the inputs, architecture, hyperparameters, etc. of the model and retraining.  In this case, simply training longer would probably yield benefits.  It can also mean treating the probabilities differently.  In the above example automobile is predicted often when the model should be predicting truck.  Perhaps try shifting some of the automobile predictions with high truck probability to a truck assignment and assess accuracy.  There is no law stating that the highest probability gets the prediction.  It is usually the best method.  But there are exceptions.**