# Training a Deep Learning Image Classifier Using Transfer Learning

### Presented by: [Ivan Hernandez, Ph.d]("http://ivanhernandez.com"), Virginia Tech


#### To run the code in each cell: 
- #### Click inside the cell
- #### Hold down Ctrl and then press Enter
- #### An Asterisk (*) should appear in the top left corner of the cell when the code is running
- #### Output should appear when the code has successfully run
- #### If you want to stop execution of the code, press the stop sign (■) at the top menu bar


# Import the Data

The first step of training a deep learning classifier is importing data.

We assume the following:
- Images are saved in a folder called "images" within the same directory as your code file.
- Images within that folder are saved in different folders corresponding to their classification (e.g., "female", "male"
- Images are saved at a size large than 224 x 224 pixels.
- Images are saved as a jpeg or png

We first import the required libraries to load the data including
- torch (contains libraries to construct and train neural networks)
- torchvision (contains libraries to load and tranform images into data for neural networks)


We create a normalization function that standardizes the image color channels (red, green, blue) to have the same mean and standard deviation. The values provided are the ones most commonly used.

We define a transformation function that allows images to have transformatons applied during training that promote generalizability including:
- Randomly resizing the cropping the image (between 80% to 100% of the original size
- Resizing the image to a common size of 224 x 224 pixels
- Randomly deciding whether to flip the image horizontally
- Converting the image to a numeric matrix (tensor).

In [None]:
#import the required libraries
import torchvision
import torch
torch.backends.cudnn.deterministic = True
torch.manual_seed(1234)

#create a function that defines the means and standard deviations that we constrain the image's red, green, and blue channels
normalize = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])

#Create a function that applies data transformation to the images used to the train the model
transform_data = torchvision.transforms.Compose([
            torchvision.transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
            torchvision.transforms.RandomHorizontalFlip(),
            torchvision.transforms.ToTensor(),
            normalize,
        ])

#Load the images in the folder called "images".
#The "images" folders should have subfolders with names that correspond to the different categories (e.g., "male" and "female")
# Within the category subfolders there should be a college of images belonging to that category
dataset = torchvision.datasets.ImageFolder('images',transform=transform_data)    
dataset

# Break Data into Training and Validation and Test Sets

Now that we have our data loaded, we need to split it into different sets for training, validation and testing.

<img src = "notebookfiles/train-validate-test.png">

The subsets of data are used in the following ways:
- Training set: used to provide the models examples to learn the correct weight values to predict the images
- Validation set: used to evaluate the performance of the training test to determine when to stop the training
- Test set: used to evaluate the performance of the model and correct for overfitting that would other occur on the validation set due to using the validation set to configure the weights

We need to specify our validation size (percentage of original data we set aside for testing).

The validation size is a percentage. If we set it to 10%, then if we had 1000 total images, 100 of them would be used for the validation set and 100 of them would be used for the testing set.

We also need to specify our batch size, which is how we group our images during training. With neural networks. we can send pass images through the network in batches rather than one-by-one.

Batch sizes are almost always powers of 2 because of the effective work of optimized matrix operation libraries. Generally, the batch sizes of 32, 64, 128, 256, 512 and 1024 are used while training a neural network. 

Using a large batch size makes the neural network perform poorly. This lack of generalizability is due to the fact that large batch size tends to converge to a sharp minimizer of the training function. Small batch sizes tends to converge to flat minimizers, and also not produce the best results. Small batches though fit into memory better. 

You generally want to choose one that is moderately sized, but if you get a memory error, try using a smaller batch. We are using a batch size of 32.

In [None]:
#import the numpy library and abbreviate its name to np
import numpy as np

np.random.seed(1234)

#Specify what percentage of the original set of images should be set aside for validation and final testing
validation_size = .10

#Specify how many images are used to train the model at once
batch_size = 16


num_train = len(dataset) #determine how many total images do we have
indices = list(range(num_train)) # a list of indices, one for each instance
validation_amount = int(np.floor(validation_size * num_train)) 
split = num_train - 2 * validation_amount #calculate the amount of images used to train the model
np.random.shuffle(indices) # randomly shuffle the indices

#split the indices into ones that correspond to the training images, validation images, and testing images.
train_idx, valid_idx, test_idx = indices[:split], indices[split:split+validation_amount],indices[split+validation_amount:]

#Create sampler objects that can randomly sample rows according to a list of all possible people
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx)
valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(valid_idx)
test_sampler = torch.utils.data.sampler.SubsetRandomSampler(test_idx)


#Create data loaders that are provided with the same dataset, but a different 
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler)
valid_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=valid_sampler)      
test_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=test_sampler)

print("Data Loaders Initialized")

# Load a Pretrained Model

For transfer learning, you need to load a pretrain model that has been trained on a problem as close to you current problem as possible.

For image classification tasks, it is common to load models pretrained on the ImageNet task (a competition to classify the object of images).

Although there are many possible neural network configurations to classify images (VGGNet, LeNet, INCEPTION), Resnet is very common to use.

<img src="notebookfiles/resnetimage.png">

There are many different sized Resnet models to choose from. In general, larger networks can attain higher accuracy, but take longer to train, and may require more training examples.

In Torchvision, some pretrained ResNet models are:
- resnet18 (ResNet with 18 hidden layers)
- resnet34 (ResNet with 34 hidden layers)
- resnet50 (ResNet with 50 hidden layers)
- resnet101 (ResNet with 101 hidden layers)
- resnet152 (ResNet with 152 hidden layers)

We will use resnet18, which has 18 hidden layers (the smallest option possible).

In [None]:
import torchvision.models as models

#Load in a pretrained model that we have saved
pretrained_model = torch.load("resnet18.pth")

#uncomment the line below to download new models from the internet
#pretrained_model = models.resnet18(pretrained=True) 

print(pretrained_model) #View the model architecture

# Freeze the Pretrained Model's Weights

When performing transfer learning, you need to tell the neural network not to change any of the weights from the earlier layers.

This process is called "freezing the weights"

<img src="notebookfiles/resnettransferlearning.png">

In the code below, we will freeze all of the weights in the model, and then later we will create new layers to replace the later ones.

In [None]:
for param in list(pretrained_model.children()):
    param.require_grad = False

print("All model weights are frozen")

# Change the Classification Layer

After freezing the weights to the model, we need to overwrite the existing layers at the end of the model that handle classification from the features inferred from the earlier layers.

Below, we specify how many possible outcome categories we have (2), and then determine how many features were fed into the last layer (called the fully connected or fc layer).

We then overwrite the fully connected layer with a new Linear neural network layer that accepts the same input as before, but outputs the desired number of categories.

In [None]:
number_classes = 2 #indicate how many outcome categories there are

num_ftrs = pretrained_model.fc.in_features #get the features produced by the second to last layer going into the last

#Overwrite the last layer (fully connected / fc) with a new Linear model that takes in the same amount of information
#The new layer outputs data equal to the number of possible outcome categories
pretrained_model.fc = torch.nn.Linear(num_ftrs, number_classes)
print("Last Layer Overwritten")

# Set Model to Train on the Desired Device

In [None]:
#the code will use your GPU if you have it configured correctly
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
pretrained_model = pretrained_model.to(device) #have the model train on the preferred device
print(device)

# Initialize the Model and Optimizers

We define the loss function.

For classifiction tasks, CrossEntropyLoss is commonly used.

For regression tasks, MSELoss is commonly used.

We also define what optimization algorithm we will use to determine how to update the weights of the model

We are using Stochastic Gradient Descent, but Adam and AdaDelta are also popular.

We then define the scheduler, which determines how much to slow down the learning rate (lr) after a specific number of losses.

We are setting the learning rate to .01. If you wanted to make the learning rate slower, you would change it to .001. SLowing the learning rate takes more time, but often improves the prediction results (up to a limit)

In [None]:
#Choose the Loss Function. For Classification Problems, CrossEntropyLoss is good
loss_function = torch.nn.CrossEntropyLoss() 

#Define the optimization algorithm used and it's learning rate
# Observe that only the fc parameters are being optimized
optimizer = torch.optim.SGD(pretrained_model.fc.parameters(), lr=0.01, momentum=0.9)

# Decay LR by a factor of 0.1 every 3 epochs
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

# Define Functions to Train and Test the model

The following functions are used to train the model and examine its performance on a given dataset.

The evaluation function is provided
- the dataloader containing either training or validation data
- the current epoch
- a True or False flag indicating whether the function is being used to train or evaluate the dataset

In [None]:
def evaluate(data_loader,epoch,train=True):
    losses = []
    if train:
        pretrained_model.train()
        for batch_idx, (data, target) in enumerate(data_loader):
            if torch.cuda.is_available(): 
                data, target = torch.autograd.Variable(data).cuda(), torch.autograd.Variable(target).cuda()
            else: 
                data, target = torch.autograd.Variable(data), torch.autograd.Variable(target)
            
            optimizer.zero_grad()
            output = pretrained_model(data)
            loss = loss_function(output, target)
            
            print("Train Epoch: ",epoch+1, "Batch :", batch_idx, "Loss: ", np.float(loss.data))
            
            #check the order of these
            loss.backward()
            optimizer.step()
            
    else:
        pretrained_model.eval()
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(data_loader):
                
                if  torch.cuda.is_available(): 
                    data, target = torch.autograd.Variable(data).cuda(), torch.autograd.Variable(target).cuda()
                else: 
                    data, target = torch.autograd.Variable(data), torch.autograd.Variable(target)

                output = pretrained_model(data)
                loss = loss_function(output, target)
                losses.append(np.float(loss.data))
    
    return np.mean(losses)

print("Training Function Defined")

# Train the Model

The following code is the key part that actually trains the model with the dataset and displays its current performance.

Run this code ONLY when you have run the previous functions.

The code below:
- iterates through a given number of epochs (10 in this example)
- passes the training data through the model (using the train_ loader)
- updates the weights (occurs within the evaluate function)
- evaluates the model on the validation dataset (using the test loader)
- compares the loss on the validation data from that epoch to the best previously observed loss.
- saves the current if it improves on the current loss

<img src="notebookfiles/learningcurve.jpg" height=300 width=300>

In [None]:
import copy

number_of_epochs = 5 #How many passes through the data are you making


best_loss =  float("inf") #start out with the loss being the worst possible (inifinity)
for epoch in range(number_of_epochs): #train the model by passing the data through it 5 separate times
    train_loss = evaluate(train_loader,epoch,train=True) #train the model
    test_loss = evaluate(valid_loader,epoch,train=False) #test the model on the validation data
    if test_loss < best_loss: #if the validation loss has improved upon the previously observed validation loss
        best_model = copy.deepcopy(pretrained_model) #copy the model as the new best model
        best_loss = test_loss #update the best observed loss
    scheduler.step() #increment the schedulers by one epoch
    print("Epoch: ", epoch+1, "Test Loss: ", test_loss) #print the validation results

# View Final Evaluation

When the model has bee trained, it is important to evaluate the model's performance on a separate dataset not used to select the best combination of model weights.

The following function uses the model to make predictions on a the data in the test loader, and then compares those predictions to the actual outcome values, and calculates the accuracy.

In [None]:
correct_predictions = 0
predictions_made = 0

pretrained_model.eval()

for batch_idx, (data, target) in enumerate(test_loader):
    with torch.no_grad():
        if torch.cuda.is_available(): 
            data, target = torch.autograd.Variable(data).cuda(), torch.autograd.Variable(target).cuda()

        else: 
            data, target = torch.autograd.Variable(data), torch.autograd.Variable(target)

        output = best_model(data)

        predicted = output.data.max(1)[1]
        correct_predictions += predicted.eq(target.data).cpu().sum()
        predictions_made += len(target.data)

print("Test Accuracy :", float(correct_predictions) / predictions_made)

# Save the Model

When you are satisfied with your model, you can save it to the directory where the notebook file is running.

It is common to use the extension ".pth" for pytorch models.

In [None]:
torch.save(best_model, "updated_model.pth") #you can call the filename whatever you want. We use "updated_model.pth" here
print("model saved")

# Load the Model

You can load previously saved models and assign them to a variable name

In [None]:
pretrained_model = torch.load("updated_model.pth")
print("model loaded")

# Make a Prediction

You can use the model to make predictions on a new image using the following code.

The predicted value is a number.

If the predicted value is 0, it means the neural network is predicting the 1st category alphabetically ("female") from the folder names.

A prediction of 1 represents the next alphabetic category ("male") that was in the folder names.

If there was another folder name ("other") whose alphabetic order was after category 1, it would be prediction 2.

In [None]:
from PIL import Image
import requests
from io import BytesIO

image_filename = "whitehouse/hilary.jpg"
#try the names of other images saved in the folder
# "bill.jpg"
# "barack.jpg"
# "michelle.jpg"
# "melania.jpg"
# "donald.jpg"


pretrained_model.eval()

img = Image.open(image_filename)

#If you wanted to use a url, you can use the following code (uncomment the next three lines):
#url = "https://upload.wikimedia.org/wikipedia/commons/f/f5/Poster-sized_portrait_of_Barack_Obama.jpg"
#response = requests.get(url)
#img = Image.open(BytesIO(response.content))

normalize = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])

preprocess = torchvision.transforms.Compose([
   torchvision.transforms.Resize ((224,224)),
   torchvision.transforms.ToTensor(),
   normalize
])

img_tensor = preprocess(img)
img_tensor.unsqueeze_(0)

if torch.cuda.is_available(): 
    img_variable = torch.autograd.Variable(img_tensor).cuda()
else:
    img_variable = torch.autograd.Variable(img_tensor)
    
output = pretrained_model(img_variable)
predicted = torch.max(output.data,1)[1].item()

print("Predicted category: ", predicted)
img.resize((200, int(200 * (img.height / img.width))),resample= Image.ANTIALIAS)