# Transfer Learning

In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision import datasets, transforms, models

import time

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [3]:
data_dir = 'Cat_Dog_data/Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30), 
                                         transforms.RandomResizedCrop(224), 
                                         transforms.RandomHorizontalFlip(), 
                                         transforms.ToTensor(), 
                                         transforms.Normalize([0.485, 0.456, 0.406], 
                                                              [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(256), 
                                     transforms.CenterCrop(224), 
                                     transforms.ToTensor(), 
                                     transforms.Normalize([0.485, 0.456, 0.406], 
                                                          [0.229, 0.224, 0.225])])


# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

In [4]:
def print_memory_allocation():
    print('alloceted: {:.1f} MB, - max. allocated: {:.1f} MB, - cached: {:.1f} MB, - max. cached: {:.1f} MB\n'.
              format((torch.cuda.memory_allocated(device) / 1000000), 
                     (torch.cuda.max_memory_allocated(device) / 1000000), 
                     (torch.cuda.memory_cached(device) / 1000000), 
                     (torch.cuda.max_memory_cached(device) / 1000000)))

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [5]:
# Build and train the network
model_name = 'densenet121'
#model_name = 'vgg19'
#model_name = 'vgg13'
if model_name == 'densenet121':
    model = models.densenet121(pretrained=True)
elif model_name == 'vgg19':
    model = models.vgg19(pretrained=True)
elif model_name == 'vgg13':
    model = models.vgg13(pretrained=True)
else:
    print('You did not choose a model! Please choose a model to continue')
    model = None
    
#model

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [6]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

In [7]:
#model

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [8]:
criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr = 0.001)

model.to(device)

epochs = 2
steps = 0
running_loss = 0
print_every = 50

start_time = time.time()

print('-----beginning\n')
print_memory_allocation()

for e in range(epochs):
    running_loss = 0
    
    model.train()
    
    for images, labels in iter(trainloader):
        steps += 1
        
        # Move input and label tensors to the GPU
        images, labels = images.to(device), labels.to(device)
        
        # Clear the gradients, do this because gradients are accumulated
        optimizer.zero_grad()
        
        if steps == 1:
            print('-----before outputs = model.forward()\n')
            print_memory_allocation()
        
        outputs = model.forward(images) # memory increase here within the first step!!!
        
        if steps == 1:
            print('-----after outputs = model.forward()\n')
            print_memory_allocation()
        
        loss = criterion(outputs, labels) 
        
        loss.backward()
        
        optimizer.step()

        running_loss += loss.item()
    
        if steps % print_every == 0:
            # Model in inference mode, dropout is off
            model.eval()
            
            accuracy = 0
            valid_loss = 0
            
            for ii, (images, labels) in enumerate(testloader):
                
                # don't save the history
                with torch.no_grad():
                    inputs = Variable(images)
                    labels = Variable(labels)
                
                images, labels = images.to(device), labels.to(device)
                    
                outputs = model.forward(images)
                    
                valid_loss += criterion(outputs, labels).item()
                    
                ps = torch.exp(outputs)
                
                equality = (labels.data == ps.max(1)[1])
                
                accuracy += equality.type_as(torch.FloatTensor()).mean()
                
            print("Epoch: {}/{}... ".format(e+1, epochs),
                  "Loss: {:.4f}".format(running_loss/print_every),
                  "Validation Loss: {:.3f}.. ".format(valid_loss/len(testloader)),
                  "Validation Accuracy: {:.3f}".format(accuracy/len(testloader)))
            
            running_loss = 0
            
            model.train()

torch.cuda.empty_cache()
                    
print("Time for training and validation : {:.0f} minutes and {:.3f} seconds".format((time.time() - start_time)/60, (time.time() - start_time) % 60))

print('-----after training\n')
print_memory_allocation()

-----beginning

alloceted: 30.4 MB, - max. allocated: 30.4 MB, - cached: 31.5 MB, - max. cached: 31.5 MB

-----before outputs = model.forward()

alloceted: 49.6 MB, - max. allocated: 49.6 MB, - cached: 50.7 MB, - max. cached: 50.7 MB

-----after outputs = model.forward()

alloceted: 49.9 MB, - max. allocated: 306.5 MB, - cached: 462.2 MB, - max. cached: 462.2 MB

Epoch: 1/2...  Loss: 0.2465 Validation Loss: 0.187..  Validation Accuracy: 0.928
Epoch: 1/2...  Loss: 0.1988 Validation Loss: 0.049..  Validation Accuracy: 0.984
Epoch: 1/2...  Loss: 0.1677 Validation Loss: 0.048..  Validation Accuracy: 0.983
Epoch: 1/2...  Loss: 0.1947 Validation Loss: 0.069..  Validation Accuracy: 0.976
Epoch: 1/2...  Loss: 0.2053 Validation Loss: 0.049..  Validation Accuracy: 0.981
Epoch: 1/2...  Loss: 0.1661 Validation Loss: 0.053..  Validation Accuracy: 0.979
Epoch: 1/2...  Loss: 0.1826 Validation Loss: 0.044..  Validation Accuracy: 0.983
Epoch: 1/2...  Loss: 0.1995 Validation Loss: 0.040..  Validation Ac

Evaluate the model

In [9]:
model.eval()

accuracy = 0
test_loss = 0

for images, labels in iter(testloader):
    
    with torch.no_grad():
        images, labels = images.to(device), labels.to(device)
        
        outputs = model.forward(images)

        ps = torch.exp(outputs)

        equality = (labels.data == ps.max(1)[1])

        accuracy += equality.type_as(torch.FloatTensor()).mean()
        
torch.cuda.empty_cache()
    
print('Test accuracy: {:.3f}'.format(accuracy / len(testloader)))

Test accuracy: 0.983


In [10]:
# Save the checkpoint
def save_checkpoint():
    checkpoint = {'arch': model_name,
                  'classifier': classifier,
                  'criterion': criterion,
                  'optimizer': optimizer,
                  'optimizer_dict': optimizer.state_dict(), 
                  'state_dict': model.state_dict(),
                  'epochs': epochs}

    torch.save(checkpoint, 'checkpoint.pth')
    
save_checkpoint()

In [11]:
# load the checkpoint and rebuilds the model
def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    # get the model / architecture name
    model_name = checkpoint['arch'] 
    # int a new model
    if model_name == 'densenet121':
        model_a = models.densenet121(pretrained=True)
    elif model_name == 'vgg19':
        model_a = models.vgg19(pretrained=True)
    elif model_name == 'vgg13':
        model_a = models.vgg19(pretrained=True)
    else:
        print('No correct model transmitted')
        model_a = None
    # set the device to the model   
    model_a.to(device)
    # assign the classifier of the stored model to the new one
    model_a.classifier = checkpoint['classifier']
    # assign the criterion of the stored model to the new one
    model_a.criterion = checkpoint['criterion']
    # get the model.state_dict() of the stored model
    model_a.load_state_dict(checkpoint['state_dict'])
    # get the weights of the stored model
    optimizer = checkpoint['optimizer']
    optimizer.load_state_dict(checkpoint['optimizer_dict'])
    # get the epochs of the stored (and trained) model
    epochs = checkpoint['epochs']
    
    return model_a

In [12]:
new_model = load_checkpoint('checkpoint.pth')
print(new_model)
print('-----after new_model = load_checkpoint()\n')
print_memory_allocation()

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplac

In [13]:
# check if the restored new_model is working
new_model.eval()

accuracy = 0
test_loss = 0
for ii, (images, labels) in enumerate(testloader):

    # don't save the history
    with torch.no_grad():
        inputs = Variable(images)
        labels = Variable(labels)

        inputs, labels = inputs.to(device), labels.to(device)

        output = new_model.forward(inputs)

        ## Calculating the accuracy 
        # Model's output is log-softmax, take exponential to get the probabilities
        ps = torch.exp(output).data
        # Class with highest probability is our predicted class, compare with true label
        equality = (labels.data == ps.max(1)[1])
        # Accuracy is number of correct predictions divided by all predictions, just take the mean
        accuracy += equality.type_as(torch.FloatTensor()).mean()

print("Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

torch.cuda.empty_cache()


Test Accuracy: 0.983


In [14]:
print('-----after optimizer.zero_grad()\n')
print_memory_allocation()

-----after optimizer.zero_grad()

alloceted: 69.5 MB, - max. allocated: 343.4 MB, - cached: 142.0 MB, - max. cached: 464.3 MB

