# Transfer Learning

In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=8, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=8, shuffle=True)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [31]:
model = models.densenet121(pretrained=True)
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [32]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [33]:
import time

In [34]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):
        print(inputs.shape)
        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
Device = cpu; Time per batch: 0.177 seconds
torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
torch.Size([16, 3, 224, 224])
Device = cuda; Time per batch: 0.005 seconds


You can write device agnostic code which will automatically use CUDA if it's enabled like so:
```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```

From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

>**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen.

In [36]:
## Use a pretrained model to classify the cat and dog images
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

torch.cuda.empty_cache()
model.cuda()

epochs = 2

test_losses, train_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
                
#         images = images[:,1,:].view(images[:,1,:].shape[0],-1)
        images, labels = images.cuda(), labels.cuda()
            
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        test_loss = 0
        accuracy = 0
        
        with torch.no_grad():
            model.eval()
            for images, labels in testloader:
                
#                 images = images[:,1,:].view(images[:,1,:].shape[0],-1)
                images, labels = images.cuda(), labels.cuda()
                
                log_ps = model(images)
                test_loss += criterion(log_ps, labels)
                ps = torch.exp(log_ps)
                top_ps, top_pred = ps.topk(1, dim=1)
                
                matches = top_pred == labels.view(*top_pred.shape)
                accuracy += torch.mean(matches.type(torch.float))
                
        model.train()

        test_losses.append(test_loss / len(testloader))
        train_losses.append(running_loss / len(trainloader))
        
    print("Epoch: {}/{}.. ".format(e+1, epochs),
      "Training Loss: {:.3f}.. ".format(train_losses[-1]),
      "Test Loss: {:.3f}.. ".format(test_losses[-1]),
      "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))


Epoch: 1/2..  Training Loss: 0.191..  Test Loss: 0.036..  Test Accuracy: 0.984
Epoch: 2/2..  Training Loss: 0.182..  Test Loss: 0.034..  Test Accuracy: 0.984


#### Using ResNet model

In [3]:
resmodel = models.resnet50(pretrained=True)
for params in resmodel.parameters():
    params.require_grad = False

In [4]:
resmodel

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [5]:
classifier = nn.Sequential(nn.Linear(2048,128),
                           nn.ReLU(),
                           nn.Dropout(p=0.2),
                           nn.Linear(128,2),
                           nn.LogSoftmax(dim=1))

resmodel.fc = classifier

criterion = nn.NLLLoss()

optimizer = optim.Adam(resmodel.fc.parameters(), lr=0.002)

In [6]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.cuda.empty_cache()
resmodel.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [12]:
def validator(model, loader, batch_size):
    
    with torch.no_grad(): # turn off gradient calculations
        model.eval() # turn off dropout
        val_step = 0
        val_loss = 0
        accuracy = 0

        for images, labels in loader: # images in validation set
            val_step += 1

            images, labels = images.to(device), labels.to(device) # pushing to available device

            log_ps = model.forward(images) # feed forward
            ps = torch.exp(log_ps) # convert log to actual probabilities
            top_ps, top_pred = ps.topk(1, dim=1) # get the top probability and its index

            matches = top_pred == labels.view(*top_pred.shape) # compare prediction with actual labels
            accuracy += torch.mean(matches.type(torch.float)) # compute and accumulate accuracy for this batch

            batch_loss = criterion(log_ps, labels) # calculte loss in this run
            val_loss += batch_loss.item() # accumulate validation losses

            if val_step >= batch_size: # don't want to go through all the validations cases each time
                break

    model.train() # turn back on dropout
    
    return val_loss, accuracy

In [13]:
epochs = 1
print_steps = 5

for e in range(epochs):
    running_loss = 0
    steps = 0
    for images, labels in trainloader:
        steps += 1
        
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        log_ps = resmodel.forward(images)
        loss = criterion(log_ps, labels)
        
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if steps % print_steps == 0:
            
            test_loss, accuracy = validator(model=resmodel, loader=testloader, batch_size=999)
            
#             test_loss = 0
#             accuracy = 0
#             with torch.no_grad():
#                 for images, labels in testloader:
#                     resmodel.eval()
                    
#                     images, labels = images.to(device), labels.to(device)
#                     log_ps = resmodel.forward(images)
#                     ps = torch.exp(log_ps)
#                     top_ps, top_pred = ps.topk(1, dim=1)
                    
#                     batch_loss = criterion(log_ps, labels)
#                     test_loss += batch_loss.item()
                    
#                     matches = top_pred == labels.view(*top_pred.shape)
#                     accuracy += torch.mean(matches.type(torch.float))
                        
            print(f"Epoch {e+1}/{epochs}.. "
                  f"Step {steps}.."
                  f"Train loss: {running_loss/print_steps:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            
            running_loss = 0
                

Epoch 1/1.. Step 5..Train loss: 0.470.. Test loss: 0.872.. Test accuracy: 0.698
Epoch 1/1.. Step 10..Train loss: 0.800.. Test loss: 0.248.. Test accuracy: 0.899
Epoch 1/1.. Step 15..Train loss: 0.720.. Test loss: 0.178.. Test accuracy: 0.931
Epoch 1/1.. Step 20..Train loss: 0.201.. Test loss: 0.134.. Test accuracy: 0.947
Epoch 1/1.. Step 25..Train loss: 0.673.. Test loss: 0.090.. Test accuracy: 0.968
Epoch 1/1.. Step 30..Train loss: 0.568.. Test loss: 0.180.. Test accuracy: 0.924
Epoch 1/1.. Step 35..Train loss: 0.420.. Test loss: 0.210.. Test accuracy: 0.919
Epoch 1/1.. Step 40..Train loss: 0.625.. Test loss: 0.116.. Test accuracy: 0.962
Epoch 1/1.. Step 45..Train loss: 0.494.. Test loss: 0.198.. Test accuracy: 0.920
Epoch 1/1.. Step 50..Train loss: 0.318.. Test loss: 0.153.. Test accuracy: 0.940
Epoch 1/1.. Step 55..Train loss: 0.557.. Test loss: 0.126.. Test accuracy: 0.963
Epoch 1/1.. Step 60..Train loss: 0.236.. Test loss: 0.148.. Test accuracy: 0.956
Epoch 1/1.. Step 65..Train lo

Epoch 1/1.. Step 510..Train loss: 0.362.. Test loss: 0.058.. Test accuracy: 0.981
Epoch 1/1.. Step 515..Train loss: 0.274.. Test loss: 0.119.. Test accuracy: 0.952
Epoch 1/1.. Step 520..Train loss: 0.181.. Test loss: 0.066.. Test accuracy: 0.975
Epoch 1/1.. Step 525..Train loss: 0.264.. Test loss: 0.054.. Test accuracy: 0.980
Epoch 1/1.. Step 530..Train loss: 0.260.. Test loss: 0.055.. Test accuracy: 0.979
Epoch 1/1.. Step 535..Train loss: 0.239.. Test loss: 0.090.. Test accuracy: 0.965
Epoch 1/1.. Step 540..Train loss: 0.304.. Test loss: 0.081.. Test accuracy: 0.968
Epoch 1/1.. Step 545..Train loss: 0.148.. Test loss: 0.061.. Test accuracy: 0.979
Epoch 1/1.. Step 550..Train loss: 0.352.. Test loss: 0.064.. Test accuracy: 0.978
Epoch 1/1.. Step 555..Train loss: 0.225.. Test loss: 0.063.. Test accuracy: 0.978
Epoch 1/1.. Step 560..Train loss: 0.293.. Test loss: 0.063.. Test accuracy: 0.979
Epoch 1/1.. Step 565..Train loss: 0.388.. Test loss: 0.162.. Test accuracy: 0.931
Epoch 1/1.. Step

Epoch 1/1.. Step 1010..Train loss: 0.271.. Test loss: 0.080.. Test accuracy: 0.981
Epoch 1/1.. Step 1015..Train loss: 0.252.. Test loss: 0.073.. Test accuracy: 0.977
Epoch 1/1.. Step 1020..Train loss: 0.331.. Test loss: 0.067.. Test accuracy: 0.976
Epoch 1/1.. Step 1025..Train loss: 0.278.. Test loss: 0.064.. Test accuracy: 0.977
Epoch 1/1.. Step 1030..Train loss: 0.460.. Test loss: 0.076.. Test accuracy: 0.968
Epoch 1/1.. Step 1035..Train loss: 0.251.. Test loss: 0.065.. Test accuracy: 0.981
Epoch 1/1.. Step 1040..Train loss: 0.187.. Test loss: 0.066.. Test accuracy: 0.981
Epoch 1/1.. Step 1045..Train loss: 0.211.. Test loss: 0.057.. Test accuracy: 0.983
Epoch 1/1.. Step 1050..Train loss: 0.212.. Test loss: 0.061.. Test accuracy: 0.976
Epoch 1/1.. Step 1055..Train loss: 0.300.. Test loss: 0.057.. Test accuracy: 0.981
Epoch 1/1.. Step 1060..Train loss: 0.242.. Test loss: 0.055.. Test accuracy: 0.980
Epoch 1/1.. Step 1065..Train loss: 0.408.. Test loss: 0.065.. Test accuracy: 0.977
Epoc

Epoch 1/1.. Step 1505..Train loss: 0.179.. Test loss: 0.059.. Test accuracy: 0.977
Epoch 1/1.. Step 1510..Train loss: 0.269.. Test loss: 0.050.. Test accuracy: 0.981
Epoch 1/1.. Step 1515..Train loss: 0.263.. Test loss: 0.050.. Test accuracy: 0.981
Epoch 1/1.. Step 1520..Train loss: 0.105.. Test loss: 0.060.. Test accuracy: 0.975
Epoch 1/1.. Step 1525..Train loss: 0.262.. Test loss: 0.083.. Test accuracy: 0.967
Epoch 1/1.. Step 1530..Train loss: 0.378.. Test loss: 0.052.. Test accuracy: 0.983
Epoch 1/1.. Step 1535..Train loss: 0.230.. Test loss: 0.062.. Test accuracy: 0.980
Epoch 1/1.. Step 1540..Train loss: 0.130.. Test loss: 0.067.. Test accuracy: 0.977
Epoch 1/1.. Step 1545..Train loss: 0.176.. Test loss: 0.055.. Test accuracy: 0.979
Epoch 1/1.. Step 1550..Train loss: 0.262.. Test loss: 0.046.. Test accuracy: 0.983
Epoch 1/1.. Step 1555..Train loss: 0.169.. Test loss: 0.057.. Test accuracy: 0.978
Epoch 1/1.. Step 1560..Train loss: 0.545.. Test loss: 0.057.. Test accuracy: 0.978
Epoc

Epoch 1/1.. Step 2000..Train loss: 0.310.. Test loss: 0.047.. Test accuracy: 0.982
Epoch 1/1.. Step 2005..Train loss: 0.353.. Test loss: 0.048.. Test accuracy: 0.982
Epoch 1/1.. Step 2010..Train loss: 0.109.. Test loss: 0.050.. Test accuracy: 0.980
Epoch 1/1.. Step 2015..Train loss: 0.424.. Test loss: 0.047.. Test accuracy: 0.984
Epoch 1/1.. Step 2020..Train loss: 0.339.. Test loss: 0.057.. Test accuracy: 0.979
Epoch 1/1.. Step 2025..Train loss: 0.147.. Test loss: 0.053.. Test accuracy: 0.980
Epoch 1/1.. Step 2030..Train loss: 0.333.. Test loss: 0.050.. Test accuracy: 0.982
Epoch 1/1.. Step 2035..Train loss: 0.354.. Test loss: 0.057.. Test accuracy: 0.984
Epoch 1/1.. Step 2040..Train loss: 0.213.. Test loss: 0.068.. Test accuracy: 0.979
Epoch 1/1.. Step 2045..Train loss: 0.303.. Test loss: 0.064.. Test accuracy: 0.982
Epoch 1/1.. Step 2050..Train loss: 0.403.. Test loss: 0.066.. Test accuracy: 0.979
Epoch 1/1.. Step 2055..Train loss: 0.302.. Test loss: 0.075.. Test accuracy: 0.978
Epoc

Epoch 1/1.. Step 2495..Train loss: 0.299.. Test loss: 0.072.. Test accuracy: 0.980
Epoch 1/1.. Step 2500..Train loss: 0.343.. Test loss: 0.061.. Test accuracy: 0.984
Epoch 1/1.. Step 2505..Train loss: 0.248.. Test loss: 0.052.. Test accuracy: 0.983
Epoch 1/1.. Step 2510..Train loss: 0.346.. Test loss: 0.049.. Test accuracy: 0.983
Epoch 1/1.. Step 2515..Train loss: 0.196.. Test loss: 0.045.. Test accuracy: 0.983
Epoch 1/1.. Step 2520..Train loss: 0.146.. Test loss: 0.041.. Test accuracy: 0.983
Epoch 1/1.. Step 2525..Train loss: 0.299.. Test loss: 0.044.. Test accuracy: 0.984
Epoch 1/1.. Step 2530..Train loss: 0.305.. Test loss: 0.049.. Test accuracy: 0.982
Epoch 1/1.. Step 2535..Train loss: 0.123.. Test loss: 0.047.. Test accuracy: 0.983
Epoch 1/1.. Step 2540..Train loss: 0.223.. Test loss: 0.056.. Test accuracy: 0.981
Epoch 1/1.. Step 2545..Train loss: 0.557.. Test loss: 0.052.. Test accuracy: 0.983
Epoch 1/1.. Step 2550..Train loss: 0.257.. Test loss: 0.091.. Test accuracy: 0.959
Epoc