# Transfer Learning
In this notebook, we learn to use a pre-trained network to solve challenging problems in computer vision. Specifically, we use networks trained on ImageNet available from torchvision.

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers.

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With torchvision.models you can download these pre-trained networks and use them in your applications.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
import time

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5) or [ResNet](https://pytorch.org/docs/0.3.0/torchvision/models.html#id3). 
Let's print out the model architecture so we can see what's going on.

In [3]:
# Also try model = models.densenet121(pretrained=True)

model = models.resnet50(pretrained=True) # The number 50 means 50 hidden layers
model

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.torch/models/resnet50-19c8e357.pth
100%|██████████| 102502400/102502400 [00:01<00:00, 76471257.32it/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=F

This model is built out of two main parts, the features and the classifier(fc). The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(fc): Linear(in_features=2048, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier(fc), but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [4]:
# Now, when we pass the device to all tensors and model, it will automatically run on GPU If GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [5]:
# To check if running on GPU
torch.cuda.is_available()

True

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. 
Instead, I have access while learning this course on **UDACITY's "Intro to ML with PyTorch" Nano degree** and I'm gonna use GPU to do the calculations.
The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch.

In [6]:
# Turn off gradient for our model, so  they don't get updated when we train the model,
for param in model.parameters():
    param.requires_grad = False
    
    
# Define the new classifier
classfier = nn.Sequential(nn.Linear(2048,512),
                          nn.ReLU(),
                          nn.Dropout(p=0.2),
                          nn.Linear(512,2),
                          nn.LogSoftmax(dim=1)
                         )
model.fc = classfier

# Define criterion & optimizer
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

model.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=F

As you can see above, the classifier(fc) has been updated

In [None]:
# Define the variables to be used during the training
epochs = 1
step = 0  # tracking the number of steps
running_loss = 0
print_every = 5  # How many steps to go before printing the validation loss  


# training loop
for epoch in range(epochs):
    for images, labels in trainloader:
        step += 1
        
        images, labels = images.to(device), labels.to(device)  # move tensors to GPU is available
        
        optimizer.zero_grad()
        
        logps = model(images)  # log probabilities
        train_loss = criterion(logps,labels)
        train_loss.backward()
        optimizer.step()
        
        running_loss += train_loss.item()  # Keep track of the losses

# Validation loop: 
# Every once in a while (print_every) we need to drop out of the training loop and

# test our network's accuracy and loss on our test dataset

        if step % print_every ==0:
            model.eval()   # start evaluation mode for making predicions
            
            test_loss = 0
            accuracy = 0
            
            for images,labels in testloader:
                
                images, labels = images.to(device), labels.to(device)  # move tensors to GPU is available
                
                logps = model(images)
                test_loss = criterion(logps,labels)
                test_loss += test_loss.item() 
                
# Calculate the accuracy

# this model is returning log probabilities[LogSoftmax()] of classes and
# to get actual probabilites we use torch.exp()
                ps = torch.exp(logps)  

                top_ps, top_class = ps.topk(1,dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
            
            
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "  # average of training loss
                  f"Test loss: {test_loss/len(testloader):.3f}.. "  # avg = total test_loss by the # of batches
                  f"Test accuracy: {accuracy/len(testloader):.3f}") # avg = sum of all accuracies by # of batches
            
            
            running_loss = 0 
            model.train()  # put the model back to training mode

Epoch 1/1.. Train loss: 2.162.. Test loss: 0.049.. Test accuracy: 0.579
Epoch 1/1.. Train loss: 0.431.. Test loss: 0.000.. Test accuracy: 0.955
Epoch 1/1.. Train loss: 0.179.. Test loss: 0.001.. Test accuracy: 0.963
Epoch 1/1.. Train loss: 0.211.. Test loss: 0.000.. Test accuracy: 0.952
Epoch 1/1.. Train loss: 0.262.. Test loss: 0.000.. Test accuracy: 0.975
Epoch 1/1.. Train loss: 0.191.. Test loss: 0.000.. Test accuracy: 0.964
Epoch 1/1.. Train loss: 0.116.. Test loss: 0.000.. Test accuracy: 0.977
Epoch 1/1.. Train loss: 0.170.. Test loss: 0.000.. Test accuracy: 0.960
Epoch 1/1.. Train loss: 0.162.. Test loss: 0.000.. Test accuracy: 0.974
Epoch 1/1.. Train loss: 0.189.. Test loss: 0.000.. Test accuracy: 0.975
Epoch 1/1.. Train loss: 0.163.. Test loss: 0.000.. Test accuracy: 0.981
Epoch 1/1.. Train loss: 0.202.. Test loss: 0.000.. Test accuracy: 0.981
Epoch 1/1.. Train loss: 0.168.. Test loss: 0.000.. Test accuracy: 0.970
Epoch 1/1.. Train loss: 0.142.. Test loss: 0.000.. Test accuracy