<a href="https://colab.research.google.com/github/heman1/deep-learning-v2-pytorch/blob/master/Copy_of_Part_8_Transfer_Learning_(Exercises).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer Learning

In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [0]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

In [0]:
# http://pytorch.org/
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision
import torch

tcmalloc: large alloc 1073750016 bytes == 0x5749c000 @  0x7fc5265832a4 0x591a07 0x5b5d56 0x502e9a 0x506859 0x502209 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x507641 0x502209 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x507641 0x504c28 0x502540 0x502f3d 0x507641


In [0]:
!wget -c https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip;
!unzip -qq Cat_Dog_data.zip

--2018-12-31 18:28:09--  https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.224.99
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.224.99|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 580495262 (554M) [application/zip]
Saving to: ‘Cat_Dog_data.zip’


2018-12-31 18:28:46 (15.2 MB/s) - ‘Cat_Dog_data.zip’ saved [580495262/580495262]



Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [0]:
#
data_dir = 'Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                      transforms.RandomResizedCrop(224),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor()])

test_transforms = transforms.Compose([transforms.CenterCrop(224),
                                    transforms.ToTensor()])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [0]:
model = models.densenet121(pretrained=True)

  nn.init.kaiming_normal(m.weight.data)
Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /root/.torch/models/densenet121-a639ec97.pth
100%|██████████| 32342954/32342954 [00:00<00:00, 72616961.42it/s]


This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [0]:
# Freeze parameters so we don't backprop through them
# update the classifier and kep features unchanged
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier  # the new classifier is now attached to our model

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [0]:
import time

In [0]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

AttributeError: ignored

You can write device agnostic code which will automatically use CUDA if it's enabled like so:
```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```

From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

>**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen.

In [0]:
!pip install Pillow==4.0.0
!pip install PIL
!pip install image

Collecting Pillow==4.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/37/e8/b3fbf87b0188d22246678f8cd61e23e31caa1769ebc06f1664e2e5fe8a17/Pillow-4.0.0-cp36-cp36m-manylinux1_x86_64.whl (5.6MB)
[K    100% |████████████████████████████████| 5.6MB 7.0MB/s 
[31mtorchvision 0.2.1 has requirement pillow>=4.1.1, but you'll have pillow 4.0.0 which is incompatible.[0m
Installing collected packages: Pillow
  Found existing installation: Pillow 5.3.0
    Uninstalling Pillow-5.3.0:
      Successfully uninstalled Pillow-5.3.0
Successfully installed Pillow-4.0.0
Collecting PIL
[31m  Could not find a version that satisfies the requirement PIL (from versions: )[0m
[31mNo matching distribution found for PIL[0m


In [0]:
## TODO: Use a pretrained model to classify the cat and dog images
## check for GPU return cuda else return cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## use the pretraines model (import from torchvision)
model = models.resnet50(pretrained=True)

#make sure parameteres of model are frozen and turn off the gradients
for param in model.parameters():
  param.requires_grad = False

#define a new classifier which needs to be trained
classifier = nn.Sequential(nn.Linear(2048, 512),
                     nn.ReLU(),
                     nn.Dropout(p=0.2),
                     nn.Linear(512,2),
                     nn.LogSoftmax(dim=1))
model.fc = classifier  #attach to our model

#define loss
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)
model.to(device)  #move our model to available device

epochs =1
steps = 0
running_loss = 0
print_every = 5

for epoch in range(epochs):
  for images, labels in trainloader:
    steps += 1
    
    images, labels = images.to(device), labels.to(device) #move them to GPU
    optimizer.zero_grad()
    logps = model(images)
    loss = criterion(logps, labels)
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item() #keep track of running_loss
    
    #drop out of training loop and network network accuracy
    if steps % print_every == 0:
      #validation code
      model.eval()
      test_loss = 0
      accuracy = 0
      
      for images, labels in testloader:
        images, labels = images.to(device), labels.to(device)
        logps = model(images) #images from test set  
        loss = criterion(logps, labels)  #calculate loss
        test_loss += loss.item()  #keep track of loss
        
        ps = torch.exp(logps)
        top_ps, top_class = ps.topk(1, dim=1)  #gives first largest value in our probablites
        equals = top_class == labels.view(*top_class.shape)
        accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
        
        #print the variables
      print(f"Epoch {epoch+1}/{epochs})"
              f"Train_loss: {running_loss/print_every:.3f}"
              f"Test Loss: {test_loss/len(testloader):.3f}"  #batches
              f"Test accuracy: {accuracy/len(testloader):.3f}"
             )
      runnnig_loss = 0
      model.train()

Epoch 1/1)Train_loss: 2.602Test Loss: 0.362Test accuracy: 0.823
Epoch 1/1)Train_loss: 3.520Test Loss: 0.741Test accuracy: 0.616
Epoch 1/1)Train_loss: 3.954Test Loss: 0.138Test accuracy: 0.953
Epoch 1/1)Train_loss: 4.328Test Loss: 0.133Test accuracy: 0.955
Epoch 1/1)Train_loss: 4.604Test Loss: 0.181Test accuracy: 0.926
Epoch 1/1)Train_loss: 4.807Test Loss: 0.127Test accuracy: 0.951
Epoch 1/1)Train_loss: 5.039Test Loss: 0.107Test accuracy: 0.960
Epoch 1/1)Train_loss: 5.317Test Loss: 0.088Test accuracy: 0.969
Epoch 1/1)Train_loss: 5.473Test Loss: 0.078Test accuracy: 0.970
Epoch 1/1)Train_loss: 5.677Test Loss: 0.080Test accuracy: 0.968
Epoch 1/1)Train_loss: 5.802Test Loss: 0.080Test accuracy: 0.970
Epoch 1/1)Train_loss: 5.995Test Loss: 0.072Test accuracy: 0.971
Epoch 1/1)Train_loss: 6.140Test Loss: 0.071Test accuracy: 0.975
Epoch 1/1)Train_loss: 6.285Test Loss: 0.083Test accuracy: 0.968
Epoch 1/1)Train_loss: 6.500Test Loss: 0.073Test accuracy: 0.973
Epoch 1/1)Train_loss: 6.672Test Loss: 0.

KeyboardInterrupt: ignored