# Transfer Learning
Now, we will see how to use already trained networks to solve (similar) problems.  
We will use [ImageNet](https://image-net.org/) available from [torchvision](https://pytorch.org/vision/stable/datasets.html#imagenet).  
  
Imagenet is a massive dataset available (over than 1 million images) in 1000 categories.  
It is used to train neural networks which uses convolutional layers. In order to learn more about convolutional layers see [this viedo](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).  
After trained such models are astonishingly well as features detectors for images they weren't trained on.  
The use of a pre-trained network on images which weren't present in the training set is called *transfer learning*.  
We'll use transfer learning to train a neural network to classify cats and dogs with an accuracy near the perfection.  
  
There is an opportune module called `torchvision.models` from which you can download pre-trained models and use them for your applications.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pre-trained models require the input to have 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel were normalized separately and the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`. 

In [2]:
data_dir = "Cat_Dog_data"
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])


# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load a model such as [ResNet50](https://pytorch.org/docs/0.3.0/torchvision/models.html#id3) in order to use the CPU. Let's print the model architecture, so we can know what is going on.

In [3]:
model = models.resnet50(pretrained=True)
model

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to C:\Users\sferro/.cache\torch\hub\checkpoints\resnet50-19c8e357.pth
100.0%


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

This model is built on 2 main parts:  
1. features  
2. classifier  
The features extractor is the first part and it is made of several convolutional layers whose result is fed into a classifier.  
The classifier part is a single fully connected layer: `(fc): Linear(in_features=2048, out_features=1000, bias=True)`.  
This last layer was trained for ImageNet and thus it won't work in our case, for example we have 2 classes and not 1000.  
So, we need to replace the classifier, only.  
** Pre-trained networks are used to extract features' representations to be fed into feed-forward classifiers.**


In [10]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(2048, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

Now, we have our model built and we need to train the classifier. That is a really deep neural network, so we cannot use only the CPU to train it, but also the GPU. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds.  
It is possible to train on multiple GPUs in order to decrease more the time of training.  
  
PyTorch, along with pretty much every other deep learning framework, uses CUDA in order to efficiently perform the forward and the backward passes.  
In PyTorch you moves all the tensors to the memory of the GPU using `model.to("cuda")` and you can move back from GPU `model.to("cpu")`

In [11]:
import time
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()
        
        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Let's see if we have cuda available in our system.

In [13]:
cuda = torch.cuda.is_available()
print(cuda)

False
