# Previously

In the past notebooks, we learned from preliminaries and the alternatives doing it to inference. So far, we're using built networks with 2-3 layers. 

# In this notebook,

we'll learn:
- how to use the real and pretrained deep neural network models via Transfer Learning
- see the changes in our basic skeleton process
- to do classification cats and dogs using these models

# Pre-trained Neural Networks

***Remark:***
- these networks are pre-trained using ImageNet

**ImageNet**
- [ImageNet](http://www.image-net.org/) [is available via torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html)
- a massive dataset with over 1 million labeled images in 1000 categories
- used to train deep neural networks using an architecture called convolutional layers (Lesson 6 not so far!)

**Pre-trained Neural Networks**
- trained using ImageNet, these models work astonishingly well as feature detectors for images they weren't trained on
- these networks can be downloaded with `torchvision.models`
  ***Remark:***
    - we will use [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5) in our notebook

**Transfer-Learning**
- using a pre-trained network on images not in the training set



# Preliminaries

The changes here are the ff:
- the size of the input images to the models
  - most of the pretrained models require the input to be 224x224 images
- image normalization 
  - need to match the normalization used when the models were trained
  - each color channel was normalized separately
    - means: `[0.438, 0.456, 0.406]` 
    - standard deviation: `[0.229, 0.224, 0.225]`

In [0]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

In [0]:
data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

# *Using Pre-trained Models

This part is actually the "Building/Defining Neural Networks" stage.
Here the changes are:
- using a pre-trained network as feature detector
- defining our simple feed-forward classifier

**(2) Two Parts of the DenseNet Model
1. Features Part
- a stack of convolutional layers
- overall works as a feature detector that can be fed into a classifier
- fine not to be replaced. They should work perfectly on their own

2. Classifier Part
- a single fully-connected layer `(classifier: Linear(in_features=1024, out_features=1000)`

  ***Remark:***
  - this layer was trained on the ImageNet dataset, so it won't work for our specific problem
- we need to replace this part

***Remark:***
- In general, pre-trained networks are amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.


In [0]:
# Loading the DenseNet model
model = models.densenet121(pretrained=True)
model # printing out the architecture

In [0]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

# Training the Neural Network

## PyTorch and Processing Time

Given that we're using a **really deep** neural network

**Training using CPUs**
- this will take a long, long time

**Training on GPUs**
- linear algebra computations are done in parallel leading to 100x increased training speeds
- also possible to train on multiple GPUs, further decreasing training time

**PyTorch and GPUs**
- PyTorch uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU
- In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. (You can move them back from the GPU with `model.to('cpu')`)
- You can write device agnostic code which will automatically use CUDA if it's enabled like so:

```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```



### Comparing CPU and GPU time via feedforward and backward pass

In [0]:
import time

In [0]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

## Training using Pre-trained models

In [0]:
# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = models.densenet121(pretrained=True)

# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False
    
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

model.to(device);

In [0]:
epochs = 5
steps = 0
running_loss = 0
print_every = 5
for epoch in range(epochs):
    for inputs, labels in trainloader:
        steps += 1
        # Move input and label tensors to the default device
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

# Congratulations!

As this being the last notebook in Lesson 5: Introduction to PyTorch you learned:
1. Using PyTorch Tensors to Build Neural Networks

2. Defining Neural Networks in PyTorch (plus Forward Pass)
3. Training Neural Networks

4. Multi-Class Classification via Fashion MNIST

5. Validation Pass, Improving Performance and Inference
6. Saving and Loading Models

7. More on Preliminaries (Before Training) 

8. Transfer Learning

# Next Up!

Since we've been dealing with images, let's dive in to Lesson 6: Introduction to Convolutional Neural Networks (CNN) to know more about it!