# Transfer Learning
https://classroom.udacity.com/courses/ud188/lessons/c5706f76-0e30-4b48-b74e-c19fafc33a75/concepts/80e61337-4d42-4292-9a3b-791f65733d23



In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

https://pytorch.org/docs/stable/torchvision/models.html


ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).
https://www.youtube.com/watch?v=2-Ol7ZB0MmU

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [2]:
import os
print(os.listdir())
#print(os.listdir('./olga_data_science_machine_learning2'))

['.ipynb_checkpoints', '99_2Python_Datacamp_16_interactive_data_visualization_with_bokeh.ipynb', '99_Python_24a_Bayesian_Linear_Regression_portuguese_data_normal_dist.ipynb', '99_Python_24a_Bayesian_Linear_Regression_portuguese_data_t_dist.ipynb', '99_Python_24b_Bayesian_Linear_Regression_portuguese_data_poisson.ipynb', '99_Python_24_Bayesian_Linear_Regression.ipynb', '99_Python_25a_Markov_Chain_Monte_Carlo_MCMC_awake_model.ipynb', '99_Python_25_Markov_Chain_Monte_Carlo_MCMC.ipynb', '99_Python_26_probabilties_with_Bayesian_Inference.ipynb', '99_Python_27_data_visualizations_plotly.ipynb', '99_Python_28_types_of_probability_distributions.ipynb', '99_Python_Datacamp_16_interactive_data_visualization_with_bokeh.ipynb', '99_Python_Datacamp_17_supervised-learning-with-scikit-learn.ipynb', '99_Python_Datacamp_18_machine_learning_with_the_experts_school_budgets.ipynb', '99_Python_Datacamp_18_Unsupervised_learning.ipynb', '99_Python_Datacamp_19_SQLIntro_SQLJoins.ipynb', '99_Python_Datacamp_20_

In [3]:
from os import walk
for (dirpath, dirnames, filenames) in walk('../olga_data_science_machine_learning2/'):
    print('directory path: ----', dirpath)
    print('folder name: ----', dirnames)

directory path: ---- ../olga_data_science_machine_learning2/
folder name: ---- ['.ipynb_checkpoints', 'Bayesian_methods_for_hackers', 'cats_dogs', 'cat_test', 'data', 'mnist', 'student', 'test1']
directory path: ---- ../olga_data_science_machine_learning2/.ipynb_checkpoints
folder name: ---- []
directory path: ---- ../olga_data_science_machine_learning2/Bayesian_methods_for_hackers
folder name: ---- []
directory path: ---- ../olga_data_science_machine_learning2/cats_dogs
folder name: ---- ['test1', 'train']
directory path: ---- ../olga_data_science_machine_learning2/cats_dogs\test1
folder name: ---- []
directory path: ---- ../olga_data_science_machine_learning2/cats_dogs\train
folder name: ---- []
directory path: ---- ../olga_data_science_machine_learning2/cat_test
folder name: ---- []
directory path: ---- ../olga_data_science_machine_learning2/data
folder name: ---- ['cifar-10-batches-py']
directory path: ---- ../olga_data_science_machine_learning2/data\cifar-10-batches-py
folder name

In [4]:
# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
#load train data
train_data = datasets.ImageFolder('../cat_tst', transform=train_transforms)
#test_data = datasets.ImageFolder('../test1', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=32)
#testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

train data in in directory:
C:\Users\dave_\Documents\cat_tst\new_dir


test data is in directory:
C:\Users\dave_\Documents\test1\new_dir

otherwise doest load!!!! 

In [7]:
#load test data
#
test_data = datasets.ImageFolder('../test1', transform=test_transforms)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)


We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [8]:
model = models.densenet121(pretrained=True)
# to look what is the architechture for this model
model

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to C:\Users\dave_/.cache\torch\hub\checkpoints\densenet121-a639ec97.pth


HBox(children=(FloatProgress(value=0.0, max=32342954.0), HTML(value='')))




DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

This model is built out of two main parts, the features and the classifier:
Look at the start:
DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    
The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier.
Look at the bottom:
)
  (classifier): Linear(in_features=1024, out_features=1000, bias=True)
)

The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [9]:
# Freeze parameters so we don't backprop through them
#when we run our tensors throw the model it will not calculate
#gradients, it is not going to keep track of all these operations
for param in model.parameters():
    param.requires_grad = False

# replacement of classifier, because it was trained for imagenet dataset
#with our own classifier
from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
    #fully connected layer
                          ('fc1', nn.Linear(1024, 500)),
    #relu activation
                          ('relu', nn.ReLU()),
    #another fully connected layer
                          ('fc2', nn.Linear(500, 2)),
    #output layer
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
#attach classifier to our model    
model.classifier = classifier

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [10]:
import time

In [11]:
for device in ['cpu', 'cuda']:
    #define criterion which will be natural log_loss
    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    #only update parameters
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    #move model parameters and other tensors to GPU memory device = cuda, see above
    model.to(device)

    #little training loop
    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        #update weights
        optimizer.step()
        #after first three iterations
        #how long it takes to process one batch
        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

Device = cpu; Time per batch: 1.692 seconds
Device = cuda; Time per batch: 0.014 seconds


You can write device agnostic code which will automatically use CUDA if it's enabled like so:
```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```

From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

>**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen.

In [15]:
#to check if GPU available, will give TRUE or FALSE
cuda = torch.cuda.is_available()
cuda

True

In [12]:
## TODO: Use a pretrained model to classify the cat and dog images
# densenet121 model
# Use GPU if it's available
#check if GPU available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#pretrained model
model = models.densenet121(pretrained=True)

# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False
    
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

model.to(device);


In [13]:
epochs = 1
steps = 0
running_loss = 0
print_every = 5
for epoch in range(epochs):
    for inputs, labels in trainloader:
        steps += 1
        # Move input and label tensors to the default device
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

Epoch 1/1.. Train loss: 0.169.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy: 1.000
Epoch 1/1.. Train loss: 0.000.. Test loss: 0.000.. Test accuracy

In [17]:
#same but with model resnet
#  resnet18 = models.resnet18()
#the larger the number (18) the larger the model is, and better accuracy, longer it will take 
#to compute

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#pretrained model
model = models.resnet50(pretrained=True)
model

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to C:\Users\dave_/.cache\torch\hub\checkpoints\resnet50-19c8e357.pth


HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [19]:
#need to modify model as it was trained on imagenet

#freeze parameters, turn off gradients for a model
for param in model.parameters():
    param.requires_grad = False
    
#define new classifier
classifier = nn.Sequential(nn.Linear(2048, 512),
                       nn.ReLU(),
                       nn.Dropout(p=0.2),
                       nn.Linear(512,2),
                       nn.LogSoftmax(dim=1))
    
#attach classifier to our model
model.fc = classifier
model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [21]:
#define loss
#nevative log loss
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

model.to(device);

In [26]:
#define variables that we will be using during training
epochs = 1
steps = 0
running_loss = 0
print_every = 5

for epoch in range(epochs):
    for images, labels in trainloader:
        steps += 1
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        #log probabilities from our model
        logps = model(images)
        #loss from criterion in the labels
        loss = criterion(logps, labels)
        #backwards pass
        loss.backward()
        #our optimizer
        optimizer.step
        
        #increment are running loss like
        #so we can track our loss
        running_loss += loss.item()
        
        #test our network's accuracy and loss on our test dataset
        # for step modulo print every...
        if steps % print_every == 0:
            model.eval()
            test_loss = 0
            accuracy = 0
            
            for images, labels in testloader:
                #transfer tensors over to GPU
                images, labels = images.to(device), labels.to(device)
                
                logps = model(images)
                loss = criterion(logps, labels)
                #keep track on test loss
                test_loss += loss.item()
                
                #calculate our accuracy
                ps = torch.exp(logps)
                #set dimention to 1 so it is looking at top probabilities
                #along the columns
                top_ps, top_class = ps.topk(1, dim=1)
                #check for equality
                equality = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equality.type(torch.FloatTensor))
                
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()
                

Epoch 1/1.. Train loss: 0.689.. Test loss: 0.678.. Test accuracy: 0.638
Epoch 1/1.. Train loss: 0.687.. Test loss: 0.676.. Test accuracy: 0.653
Epoch 1/1.. Train loss: 0.680.. Test loss: 0.674.. Test accuracy: 0.672
Epoch 1/1.. Train loss: 0.681.. Test loss: 0.676.. Test accuracy: 0.656
Epoch 1/1.. Train loss: 0.667.. Test loss: 0.676.. Test accuracy: 0.656
Epoch 1/1.. Train loss: 0.674.. Test loss: 0.676.. Test accuracy: 0.658
Epoch 1/1.. Train loss: 0.680.. Test loss: 0.677.. Test accuracy: 0.657
Epoch 1/1.. Train loss: 0.671.. Test loss: 0.674.. Test accuracy: 0.678
Epoch 1/1.. Train loss: 0.678.. Test loss: 0.674.. Test accuracy: 0.678
Epoch 1/1.. Train loss: 0.678.. Test loss: 0.673.. Test accuracy: 0.684
Epoch 1/1.. Train loss: 0.674.. Test loss: 0.680.. Test accuracy: 0.620
Epoch 1/1.. Train loss: 0.675.. Test loss: 0.683.. Test accuracy: 0.597
Epoch 1/1.. Train loss: 0.671.. Test loss: 0.685.. Test accuracy: 0.582
Epoch 1/1.. Train loss: 0.675.. Test loss: 0.685.. Test accuracy

Watch those shapes
In general, you'll want to check that the tensors going through your model and other code are the correct shapes. Make use of the .shape method during debugging and development.

A few things to check if your network isn't training appropriately
Make sure you're clearing the gradients in the training loop with optimizer.zero_grad(). If you're doing a validation loop, be sure to set the network to evaluation mode with model.eval(), then back to training mode with model.train().

CUDA errors
Sometimes you'll see this error:

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #1 ‘mat1’

You'll notice the second type is torch.cuda.FloatTensor, this means it's a tensor that has been moved to the GPU. It's expecting a tensor with type torch.FloatTensor, no .cuda there, which means the tensor should be on the CPU. PyTorch can only perform operations on tensors that are on the same device, so either both CPU or both GPU. If you're trying to run your network on the GPU, check to make sure you've moved the model and all necessary tensors to the GPU with .to(device) where device is either "cuda" or "cpu".