# Transfer Learning

In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 

ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.

With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now.

In [6]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`.

In [7]:
data_dir = 'Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor()])

test_transforms = transforms.Compose([transforms.RandomResizedCrop(224),
                                       transforms.ToTensor()])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on.

In [8]:
model = models.densenet121(pretrained=True)
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

이 densenet model은 2개의 메인 파트로 이뤄져 있다: features랑 classifier. <br>
**features part**: classifier에 먹일 수 있게끔 하는 **feature detector** (convolutional layers and overall works)<br>
**classifier**:  ImageNet dataset으로 train되어진, single fully-connected layer. <- 우리 예제는 우리 image를 쓰고 있으므로, 얘를 replace해줘야 한다. feature는 똑같은거 사용해도 뭐 완벽히 작동한다. 일반적으로, pre-trained networks의 feature detectors는 간단한 feed-forward classifiers를 위한 인풋으로 사용하기 아주 훌륭하다라고 생각한다.

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

In [9]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

print("before:\n", model.classifier)

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
# classifier 바꿔줬음.
model.classifier = classifier
print("after:\n", model.classifier)

before:
 Linear(in_features=1024, out_features=1000, bias=True)
after:
 Sequential(
  (fc1): Linear(in_features=1024, out_features=500, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=500, out_features=2, bias=True)
  (output): LogSoftmax()
)


이렇게 모델을 만들어 줬으니까, classifier 학습해줘야지. 근데, 지금 우리 **엄청나게 딥한** 뉴럴 네트워크를 쓰려고 하고 있잖아. 일반 CPU를 써서 학습하려고 하면............ 엄청엄청엄청엄청엄청엄청엄청엄청엄청엄청 오래 걸릴꺼다. 그래서 우리는 GPU를 쓸 거다. linear algebra 계산 할 때 병렬로 되니까 100배 빠름. 일단 제대로 돌아가는지 테스트를 하거나(테스트뭐 쓰거나, 1 epoch 정도 돌리기?)한 다음에 GPU로 돌려주자.

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

In [10]:
import time

In [11]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

Device = cpu; Time per batch: 2.211 seconds
Device = cuda; Time per batch: 0.006 seconds


You can write device agnostic code which will automatically use CUDA if it's enabled like so:
```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```

From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

>**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen.

In [16]:
# 처음부터 끝까지 

# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"device: {device}")

model = models.densenet121(pretrained=True)

# Freeze parameters so we don't backprop through them
# Freeze parameters so we don't backprop through them
# requires_grad에 대해서는 Part 3. 에 나와있다. (Autograd)
# torch의 자동으로 gradient계산해 주는 기능.
for param in model.parameters():
    param.requires_grad = False
    
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

model.to(device);

device: cuda


In [17]:
epochs = 1
steps = 0
running_loss = 0
print_every = 5
for epoch in range(epochs):
    for inputs, labels in trainloader:
        steps += 1
        # Move input and label tensors to the default device
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

Epoch 1/1.. Train loss: 0.854.. Test loss: 0.393.. Test accuracy: 0.804
Epoch 1/1.. Train loss: 0.394.. Test loss: 0.256.. Test accuracy: 0.891
Epoch 1/1.. Train loss: 0.287.. Test loss: 0.178.. Test accuracy: 0.934
Epoch 1/1.. Train loss: 0.236.. Test loss: 0.168.. Test accuracy: 0.924
Epoch 1/1.. Train loss: 0.191.. Test loss: 0.154.. Test accuracy: 0.934
Epoch 1/1.. Train loss: 0.171.. Test loss: 0.141.. Test accuracy: 0.940
Epoch 1/1.. Train loss: 0.192.. Test loss: 0.128.. Test accuracy: 0.949
Epoch 1/1.. Train loss: 0.158.. Test loss: 0.132.. Test accuracy: 0.949
Epoch 1/1.. Train loss: 0.173.. Test loss: 0.129.. Test accuracy: 0.948
Epoch 1/1.. Train loss: 0.151.. Test loss: 0.112.. Test accuracy: 0.954
Epoch 1/1.. Train loss: 0.183.. Test loss: 0.133.. Test accuracy: 0.941
Epoch 1/1.. Train loss: 0.186.. Test loss: 0.111.. Test accuracy: 0.954
Epoch 1/1.. Train loss: 0.167.. Test loss: 0.109.. Test accuracy: 0.955
Epoch 1/1.. Train loss: 0.205.. Test loss: 0.124.. Test accuracy

## resnet50 예시

In [39]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

In [36]:
model = models.resnet50(pretrained=True)
print(device)
# gradients 끄기
for param in model.parameters():
    param.requires_grad = False

# 새로운 classifier 정의
classifier = nn.Sequential(nn.Linear(2048, 512),
                           nn.ReLU(),
                           nn.Dropout(p=0.2),
                           nn.Linear(512,2),
                           nn.LogSoftmax(dim=1))

model.fc = classifier

criterion = nn.NLLLoss()

optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

model.to(device);

cuda


In [44]:
epochs = 1
stops = 0
runnig_loss = 0
print_every = 5

for epoch in range(epochs):
    for images, labels in trainloader:
        steps += 1
        
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        logps = model(images)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if steps % print_every == 0:
            model.eval()
            ttest_loss = 0
            accuracy = 0
            
            for test_images, test_labels in testloader:
                
                test_images, test_labels = test_images.to(device), test_labels.to(device)
                test_logps = model(test_images)
                test_loss = criterion(test_logps, test_labels)
                ttest_loss += test_loss.item()
                
                # calculate our accuracy
                ps = torch.exp(logps)
                top_ps, top_class = ps.topk(1, dim=1) 
                #print("debug:", top_class.shape, labels.view(*top_class.shape).shape)
                equality = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equality.type(torch.FloatTensor)).item()
                
            print(f"Epoch {epoch+1}/{epochs}..)"
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {ttest_loss/len(testloader):.3f}.. "
                  f"Test accurach: {accuracy/len(testloader):.3f}")
            
            running_loss =0
            model.train()

Epoch 1/1..)Train loss: 11.301.. Test loss: 0.142.. Test accurach: 0.969
Epoch 1/1..)Train loss: 0.195.. Test loss: 0.150.. Test accurach: 0.891
Epoch 1/1..)Train loss: 0.171.. Test loss: 0.162.. Test accurach: 0.953
Epoch 1/1..)Train loss: 0.173.. Test loss: 0.110.. Test accurach: 0.953
Epoch 1/1..)Train loss: 0.152.. Test loss: 0.123.. Test accurach: 0.922
Epoch 1/1..)Train loss: 0.206.. Test loss: 0.122.. Test accurach: 0.922
Epoch 1/1..)Train loss: 0.176.. Test loss: 0.145.. Test accurach: 0.906
Epoch 1/1..)Train loss: 0.185.. Test loss: 0.125.. Test accurach: 0.906
Epoch 1/1..)Train loss: 0.174.. Test loss: 0.121.. Test accurach: 0.938
Epoch 1/1..)Train loss: 0.164.. Test loss: 0.119.. Test accurach: 0.953
Epoch 1/1..)Train loss: 0.190.. Test loss: 0.146.. Test accurach: 0.891
Epoch 1/1..)Train loss: 0.107.. Test loss: 0.115.. Test accurach: 0.984
Epoch 1/1..)Train loss: 0.154.. Test loss: 0.136.. Test accurach: 0.938
Epoch 1/1..)Train loss: 0.194.. Test loss: 0.165.. Test accurac