# Transfer Learning

노트북에서, 당신은 pre-train된 네트워크를 어떻게 사용하는지 배울 것이다. [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). 에서 사용된 네트워크를 사용할 것이다. 

ImageNet은 1M에 육박하는 레이블링된 이미지로, 1000개의 카테고리이다. 이것은 convolution layer라는 구조의 딥 뉴럴 네트워크를 사용하였다. 자세히는 안보겠지만, 더 자세히 알고 싶으면 please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).

트레인되고 나서, 이 모델들은 새로운 이미지의 특성 파악을 매우 잘했다. pre-trained 네트워크를 가져와서 새로운 데이터에 사용하는 것을 transfer learning이라고 한다. 여기서 우리는 transfer learning을 사용해서 네트워크를 트레이닝하여 우리의 고양이와 강아지를 완벽하게 분류하고자 한다. 

`torchvision.models`를 사용하면 pre-trained 네트워크를 다운받을 수 있고 니 애플리케이션에서 사용할수도 있다. 우리는 `models`를 import에 포함할 것이다. 


In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

pretrained model의 대부분은 224X224 이미지를 input으로 한다. 또한, 우리는 트레인할때 어떤 normalization이 사용했는지도 맞춰서 해야한다. 각각의 color channel은 따로따로 normailze되며, 평균은 `[0.485, 0.456, 0.406]` 그리고 std는 `[0.229, 0.224, 0.225]`였다.

In [7]:
data_dir = 'dogs-vs-cats'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on. 

In [10]:
model = models.densenet121(pretrained=True) #torchvision.models.densenet121(pretrained=True)

This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.

이 모델은 2개의 메인 파트로 만들어졌는데, feature와 classifier이다. feature 파트는 convolutional layer의 스택으로서 feature detector로 작용하여 classifier로 들어간다. classifier 파트는 single fully-connected layer로서 `(classifier): Linear(in_features=1024, out_features=1000)`이다. 이 레이어는 ImageNet dataset에 대해 트레인 되어서, 우리의 특정한 문제 해결에는 해결할 수 없을 수 있다. 이는 우리는 classifier를 대체해야하며, feature는 스스로 완벽히 일할 것이다. 보통, pre-trained networks는 놀랍도록 좋은 feature detector라서 간단한 feed-forward classifier의 input으로 사용될 수 있다. 


In [9]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier = classifier

With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.

우리의 모델은, 우리는 classifier를 train할 필요가 있다. 그러나, 이제 우리는 **really deep**한 neural network. 만약 당신이 보통처럼 CPU로 학습시킨다면, 오랜 시간이 걸린다. 그러나 GPU를 사용할 것이다. 선형대수 계산은 GPU에서 병렬로 이루어지며 이러면 100배로 속도가  향상된다. 또한 멀티 GPU로 학습해서 더 빨리 할 수도 있다. 

PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.

pytorch는 다른 deep learning 프레임워크와 마찬가지로 [CUDA](https://developer.nvidia.com/cuda-zone)를 써서 forword, backward로 효율적으로 계산합니다. pytorch에서 당신은 `model.to('cuda')`를 통해 model parameter와 다른 tensor를 GPU 메모리에 옮길 수 있다. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.


In [11]:
import time

In [12]:
for device in ['cpu', 'cuda']:

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

Device = cpu; Time per batch: 2.992 seconds


AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

You can write device agnostic code which will automatically use CUDA if it's enabled like so:

```python
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
```

From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.

>**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen. pretrained model을 학습시켜서 고양이와 강아지 그림을 분류해보세요. DenseNet model을 가지고 하거나, ResNet을 써도 됩니다. 당신은 classifier만 학습시키고 feature에 대한 parameter 파트는 얼려있다. 

In [14]:
# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = models.densenet121(pretrained=True)

# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False
    
model.classifier = nn.Sequential(nn.Linear(1024, 256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

model.to(device);

In [None]:
epochs = 1
steps = 0
running_loss = 0
print_every = 5
for epoch in range(epochs):
    
    for inputs, labels in trainloader:
    
        steps += 1
        
        # Move input and label tensors to the default device
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        logps = model.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        
        if steps % print_every == 0:
            test_loss = 0
            accuracy = 0
            model.eval()
            with torch.no_grad():
                for inputs, labels in testloader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    logps = model.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    # Calculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                    
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            running_loss = 0
            model.train()

Epoch 1/1.. Train loss: 0.822.. Test loss: 0.316.. Test accuracy: 0.866
Epoch 1/1.. Train loss: 0.315.. Test loss: 0.123.. Test accuracy: 0.964
Epoch 1/1.. Train loss: 0.254.. Test loss: 0.093.. Test accuracy: 0.972
Epoch 1/1.. Train loss: 0.272.. Test loss: 0.083.. Test accuracy: 0.972
Epoch 1/1.. Train loss: 0.216.. Test loss: 0.067.. Test accuracy: 0.978
Epoch 1/1.. Train loss: 0.199.. Test loss: 0.058.. Test accuracy: 0.980
Epoch 1/1.. Train loss: 0.163.. Test loss: 0.084.. Test accuracy: 0.969
Epoch 1/1.. Train loss: 0.202.. Test loss: 0.116.. Test accuracy: 0.954
Epoch 1/1.. Train loss: 0.228.. Test loss: 0.101.. Test accuracy: 0.961
Epoch 1/1.. Train loss: 0.279.. Test loss: 0.054.. Test accuracy: 0.982
Epoch 1/1.. Train loss: 0.278.. Test loss: 0.061.. Test accuracy: 0.979
Epoch 1/1.. Train loss: 0.168.. Test loss: 0.074.. Test accuracy: 0.974
Epoch 1/1.. Train loss: 0.193.. Test loss: 0.058.. Test accuracy: 0.982
Epoch 1/1.. Train loss: 0.154.. Test loss: 0.052.. Test accuracy