## PyTorch Tutorial 15 - Transfer Learning
Taken from 

* https://www.youtube.com/watch?v=K0lWSB2QoIQ

* https://github.com/python-engineer/pytorchTutorial/blob/master/15_transfer_learning.py

Model developed for one task can be resused for a model in second task. e.g we can use model to classify birds and cats and then use to to classify, just with modifications in **last layer** to classify bees and dogs.

This helps to quickly generate new models. To train completely new model would have been very time consuming (can take days or weeks to do so)

So this approach just focuses on training the last layer and we do not need to train whole model again.

Have a look at CNN archicecture - we just train the last fully connected layers and create a new model from old model:

<img src="images/transfer-learning.png" width=900>


Here we will be using pre-trained **Res-Net-18 cnn** which is pre-trained with more than million images from imagenet database. This network is **18 layers** deep and can classify objects into **1000 object categories.**

But in our example we have only two class classification - bees and ants. Lets try.


We will also try to use 

* Imagefolder

* Sceduler (to change the learning rate)

* Transfer Learning


We have save data from https://download.pytorch.org/tutorial/hymenoptera_data.zip into *./data* folder and structure looks like this

<img src="images/folder_structure.png" width=600>


In [16]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy


# some transforms needed
mean = np.array([0.5, 0.5, 0.5])
std = np.array([0.25, 0.25, 0.25])

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
}

# import data
data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders    = {x: torch.utils.data.DataLoader(image_datasets[x], 
                                                 batch_size = 4,
                                                 shuffle = True, 
                                                 num_workers = 0) for x in ['train', 'val']}


dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


In [17]:
image_datasets

{'train': Dataset ImageFolder
     Number of datapoints: 244
     Root location: data/hymenoptera_data/train
     StandardTransform
 Transform: Compose(
                RandomResizedCrop(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BILINEAR)
                RandomHorizontalFlip(p=0.5)
                ToTensor()
                Normalize(mean=[0.5 0.5 0.5], std=[0.25 0.25 0.25])
            ),
 'val': Dataset ImageFolder
     Number of datapoints: 153
     Root location: data/hymenoptera_data/val
     StandardTransform
 Transform: Compose(
                Resize(size=256, interpolation=PIL.Image.BILINEAR)
                CenterCrop(size=(224, 224))
                ToTensor()
                Normalize(mean=[0.5 0.5 0.5], std=[0.25 0.25 0.25])
            )}

In [18]:
dataloaders

{'train': <torch.utils.data.dataloader.DataLoader at 0x7fea08883d90>,
 'val': <torch.utils.data.dataloader.DataLoader at 0x7fea08883d50>}

In [14]:
dataset_sizes

{'train': 244, 'val': 153}

In [19]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs-1}')
        print('-'*10)
        
        # Each epoch has a training and a validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # set model to training mode
            else:
                model.eval() # set model to evaluation mode
            
            running_loss = 0.0
            running_corrects = 0
            
            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    
                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                        optimizer.zero_grad()
                        
                # statistics
                running_loss += loss.item() * inputs.size(0)  #??? why * inputs.size(0) 
                running_corrects += torch.sum(preds == labels.data)
                
            if phase == 'train':
                scheduler.step()     # ????
                
            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
            
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            
            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
        print()
            
    # outside the loop now     
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

### Lets use transfer learning now]

Load a pretrained model and reset final fully connected layer. Pretraining of the loaded model is on Imagenet data (https://www.image-net.org/ , https://www.image-net.org/download.php)

In [21]:
# Load a pretrained model and reset final fully connected layer.
# Pretraining of the loaded model is on Imagenet data
model = models.resnet18(pretrained=True)

# Exchange last fc layer 
#1. Get number of input features from last layer
num_ftrs = model.fc.in_features
print(f'Num features in input features of last layer: {num_ftrs}')


Num features in input features of last layer: 512


In [20]:
# We are creating a new layer and assign it to last layer
# Here the size of each output sample is set to 2 (because 2 classes ants and nbes now)
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model.fc = nn.Linear(num_ftrs, 2)

model = model.to(device)

# now we have new model - we define our loss criteria
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer = optim.SGD(model.parameters(), lr=0.001)

# sceduler : This will update the learning rate
# - the below prameters mean that at every 7 epochs, our learning rate 
# - will be multiplied by gamma(=0.1)
step_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
#for epoch in range(100):
#    train()  # optimizer.step()
#    evaluate()
#    scheduler.step()

model = train_model(model, criterion, optimizer, step_lr_scheduler, num_epochs=25)



Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /Users/jvsingh/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

Epoch 0/24
----------
train Loss: 0.6288 Acc: 0.6557
val Loss: 0.4501 Acc: 0.7974

Epoch 1/24
----------
train Loss: 0.4771 Acc: 0.7828
val Loss: 0.3480 Acc: 0.8824

Epoch 2/24
----------
train Loss: 0.4582 Acc: 0.8197
val Loss: 0.2824 Acc: 0.9281

Epoch 3/24
----------
train Loss: 0.4025 Acc: 0.8484
val Loss: 0.2541 Acc: 0.9216

Epoch 4/24
----------
train Loss: 0.4055 Acc: 0.7951
val Loss: 0.2194 Acc: 0.9412

Epoch 5/24
----------
train Loss: 0.3646 Acc: 0.8730
val Loss: 0.2252 Acc: 0.9542

Epoch 6/24
----------
train Loss: 0.4495 Acc: 0.7664
val Loss: 0.2076 Acc: 0.9346

Epoch 7/24
----------
train Loss: 0.3374 Acc: 0.8525
val Loss: 0.2048 Acc: 0.9346

Epoch 8/24
----------
train Loss: 0.3727 Acc: 0.8402
val Loss: 0.2130 Acc: 0.9412

Epoch 9/24
----------
train Loss: 0.3565 Acc: 0.8566
val Loss: 0.2046 Acc: 0.9542

Epoch 10/24
----------
train Loss: 0.3251 Acc: 0.8689
val Loss: 0.2008 Acc: 0.9477

Epoch 11/24
----------
train Loss: 0.3640 Acc: 0.8566
val Loss: 0.2228 Acc: 0.9412

Ep

We get very good results 

In [23]:
#### ConvNet as fixed feature extractor ####
# Here, we need to freeze all the network except the final layer.
# We need to set requires_grad == False to freeze the parameters so that 
# the gradients are not computed in backward()
model_conv = models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model_conv.parameters(), lr=0.001)

step_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

model = train_model(model_conv, criterion, optimizer, step_lr_scheduler, num_epochs=25)

Epoch 0/24
----------
train Loss: 0.6864 Acc: 0.5410
val Loss: 0.5791 Acc: 0.7255

Epoch 1/24
----------
train Loss: 0.6136 Acc: 0.6598
val Loss: 0.4548 Acc: 0.8431

Epoch 2/24
----------
train Loss: 0.5377 Acc: 0.7459
val Loss: 0.3790 Acc: 0.8497

Epoch 3/24
----------
train Loss: 0.5046 Acc: 0.7664
val Loss: 0.3407 Acc: 0.8889

Epoch 4/24
----------
train Loss: 0.4657 Acc: 0.7951
val Loss: 0.3042 Acc: 0.9085

Epoch 5/24
----------
train Loss: 0.4397 Acc: 0.8115
val Loss: 0.2925 Acc: 0.9020

Epoch 6/24
----------
train Loss: 0.4576 Acc: 0.7787
val Loss: 0.2801 Acc: 0.9085

Epoch 7/24
----------
train Loss: 0.4404 Acc: 0.7992
val Loss: 0.2804 Acc: 0.8954

Epoch 8/24
----------
train Loss: 0.4139 Acc: 0.8484
val Loss: 0.2697 Acc: 0.9085

Epoch 9/24
----------
train Loss: 0.4268 Acc: 0.8156
val Loss: 0.2810 Acc: 0.9020

Epoch 10/24
----------
train Loss: 0.3749 Acc: 0.8607
val Loss: 0.2637 Acc: 0.9085

Epoch 11/24
----------
train Loss: 0.3874 Acc: 0.8566
val Loss: 0.2584 Acc: 0.9281

Ep

We can see that this takes lesser time only 9m as compared to training full which took 19m