# Transfer Learning
 transfer learning and how this can be implemented in PyTorch.

We will Implement:
- What is Transfer Learning
- Use the pretrained ResNet-18 model
- Apply transfer learning to classify ants and bees
- Exchange the last fully connected layer
- Try 2 methods: Finetune the whole network or train only the last layer
- Evaluate the results

Transfer learning is a ML method where a model developed for the first task is reused as the starting point for the second task. For example we can create a model to classify birds, dogs and than use the same maodel modifield a littlebit at the last layer . Than use the new model to classify bees, ants. this is a popular approach in deeplearnign that allows rapid generation of new models. Important because trainign a completely new model is expensive. If we use a pretrained model we can exchange only the last layer and do not need to train the model again.

<center><img src='./images/transferLearn.PNG' width=800px></center> 

Example: 
- Here we are usign pretrained ResNet18 CNN.  This is a model which was trianed on millions of images on imagenet database. 
- This modela is 18 layers deep and can classify images into 100 object categories.
- In our example we only have two calss. So we only want to detect bees and ants

We will also see how can we use
- ImageFolder: Datasets
- Scheduler: to change the learnign rate
- Transfer Learning: 

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

In [None]:
# Set the data: 
mean = np.array([0.5, 0.5, 0.5])
std = np.array([0.25, 0.25, 0.25])

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
    ]),
}

# Here we have saved our data in an imageFolder. We have a folder. Than subfolder train and val. In each sub folder 
# we have folders for bees and ants whcih contains bees and ants images.
data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=0)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(class_names)


def imshow(inp, title):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    plt.title(title)
    plt.show()


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

In [None]:
# Trainign Loop
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        optimizer.zero_grad()
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

# transfer learning

In [None]:
#### Finetuning the convnet ####
# Load a pretrained model and reset final fully connected layer.

model = models.resnet18(pretrained=True) # Avalailable in torchvision.models module

# Excahnge the last fully connected layers. Lets take number of input featerues from last layer.
num_ftrs = model.fc.in_features
# Here the size of each output sample is set to 2.

# Create a new layer and assign values
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model.fc = nn.Linear(num_ftrs, 2) # number of outous are 2 for our case. ants or bees.

model = model.to(device)

In [None]:
# Optimizer and loss fucniton
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer = optim.SGD(model.parameters(), lr=0.001)

In [None]:
# Scheduler: to update learnign rate
step_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1) # every 7 epocs our learning rate is multiplied by 0.1 that is 10%

This is how can we use transfer learning. 
In first case we use a technique called 'fine tuning' because here we train the whole model again but only a little bit. That means we are fine tunig the weights with new data and with new last layer

Second option is to freeze all the layers in the begineeing and only train the last layer. So we ahve to loop until we get parameters.

In [None]:
#---------------------------------------------------------------------------------------#
# StepLR Decays the learning rate of each parameter group by gamma every step_size epochs
# Decay LR by a factor of 0.1 every 7 epochs
# Learning rate scheduling should be applied after optimizer’s update
# e.g., you should write your code this way:

# for epoch in range(100):
#     train(...)
#     validate(...) or evaluate
#     scheduler.step()
#---------------------------------------------------------------------------------------#

#this is we have create above
model = train_model(model, criterion, optimizer, step_lr_scheduler, num_epochs=25)

In [None]:
#### ConvNet as fixed feature extractor ####
# Here, we need to freeze all the network except the final layer.
# We need to set requires_grad == False to freeze the parameters so that the gradients are not computed in backward()
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2) # setup new last layer

model_conv = model_conv.to(device)

In [None]:
criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)